Namespaces
Variants
Actions

Difference between revisions of "Rank statistic"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
Line 1: Line 1:
A statistic (cf. [[Statistical estimator|Statistical estimator]]) constructed from a [[Rank vector|rank vector]]. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775101.png" /> is the rank vector constructed from a random observation vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775102.png" />, then any statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775103.png" /> which is a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775104.png" /> is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775105.png" /> between the vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775106.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775107.png" />, defined by the formula
+
<!--
 +
r0775101.png
 +
$#A+1 = 43 n = 0
 +
$#C+1 = 43 : ~/encyclopedia/old_files/data/R077/R.0707510 Rank statistic
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775108.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r0775109.png" /> be an arbitrary square matrix of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751010.png" />. Then the statistic
+
A statistic (cf. [[Statistical estimator|Statistical estimator]]) constructed from a [[Rank vector|rank vector]]. If  $  R = ( R _ {1} \dots R _ {n} ) $
 +
is the rank vector constructed from a random observation vector  $  X = ( X _ {1} \dots X _ {n} ) $,
 +
then any statistic  $  T = T ( R) $
 +
which is a function of  $  R $
 +
is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation  $  \tau $
 +
between the vectors  $  R $
 +
and  $  l = ( 1 \dots n ) $,
 +
defined by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751011.png" /></td> </tr></table>
+
$$
 +
\tau  =
 +
\frac{1}{n ( n - 1 ) }
  
is called a linear rank statistic. For example, the Spearman coefficient of rank correlation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751012.png" />, defined by the formula
+
\sum _ {i \neq j }
 +
\mathop{\rm sign} ( i - j ) \
 +
\mathop{\rm sign} ( R _ {i} - R _ {j} ) .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751013.png" /></td> </tr></table>
+
In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let  $  A = \| a ( i , j ) \| $
 +
be an arbitrary square matrix of order  $  n $.
 +
Then the statistic
 +
 
 +
$$
 +
= \sum _ { i= } 1 ^ { n }  a ( i , R _ {i} )
 +
$$
 +
 
 +
is called a linear rank statistic. For example, the Spearman coefficient of rank correlation  $  \rho $,
 +
defined by the formula
 +
 
 +
$$
 +
\rho  =
 +
\frac{12}{n ( n - 1 ) }
 +
 
 +
\sum _ { i= } 1 ^ { n }
 +
\left ( i - n+
 +
\frac{1}{2}
 +
\right )
 +
\left ( R _ {i} - n+
 +
\frac{1}{2}
 +
\right ) ,
 +
$$
  
 
is a linear rank statistic.
 
is a linear rank statistic.
  
Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751014.png" /> is a rank statistic constructed from a random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751015.png" /> under a hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751016.png" /> about its distribution, then a linear rank statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751017.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751018.png" /> is minimal under the condition that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751019.png" /> is true, is called the projection of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751020.png" /> into the family of linear rank statistics. As a rule, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751021.png" /> approximates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751022.png" /> well enough and the difference <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751023.png" /> is negligibly small as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751024.png" />. If the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751025.png" /> under which the components <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751026.png" /> of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751027.png" /> are independent random variables is true, then the projection <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751028.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751029.png" /> can be determined by the formula
+
Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If $  T $
 +
is a rank statistic constructed from a random vector $  X $
 +
under a hypothesis $  H _ {0} $
 +
about its distribution, then a linear rank statistic $  \widehat{T}  = \widehat{T}  ( R) $
 +
such that $  {\mathsf E} \{ ( T - \widehat{T}  )  ^ {2} \} $
 +
is minimal under the condition that $  H _ {0} $
 +
is true, is called the projection of $  T $
 +
into the family of linear rank statistics. As a rule, $  \widehat{T}  $
 +
approximates $  T $
 +
well enough and the difference $  T - \widehat{T}  $
 +
is negligibly small as $  n \rightarrow \infty $.  
 +
If the hypothesis $  H _ {0} $
 +
under which the components $  X _ {1} \dots X _ {n} $
 +
of the random vector $  X $
 +
are independent random variables is true, then the projection $  \widehat{T}  $
 +
of $  T $
 +
can be determined by the formula
 +
 
 +
$$ \tag{* }
 +
\widehat{T}  =  n-
 +
\frac{1}{n}
 +
 
 +
\sum _ { i= } 1 ^ { n }
 +
\widehat{a}  ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
 +
$$
 +
 
 +
where  $  \widehat{a}  ( i , j ) = {\mathsf E} \{ T \mid  R _ {i} = j \} $,
 +
$  1 \leq  i , j \leq  n $(
 +
see [[#References|[1]]]).
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751030.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table>
+
There is an intrinsic connection between  $  \tau $
 +
and  $  \rho $.  
 +
It is shown in [[#References|[1]]] that the projection  $  \widehat \tau  $
 +
of the Kendall coefficient  $  \tau $
 +
into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient  $  \rho $;  
 +
namely,
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751031.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751032.png" /> (see [[#References|[1]]]).
+
$$
 +
\widehat \tau    =
 +
\frac{2}{3}
  
There is an intrinsic connection between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751033.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751034.png" />. It is shown in [[#References|[1]]] that the projection <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751035.png" /> of the Kendall coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751036.png" /> into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751037.png" />; namely,
+
\left ( 1 +
 +
\frac{1}{n}
 +
\right ) \rho .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751038.png" /></td> </tr></table>
+
This equality implies that the [[Correlation coefficient|correlation coefficient]]  $  \mathop{\rm corr} ( \rho , \tau ) $
 +
between  $  \rho $
 +
and  $  \tau $
 +
is equal to
  
This equality implies that the [[Correlation coefficient|correlation coefficient]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751039.png" /> between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751040.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751041.png" /> is equal to
+
$$
 +
\mathop{\rm corr} ( \rho , \tau )  = \
 +
\sqrt {
 +
\frac{ {\mathsf D} \widehat \tau  }{ {\mathsf D} \tau }
 +
= \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751042.png" /></td> </tr></table>
+
\frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } }
 +
,
 +
$$
  
implying that these rank statistics are asymptotically equivalent for large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r077/r077510/r07751043.png" /> (cf. [[#References|[2]]]).
+
implying that these rank statistics are asymptotically equivalent for large $  n $(
 +
cf. [[#References|[2]]]).
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  "Rank correlation methods" , Griffin  (1970)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  J. Hájek,  Z. Sidák,  "Theory of rank tests" , Acad. Press  (1967)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  "Rank correlation methods" , Griffin  (1970)</TD></TR></table>

Revision as of 08:09, 6 June 2020


A statistic (cf. Statistical estimator) constructed from a rank vector. If $ R = ( R _ {1} \dots R _ {n} ) $ is the rank vector constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $, then any statistic $ T = T ( R) $ which is a function of $ R $ is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation $ \tau $ between the vectors $ R $ and $ l = ( 1 \dots n ) $, defined by the formula

$$ \tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) . $$

In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let $ A = \| a ( i , j ) \| $ be an arbitrary square matrix of order $ n $. Then the statistic

$$ T = \sum _ { i= } 1 ^ { n } a ( i , R _ {i} ) $$

is called a linear rank statistic. For example, the Spearman coefficient of rank correlation $ \rho $, defined by the formula

$$ \rho = \frac{12}{n ( n - 1 ) } \sum _ { i= } 1 ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) , $$

is a linear rank statistic.

Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If $ T $ is a rank statistic constructed from a random vector $ X $ under a hypothesis $ H _ {0} $ about its distribution, then a linear rank statistic $ \widehat{T} = \widehat{T} ( R) $ such that $ {\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \} $ is minimal under the condition that $ H _ {0} $ is true, is called the projection of $ T $ into the family of linear rank statistics. As a rule, $ \widehat{T} $ approximates $ T $ well enough and the difference $ T - \widehat{T} $ is negligibly small as $ n \rightarrow \infty $. If the hypothesis $ H _ {0} $ under which the components $ X _ {1} \dots X _ {n} $ of the random vector $ X $ are independent random variables is true, then the projection $ \widehat{T} $ of $ T $ can be determined by the formula

$$ \tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i= } 1 ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} , $$

where $ \widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \} $, $ 1 \leq i , j \leq n $( see [1]).

There is an intrinsic connection between $ \tau $ and $ \rho $. It is shown in [1] that the projection $ \widehat \tau $ of the Kendall coefficient $ \tau $ into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient $ \rho $; namely,

$$ \widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho . $$

This equality implies that the correlation coefficient $ \mathop{\rm corr} ( \rho , \tau ) $ between $ \rho $ and $ \tau $ is equal to

$$ \mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } , $$

implying that these rank statistics are asymptotically equivalent for large $ n $( cf. [2]).

References

[1] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[2] M.G. Kendall, "Rank correlation methods" , Griffin (1970)
How to Cite This Entry:
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=18903
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article