Namespaces
Variants
Actions

Difference between revisions of "Rank statistic"

From Encyclopedia of Mathematics
Jump to: navigation, search
m (tex encoded by computer)
m (fix tex)
 
Line 17: Line 17:
 
is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation    \tau
 
is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation    \tau
 
between the vectors    R
 
between the vectors    R
and  $  l = ( 1 \dots n ) $,  
+
and  $  \ell = ( 1 \dots n ) $,  
 
defined by the formula
 
defined by the formula
  
Line 34: Line 34:
  
 
$$  
 
$$  
T  =  \sum _ { i= } 1 ^ { n }  a ( i , R _ {i} )
+
T  =  \sum _ { i=1} ^ { n }  a ( i , R _ {i} )
 
$$
 
$$
  
Line 44: Line 44:
 
\frac{12}{n ( n - 1 ) }
 
\frac{12}{n ( n - 1 ) }
  
\sum _ { i= } 1 ^ { n }  
+
\sum _ { i=1} ^ { n }  
 
\left ( i - n+  
 
\left ( i - n+  
 
\frac{1}{2}
 
\frac{1}{2}
Line 77: Line 77:
 
\frac{1}{n}
 
\frac{1}{n}
  
\sum _ { i= } 1 ^ { n }  
+
\sum _ { i=1} ^ { n }  
 
\widehat{a}  ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
 
\widehat{a}  ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
 
$$
 
$$
  
 
where    \widehat{a}  ( i , j ) = {\mathsf E} \{ T \mid  R _ {i} = j \} ,  
 
where    \widehat{a}  ( i , j ) = {\mathsf E} \{ T \mid  R _ {i} = j \} ,  
  1 \leq  i , j \leq  n (
+
  1 \leq  i , j \leq  n
see [[#References|[1]]]).
+
(see [[#References|[1]]]).
  
 
There is an intrinsic connection between    \tau
 
There is an intrinsic connection between    \tau
Line 101: Line 101:
 
$$
 
$$
  
This equality implies that the [[Correlation coefficient|correlation coefficient]]    \mathop{\rm corr} ( \rho , \tau )
+
This equality implies that the [[correlation coefficient]]    \mathop{\rm corr} ( \rho , \tau )
 
between    \rho
 
between    \rho
 
and    \tau
 
and    \tau

Latest revision as of 17:47, 8 February 2021


A statistic (cf. Statistical estimator) constructed from a rank vector. If R = ( R _ {1} \dots R _ {n} ) is the rank vector constructed from a random observation vector X = ( X _ {1} \dots X _ {n} ) , then any statistic T = T ( R) which is a function of R is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation \tau between the vectors R and \ell = ( 1 \dots n ) , defined by the formula

\tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) .

In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let A = \| a ( i , j ) \| be an arbitrary square matrix of order n . Then the statistic

T = \sum _ { i=1} ^ { n } a ( i , R _ {i} )

is called a linear rank statistic. For example, the Spearman coefficient of rank correlation \rho , defined by the formula

\rho = \frac{12}{n ( n - 1 ) } \sum _ { i=1} ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) ,

is a linear rank statistic.

Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If T is a rank statistic constructed from a random vector X under a hypothesis H _ {0} about its distribution, then a linear rank statistic \widehat{T} = \widehat{T} ( R) such that {\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \} is minimal under the condition that H _ {0} is true, is called the projection of T into the family of linear rank statistics. As a rule, \widehat{T} approximates T well enough and the difference T - \widehat{T} is negligibly small as n \rightarrow \infty . If the hypothesis H _ {0} under which the components X _ {1} \dots X _ {n} of the random vector X are independent random variables is true, then the projection \widehat{T} of T can be determined by the formula

\tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i=1} ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,

where \widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \} , 1 \leq i , j \leq n (see [1]).

There is an intrinsic connection between \tau and \rho . It is shown in [1] that the projection \widehat \tau of the Kendall coefficient \tau into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient \rho ; namely,

\widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho .

This equality implies that the correlation coefficient \mathop{\rm corr} ( \rho , \tau ) between \rho and \tau is equal to

\mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } ,

implying that these rank statistics are asymptotically equivalent for large n ( cf. [2]).

References

[1] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[2] M.G. Kendall, "Rank correlation methods" , Griffin (1970)
How to Cite This Entry:
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=51568
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article