Difference between revisions of "Rank statistic"

Latest revision as of 17:47, 8 February 2021

A statistic (cf. Statistical estimator) constructed from a rank vector. If $ R = ( R _ {1} \dots R _ {n} ) $ is the rank vector constructed from a random observation vector $ X = ( X _ {1} \dots X _ {n} ) $, then any statistic $ T = T ( R) $ which is a function of $ R $ is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation $ \tau $ between the vectors $ R $ and $ \ell = ( 1 \dots n ) $, defined by the formula

$$ \tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) . $$

In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let $ A = \| a ( i , j ) \| $ be an arbitrary square matrix of order $ n $. Then the statistic

$$ T = \sum _ { i=1} ^ { n } a ( i , R _ {i} ) $$

is called a linear rank statistic. For example, the Spearman coefficient of rank correlation $ \rho $, defined by the formula

$$ \rho = \frac{12}{n ( n - 1 ) } \sum _ { i=1} ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) , $$

is a linear rank statistic.

Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If $ T $ is a rank statistic constructed from a random vector $ X $ under a hypothesis $ H _ {0} $ about its distribution, then a linear rank statistic $ \widehat{T} = \widehat{T} ( R) $ such that $ {\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \} $ is minimal under the condition that $ H _ {0} $ is true, is called the projection of $ T $ into the family of linear rank statistics. As a rule, $ \widehat{T} $ approximates $ T $ well enough and the difference $ T - \widehat{T} $ is negligibly small as $ n \rightarrow \infty $. If the hypothesis $ H _ {0} $ under which the components $ X _ {1} \dots X _ {n} $ of the random vector $ X $ are independent random variables is true, then the projection $ \widehat{T} $ of $ T $ can be determined by the formula

$$ \tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i=1} ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} , $$

where $ \widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \} $, $ 1 \leq i , j \leq n $ (see [1]).

There is an intrinsic connection between $ \tau $ and $ \rho $. It is shown in [1] that the projection $ \widehat \tau $ of the Kendall coefficient $ \tau $ into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient $ \rho $; namely,

$$ \widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho . $$

This equality implies that the correlation coefficient $ \mathop{\rm corr} ( \rho , \tau ) $ between $ \rho $ and $ \tau $ is equal to

$$ \mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } , $$

implying that these rank statistics are asymptotically equivalent for large $ n $( cf. [2]).

References

[1]	J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[2]	M.G. Kendall, "Rank correlation methods" , Griffin (1970)

How to Cite This Entry:
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=48435

This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Rank statistic"

Latest revision as of 17:47, 8 February 2021

References

@@ Line 17: / Line 17: @@
 is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation  $  \tau $
 between the vectors  $  R $
-and  $  l = ( 1 \dots n ) $,
+and  $  \ell = ( 1 \dots n ) $,
 defined by the formula
@@ Line 34: / Line 34: @@
 $$
-T  =  \sum _ { i= } 1 ^ { n }  a ( i , R _ {i} )
+T  =  \sum _ { i=1} ^ { n }  a ( i , R _ {i} )
 $$
@@ Line 44: / Line 44: @@
 \frac{12}{n ( n - 1 ) }
-\sum _ { i= } 1 ^ { n }
+\sum _ { i=1} ^ { n }
 \left ( i - n+
 \frac{1}{2}
@@ Line 77: / Line 77: @@
 \frac{1}{n}
-\sum _ { i= } 1 ^ { n }
+\sum _ { i=1} ^ { n }
 \widehat{a}  ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
 $$
 where  $  \widehat{a}  ( i , j ) = {\mathsf E} \{ T \mid  R _ {i} = j \} $,
-$  1 \leq  i , j \leq  n $(
+$  1 \leq  i , j \leq  n $
-see [[#References|[1]]]).
+(see [[#References|[1]]]).
 There is an intrinsic connection between  $  \tau $
@@ Line 101: / Line 101: @@
 $$
-This equality implies that the [[Correlation coefficient|correlation coefficient]]  $   \mathop{\rm corr} ( \rho , \tau ) $
+This equality implies that the [[correlation coefficient]]  $   \mathop{\rm corr} ( \rho , \tau ) $
 between  $  \rho $
 and  $  \tau $