Difference between revisions of "Rank statistic"

Latest revision as of 17:47, 8 February 2021

A statistic (cf. Statistical estimator) constructed from a rank vector. If $R = ( R _ {1} \dots R _ {n} )$ is the rank vector constructed from a random observation vector $X = ( X _ {1} \dots X _ {n} )$ , then any statistic $T = T ( R)$ which is a function of $R$ is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation $\tau$ between the vectors $R$ and $\ell = ( 1 \dots n )$ , defined by the formula

$\tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) .$

In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let $A = \| a ( i , j ) \|$ be an arbitrary square matrix of order $n$ . Then the statistic

$T = \sum _ { i=1} ^ { n } a ( i , R _ {i} )$

is called a linear rank statistic. For example, the Spearman coefficient of rank correlation $\rho$ , defined by the formula

$\rho = \frac{12}{n ( n - 1 ) } \sum _ { i=1} ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) ,$

is a linear rank statistic.

Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If $T$ is a rank statistic constructed from a random vector $X$ under a hypothesis $H _ {0}$ about its distribution, then a linear rank statistic $\widehat{T} = \widehat{T} ( R)$ such that ${\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \}$ is minimal under the condition that $H _ {0}$ is true, is called the projection of $T$ into the family of linear rank statistics. As a rule, $\widehat{T}$ approximates $T$ well enough and the difference $T - \widehat{T}$ is negligibly small as $n \rightarrow \infty$ . If the hypothesis $H _ {0}$ under which the components $X _ {1} \dots X _ {n}$ of the random vector $X$ are independent random variables is true, then the projection $\widehat{T}$ of $T$ can be determined by the formula

$\tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i=1} ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,$

where $\widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \}$ , $1 \leq i , j \leq n$ (see [1]).

There is an intrinsic connection between $\tau$ and $\rho$ . It is shown in [1] that the projection $\widehat \tau$ of the Kendall coefficient $\tau$ into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient $\rho$ ; namely,

$\widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho .$

This equality implies that the correlation coefficient $\mathop{\rm corr} ( \rho , \tau )$ between $\rho$ and $\tau$ is equal to

$\mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } ,$

implying that these rank statistics are asymptotically equivalent for large $n$ ( cf. [2]).

References

[1]	J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)
[2]	M.G. Kendall, "Rank correlation methods" , Griffin (1970)

How to Cite This Entry:
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=51568

This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Rank statistic"

Latest revision as of 17:47, 8 February 2021

References

@@ Line 17: / Line 17: @@
 is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation   $\tau$
 between the vectors   $R$
-and  $  l = ( 1 \dots n ) $,
+and  $  \ell = ( 1 \dots n ) $,
 defined by the formula
@@ Line 34: / Line 34: @@
 $$
-T  =  \sum _ { i= } 1 ^ { n }  a ( i , R _ {i} )
+T  =  \sum _ { i=1} ^ { n }  a ( i , R _ {i} )
 $$
@@ Line 44: / Line 44: @@
 \frac{12}{n ( n - 1 ) }
-\sum _ { i= } 1 ^ { n }
+\sum _ { i=1} ^ { n }
 \left ( i - n+
 \frac{1}{2}
@@ Line 77: / Line 77: @@
 \frac{1}{n}
-\sum _ { i= } 1 ^ { n }
+\sum _ { i=1} ^ { n }
 \widehat{a}  ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
 $$
 where   $\widehat{a}  ( i , j ) = {\mathsf E} \{ T \mid  R _ {i} = j \}$ ,
- $1 \leq  i , j \leq  n$ (
+ $1 \leq  i , j \leq  n$
-see [[#References|[1]]]).
+(see [[#References|[1]]]).
 There is an intrinsic connection between   $\tau$
@@ Line 101: / Line 101: @@
 $$
-This equality implies that the [[Correlation coefficient|correlation coefficient]]   $\mathop{\rm corr} ( \rho , \tau )$
+This equality implies that the [[correlation coefficient]]   $\mathop{\rm corr} ( \rho , \tau )$
 between   $\rho$
 and   $\tau$