# Rank statistic

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A statistic (cf. Statistical estimator) constructed from a rank vector. If $R = ( R _ {1} \dots R _ {n} )$ is the rank vector constructed from a random observation vector $X = ( X _ {1} \dots X _ {n} )$, then any statistic $T = T ( R)$ which is a function of $R$ is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation $\tau$ between the vectors $R$ and $\ell = ( 1 \dots n )$, defined by the formula

$$\tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) .$$

In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let $A = \| a ( i , j ) \|$ be an arbitrary square matrix of order $n$. Then the statistic

$$T = \sum _ { i=1} ^ { n } a ( i , R _ {i} )$$

is called a linear rank statistic. For example, the Spearman coefficient of rank correlation $\rho$, defined by the formula

$$\rho = \frac{12}{n ( n - 1 ) } \sum _ { i=1} ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) ,$$

is a linear rank statistic.

Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If $T$ is a rank statistic constructed from a random vector $X$ under a hypothesis $H _ {0}$ about its distribution, then a linear rank statistic $\widehat{T} = \widehat{T} ( R)$ such that ${\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \}$ is minimal under the condition that $H _ {0}$ is true, is called the projection of $T$ into the family of linear rank statistics. As a rule, $\widehat{T}$ approximates $T$ well enough and the difference $T - \widehat{T}$ is negligibly small as $n \rightarrow \infty$. If the hypothesis $H _ {0}$ under which the components $X _ {1} \dots X _ {n}$ of the random vector $X$ are independent random variables is true, then the projection $\widehat{T}$ of $T$ can be determined by the formula

$$\tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i=1} ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,$$

where $\widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \}$, $1 \leq i , j \leq n$ (see [1]).

There is an intrinsic connection between $\tau$ and $\rho$. It is shown in [1] that the projection $\widehat \tau$ of the Kendall coefficient $\tau$ into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient $\rho$; namely,

$$\widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho .$$

This equality implies that the correlation coefficient $\mathop{\rm corr} ( \rho , \tau )$ between $\rho$ and $\tau$ is equal to

$$\mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } ,$$

implying that these rank statistics are asymptotically equivalent for large $n$( cf. [2]).

#### References

 [1] J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) [2] M.G. Kendall, "Rank correlation methods" , Griffin (1970)
How to Cite This Entry:
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=51568
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article