Difference between revisions of "Rank statistic"
(Importing text file) |
m (fix tex) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | r0775101.png | ||
+ | $#A+1 = 43 n = 0 | ||
+ | $#C+1 = 43 : ~/encyclopedia/old_files/data/R077/R.0707510 Rank statistic | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | A statistic (cf. [[Statistical estimator|Statistical estimator]]) constructed from a [[Rank vector|rank vector]]. If R = ( R _ {1} \dots R _ {n} ) | |
+ | is the rank vector constructed from a random observation vector X = ( X _ {1} \dots X _ {n} ) , | ||
+ | then any statistic T = T ( R) | ||
+ | which is a function of R | ||
+ | is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation \tau | ||
+ | between the vectors R | ||
+ | and $ \ell = ( 1 \dots n ) $, | ||
+ | defined by the formula | ||
− | + | $$ | |
+ | \tau = | ||
+ | \frac{1}{n ( n - 1 ) } | ||
− | + | \sum _ {i \neq j } | |
+ | \mathop{\rm sign} ( i - j ) \ | ||
+ | \mathop{\rm sign} ( R _ {i} - R _ {j} ) . | ||
+ | $$ | ||
− | + | In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let $ A = \| a ( i , j ) \| $ | |
+ | be an arbitrary square matrix of order n . | ||
+ | Then the statistic | ||
+ | |||
+ | $$ | ||
+ | T = \sum _ { i=1} ^ { n } a ( i , R _ {i} ) | ||
+ | $$ | ||
+ | |||
+ | is called a linear rank statistic. For example, the Spearman coefficient of rank correlation \rho , | ||
+ | defined by the formula | ||
+ | |||
+ | $$ | ||
+ | \rho = | ||
+ | \frac{12}{n ( n - 1 ) } | ||
+ | |||
+ | \sum _ { i=1} ^ { n } | ||
+ | \left ( i - n+ | ||
+ | \frac{1}{2} | ||
+ | \right ) | ||
+ | \left ( R _ {i} - n+ | ||
+ | \frac{1}{2} | ||
+ | \right ) , | ||
+ | $$ | ||
is a linear rank statistic. | is a linear rank statistic. | ||
− | Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If | + | Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If T |
+ | is a rank statistic constructed from a random vector X | ||
+ | under a hypothesis $ H _ {0} $ | ||
+ | about its distribution, then a linear rank statistic $ \widehat{T} = \widehat{T} ( R) $ | ||
+ | such that {\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \} | ||
+ | is minimal under the condition that $ H _ {0} $ | ||
+ | is true, is called the projection of T | ||
+ | into the family of linear rank statistics. As a rule, \widehat{T} | ||
+ | approximates T | ||
+ | well enough and the difference T - \widehat{T} | ||
+ | is negligibly small as n \rightarrow \infty . | ||
+ | If the hypothesis $ H _ {0} $ | ||
+ | under which the components X _ {1} \dots X _ {n} | ||
+ | of the random vector X | ||
+ | are independent random variables is true, then the projection \widehat{T} | ||
+ | of T | ||
+ | can be determined by the formula | ||
+ | |||
+ | $$ \tag{* } | ||
+ | \widehat{T} = n- | ||
+ | \frac{1}{n} | ||
+ | |||
+ | \sum _ { i=1} ^ { n } | ||
+ | \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} , | ||
+ | $$ | ||
+ | |||
+ | where \widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \} , | ||
+ | 1 \leq i , j \leq n | ||
+ | (see [[#References|[1]]]). | ||
− | + | There is an intrinsic connection between \tau | |
+ | and \rho . | ||
+ | It is shown in [[#References|[1]]] that the projection \widehat \tau | ||
+ | of the Kendall coefficient \tau | ||
+ | into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient \rho ; | ||
+ | namely, | ||
− | + | $$ | |
+ | \widehat \tau = | ||
+ | \frac{2}{3} | ||
− | + | \left ( 1 + | |
+ | \frac{1}{n} | ||
+ | \right ) \rho . | ||
+ | $$ | ||
− | + | This equality implies that the [[correlation coefficient]] \mathop{\rm corr} ( \rho , \tau ) | |
+ | between \rho | ||
+ | and \tau | ||
+ | is equal to | ||
− | + | $$ | |
+ | \mathop{\rm corr} ( \rho , \tau ) = \ | ||
+ | \sqrt { | ||
+ | \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } | ||
+ | } = \ | ||
− | + | \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } | |
+ | , | ||
+ | $$ | ||
− | implying that these rank statistics are asymptotically equivalent for large | + | implying that these rank statistics are asymptotically equivalent for large n ( |
+ | cf. [[#References|[2]]]). | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> M.G. Kendall, "Rank correlation methods" , Griffin (1970)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> M.G. Kendall, "Rank correlation methods" , Griffin (1970)</TD></TR></table> |
Latest revision as of 17:47, 8 February 2021
A statistic (cf. Statistical estimator) constructed from a rank vector. If R = ( R _ {1} \dots R _ {n} )
is the rank vector constructed from a random observation vector X = ( X _ {1} \dots X _ {n} ) ,
then any statistic T = T ( R)
which is a function of R
is called a rank statistic. A classical example of a rank statistic is the Kendall coefficient of rank correlation \tau
between the vectors R
and \ell = ( 1 \dots n ) ,
defined by the formula
\tau = \frac{1}{n ( n - 1 ) } \sum _ {i \neq j } \mathop{\rm sign} ( i - j ) \ \mathop{\rm sign} ( R _ {i} - R _ {j} ) .
In the class of all rank statistics a special place is occupied by so-called linear rank statistics, defined as follows. Let A = \| a ( i , j ) \| be an arbitrary square matrix of order n . Then the statistic
T = \sum _ { i=1} ^ { n } a ( i , R _ {i} )
is called a linear rank statistic. For example, the Spearman coefficient of rank correlation \rho , defined by the formula
\rho = \frac{12}{n ( n - 1 ) } \sum _ { i=1} ^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) ,
is a linear rank statistic.
Linear rank statistics are, as a rule, simple to construct from the computational point of view and their distributions are easy to find. For this reason the notion of projection of a rank statistic into the family of linear rank statistics plays an important role in the theory of rank statistics. If T is a rank statistic constructed from a random vector X under a hypothesis H _ {0} about its distribution, then a linear rank statistic \widehat{T} = \widehat{T} ( R) such that {\mathsf E} \{ ( T - \widehat{T} ) ^ {2} \} is minimal under the condition that H _ {0} is true, is called the projection of T into the family of linear rank statistics. As a rule, \widehat{T} approximates T well enough and the difference T - \widehat{T} is negligibly small as n \rightarrow \infty . If the hypothesis H _ {0} under which the components X _ {1} \dots X _ {n} of the random vector X are independent random variables is true, then the projection \widehat{T} of T can be determined by the formula
\tag{* } \widehat{T} = n- \frac{1}{n} \sum _ { i=1} ^ { n } \widehat{a} ( i , R _ {i} ) - ( n - 2 ) {\mathsf E} \{ T \} ,
where \widehat{a} ( i , j ) = {\mathsf E} \{ T \mid R _ {i} = j \} , 1 \leq i , j \leq n (see [1]).
There is an intrinsic connection between \tau and \rho . It is shown in [1] that the projection \widehat \tau of the Kendall coefficient \tau into the family of linear rank statistics coincides, up to a multiplicative constant, with the Spearman coefficient \rho ; namely,
\widehat \tau = \frac{2}{3} \left ( 1 + \frac{1}{n} \right ) \rho .
This equality implies that the correlation coefficient \mathop{\rm corr} ( \rho , \tau ) between \rho and \tau is equal to
\mathop{\rm corr} ( \rho , \tau ) = \ \sqrt { \frac{ {\mathsf D} \widehat \tau }{ {\mathsf D} \tau } } = \ \frac{2 ( n + 1 ) }{\sqrt {2 n ( 2 n + 5 ) } } ,
implying that these rank statistics are asymptotically equivalent for large n ( cf. [2]).
References
[1] | J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) |
[2] | M.G. Kendall, "Rank correlation methods" , Griffin (1970) |
Rank statistic. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Rank_statistic&oldid=18903