Spearman coefficient of rank correlation
A measure of the dependence of two random variables $ X $
and $ Y $,
based on the rankings of the $ X _ {i} $'
s and $ Y _ {i} $'
s in independent pairs of observations $ ( X _ {1} , Y _ {1} ) \dots ( X _ {n} , Y _ {n} ) $.
If $ R _ {i} $
is the rank of $ Y $
corresponding to that pair $ ( X , Y ) $
for which the rank of $ X $
is equal to $ i $,
then the Spearman coefficient of rank correlation is defined by the formula
$$ r _ {s} = \frac{12}{n ( n ^ {2} - 1 ) } \sum _ { i=1}^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) $$
or, equivalently, by
$$ r _ {s} = 1 - \frac{6 }{n ( n ^ {2} - 1 ) } \sum _ {i=1} ^ { n } d _ {i} ^ {2} , $$
where $ d _ {i} $ is the difference between the ranks of $ X _ {i} $ and $ Y _ {i} $. The value of $ r _ {s} $ lies between $ - 1 $ and $ + 1 $; $ r _ {s} = + 1 $ when the rank sequences completely coincide, i.e. $ i = R _ {i} $, $ i = 1 \dots n $; and $ r _ {s} = - 1 $ when the rank sequences are completely opposite, i.e. $ i = ( n + 1 ) - R _ {i} $, $ i = 1 \dots n $. This coefficient, like any other rank statistic, is applied to test the hypothesis of independence of two variables. If the variables are independent, then $ {\mathsf E} r _ {s} = 0 $, and $ {\mathsf D} r _ {s} = 1 / ( n - 1 ) $. Thus, the amount of deviation of $ r _ {s} $ from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of $ r _ {s} $ for independent variables $ X $ and $ Y $. When $ 4 \leq n \leq 10 $ one can use tables of the exact distribution (see [2], [4]), and when $ n > 10 $ one can take advantage, for example, of the fact that as $ n \rightarrow \infty $ the random variable $ \sqrt n- 1 r _ {s} $ is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if $ | r _ {s} | > u _ {1 - \alpha / 2 } / \sqrt n- 1 $, where $ u _ {1 - \alpha / 2 } $ is the root of the equation $ \Phi ( u) = 1 - \alpha / 2 $ and $ \Phi ( u) $ is the standard normal distribution function.
Under the assumption that $ X $ and $ Y $ have a joint normal distribution with (ordinary) correlation coefficient $ \rho $,
$$ {\mathsf E} r _ {s} \sim \frac{6} \pi { \mathop{\rm arc} \sin } \frac \rho {2} $$
as $ n \rightarrow \infty $, and therefore the variable $ 2 \sin ( \pi r _ {s} / 6 ) $ can be used as an estimator for $ \rho $.
The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the Kendall coefficient of rank correlation are asymptotically equivalent (when $ n = 2 $, the corresponding rank statistics coincide).
References
[1] | C. Spearman, "The proof and measurement of association between two rings" Amer. J. Psychol. , 15 (1904) pp. 72–101 |
[2] | M.G. Kendall, "Rank correlation methods" , Griffin (1962) |
[3] | B.L. van der Waerden, "Mathematische Statistik" , Springer (1957) |
[4] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[a1] | J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) |
[a2] | M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973) |
Spearman coefficient of rank correlation. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_coefficient_of_rank_correlation&oldid=54862