Difference between revisions of "Spearman coefficient of rank correlation"
(Importing text file) |
(latex details) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | s0862501.png | ||
+ | $#A+1 = 46 n = 0 | ||
+ | $#C+1 = 46 : ~/encyclopedia/old_files/data/S086/S.0806250 Spearman coefficient of rank correlation | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | A measure of the dependence of two random variables $ X $ | |
+ | and $ Y $, | ||
+ | based on the rankings of the $ X _ {i} $' | ||
+ | s and $ Y _ {i} $' | ||
+ | s in independent pairs of observations $ ( X _ {1} , Y _ {1} ) \dots ( X _ {n} , Y _ {n} ) $. | ||
+ | If $ R _ {i} $ | ||
+ | is the [[Rank|rank]] of $ Y $ | ||
+ | corresponding to that pair $ ( X , Y ) $ | ||
+ | for which the rank of $ X $ | ||
+ | is equal to $ i $, | ||
+ | then the Spearman coefficient of rank correlation is defined by the formula | ||
− | + | $$ | |
+ | r _ {s} = | ||
+ | \frac{12}{n ( n ^ {2} - 1 ) } | ||
− | + | \sum _ { i=1}^ { n } | |
+ | \left ( i - n+ | ||
+ | \frac{1}{2} | ||
+ | \right ) | ||
+ | \left ( R _ {i} - n+ | ||
+ | \frac{1}{2} | ||
+ | \right ) | ||
+ | $$ | ||
− | + | or, equivalently, by | |
− | |||
− | |||
− | + | $$ | |
+ | r _ {s} = 1 - | ||
+ | \frac{6 }{n ( n ^ {2} - 1 ) } | ||
+ | \sum _ {i=1} ^ { n } d _ {i} ^ {2} , | ||
+ | $$ | ||
− | The | + | where $ d _ {i} $ |
+ | is the difference between the ranks of $ X _ {i} $ | ||
+ | and $ Y _ {i} $. | ||
+ | The value of $ r _ {s} $ | ||
+ | lies between $ - 1 $ | ||
+ | and $ + 1 $; | ||
+ | $ r _ {s} = + 1 $ | ||
+ | when the rank sequences completely coincide, i.e. $ i = R _ {i} $, | ||
+ | $ i = 1 \dots n $; | ||
+ | and $ r _ {s} = - 1 $ | ||
+ | when the rank sequences are completely opposite, i.e. $ i = ( n + 1 ) - R _ {i} $, | ||
+ | $ i = 1 \dots n $. | ||
+ | This coefficient, like any other [[rank statistic]], is applied to test the hypothesis of independence of two variables. If the variables are independent, then $ {\mathsf E} r _ {s} = 0 $, | ||
+ | and $ {\mathsf D} r _ {s} = 1 / ( n - 1 ) $. | ||
+ | Thus, the amount of deviation of $ r _ {s} $ | ||
+ | from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of $ r _ {s} $ | ||
+ | for independent variables $ X $ | ||
+ | and $ Y $. | ||
+ | When $ 4 \leq n \leq 10 $ | ||
+ | one can use tables of the exact distribution (see [[#References|[2]]], [[#References|[4]]]), and when $ n > 10 $ | ||
+ | one can take advantage, for example, of the fact that as $ n \rightarrow \infty $ | ||
+ | the random variable $ \sqrt n- 1 r _ {s} $ | ||
+ | is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if $ | r _ {s} | > u _ {1 - \alpha / 2 } / \sqrt n- 1 $, | ||
+ | where $ u _ {1 - \alpha / 2 } $ | ||
+ | is the root of the equation $ \Phi ( u) = 1 - \alpha / 2 $ | ||
+ | and $ \Phi ( u) $ | ||
+ | is the standard [[normal distribution]] function. | ||
− | + | Under the assumption that $ X $ | |
− | + | and $ Y $ | |
+ | have a joint normal distribution with (ordinary) [[correlation coefficient]] $ \rho $, | ||
+ | $$ | ||
+ | {\mathsf E} r _ {s} \sim | ||
+ | \frac{6} \pi | ||
+ | { \mathop{\rm arc} \sin } | ||
+ | \frac \rho {2} | ||
+ | $$ | ||
− | + | as $ n \rightarrow \infty $, | |
+ | and therefore the variable $ 2 \sin ( \pi r _ {s} / 6 ) $ | ||
+ | can be used as an estimator for $ \rho $. | ||
+ | The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the [[Kendall coefficient of rank correlation|Kendall coefficient of rank correlation]] are asymptotically equivalent (when $ n = 2 $, | ||
+ | the corresponding rank statistics coincide). | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973)</TD></TR></table> | + | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> C. Spearman, "The proof and measurement of association between two rings" ''Amer. J. Psychol.'' , '''15''' (1904) pp. 72–101</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> M.G. Kendall, "Rank correlation methods" , Griffin (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR> |
+ | <TR><TD valign="top">[a1]</TD> <TD valign="top"> J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973)</TD></TR></table> |
Latest revision as of 09:15, 6 January 2024
A measure of the dependence of two random variables $ X $
and $ Y $,
based on the rankings of the $ X _ {i} $'
s and $ Y _ {i} $'
s in independent pairs of observations $ ( X _ {1} , Y _ {1} ) \dots ( X _ {n} , Y _ {n} ) $.
If $ R _ {i} $
is the rank of $ Y $
corresponding to that pair $ ( X , Y ) $
for which the rank of $ X $
is equal to $ i $,
then the Spearman coefficient of rank correlation is defined by the formula
$$ r _ {s} = \frac{12}{n ( n ^ {2} - 1 ) } \sum _ { i=1}^ { n } \left ( i - n+ \frac{1}{2} \right ) \left ( R _ {i} - n+ \frac{1}{2} \right ) $$
or, equivalently, by
$$ r _ {s} = 1 - \frac{6 }{n ( n ^ {2} - 1 ) } \sum _ {i=1} ^ { n } d _ {i} ^ {2} , $$
where $ d _ {i} $ is the difference between the ranks of $ X _ {i} $ and $ Y _ {i} $. The value of $ r _ {s} $ lies between $ - 1 $ and $ + 1 $; $ r _ {s} = + 1 $ when the rank sequences completely coincide, i.e. $ i = R _ {i} $, $ i = 1 \dots n $; and $ r _ {s} = - 1 $ when the rank sequences are completely opposite, i.e. $ i = ( n + 1 ) - R _ {i} $, $ i = 1 \dots n $. This coefficient, like any other rank statistic, is applied to test the hypothesis of independence of two variables. If the variables are independent, then $ {\mathsf E} r _ {s} = 0 $, and $ {\mathsf D} r _ {s} = 1 / ( n - 1 ) $. Thus, the amount of deviation of $ r _ {s} $ from zero gives information about the dependence or independence of the variables. To construct the corresponding test one computes the distribution of $ r _ {s} $ for independent variables $ X $ and $ Y $. When $ 4 \leq n \leq 10 $ one can use tables of the exact distribution (see [2], [4]), and when $ n > 10 $ one can take advantage, for example, of the fact that as $ n \rightarrow \infty $ the random variable $ \sqrt n- 1 r _ {s} $ is asymptotically distributed as a standard normal distribution. In the latter case the hypothesis of independence is rejected if $ | r _ {s} | > u _ {1 - \alpha / 2 } / \sqrt n- 1 $, where $ u _ {1 - \alpha / 2 } $ is the root of the equation $ \Phi ( u) = 1 - \alpha / 2 $ and $ \Phi ( u) $ is the standard normal distribution function.
Under the assumption that $ X $ and $ Y $ have a joint normal distribution with (ordinary) correlation coefficient $ \rho $,
$$ {\mathsf E} r _ {s} \sim \frac{6} \pi { \mathop{\rm arc} \sin } \frac \rho {2} $$
as $ n \rightarrow \infty $, and therefore the variable $ 2 \sin ( \pi r _ {s} / 6 ) $ can be used as an estimator for $ \rho $.
The Spearman coefficient of rank correlation was named in honour of the psychologist C. Spearman (1904), who used it in research on psychology in place of the ordinary correlation coefficient. The tests based on the Spearman coefficient of rank correlation and on the Kendall coefficient of rank correlation are asymptotically equivalent (when $ n = 2 $, the corresponding rank statistics coincide).
References
[1] | C. Spearman, "The proof and measurement of association between two rings" Amer. J. Psychol. , 15 (1904) pp. 72–101 |
[2] | M.G. Kendall, "Rank correlation methods" , Griffin (1962) |
[3] | B.L. van der Waerden, "Mathematische Statistik" , Springer (1957) |
[4] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[a1] | J. Hájek, Z. Sidák, "Theory of rank tests" , Acad. Press (1967) |
[a2] | M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973) |
Spearman coefficient of rank correlation. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_coefficient_of_rank_correlation&oldid=15078