Difference between revisions of "Spearman rho metric"
(Importing text file) |
(latex details) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | <!--This article has been texified automatically. Since there was no Nroff source code for this article, | ||
+ | the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist | ||
+ | was used. | ||
+ | If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category. | ||
+ | |||
+ | Out of 79 formulas, 79 were replaced by TEX code.--> | ||
+ | |||
+ | {{TEX|semi-auto}}{{TEX|done}} | ||
''Spearman rho'' | ''Spearman rho'' | ||
− | The non-parametric [[ | + | The non-parametric [[correlation coefficient]] (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [[#References|[a4]]] as a coefficient of correlation on ranks (cf. also [[Correlation coefficient|Correlation coefficient]]; [[Rank statistic|Rank statistic]]). In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the [[Pearson product-moment correlation coefficient|Pearson product-moment correlation coefficient]]), while "measure of association" refers to a measure of a monotone relationship between variates (such as the [[Kendall tau metric|Kendall tau metric]] and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [[#References|[a2]]]. |
− | Spearman's rho, denoted | + | Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields |
− | + | \begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*} | |
− | + | \begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*} | |
− | where | + | where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used: |
− | + | \begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*} | |
− | where | + | where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [[#References|[a1]]]. |
− | If | + | If $X$ and $Y$ are random variables (cf. [[Random variable|Random variable]]) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$: |
− | + | \begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*} | |
− | + | \begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*} | |
− | Spearman's | + | Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the "grades" of $X$ and $Y$. |
− | Like Kendall's tau, | + | Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) > 0$); and discordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) < 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then |
− | + | \begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) > 0 ] + \end{equation*} | |
− | + | \begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) < 0 ], \end{equation*} | |
− | that is, | + | that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$). |
− | When | + | When $X$ and $Y$ are continuous, |
− | + | \begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*} | |
− | + | \begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*} | |
− | where | + | where $C _ { X , Y }$ is the [[Copula|copula]] of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the "product" copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [[#References|[a3]]]. |
− | Spearman [[#References|[a5]]] also proposed an | + | Spearman [[#References|[a5]]] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences: |
− | + | \begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ {2} - 1 }. \end{equation*} | |
− | The population parameter | + | The population parameter $\phi_S$ estimated by $f _ { S }$ is given by |
− | + | \begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*} | |
− | + | \begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*} | |
====References==== | ====References==== | ||
− | <table>< | + | <table> |
+ | <tr><td valign="top">[a1]</td> <td valign="top"> J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976)</td></tr><tr><td valign="top">[a2]</td> <td valign="top"> W.H. Kruskal, "Ordinal measures of association" ''J. Amer. Statist. Assoc.'' , '''53''' (1958) pp. 814–861</td></tr><tr><td valign="top">[a3]</td> <td valign="top"> R.B. Nelsen, "An introduction to copulas" , Springer (1999)</td></tr><tr><td valign="top">[a4]</td> <td valign="top"> C. Spearman, "The proof and measurement of association between two things" ''Amer. J. Psychol.'' , '''15''' (1904) pp. 72–101</td></tr><tr><td valign="top">[a5]</td> <td valign="top"> C. Spearman, "A footrule for measuring correlation" ''Brit. J. Psychol.'' , '''2''' (1906) pp. 89–108</td></tr> | ||
+ | </table> |
Latest revision as of 18:56, 22 January 2024
Spearman rho
The non-parametric correlation coefficient (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [a4] as a coefficient of correlation on ranks (cf. also Correlation coefficient; Rank statistic). In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as the Kendall tau metric and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [a2].
Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields
\begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*}
\begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*}
where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used:
\begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*}
where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [a1].
If $X$ and $Y$ are random variables (cf. Random variable) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$:
\begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*}
\begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*}
Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the "grades" of $X$ and $Y$.
Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) > 0$); and discordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) < 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then
\begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) > 0 ] + \end{equation*}
\begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) < 0 ], \end{equation*}
that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$).
When $X$ and $Y$ are continuous,
\begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*}
\begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*}
where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the "product" copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [a3].
Spearman [a5] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences:
\begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ {2} - 1 }. \end{equation*}
The population parameter $\phi_S$ estimated by $f _ { S }$ is given by
\begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*}
\begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*}
References
[a1] | J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976) |
[a2] | W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861 |
[a3] | R.B. Nelsen, "An introduction to copulas" , Springer (1999) |
[a4] | C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101 |
[a5] | C. Spearman, "A footrule for measuring correlation" Brit. J. Psychol. , 2 (1906) pp. 89–108 |
Spearman rho metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_rho_metric&oldid=15466