Spearman rho metric
Spearman rho
The non-parametric correlation coefficient (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [a4] as a coefficient of correlation on ranks (cf. also Correlation coefficient; Rank statistic). In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as the Kendall tau metric and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [a2].
Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields
\begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*}
\begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*}
where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used:
\begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*}
where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [a1].
If $X$ and $Y$ are random variables (cf. Random variable) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$:
\begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*}
\begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*}
Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the "grades" of $X$ and $Y$.
Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) > 0$); and discordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) < 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then
\begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) > 0 ] + \end{equation*}
\begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) < 0 ], \end{equation*}
that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$).
When $X$ and $Y$ are continuous,
\begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*}
\begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*}
where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the "product" copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [a3].
Spearman [a5] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences:
\begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ {2} - 1 }. \end{equation*}
The population parameter $\phi_S$ estimated by $f _ { S }$ is given by
\begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*}
\begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*}
References
[a1] | J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976) |
[a2] | W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861 |
[a3] | R.B. Nelsen, "An introduction to copulas" , Springer (1999) |
[a4] | C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101 |
[a5] | C. Spearman, "A footrule for measuring correlation" Brit. J. Psychol. , 2 (1906) pp. 89–108 |
Spearman rho metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_rho_metric&oldid=50006