Namespaces
Variants
Actions

Difference between revisions of "Spearman rho metric"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 79 formulas out of 79 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
Line 1: Line 1:
 +
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
 +
 +
Out of 79 formulas, 79 were replaced by TEX code.-->
 +
 +
{{TEX|semi-auto}}{{TEX|done}}
 
''Spearman rho''
 
''Spearman rho''
  
 
The non-parametric [[Correlation coefficient|correlation coefficient]] (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [[#References|[a4]]] as a coefficient of correlation on ranks (cf. also [[Correlation coefficient|Correlation coefficient]]; [[Rank statistic|Rank statistic]]). In modern use, the term  "correlation"  refers to a measure of a linear relationship between variates (such as the [[Pearson product-moment correlation coefficient|Pearson product-moment correlation coefficient]]), while  "measure of association"  refers to a measure of a monotone relationship between variates (such as the [[Kendall tau metric|Kendall tau metric]] and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [[#References|[a2]]].
 
The non-parametric [[Correlation coefficient|correlation coefficient]] (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [[#References|[a4]]] as a coefficient of correlation on ranks (cf. also [[Correlation coefficient|Correlation coefficient]]; [[Rank statistic|Rank statistic]]). In modern use, the term  "correlation"  refers to a measure of a linear relationship between variates (such as the [[Pearson product-moment correlation coefficient|Pearson product-moment correlation coefficient]]), while  "measure of association"  refers to a measure of a monotone relationship between variates (such as the [[Kendall tau metric|Kendall tau metric]] and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [[#References|[a2]]].
  
Spearman's rho, denoted <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304501.png" />, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304502.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304503.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304504.png" />; then computing the sample (Pearson) correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304505.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304506.png" /> yields
+
Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304507.png" /></td> </tr></table>
+
\begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304508.png" /></td> </tr></table>
+
\begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s1304509.png" />. When ties exist in the data, the following adjusted formula for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045010.png" /> is used:
+
where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045011.png" /></td> </tr></table>
+
\begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045012.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045013.png" /> the number of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045014.png" /> observations that are tied at a given rank, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045015.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045016.png" /> the number of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045017.png" /> observations that are tied at a given rank. For details on the use of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045018.png" /> in hypothesis testing, and for large-sample theory, see [[#References|[a1]]].
+
where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [[#References|[a1]]].
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045019.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045020.png" /> are random variables (cf. [[Random variable|Random variable]]) with respective distribution functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045021.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045022.png" />, then the population parameter estimated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045023.png" />, usually denoted <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045024.png" />, is defined to be the Pearson product-moment correlation coefficient of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045025.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045026.png" />:
+
If $X$ and $Y$ are random variables (cf. [[Random variable|Random variable]]) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045027.png" /></td> </tr></table>
+
\begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045028.png" /></td> </tr></table>
+
\begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*}
  
Spearman's <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045029.png" /> is occasionally referred to as the grade correlation coefficient, since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045031.png" /> are sometimes called the  "grades"  of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045032.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045033.png" />.
+
Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the  "grades"  of $X$ and $Y$.
  
Like Kendall's tau, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045034.png" /> is a measure of association based on the notion of concordance. One says that two pairs <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045036.png" /> of real numbers are concordant if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045037.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045038.png" /> or if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045039.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045040.png" /> (i.e., if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045041.png" />); and discordant if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045042.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045043.png" /> or if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045044.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045045.png" /> (i.e., if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045046.png" />). Now, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045047.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045049.png" /> be independent random vectors with the same distribution as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045050.png" />. Then
+
Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } &lt; x _ { 2 }$ and $y _ { 1 } &lt; y _ { 2 }$ or if $x _ { 1 } &gt; x _ { 2 }$ and $y _ { 1 } &gt; y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) &gt; 0$); and discordant if $x _ { 1 } &lt; x _ { 2 }$ and $y _ { 1 } &gt; y _ { 2 }$ or if $x _ { 1 } &gt; x _ { 2 }$ and $y _ { 1 } &lt; y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) &lt; 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045051.png" /></td> </tr></table>
+
\begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) &gt; 0 ] + \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045052.png" /></td> </tr></table>
+
\begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) &lt; 0 ], \end{equation*}
  
that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045053.png" /> is proportional to the difference between the probabilities of concordance and discordance between the random vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045054.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045055.png" /> (clearly, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045056.png" /> can be replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045057.png" />).
+
that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$).
  
When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045058.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045059.png" /> are continuous,
+
When $X$ and $Y$ are continuous,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045060.png" /></td> </tr></table>
+
\begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045061.png" /></td> </tr></table>
+
\begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045062.png" /> is the [[Copula|copula]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045063.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045064.png" />. Consequently, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045065.png" /> is invariant under strictly increasing transformations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045066.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045067.png" />, a property <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045068.png" /> shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045069.png" /> is proportional to the signed volume between the graphs of the copula <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045070.png" /> and the  "product"  copula <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045071.png" />, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [[#References|[a3]]].
+
where $C _ { X , Y }$ is the [[Copula|copula]] of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the  "product"  copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [[#References|[a3]]].
  
Spearman [[#References|[a5]]] also proposed an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045072.png" /> version of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045073.png" />, known as Spearman's footrule, based on absolute differences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045074.png" /> in ranks rather than squared differences:
+
Spearman [[#References|[a5]]] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045075.png" /></td> </tr></table>
+
\begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ { 2 } - 1 }. \end{equation*}
  
The population parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045076.png" /> estimated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045077.png" /> is given by
+
The population parameter $\phi_S$ estimated by $f _ { S }$ is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045078.png" /></td> </tr></table>
+
\begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s130/s130450/s13045079.png" /></td> </tr></table>
+
\begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*}
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J.D. Gibbons,  "Nonparametric methods for quantitative analysis" , Holt, Rinehart &amp; Winston  (1976)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  W.H. Kruskal,  "Ordinal measures of association"  ''J. Amer. Statist. Assoc.'' , '''53'''  (1958)  pp. 814–861</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R.B. Nelsen,  "An introduction to copulas" , Springer  (1999)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  C. Spearman,  "The proof and measurement of association between two things"  ''Amer. J. Psychol.'' , '''15'''  (1904)  pp. 72–101</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  C. Spearman,  "A footrule for measuring correlation"  ''Brit. J. Psychol.'' , '''2'''  (1906)  pp. 89–108</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  J.D. Gibbons,  "Nonparametric methods for quantitative analysis" , Holt, Rinehart &amp; Winston  (1976)</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  W.H. Kruskal,  "Ordinal measures of association"  ''J. Amer. Statist. Assoc.'' , '''53'''  (1958)  pp. 814–861</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  R.B. Nelsen,  "An introduction to copulas" , Springer  (1999)</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  C. Spearman,  "The proof and measurement of association between two things"  ''Amer. J. Psychol.'' , '''15'''  (1904)  pp. 72–101</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  C. Spearman,  "A footrule for measuring correlation"  ''Brit. J. Psychol.'' , '''2'''  (1906)  pp. 89–108</td></tr></table>

Revision as of 16:46, 1 July 2020

Spearman rho

The non-parametric correlation coefficient (or measure of association) known as Spearman's rho was first discussed by the psychologist C. Spearman in 1904 [a4] as a coefficient of correlation on ranks (cf. also Correlation coefficient; Rank statistic). In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as the Kendall tau metric and Spearman's rho). For an historical review of Spearman's rho and related coefficients, see [a2].

Spearman's rho, denoted $r _ { S }$, is computed by applying the Pearson product-moment correlation coefficient procedure to the ranks associated with a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. Let $R _ { i } = \operatorname { rank } ( x _ { i } )$ and $S _ { i } = \operatorname { rank } ( y _ { i } )$; then computing the sample (Pearson) correlation coefficient $r$ for $\{ ( R _ { i } , S _ { i } ) \} _ { i = 1 } ^ { n }$ yields

\begin{equation*} r _{S} = \frac { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ( S _ { i } - \overline{S} ) } { \sqrt { \sum _ { i = 1 } ^ { n } ( R _ { i } - \overline { R } ) ^ { 2 }\cdot \sum _ { i = 1 } ^ { n } ( S _ { i } - \overline { S } ) ^ { 2 } } } = \end{equation*}

\begin{equation*} = 1 - \frac { 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } } { n ( n ^ { 2 } - 1 ) }, \end{equation*}

where $\overline { R } = \sum _ { i = 1 } ^ { n } R _ { i } / n = ( n + 1 ) / 2 = \sum _ { i = 1 } ^ { n } S _ { i } / n = \overline { S }$. When ties exist in the data, the following adjusted formula for $r _ { S }$ is used:

\begin{equation*} r_{S} = \frac { n ( n ^ { 2 } - 1 ) - 6 \sum _ { i = 1 } ^ { n } ( R _ { i } - S _ { i } ) ^ { 2 } - 6 ( T + U ) } { \sqrt { n ( n ^ { 2 } - 1 ) - 12 T } \sqrt { n ( n ^ { 2 } - 1 ) - 12 U } }, \end{equation*}

where $T = \sum _ { t } t ( t ^ { 2 } - 1 ) / 12$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u ^ { 2 } - 1 ) / 12$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $r _ { S }$ in hypothesis testing, and for large-sample theory, see [a1].

If $X$ and $Y$ are random variables (cf. Random variable) with respective distribution functions $F _ { X }$ and $F _{Y}$, then the population parameter estimated by $r _ { S }$, usually denoted $\rho_{ S}$, is defined to be the Pearson product-moment correlation coefficient of the random variables $F _ { X } ( X )$ and $F _ { Y } ( Y )$:

\begin{equation*} \rho _ { S } = \operatorname { corr } [ F _ { X } ( X ) , F _ { Y } ( Y ) ] = \end{equation*}

\begin{equation*} = 12 \mathsf{E} [ F_{ X} ( X ) F _ { Y } ( Y ) ] - 3. \end{equation*}

Spearman's $\rho_{ S}$ is occasionally referred to as the grade correlation coefficient, since $F _ { X } ( X )$ and $F _ { Y } ( Y )$ are sometimes called the "grades" of $X$ and $Y$.

Like Kendall's tau, $\rho_{ S}$ is a measure of association based on the notion of concordance. One says that two pairs $( x _ { 1 } , y _ { 1 } )$ and $( x _ { 2 } , y _ { 2 } )$ of real numbers are concordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) > 0$); and discordant if $x _ { 1 } < x _ { 2 }$ and $y _ { 1 } > y _ { 2 }$ or if $x _ { 1 } > x _ { 2 }$ and $y _ { 1 } < y _ { 2 }$ (i.e., if $( x _ { 1 } - x _ { 2 } ) ( y _ { 1 } - y _ { 2 } ) < 0$). Now, let $( X _ { 1 } , Y _ { 1 } )$, $( X _ { 2 } , Y _ { 2 } )$ and $( X _ { 3 } , Y _ { 3 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then

\begin{equation*} \rho _ { S } = 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) > 0 ] + \end{equation*}

\begin{equation*} - 3 \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 3 } ) < 0 ], \end{equation*}

that is, $\rho_{ S}$ is proportional to the difference between the probabilities of concordance and discordance between the random vectors $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 3 } )$ (clearly, $( X _ { 2 } , Y _ { 3 } )$ can be replaced by $( X _ { 3 } , Y _ { 2 } )$).

When $X$ and $Y$ are continuous,

\begin{equation*} \rho _ { S } = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } u v d C _ { X , Y } ( u , v ) - 3 = \end{equation*}

\begin{equation*} = 12 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } [ C _ { X , Y } ( u , v ) - u v ] d u d v, \end{equation*}

where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\rho_{ S}$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\rho_{ S}$ shares with Kendall's tau but not with the Pearson product-moment correlation coefficient. Note that $\rho_{ S}$ is proportional to the signed volume between the graphs of the copula $C _ { X , Y } ( u , v )$ and the "product" copula $\Pi ( u , v ) = u v$, the copula of independent random variables. For a survey of copulas and their relationship with measures of association, see [a3].

Spearman [a5] also proposed an $L_1$ version of $r _ { S }$, known as Spearman's footrule, based on absolute differences $| R _ { i } - S _ { i } |$ in ranks rather than squared differences:

\begin{equation*} f _ { S } = 1 - \frac { 3 \sum _ { i = 1 } ^ { n } | R _ { i } - S _ { i } | } { n ^ { 2 } - 1 }. \end{equation*}

The population parameter $\phi_S$ estimated by $f _ { S }$ is given by

\begin{equation*} \phi _ { S } = 1 - 3 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } | u - v | d C _ { X , Y } \gamma ( u , v ) = \end{equation*}

\begin{equation*} = 6 \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , u ) d u - 2. \end{equation*}

References

[a1] J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976)
[a2] W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861
[a3] R.B. Nelsen, "An introduction to copulas" , Springer (1999)
[a4] C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101
[a5] C. Spearman, "A footrule for measuring correlation" Brit. J. Psychol. , 2 (1906) pp. 89–108
How to Cite This Entry:
Spearman rho metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Spearman_rho_metric&oldid=50006
This article was adapted from an original article by R.B. Nelsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article