Namespaces
Variants
Actions

Difference between revisions of "Kendall tau metric"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 97 formulas out of 99 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
Line 1: Line 1:
 +
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
 +
 +
Out of 99 formulas, 97 were replaced by TEX code.-->
 +
 +
{{TEX|semi-auto}}{{TEX|partial}}
 
''Kendall tau''
 
''Kendall tau''
  
 
The non-parametric [[Correlation coefficient|correlation coefficient]] (or measure of association) known as Kendall's tau was first discussed by G.T. Fechner and others about 1900, and was rediscovered (independently) by M.G. Kendall in 1938 [[#References|[a3]]], [[#References|[a4]]]. In modern use, the term  "correlation"  refers to a measure of a linear relationship between variates (such as the [[Pearson product-moment correlation coefficient|Pearson product-moment correlation coefficient]]), while  "measure of association"  refers to a measure of a monotone relationship between variates (such as Kendall's tau and the [[Spearman rho metric|Spearman rho metric]]). For a historical review of Kendall's tau and related coefficients, see [[#References|[a5]]].
 
The non-parametric [[Correlation coefficient|correlation coefficient]] (or measure of association) known as Kendall's tau was first discussed by G.T. Fechner and others about 1900, and was rediscovered (independently) by M.G. Kendall in 1938 [[#References|[a3]]], [[#References|[a4]]]. In modern use, the term  "correlation"  refers to a measure of a linear relationship between variates (such as the [[Pearson product-moment correlation coefficient|Pearson product-moment correlation coefficient]]), while  "measure of association"  refers to a measure of a monotone relationship between variates (such as Kendall's tau and the [[Spearman rho metric|Spearman rho metric]]). For a historical review of Kendall's tau and related coefficients, see [[#References|[a5]]].
  
Underlying the definition of Kendall's tau is the notion of concordance. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300201.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300202.png" /> are two elements of a [[Sample|sample]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300203.png" /> from a bivariate population, one says that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300204.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300205.png" /> are concordant if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300206.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300207.png" /> or if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300208.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k1300209.png" /> (i.e., if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002010.png" />); and discordant if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002011.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002012.png" /> or if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002013.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002014.png" /> (i.e., if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002015.png" />). There are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002016.png" /> distinct pairs of observations in the sample, and each pair (barring ties) is either concordant or discordant. Denoting by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002017.png" /> the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002018.png" /> of concordant pairs minus the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002019.png" /> of discordant pairs, Kendall's tau for the sample is defined as
+
Underlying the definition of Kendall's tau is the notion of concordance. If $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are two elements of a [[Sample|sample]] $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ from a bivariate population, one says that $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are concordant if $x _ { j } &lt; x _ { k }$ and $y _ { j } &lt; y _ { k }$ or if $x _ { j } &gt; x _ { k }$ and $y _ { j } &gt; y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) &gt; 0$); and discordant if $x _ { j } &lt; x _ { k }$ and $y _ { j } &gt; y _ { k }$ or if $x _ { j } &gt; x _ { k }$ and $y _ { j } &lt; y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) &lt; 0$). There are $\left( \begin{array} { l } { n } \\ { 2 } \end{array} \right)$ distinct pairs of observations in the sample, and each pair (barring ties) is either concordant or discordant. Denoting by $S$ the number $c$ of concordant pairs minus the number $d$ of discordant pairs, Kendall's tau for the sample is defined as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002020.png" /></td> </tr></table>
+
\begin{equation*} \tau _ { n } = \frac { c - d } { c + d } = \frac { S } { \left( \begin{array} { l } { n } \\ { 2 } \end{array} \right) } = \frac { 2 S } { n ( n - 1 ) } \end{equation*}
  
 
When ties exist in the data, the following adjusted formula is used:
 
When ties exist in the data, the following adjusted formula is used:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002021.png" /></td> </tr></table>
+
\begin{equation*} \tau _ { n } = \frac { S } { \sqrt { n ( n - 1 ) / 2 - T } \sqrt { n ( n - 1 ) / 2 - U } }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002022.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002023.png" /> the number of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002024.png" /> observations that are tied at a given rank, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002025.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002026.png" /> the number of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002027.png" /> observations that are tied at a given rank. For details on the use of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002028.png" /> in hypotheses testing, and for large-sample theory, see [[#References|[a2]]].
+
where $T = \sum _ { t } t ( t - 1 ) / 2$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u - 1 ) / 2$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $\tau _ { n }$ in hypotheses testing, and for large-sample theory, see [[#References|[a2]]].
  
Note that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002029.png" /> is equal to the probability of concordance minus the probability of discordance for a pair of observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002031.png" /> chosen randomly from the sample <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002032.png" />. The population version <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002033.png" /> of Kendall's tau is defined similarly for random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002034.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002035.png" /> (cf. also [[Random variable|Random variable]]). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002036.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002037.png" /> be independent random vectors with the same distribution as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002038.png" />. Then
+
Note that $\tau _ { n }$ is equal to the probability of concordance minus the probability of discordance for a pair of observations $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ chosen randomly from the sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. The population version $\tau$ of Kendall's tau is defined similarly for random variables $X$ and $Y$ (cf. also [[Random variable|Random variable]]). Let $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 2 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002039.png" /></td> </tr></table>
+
\begin{equation*} \tau = \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) &gt; 0 ] + \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002040.png" /></td> </tr></table>
+
\begin{equation*} - \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) &lt; 0 ] = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002041.png" /></td> </tr></table>
+
\begin{equation*} = \operatorname { corr } [ \operatorname { sign } ( X _ { 1 } - X _ { 2 } ) , \operatorname { sign } ( Y _ { 1 } - Y _ { 2 } ) ]. \end{equation*}
  
Since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002042.png" /> is the Pearson product-moment correlation coefficient of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002043.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002044.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002045.png" /> is sometimes called the difference sign correlation coefficient.
+
Since $\tau$ is the Pearson product-moment correlation coefficient of the random variables $\operatorname { sign } ( X _ { 1 } - X _ { 2 } )$ and $\operatorname { sign } ( Y _ { 1 } - Y _ { 2 } )$, $\tau$ is sometimes called the difference sign correlation coefficient.
  
When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002046.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002047.png" /> are continuous,
+
When $X$ and $Y$ are continuous,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002048.png" /></td> </tr></table>
+
\begin{equation*} \tau = 4 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } C _ { X , Y  } ( u , v ) d C _ { X , Y } ( u , v ) - 1, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002049.png" /> is the [[Copula|copula]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002050.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002051.png" />. Consequently, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002052.png" /> is invariant under strictly increasing transformations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002053.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002054.png" />, a property <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002055.png" /> shares with Spearman's rho, but not with the Pearson product-moment correlation coefficient. For a survey of copulas and their relationship with measures of association, see [[#References|[a6]]].
+
where $C _ { X , Y }$ is the [[Copula|copula]] of $X$ and $Y$. Consequently, $\tau$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\tau$ shares with Spearman's rho, but not with the Pearson product-moment correlation coefficient. For a survey of copulas and their relationship with measures of association, see [[#References|[a6]]].
  
Besides Kendall's tau, there are other measures of association based on the notion of concordance, one of which is Blomqvist's coefficient [[#References|[a1]]]. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002056.png" /> denote a sample from a continuous bivariate population, and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002057.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002058.png" /> denote sample medians (cf. also [[Median (in statistics)|Median (in statistics)]]). Divide the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002059.png" />-plane into four quadrants with the lines <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002060.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002061.png" />; and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002062.png" /> be the number of sample points belonging to the first or third quadrants, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002063.png" /> the number of points belonging to the second or fourth quadrants. If the sample size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002064.png" /> is even, the calculation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002065.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002066.png" /> is evident. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002067.png" /> is odd, then one or two of the sample points fall on the lines <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002068.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002069.png" />. In the first case one ignores the point; in the second case one assigns one point to the quadrant touched by both points and ignores the other. Then Blomqvist's <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002071.png" /> is defined as
+
Besides Kendall's tau, there are other measures of association based on the notion of concordance, one of which is Blomqvist's coefficient [[#References|[a1]]]. Let $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ denote a sample from a continuous bivariate population, and let $\tilde{x}$ and $\tilde{y}$ denote sample medians (cf. also [[Median (in statistics)|Median (in statistics)]]). Divide the $( x , y )$-plane into four quadrants with the lines $x = \tilde { x }$ and $y = \tilde { y }$; and let $n_ 1$ be the number of sample points belonging to the first or third quadrants, and $n_{2}$ the number of points belonging to the second or fourth quadrants. If the sample size $n$ is even, the calculation of $n_ 1$ and $n_{2}$ is evident. If $n$ is odd, then one or two of the sample points fall on the lines $x = \tilde { x }$ and $y = \tilde { y }$. In the first case one ignores the point; in the second case one assigns one point to the quadrant touched by both points and ignores the other. Then Blomqvist's $q$ is defined as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002072.png" /></td> </tr></table>
+
\begin{equation*} q = \frac { n_1 - n_2 } { n_1 + n_2 }. \end{equation*}
  
For details on the use of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002073.png" /> in hypothesis testing, and for large-sample theory, see [[#References|[a1]]].
+
For details on the use of $q$ in hypothesis testing, and for large-sample theory, see [[#References|[a1]]].
  
The population parameter estimated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002074.png" />, denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002075.png" />, is defined analogously to Kendall's tau (cf. Kendall tau metric). Denoting by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002076.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002077.png" /> the population medians of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002078.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002079.png" />, then
+
The population parameter estimated by $q$, denoted by $\beta$, is defined analogously to Kendall's tau (cf. Kendall tau metric). Denoting by $\tilde{X}$ and $\tilde{Y}$ the population medians of $X$ and $Y$, then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002080.png" /></td> </tr></table>
+
\begin{equation*} \beta = \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) &gt; 0 ] + \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002081.png" /></td> </tr></table>
+
\begin{equation*} - \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) &lt; 0 ] = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002082.png" /></td> </tr></table>
+
<table class="eq" style="width:100%;"> <tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002082.png"/></td> </tr></table>
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002083.png" /> denotes the joint distribution function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002084.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002085.png" />. Since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002086.png" /> depends only on the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002087.png" /> at the point whose coordinates are the population medians of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002088.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002089.png" />, it is sometimes called the medial correlation coefficient. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002090.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002091.png" /> are continuous,
+
where $F_{ X , Y}$ denotes the joint distribution function of $X$ and $Y$. Since $\beta$ depends only on the value of $F_{ X , Y}$ at the point whose coordinates are the population medians of $X$ and $Y$, it is sometimes called the medial correlation coefficient. When $X$ and $Y$ are continuous,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002092.png" /></td> </tr></table>
+
\begin{equation*} \beta = 4 C _ { X , Y } \left( \frac { 1 } { 2 } , \frac { 1 } { 2 } \right) - 1, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002093.png" /> again denotes the copula of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002094.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002095.png" />. Thus <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002096.png" />, like <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002097.png" />, is invariant under strictly increasing transformations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002098.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k130/k130020/k13002099.png" />.
+
where $C _ { X , Y }$ again denotes the copula of $X$ and $Y$. Thus $\beta$, like $\tau$, is invariant under strictly increasing transformations of $X$ and $Y$.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  N. Blomqvist,  "On a measure of dependence between two random variables"  ''Ann. Math. Stat.'' , '''21'''  (1950)  pp. 503–600</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  J.D. Gibbons,  "Nonparametric methods for quantitative analysis" , Holt, Rinehart &amp; Winston  (1976)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  M.G. Kendall,  "A new measure of rank correlation"  ''Biometrika'' , '''30'''  (1938)  pp. 81–93</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  M.G. Kendall,  "Rank correlation methods" , Charles Griffin  (1970)  (Edition: Fourth)</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  W.H. Kruskal,  "Ordinal measures of association"  ''J. Amer. Statist. Assoc.'' , '''53'''  (1958)  pp. 814–861</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  R.B. Nelsen,  "An introduction to copulas" , Springer  (1999)</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  N. Blomqvist,  "On a measure of dependence between two random variables"  ''Ann. Math. Stat.'' , '''21'''  (1950)  pp. 503–600</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  J.D. Gibbons,  "Nonparametric methods for quantitative analysis" , Holt, Rinehart &amp; Winston  (1976)</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  M.G. Kendall,  "A new measure of rank correlation"  ''Biometrika'' , '''30'''  (1938)  pp. 81–93</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  M.G. Kendall,  "Rank correlation methods" , Charles Griffin  (1970)  (Edition: Fourth)</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  W.H. Kruskal,  "Ordinal measures of association"  ''J. Amer. Statist. Assoc.'' , '''53'''  (1958)  pp. 814–861</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  R.B. Nelsen,  "An introduction to copulas" , Springer  (1999)</td></tr></table>

Revision as of 17:01, 1 July 2020

Kendall tau

The non-parametric correlation coefficient (or measure of association) known as Kendall's tau was first discussed by G.T. Fechner and others about 1900, and was rediscovered (independently) by M.G. Kendall in 1938 [a3], [a4]. In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as Kendall's tau and the Spearman rho metric). For a historical review of Kendall's tau and related coefficients, see [a5].

Underlying the definition of Kendall's tau is the notion of concordance. If $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are two elements of a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ from a bivariate population, one says that $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are concordant if $x _ { j } < x _ { k }$ and $y _ { j } < y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } > y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) > 0$); and discordant if $x _ { j } < x _ { k }$ and $y _ { j } > y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } < y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) < 0$). There are $\left( \begin{array} { l } { n } \\ { 2 } \end{array} \right)$ distinct pairs of observations in the sample, and each pair (barring ties) is either concordant or discordant. Denoting by $S$ the number $c$ of concordant pairs minus the number $d$ of discordant pairs, Kendall's tau for the sample is defined as

\begin{equation*} \tau _ { n } = \frac { c - d } { c + d } = \frac { S } { \left( \begin{array} { l } { n } \\ { 2 } \end{array} \right) } = \frac { 2 S } { n ( n - 1 ) } \end{equation*}

When ties exist in the data, the following adjusted formula is used:

\begin{equation*} \tau _ { n } = \frac { S } { \sqrt { n ( n - 1 ) / 2 - T } \sqrt { n ( n - 1 ) / 2 - U } }, \end{equation*}

where $T = \sum _ { t } t ( t - 1 ) / 2$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u - 1 ) / 2$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $\tau _ { n }$ in hypotheses testing, and for large-sample theory, see [a2].

Note that $\tau _ { n }$ is equal to the probability of concordance minus the probability of discordance for a pair of observations $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ chosen randomly from the sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. The population version $\tau$ of Kendall's tau is defined similarly for random variables $X$ and $Y$ (cf. also Random variable). Let $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 2 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then

\begin{equation*} \tau = \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) > 0 ] + \end{equation*}

\begin{equation*} - \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) < 0 ] = \end{equation*}

\begin{equation*} = \operatorname { corr } [ \operatorname { sign } ( X _ { 1 } - X _ { 2 } ) , \operatorname { sign } ( Y _ { 1 } - Y _ { 2 } ) ]. \end{equation*}

Since $\tau$ is the Pearson product-moment correlation coefficient of the random variables $\operatorname { sign } ( X _ { 1 } - X _ { 2 } )$ and $\operatorname { sign } ( Y _ { 1 } - Y _ { 2 } )$, $\tau$ is sometimes called the difference sign correlation coefficient.

When $X$ and $Y$ are continuous,

\begin{equation*} \tau = 4 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , v ) d C _ { X , Y } ( u , v ) - 1, \end{equation*}

where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\tau$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\tau$ shares with Spearman's rho, but not with the Pearson product-moment correlation coefficient. For a survey of copulas and their relationship with measures of association, see [a6].

Besides Kendall's tau, there are other measures of association based on the notion of concordance, one of which is Blomqvist's coefficient [a1]. Let $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ denote a sample from a continuous bivariate population, and let $\tilde{x}$ and $\tilde{y}$ denote sample medians (cf. also Median (in statistics)). Divide the $( x , y )$-plane into four quadrants with the lines $x = \tilde { x }$ and $y = \tilde { y }$; and let $n_ 1$ be the number of sample points belonging to the first or third quadrants, and $n_{2}$ the number of points belonging to the second or fourth quadrants. If the sample size $n$ is even, the calculation of $n_ 1$ and $n_{2}$ is evident. If $n$ is odd, then one or two of the sample points fall on the lines $x = \tilde { x }$ and $y = \tilde { y }$. In the first case one ignores the point; in the second case one assigns one point to the quadrant touched by both points and ignores the other. Then Blomqvist's $q$ is defined as

\begin{equation*} q = \frac { n_1 - n_2 } { n_1 + n_2 }. \end{equation*}

For details on the use of $q$ in hypothesis testing, and for large-sample theory, see [a1].

The population parameter estimated by $q$, denoted by $\beta$, is defined analogously to Kendall's tau (cf. Kendall tau metric). Denoting by $\tilde{X}$ and $\tilde{Y}$ the population medians of $X$ and $Y$, then

\begin{equation*} \beta = \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) > 0 ] + \end{equation*}

\begin{equation*} - \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) < 0 ] = \end{equation*}

where $F_{ X , Y}$ denotes the joint distribution function of $X$ and $Y$. Since $\beta$ depends only on the value of $F_{ X , Y}$ at the point whose coordinates are the population medians of $X$ and $Y$, it is sometimes called the medial correlation coefficient. When $X$ and $Y$ are continuous,

\begin{equation*} \beta = 4 C _ { X , Y } \left( \frac { 1 } { 2 } , \frac { 1 } { 2 } \right) - 1, \end{equation*}

where $C _ { X , Y }$ again denotes the copula of $X$ and $Y$. Thus $\beta$, like $\tau$, is invariant under strictly increasing transformations of $X$ and $Y$.

References

[a1] N. Blomqvist, "On a measure of dependence between two random variables" Ann. Math. Stat. , 21 (1950) pp. 503–600
[a2] J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976)
[a3] M.G. Kendall, "A new measure of rank correlation" Biometrika , 30 (1938) pp. 81–93
[a4] M.G. Kendall, "Rank correlation methods" , Charles Griffin (1970) (Edition: Fourth)
[a5] W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861
[a6] R.B. Nelsen, "An introduction to copulas" , Springer (1999)
How to Cite This Entry:
Kendall tau metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kendall_tau_metric&oldid=50407
This article was adapted from an original article by R.B. Nelsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article