Kendall tau metric
Kendall tau
The non-parametric correlation coefficient (or measure of association) known as Kendall's tau was first discussed by G.T. Fechner and others about 1900, and was rediscovered (independently) by M.G. Kendall in 1938 [a3], [a4]. In modern use, the term "correlation" refers to a measure of a linear relationship between variates (such as the Pearson product-moment correlation coefficient), while "measure of association" refers to a measure of a monotone relationship between variates (such as Kendall's tau and the Spearman rho metric). For a historical review of Kendall's tau and related coefficients, see [a5].
Underlying the definition of Kendall's tau is the notion of concordance. If $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are two elements of a sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ from a bivariate population, one says that $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ are concordant if $x _ { j } < x _ { k }$ and $y _ { j } < y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } > y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) > 0$); and discordant if $x _ { j } < x _ { k }$ and $y _ { j } > y _ { k }$ or if $x _ { j } > x _ { k }$ and $y _ { j } < y _ { k }$ (i.e., if $( x _ { j } - x _ { k } ) ( y _ { j } - y _ { k } ) < 0$). There are $\left( \begin{array} { l } { n } \\ { 2 } \end{array} \right)$ distinct pairs of observations in the sample, and each pair (barring ties) is either concordant or discordant. Denoting by $S$ the number $c$ of concordant pairs minus the number $d$ of discordant pairs, Kendall's tau for the sample is defined as
\begin{equation*} \tau _ { n } = \frac { c - d } { c + d } = \frac { S } { \left( \begin{array} { l } { n } \\ { 2 } \end{array} \right) } = \frac { 2 S } { n ( n - 1 ) } \end{equation*}
When ties exist in the data, the following adjusted formula is used:
\begin{equation*} \tau _ { n } = \frac { S } { \sqrt { n ( n - 1 ) / 2 - T } \sqrt { n ( n - 1 ) / 2 - U } }, \end{equation*}
where $T = \sum _ { t } t ( t - 1 ) / 2$ for $t$ the number of $X$ observations that are tied at a given rank, and $U = \sum _ { u } u ( u - 1 ) / 2$ for $u$ the number of $Y$ observations that are tied at a given rank. For details on the use of $\tau _ { n }$ in hypotheses testing, and for large-sample theory, see [a2].
Note that $\tau _ { n }$ is equal to the probability of concordance minus the probability of discordance for a pair of observations $( x_j, y_j )$ and $( x _ { k } , y _ { k } )$ chosen randomly from the sample $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$. The population version $\tau$ of Kendall's tau is defined similarly for random variables $X$ and $Y$ (cf. also Random variable). Let $( X _ { 1 } , Y _ { 1 } )$ and $( X _ { 2 } , Y _ { 2 } )$ be independent random vectors with the same distribution as $( X , Y )$. Then
\begin{equation*} \tau = \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) > 0 ] + \end{equation*}
\begin{equation*} - \mathsf{P} [ ( X _ { 1 } - X _ { 2 } ) ( Y _ { 1 } - Y _ { 2 } ) < 0 ] = \end{equation*}
\begin{equation*} = \operatorname { corr } [ \operatorname { sign } ( X _ { 1 } - X _ { 2 } ) , \operatorname { sign } ( Y _ { 1 } - Y _ { 2 } ) ]. \end{equation*}
Since $\tau$ is the Pearson product-moment correlation coefficient of the random variables $\operatorname { sign } ( X _ { 1 } - X _ { 2 } )$ and $\operatorname { sign } ( Y _ { 1 } - Y _ { 2 } )$, $\tau$ is sometimes called the difference sign correlation coefficient.
When $X$ and $Y$ are continuous,
\begin{equation*} \tau = 4 \int _ { 0 } ^ { 1 } \int _ { 0 } ^ { 1 } C _ { X , Y } ( u , v ) d C _ { X , Y } ( u , v ) - 1, \end{equation*}
where $C _ { X , Y }$ is the copula of $X$ and $Y$. Consequently, $\tau$ is invariant under strictly increasing transformations of $X$ and $Y$, a property $\tau$ shares with Spearman's rho, but not with the Pearson product-moment correlation coefficient. For a survey of copulas and their relationship with measures of association, see [a6].
Besides Kendall's tau, there are other measures of association based on the notion of concordance, one of which is Blomqvist's coefficient [a1]. Let $\{ ( x _ { i } , y _ { i } ) \} _ { i = 1 } ^ { n }$ denote a sample from a continuous bivariate population, and let $\tilde{x}$ and $\tilde{y}$ denote sample medians (cf. also Median (in statistics)). Divide the $( x , y )$-plane into four quadrants with the lines $x = \tilde { x }$ and $y = \tilde { y }$; and let $n_ 1$ be the number of sample points belonging to the first or third quadrants, and $n_{2}$ the number of points belonging to the second or fourth quadrants. If the sample size $n$ is even, the calculation of $n_ 1$ and $n_{2}$ is evident. If $n$ is odd, then one or two of the sample points fall on the lines $x = \tilde { x }$ and $y = \tilde { y }$. In the first case one ignores the point; in the second case one assigns one point to the quadrant touched by both points and ignores the other. Then Blomqvist's $q$ is defined as
\begin{equation*} q = \frac { n_1 - n_2 } { n_1 + n_2 }. \end{equation*}
For details on the use of $q$ in hypothesis testing, and for large-sample theory, see [a1].
The population parameter estimated by $q$, denoted by $\beta$, is defined analogously to Kendall's tau (cf. Kendall tau metric). Denoting by $\tilde{X}$ and $\tilde{Y}$ the population medians of $X$ and $Y$, then
\begin{equation*} \beta = \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) > 0 ] + \end{equation*}
\begin{equation*} - \mathsf{P} [ ( X - \tilde { X } ) ( Y - \tilde { Y } ) < 0 ] = \end{equation*}
where $F_{ X , Y}$ denotes the joint distribution function of $X$ and $Y$. Since $\beta$ depends only on the value of $F_{ X , Y}$ at the point whose coordinates are the population medians of $X$ and $Y$, it is sometimes called the medial correlation coefficient. When $X$ and $Y$ are continuous,
\begin{equation*} \beta = 4 C _ { X , Y } \left( \frac { 1 } { 2 } , \frac { 1 } { 2 } \right) - 1, \end{equation*}
where $C _ { X , Y }$ again denotes the copula of $X$ and $Y$. Thus $\beta$, like $\tau$, is invariant under strictly increasing transformations of $X$ and $Y$.
References
[a1] | N. Blomqvist, "On a measure of dependence between two random variables" Ann. Math. Stat. , 21 (1950) pp. 503–600 |
[a2] | J.D. Gibbons, "Nonparametric methods for quantitative analysis" , Holt, Rinehart & Winston (1976) |
[a3] | M.G. Kendall, "A new measure of rank correlation" Biometrika , 30 (1938) pp. 81–93 |
[a4] | M.G. Kendall, "Rank correlation methods" , Charles Griffin (1970) (Edition: Fourth) |
[a5] | W.H. Kruskal, "Ordinal measures of association" J. Amer. Statist. Assoc. , 53 (1958) pp. 814–861 |
[a6] | R.B. Nelsen, "An introduction to copulas" , Springer (1999) |
Kendall tau metric. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kendall_tau_metric&oldid=12869