Multiple-correlation coefficient
A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if $ ( X _ {1} \dots X _ {k} ) $
is a random vector with values in $ \mathbf R ^ {k} $,
then the multiple-correlation coefficient between $ X _ {1} $
and $ X _ {2} \dots X _ {k} $
is defined as the usual correlation coefficient between $ X _ {1} $
and its best linear approximation $ {\mathsf E} ( X _ {1} \mid X _ {2} \dots X _ {k} ) $
relative to $ X _ {2} \dots X _ {k} $,
i.e. as its regression relative to $ X _ {2} \dots X _ {k} $.
The multiple-correlation coefficient has the property that if $ {\mathsf E} X _ {1} = \dots = {\mathsf E} X _ {k} = 0 $
and if
$$ X _ {1} ^ {*} = \ \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} $$
is the regression of $ X _ {1} $ relative to $ X _ {2} \dots X _ {k} $, then among all linear combinations of $ X _ {2} \dots X _ {k} $ the variable $ X _ {1} ^ {*} $ has largest correlation with $ X _ {1} $. In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. Canonical correlation coefficients). For $ k = 2 $ the multiple-correlation coefficient is the absolute value of the usual correlation coefficient $ \rho _ {12} $ between $ X _ {1} $ and $ X _ {2} $. The multiple-correlation coefficient between $ X _ {1} $ and $ X _ {2} \dots X _ {k} $ is denoted by $ \rho _ {1 \cdot ( 2 \dots k ) } $ and is expressed in terms of the entries of the correlation matrix $ R = \| \rho _ {ij} \| $, $ i , j = 1 \dots k $, by
$$ \rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{| R | }{R _ {11} } , $$
where $ | R | $ is the determinant of $ R $ and $ R _ {11} $ is the cofactor of $ \rho _ {11} = 1 $; here $ 0 \leq \rho _ {1 \cdot ( 2 \dots k) } \leq 1 $. If $ \rho _ {1 \cdot ( 2 \dots k ) } = 1 $, then, with probability $ 1 $, $ X _ {1} $ is equal to a linear combination of $ X _ {2} \dots X _ {k} $, that is, the joint distribution of $ X _ {1} \dots X _ {k} $ is concentrated on a hyperplane in $ \mathbf R ^ {k} $. On the other hand, $ \rho _ {1 \cdot ( 2 \dots k ) } = 0 $ if and only if $ \rho _ {12} = \dots = \rho _ {1k} = 0 $, that is, if $ X _ {1} $ is not correlated with any of $ X _ {2} \dots X _ {k} $. To calculate the multiple-correlation coefficient one can use the formula
$$ \rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} }{\sigma _ {1} ^ {2} } , $$
where $ \sigma _ {1} ^ {2} $ is the variance of $ X _ {1} $ and
$$ \sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} = {\mathsf E} [ X _ {1} - ( \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} ) ] ^ {2} $$
is the variance of $ X _ {1} $ with respect to the regression.
The sample analogue of the multiple-correlation coefficient $ \rho _ {1 \cdot ( 2 \dots k ) } $ is
$$ r _ {1 \cdot ( 2 \dots k ) } = \ \sqrt {1 - \frac{s _ {1 \cdot ( 2 \dots k ) } ^ {2} }{s _ {1} ^ {2} } } , $$
where $ s _ {1 \cdot ( 2 \dots k ) } ^ {2} $ and $ s _ {1} ^ {2} $ are estimators of $ \sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} $ and $ \sigma _ {1} ^ {2} $ based on a sample of size $ n $. To test the hypothesis of no relationship, the sampling distribution of $ r _ {1 \cdot ( 2 \dots k) } $ is used. Given that the sample is taken from a multivariate normal distribution, the variable $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ has the beta-distribution with parameters $ ( ( k - 1 ) / 2 , ( n - k ) / 2 ) $ if $ \rho _ {1 \cdot ( 2 \dots k ) } = 0 $; if $ \rho _ {1 \cdot ( 2 \dots k ) } \neq 0 $, then the distribution of $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ is known, but is somewhat complicated.
References
[1] | H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) |
[2] | M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979) |
Comments
For the distribution of $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ if $ \rho _ {1 \cdot ( 2 \dots k ) } \neq 0 $ see [a2], Chapt. 10.
References
[a1] | T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1958) |
[a2] | M.L. Eaton, "Multivariate statistics: A vector space approach" , Wiley (1983) |
[a3] | R.J. Muirhead, "Aspects of multivariate statistical theory" , Wiley (1982) |
Multiple-correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Multiple-correlation_coefficient&oldid=47929