# Multiple-correlation coefficient

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if $( X _ {1} \dots X _ {k} )$ is a random vector with values in $\mathbf R ^ {k}$, then the multiple-correlation coefficient between $X _ {1}$ and $X _ {2} \dots X _ {k}$ is defined as the usual correlation coefficient between $X _ {1}$ and its best linear approximation ${\mathsf E} ( X _ {1} \mid X _ {2} \dots X _ {k} )$ relative to $X _ {2} \dots X _ {k}$, i.e. as its regression relative to $X _ {2} \dots X _ {k}$. The multiple-correlation coefficient has the property that if ${\mathsf E} X _ {1} = \dots = {\mathsf E} X _ {k} = 0$ and if

$$X _ {1} ^ {*} = \ \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k}$$

is the regression of $X _ {1}$ relative to $X _ {2} \dots X _ {k}$, then among all linear combinations of $X _ {2} \dots X _ {k}$ the variable $X _ {1} ^ {*}$ has largest correlation with $X _ {1}$. In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. Canonical correlation coefficients). For $k = 2$ the multiple-correlation coefficient is the absolute value of the usual correlation coefficient $\rho _ {12}$ between $X _ {1}$ and $X _ {2}$. The multiple-correlation coefficient between $X _ {1}$ and $X _ {2} \dots X _ {k}$ is denoted by $\rho _ {1 \cdot ( 2 \dots k ) }$ and is expressed in terms of the entries of the correlation matrix $R = \| \rho _ {ij} \|$, $i , j = 1 \dots k$, by

$$\rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{| R | }{R _ {11} } ,$$

where $| R |$ is the determinant of $R$ and $R _ {11}$ is the cofactor of $\rho _ {11} = 1$; here $0 \leq \rho _ {1 \cdot ( 2 \dots k) } \leq 1$. If $\rho _ {1 \cdot ( 2 \dots k ) } = 1$, then, with probability $1$, $X _ {1}$ is equal to a linear combination of $X _ {2} \dots X _ {k}$, that is, the joint distribution of $X _ {1} \dots X _ {k}$ is concentrated on a hyperplane in $\mathbf R ^ {k}$. On the other hand, $\rho _ {1 \cdot ( 2 \dots k ) } = 0$ if and only if $\rho _ {12} = \dots = \rho _ {1k} = 0$, that is, if $X _ {1}$ is not correlated with any of $X _ {2} \dots X _ {k}$. To calculate the multiple-correlation coefficient one can use the formula

$$\rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} }{\sigma _ {1} ^ {2} } ,$$

where $\sigma _ {1} ^ {2}$ is the variance of $X _ {1}$ and

$$\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} = {\mathsf E} [ X _ {1} - ( \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} ) ] ^ {2}$$

is the variance of $X _ {1}$ with respect to the regression.

The sample analogue of the multiple-correlation coefficient $\rho _ {1 \cdot ( 2 \dots k ) }$ is

$$r _ {1 \cdot ( 2 \dots k ) } = \ \sqrt {1 - \frac{s _ {1 \cdot ( 2 \dots k ) } ^ {2} }{s _ {1} ^ {2} } } ,$$

where $s _ {1 \cdot ( 2 \dots k ) } ^ {2}$ and $s _ {1} ^ {2}$ are estimators of $\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2}$ and $\sigma _ {1} ^ {2}$ based on a sample of size $n$. To test the hypothesis of no relationship, the sampling distribution of $r _ {1 \cdot ( 2 \dots k) }$ is used. Given that the sample is taken from a multivariate normal distribution, the variable $r _ {1 \cdot ( 2 \dots k ) } ^ {2}$ has the beta-distribution with parameters $( ( k - 1 ) / 2 , ( n - k ) / 2 )$ if $\rho _ {1 \cdot ( 2 \dots k ) } = 0$; if $\rho _ {1 \cdot ( 2 \dots k ) } \neq 0$, then the distribution of $r _ {1 \cdot ( 2 \dots k ) } ^ {2}$ is known, but is somewhat complicated.

How to Cite This Entry:
Multiple-correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Multiple-correlation_coefficient&oldid=47929
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article