# Multiple-correlation coefficient

A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if $( X _ {1} \dots X _ {k} )$ is a random vector with values in $\mathbf R ^ {k}$, then the multiple-correlation coefficient between $X _ {1}$ and $X _ {2} \dots X _ {k}$ is defined as the usual correlation coefficient between $X _ {1}$ and its best linear approximation ${\mathsf E} ( X _ {1} \mid X _ {2} \dots X _ {k} )$ relative to $X _ {2} \dots X _ {k}$, i.e. as its regression relative to $X _ {2} \dots X _ {k}$. The multiple-correlation coefficient has the property that if ${\mathsf E} X _ {1} = \dots = {\mathsf E} X _ {k} = 0$ and if

$$X _ {1} ^ {*} = \ \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k}$$

is the regression of $X _ {1}$ relative to $X _ {2} \dots X _ {k}$, then among all linear combinations of $X _ {2} \dots X _ {k}$ the variable $X _ {1} ^ {*}$ has largest correlation with $X _ {1}$. In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. Canonical correlation coefficients). For $k = 2$ the multiple-correlation coefficient is the absolute value of the usual correlation coefficient $\rho _ {12}$ between $X _ {1}$ and $X _ {2}$. The multiple-correlation coefficient between $X _ {1}$ and $X _ {2} \dots X _ {k}$ is denoted by $\rho _ {1 \cdot ( 2 \dots k ) }$ and is expressed in terms of the entries of the correlation matrix $R = \| \rho _ {ij} \|$, $i , j = 1 \dots k$, by

$$\rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{| R | }{R _ {11} } ,$$

where $| R |$ is the determinant of $R$ and $R _ {11}$ is the cofactor of $\rho _ {11} = 1$; here $0 \leq \rho _ {1 \cdot ( 2 \dots k) } \leq 1$. If $\rho _ {1 \cdot ( 2 \dots k ) } = 1$, then, with probability $1$, $X _ {1}$ is equal to a linear combination of $X _ {2} \dots X _ {k}$, that is, the joint distribution of $X _ {1} \dots X _ {k}$ is concentrated on a hyperplane in $\mathbf R ^ {k}$. On the other hand, $\rho _ {1 \cdot ( 2 \dots k ) } = 0$ if and only if $\rho _ {12} = \dots = \rho _ {1k} = 0$, that is, if $X _ {1}$ is not correlated with any of $X _ {2} \dots X _ {k}$. To calculate the multiple-correlation coefficient one can use the formula

$$\rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} }{\sigma _ {1} ^ {2} } ,$$

where $\sigma _ {1} ^ {2}$ is the variance of $X _ {1}$ and

$$\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} = {\mathsf E} [ X _ {1} - ( \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} ) ] ^ {2}$$

is the variance of $X _ {1}$ with respect to the regression.

The sample analogue of the multiple-correlation coefficient $\rho _ {1 \cdot ( 2 \dots k ) }$ is

$$r _ {1 \cdot ( 2 \dots k ) } = \ \sqrt {1 - \frac{s _ {1 \cdot ( 2 \dots k ) } ^ {2} }{s _ {1} ^ {2} } } ,$$

where $s _ {1 \cdot ( 2 \dots k ) } ^ {2}$ and $s _ {1} ^ {2}$ are estimators of $\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2}$ and $\sigma _ {1} ^ {2}$ based on a sample of size $n$. To test the hypothesis of no relationship, the sampling distribution of $r _ {1 \cdot ( 2 \dots k) }$ is used. Given that the sample is taken from a multivariate normal distribution, the variable $r _ {1 \cdot ( 2 \dots k ) } ^ {2}$ has the beta-distribution with parameters $( ( k - 1 ) / 2 , ( n - k ) / 2 )$ if $\rho _ {1 \cdot ( 2 \dots k ) } = 0$; if $\rho _ {1 \cdot ( 2 \dots k ) } \neq 0$, then the distribution of $r _ {1 \cdot ( 2 \dots k ) } ^ {2}$ is known, but is somewhat complicated.

#### References

 [1] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) [2] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979)

For the distribution of $r _ {1 \cdot ( 2 \dots k ) } ^ {2}$ if $\rho _ {1 \cdot ( 2 \dots k ) } \neq 0$ see [a2], Chapt. 10.