Namespaces
Variants
Actions

Difference between revisions of "Multiple-correlation coefficient"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653601.png" /> is a random vector with values in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653602.png" />, then the multiple-correlation coefficient between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653603.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653604.png" /> is defined as the usual [[Correlation coefficient|correlation coefficient]] between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653605.png" /> and its best linear approximation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653606.png" /> relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653607.png" />, i.e. as its [[Regression|regression]] relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653608.png" />. The multiple-correlation coefficient has the property that if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m0653609.png" /> and if
+
<!--
 +
m0653601.png
 +
$#A+1 = 60 n = 0
 +
$#C+1 = 60 : ~/encyclopedia/old_files/data/M065/M.0605360 Multiple\AAhcorrelation coefficient
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536010.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
is the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536011.png" /> relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536012.png" />, then among all linear combinations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536013.png" /> the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536014.png" /> has largest correlation with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536015.png" />. In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. [[Canonical correlation coefficients|Canonical correlation coefficients]]). For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536016.png" /> the multiple-correlation coefficient is the absolute value of the usual correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536017.png" /> between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536018.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536019.png" />. The multiple-correlation coefficient between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536020.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536021.png" /> is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536022.png" /> and is expressed in terms of the entries of the correlation matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536023.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536024.png" />, by
+
A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if  $  ( X _ {1} \dots X _ {k} ) $
 +
is a random vector with values in  $  \mathbf R  ^ {k} $,  
 +
then the multiple-correlation coefficient between  $  X _ {1} $
 +
and  $  X _ {2} \dots X _ {k} $
 +
is defined as the usual [[Correlation coefficient|correlation coefficient]] between $  X _ {1} $
 +
and its best linear approximation  $  {\mathsf E} ( X _ {1} \mid  X _ {2} \dots X _ {k} ) $
 +
relative to  $  X _ {2} \dots X _ {k} $,
 +
i.e. as its [[Regression|regression]] relative to  $  X _ {2} \dots X _ {k} $.  
 +
The multiple-correlation coefficient has the property that if  $  {\mathsf E} X _ {1} = \dots = {\mathsf E} X _ {k} = 0 $
 +
and if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536025.png" /></td> </tr></table>
+
$$
 +
X _ {1}  ^ {*}  = \
 +
\beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k}  $$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536026.png" /> is the determinant of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536027.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536028.png" /> is the [[Cofactor|cofactor]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536029.png" />; here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536030.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536031.png" />, then, with probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536032.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536033.png" /> is equal to a linear combination of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536034.png" />, that is, the [[Joint distribution|joint distribution]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536035.png" /> is concentrated on a hyperplane in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536036.png" />. On the other hand, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536037.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536038.png" />, that is, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536039.png" /> is not correlated with any of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536040.png" />. To calculate the multiple-correlation coefficient one can use the formula
+
is the regression of $  X _ {1} $
 +
relative to  $  X _ {2} \dots X _ {k} $,  
 +
then among all linear combinations of  $  X _ {2} \dots X _ {k} $
 +
the variable  $  X _ {1}  ^ {*} $
 +
has largest correlation with $  X _ {1} $.  
 +
In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. [[Canonical correlation coefficients|Canonical correlation coefficients]]). For  $  k = 2 $
 +
the multiple-correlation coefficient is the absolute value of the usual correlation coefficient  $  \rho _ {12} $
 +
between  $  X _ {1} $
 +
and  $  X _ {2} $.  
 +
The multiple-correlation coefficient between  $  X _ {1} $
 +
and  $  X _ {2} \dots X _ {k} $
 +
is denoted by  $  \rho _ {1 \cdot ( 2 \dots k ) }  $
 +
and is expressed in terms of the entries of the correlation matrix  $  R = \| \rho _ {ij} \| $,
 +
$  i , j = 1 \dots k $,
 +
by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536041.png" /></td> </tr></table>
+
$$
 +
\rho _ {1 \cdot ( 2 \dots k ) }  ^ {2}  = 1 -
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536042.png" /> is the variance of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536043.png" /> and
+
\frac{| R | }{R _ {11} }
 +
,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536044.png" /></td> </tr></table>
+
where  $  | R | $
 +
is the determinant of  $  R $
 +
and  $  R _ {11} $
 +
is the [[Cofactor|cofactor]] of  $  \rho _ {11} = 1 $;
 +
here  $  0 \leq  \rho _ {1 \cdot ( 2 \dots k) }  \leq  1 $.
 +
If  $  \rho _ {1 \cdot ( 2 \dots k ) }  = 1 $,
 +
then, with probability  $  1 $,
 +
$  X _ {1} $
 +
is equal to a linear combination of  $  X _ {2} \dots X _ {k} $,
 +
that is, the [[Joint distribution|joint distribution]] of  $  X _ {1} \dots X _ {k} $
 +
is concentrated on a hyperplane in  $  \mathbf R  ^ {k} $.
 +
On the other hand,  $  \rho _ {1 \cdot ( 2 \dots k ) }  = 0 $
 +
if and only if  $  \rho _ {12} = \dots = \rho _ {1k} = 0 $,
 +
that is, if  $  X _ {1} $
 +
is not correlated with any of  $  X _ {2} \dots X _ {k} $.  
 +
To calculate the multiple-correlation coefficient one can use the formula
  
is the variance of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536045.png" /> with respect to the regression.
+
$$
 +
\rho _ {1 \cdot ( 2 \dots k ) }
 +
^ {2}  = 1 -
  
The sample analogue of the multiple-correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536046.png" /> is
+
\frac{\sigma _ {1 \cdot ( 2 \dots k ) }  ^ {2} }{\sigma _ {1}  ^ {2} }
 +
,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536047.png" /></td> </tr></table>
+
where  $  \sigma _ {1}  ^ {2} $
 +
is the variance of  $  X _ {1} $
 +
and
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536049.png" /> are estimators of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536050.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536051.png" /> based on a sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536052.png" />. To test the hypothesis of no relationship, the sampling distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536053.png" /> is used. Given that the sample is taken from a multivariate normal distribution, the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536054.png" /> has the beta-distribution with parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536055.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536056.png" />; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536057.png" />, then the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536058.png" /> is known, but is somewhat complicated.
+
$$
 +
\sigma _ {1 \cdot ( 2 \dots k ) }
 +
^ {2}  = {\mathsf E} [ X _ {1} - ( \beta _ {2} X _ {2} + \dots +
 +
\beta _ {k} X _ {k} ) ]  ^ {2}
 +
$$
 +
 
 +
is the variance of  $  X _ {1} $
 +
with respect to the regression.
 +
 
 +
The sample analogue of the multiple-correlation coefficient  $  \rho _ {1 \cdot ( 2 \dots k ) }  $
 +
is
 +
 
 +
$$
 +
r _ {1 \cdot ( 2 \dots k ) }  = \
 +
\sqrt {1 -
 +
 
 +
\frac{s _ {1 \cdot ( 2 \dots k ) }  ^ {2} }{s _ {1}  ^ {2} }
 +
} ,
 +
$$
 +
 
 +
where  $  s _ {1 \cdot ( 2 \dots k ) }  ^ {2} $
 +
and $  s _ {1}  ^ {2} $
 +
are estimators of $  \sigma _ {1 \cdot ( 2 \dots k ) }  ^ {2} $
 +
and $  \sigma _ {1}  ^ {2} $
 +
based on a sample of size $  n $.  
 +
To test the hypothesis of no relationship, the sampling distribution of $  r _ {1 \cdot ( 2 \dots k) }  $
 +
is used. Given that the sample is taken from a multivariate normal distribution, the variable $  r _ {1 \cdot ( 2 \dots k ) }  ^ {2} $
 +
has the beta-distribution with parameters $  ( ( k - 1 ) / 2 , ( n - k ) / 2 ) $
 +
if $  \rho _ {1 \cdot ( 2 \dots k ) }  = 0 $;  
 +
if $  \rho _ {1 \cdot ( 2 \dots k ) }  \neq 0 $,  
 +
then the distribution of $  r _ {1 \cdot ( 2 \dots k ) }  ^ {2} $
 +
is known, but is somewhat complicated.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1979)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1979)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
For the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536059.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065360/m06536060.png" /> see [[#References|[a2]]], Chapt. 10.
+
For the distribution of $  r _ {1 \cdot ( 2 \dots k ) }  ^ {2} $
 +
if $  \rho _ {1 \cdot ( 2 \dots k ) }  \neq 0 $
 +
see [[#References|[a2]]], Chapt. 10.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  T.W. Anderson,  "An introduction to multivariate statistical analysis" , Wiley  (1958)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  M.L. Eaton,  "Multivariate statistics: A vector space approach" , Wiley  (1983)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R.J. Muirhead,  "Aspects of multivariate statistical theory" , Wiley  (1982)</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  T.W. Anderson,  "An introduction to multivariate statistical analysis" , Wiley  (1958)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  M.L. Eaton,  "Multivariate statistics: A vector space approach" , Wiley  (1983)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R.J. Muirhead,  "Aspects of multivariate statistical theory" , Wiley  (1982)</TD></TR></table>

Latest revision as of 08:02, 6 June 2020


A measure of the linear dependence between one random variable and a certain collection of random variables. More precisely, if $ ( X _ {1} \dots X _ {k} ) $ is a random vector with values in $ \mathbf R ^ {k} $, then the multiple-correlation coefficient between $ X _ {1} $ and $ X _ {2} \dots X _ {k} $ is defined as the usual correlation coefficient between $ X _ {1} $ and its best linear approximation $ {\mathsf E} ( X _ {1} \mid X _ {2} \dots X _ {k} ) $ relative to $ X _ {2} \dots X _ {k} $, i.e. as its regression relative to $ X _ {2} \dots X _ {k} $. The multiple-correlation coefficient has the property that if $ {\mathsf E} X _ {1} = \dots = {\mathsf E} X _ {k} = 0 $ and if

$$ X _ {1} ^ {*} = \ \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} $$

is the regression of $ X _ {1} $ relative to $ X _ {2} \dots X _ {k} $, then among all linear combinations of $ X _ {2} \dots X _ {k} $ the variable $ X _ {1} ^ {*} $ has largest correlation with $ X _ {1} $. In this sense the multiple-correlation coefficient is a special case of the canonical correlation coefficient (cf. Canonical correlation coefficients). For $ k = 2 $ the multiple-correlation coefficient is the absolute value of the usual correlation coefficient $ \rho _ {12} $ between $ X _ {1} $ and $ X _ {2} $. The multiple-correlation coefficient between $ X _ {1} $ and $ X _ {2} \dots X _ {k} $ is denoted by $ \rho _ {1 \cdot ( 2 \dots k ) } $ and is expressed in terms of the entries of the correlation matrix $ R = \| \rho _ {ij} \| $, $ i , j = 1 \dots k $, by

$$ \rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{| R | }{R _ {11} } , $$

where $ | R | $ is the determinant of $ R $ and $ R _ {11} $ is the cofactor of $ \rho _ {11} = 1 $; here $ 0 \leq \rho _ {1 \cdot ( 2 \dots k) } \leq 1 $. If $ \rho _ {1 \cdot ( 2 \dots k ) } = 1 $, then, with probability $ 1 $, $ X _ {1} $ is equal to a linear combination of $ X _ {2} \dots X _ {k} $, that is, the joint distribution of $ X _ {1} \dots X _ {k} $ is concentrated on a hyperplane in $ \mathbf R ^ {k} $. On the other hand, $ \rho _ {1 \cdot ( 2 \dots k ) } = 0 $ if and only if $ \rho _ {12} = \dots = \rho _ {1k} = 0 $, that is, if $ X _ {1} $ is not correlated with any of $ X _ {2} \dots X _ {k} $. To calculate the multiple-correlation coefficient one can use the formula

$$ \rho _ {1 \cdot ( 2 \dots k ) } ^ {2} = 1 - \frac{\sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} }{\sigma _ {1} ^ {2} } , $$

where $ \sigma _ {1} ^ {2} $ is the variance of $ X _ {1} $ and

$$ \sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} = {\mathsf E} [ X _ {1} - ( \beta _ {2} X _ {2} + \dots + \beta _ {k} X _ {k} ) ] ^ {2} $$

is the variance of $ X _ {1} $ with respect to the regression.

The sample analogue of the multiple-correlation coefficient $ \rho _ {1 \cdot ( 2 \dots k ) } $ is

$$ r _ {1 \cdot ( 2 \dots k ) } = \ \sqrt {1 - \frac{s _ {1 \cdot ( 2 \dots k ) } ^ {2} }{s _ {1} ^ {2} } } , $$

where $ s _ {1 \cdot ( 2 \dots k ) } ^ {2} $ and $ s _ {1} ^ {2} $ are estimators of $ \sigma _ {1 \cdot ( 2 \dots k ) } ^ {2} $ and $ \sigma _ {1} ^ {2} $ based on a sample of size $ n $. To test the hypothesis of no relationship, the sampling distribution of $ r _ {1 \cdot ( 2 \dots k) } $ is used. Given that the sample is taken from a multivariate normal distribution, the variable $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ has the beta-distribution with parameters $ ( ( k - 1 ) / 2 , ( n - k ) / 2 ) $ if $ \rho _ {1 \cdot ( 2 \dots k ) } = 0 $; if $ \rho _ {1 \cdot ( 2 \dots k ) } \neq 0 $, then the distribution of $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ is known, but is somewhat complicated.

References

[1] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[2] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979)

Comments

For the distribution of $ r _ {1 \cdot ( 2 \dots k ) } ^ {2} $ if $ \rho _ {1 \cdot ( 2 \dots k ) } \neq 0 $ see [a2], Chapt. 10.

References

[a1] T.W. Anderson, "An introduction to multivariate statistical analysis" , Wiley (1958)
[a2] M.L. Eaton, "Multivariate statistics: A vector space approach" , Wiley (1983)
[a3] R.J. Muirhead, "Aspects of multivariate statistical theory" , Wiley (1982)
How to Cite This Entry:
Multiple-correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Multiple-correlation_coefficient&oldid=47929
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article