Namespaces
Variants
Actions

Difference between revisions of "Correlation coefficient"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
A numerical characteristic of the joint distribution of two random variables, expressing a relationship between them. The correlation coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265301.png" /> for random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265302.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265303.png" /> with mathematical expectations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265304.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265305.png" /> and non-zero variances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265306.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265307.png" /> is defined by
+
<!--
 +
c0265301.png
 +
$#A+1 = 27 n = 0
 +
$#C+1 = 27 : ~/encyclopedia/old_files/data/C026/C.0206530 Correlation coefficient
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265308.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
The correlation coefficient of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c0265309.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653010.png" /> is simply the [[Covariance|covariance]] of the normalized variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653011.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653012.png" />. The correlation coefficient is symmetric with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653013.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653014.png" /> and is invariant under change of the origin and scaling. In all cases <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653015.png" />. The importance of the correlation coefficient as one of the possible measures of dependence is determined by its following properties: 1) if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653016.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653017.png" /> are independent, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653018.png" /> (the converse is not necessarily true). Random variables for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653019.png" /> are said to be non-correlated. 2) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653020.png" /> if and only if the dependence between the random variables is linear:
+
A numerical characteristic of the joint distribution of two random variables, expressing a relationship between them. The correlation coefficient $  \rho = \rho ( X _ {1} , X _ {2} ) $
 +
for random variables $  X _ {1} $
 +
and  $  X _ {2} $
 +
with mathematical expectations  $  a _ {1} = {\mathsf E} X _ {1} $
 +
and  $  a _ {2} = {\mathsf E} X _ {2} $
 +
and non-zero variances  $  \sigma _ {1}  ^ {2} = {\mathsf D} X _ {1} $
 +
and  $  \sigma _ {2}  ^ {2} = {\mathsf D} X _ {2} $
 +
is defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653021.png" /></td> </tr></table>
+
$$
 +
\rho ( X _ {1} , X _ {2} )  = \
  
The difficulty of interpreting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653022.png" /> as a measure of dependence is that the equality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653023.png" /> may be valid for both independent and dependent random variables; in the general case, a necessary and sufficient condition for independence is that the [[Maximal correlation coefficient|maximal correlation coefficient]] equals zero. Thus, the correlation coefficient does not exhaust all types of dependence between random variables and it is a measure of linear dependence only. The degree of this linear dependence is characterized as follows: The random variable
+
\frac{ {\mathsf E} ( X _ {1} - a _ {1} )
 +
( X _ {2} - a _ {2} ) }{\sigma _ {1} \sigma _ {2} }
 +
.
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653024.png" /></td> </tr></table>
+
The correlation coefficient of  $  X _ {1} $
 +
and  $  X _ {2} $
 +
is simply the [[Covariance|covariance]] of the normalized variables  $  ( X _ {1} - a _ {1} )/ \sigma _ {1} $
 +
and  $  ( X _ {2} - a _ {2} )/ \sigma _ {2} $.
 +
The correlation coefficient is symmetric with respect to  $  X _ {1} $
 +
and  $  X _ {2} $
 +
and is invariant under change of the origin and scaling. In all cases  $  - 1 \leq  \rho \leq  1 $.
 +
The importance of the correlation coefficient as one of the possible measures of dependence is determined by its following properties: 1) if  $  X _ {1} $
 +
and  $  X _ {2} $
 +
are independent, then  $  \rho ( X _ {1} , X _ {2} ) = 0 $(
 +
the converse is not necessarily true). Random variables for which  $  \rho = 0 $
 +
are said to be non-correlated. 2)  $  | \rho | = 1 $
 +
if and only if the dependence between the random variables is linear:
  
gives a linear representation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653025.png" /> in terms of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653026.png" /> which is best in the sense that
+
$$
 +
X _ {2}  = \
 +
\rho
 +
\frac{\sigma _ {2} }{\sigma _ {1} }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026530/c02653027.png" /></td> </tr></table>
+
( X _ {1} - a _ {1} ) +
 +
a _ {2} .
 +
$$
 +
 
 +
The difficulty of interpreting  $  \rho $
 +
as a measure of dependence is that the equality  $  \rho = 0 $
 +
may be valid for both independent and dependent random variables; in the general case, a necessary and sufficient condition for independence is that the [[Maximal correlation coefficient|maximal correlation coefficient]] equals zero. Thus, the correlation coefficient does not exhaust all types of dependence between random variables and it is a measure of linear dependence only. The degree of this linear dependence is characterized as follows: The random variable
 +
 
 +
$$
 +
\widehat{X}  _ {2}  = \
 +
\rho
 +
\frac{\sigma _ {2} }{\sigma _ {1} }
 +
 
 +
( X _ {1} - a _ {1} ) +
 +
a _ {2}  $$
 +
 
 +
gives a linear representation of  $  X _ {2} $
 +
in terms of  $  X _ {1} $
 +
which is best in the sense that
 +
 
 +
$$
 +
{\mathsf E} ( X _ {2} - \widehat{X}  _ {2} )  ^ {2}  = \
 +
\min _ {c _ {1} , c _ {2} }  {\mathsf E}
 +
( X _ {2} - c _ {1} X _ {1} - c _ {2} )  ^ {2} ;
 +
$$
  
 
see also [[Regression|Regression]]. As characteristic correlations between several random variables there are the [[Partial correlation coefficient|partial correlation coefficient]] and the [[Multiple-correlation coefficient|multiple-correlation coefficient]]. For methods for testing independence hypotheses and using correlation coefficients to study correlation, see [[Correlation (in statistics)|Correlation (in statistics)]].
 
see also [[Regression|Regression]]. As characteristic correlations between several random variables there are the [[Partial correlation coefficient|partial correlation coefficient]] and the [[Multiple-correlation coefficient|multiple-correlation coefficient]]. For methods for testing independence hypotheses and using correlation coefficients to study correlation, see [[Correlation (in statistics)|Correlation (in statistics)]].

Latest revision as of 17:31, 5 June 2020


A numerical characteristic of the joint distribution of two random variables, expressing a relationship between them. The correlation coefficient $ \rho = \rho ( X _ {1} , X _ {2} ) $ for random variables $ X _ {1} $ and $ X _ {2} $ with mathematical expectations $ a _ {1} = {\mathsf E} X _ {1} $ and $ a _ {2} = {\mathsf E} X _ {2} $ and non-zero variances $ \sigma _ {1} ^ {2} = {\mathsf D} X _ {1} $ and $ \sigma _ {2} ^ {2} = {\mathsf D} X _ {2} $ is defined by

$$ \rho ( X _ {1} , X _ {2} ) = \ \frac{ {\mathsf E} ( X _ {1} - a _ {1} ) ( X _ {2} - a _ {2} ) }{\sigma _ {1} \sigma _ {2} } . $$

The correlation coefficient of $ X _ {1} $ and $ X _ {2} $ is simply the covariance of the normalized variables $ ( X _ {1} - a _ {1} )/ \sigma _ {1} $ and $ ( X _ {2} - a _ {2} )/ \sigma _ {2} $. The correlation coefficient is symmetric with respect to $ X _ {1} $ and $ X _ {2} $ and is invariant under change of the origin and scaling. In all cases $ - 1 \leq \rho \leq 1 $. The importance of the correlation coefficient as one of the possible measures of dependence is determined by its following properties: 1) if $ X _ {1} $ and $ X _ {2} $ are independent, then $ \rho ( X _ {1} , X _ {2} ) = 0 $( the converse is not necessarily true). Random variables for which $ \rho = 0 $ are said to be non-correlated. 2) $ | \rho | = 1 $ if and only if the dependence between the random variables is linear:

$$ X _ {2} = \ \rho \frac{\sigma _ {2} }{\sigma _ {1} } ( X _ {1} - a _ {1} ) + a _ {2} . $$

The difficulty of interpreting $ \rho $ as a measure of dependence is that the equality $ \rho = 0 $ may be valid for both independent and dependent random variables; in the general case, a necessary and sufficient condition for independence is that the maximal correlation coefficient equals zero. Thus, the correlation coefficient does not exhaust all types of dependence between random variables and it is a measure of linear dependence only. The degree of this linear dependence is characterized as follows: The random variable

$$ \widehat{X} _ {2} = \ \rho \frac{\sigma _ {2} }{\sigma _ {1} } ( X _ {1} - a _ {1} ) + a _ {2} $$

gives a linear representation of $ X _ {2} $ in terms of $ X _ {1} $ which is best in the sense that

$$ {\mathsf E} ( X _ {2} - \widehat{X} _ {2} ) ^ {2} = \ \min _ {c _ {1} , c _ {2} } {\mathsf E} ( X _ {2} - c _ {1} X _ {1} - c _ {2} ) ^ {2} ; $$

see also Regression. As characteristic correlations between several random variables there are the partial correlation coefficient and the multiple-correlation coefficient. For methods for testing independence hypotheses and using correlation coefficients to study correlation, see Correlation (in statistics).

How to Cite This Entry:
Correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Correlation_coefficient&oldid=46522
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article