# Correlation coefficient

A numerical characteristic of the joint distribution of two random variables, expressing a relationship between them. The correlation coefficient $ \rho = \rho ( X _ {1} , X _ {2} ) $
for random variables $ X _ {1} $
and $ X _ {2} $
with mathematical expectations $ a _ {1} = {\mathsf E} X _ {1} $
and $ a _ {2} = {\mathsf E} X _ {2} $
and non-zero variances $ \sigma _ {1} ^ {2} = {\mathsf D} X _ {1} $
and $ \sigma _ {2} ^ {2} = {\mathsf D} X _ {2} $
is defined by

$$ \rho ( X _ {1} , X _ {2} ) = \ \frac{ {\mathsf E} ( X _ {1} - a _ {1} ) ( X _ {2} - a _ {2} ) }{\sigma _ {1} \sigma _ {2} } . $$

The correlation coefficient of $ X _ {1} $ and $ X _ {2} $ is simply the covariance of the normalized variables $ ( X _ {1} - a _ {1} )/ \sigma _ {1} $ and $ ( X _ {2} - a _ {2} )/ \sigma _ {2} $. The correlation coefficient is symmetric with respect to $ X _ {1} $ and $ X _ {2} $ and is invariant under change of the origin and scaling. In all cases $ - 1 \leq \rho \leq 1 $. The importance of the correlation coefficient as one of the possible measures of dependence is determined by its following properties: 1) if $ X _ {1} $ and $ X _ {2} $ are independent, then $ \rho ( X _ {1} , X _ {2} ) = 0 $( the converse is not necessarily true). Random variables for which $ \rho = 0 $ are said to be non-correlated. 2) $ | \rho | = 1 $ if and only if the dependence between the random variables is linear:

$$ X _ {2} = \ \rho \frac{\sigma _ {2} }{\sigma _ {1} } ( X _ {1} - a _ {1} ) + a _ {2} . $$

The difficulty of interpreting $ \rho $ as a measure of dependence is that the equality $ \rho = 0 $ may be valid for both independent and dependent random variables; in the general case, a necessary and sufficient condition for independence is that the maximal correlation coefficient equals zero. Thus, the correlation coefficient does not exhaust all types of dependence between random variables and it is a measure of linear dependence only. The degree of this linear dependence is characterized as follows: The random variable

$$ \widehat{X} _ {2} = \ \rho \frac{\sigma _ {2} }{\sigma _ {1} } ( X _ {1} - a _ {1} ) + a _ {2} $$

gives a linear representation of $ X _ {2} $ in terms of $ X _ {1} $ which is best in the sense that

$$ {\mathsf E} ( X _ {2} - \widehat{X} _ {2} ) ^ {2} = \ \min _ {c _ {1} , c _ {2} } {\mathsf E} ( X _ {2} - c _ {1} X _ {1} - c _ {2} ) ^ {2} ; $$

see also Regression. As characteristic correlations between several random variables there are the partial correlation coefficient and the multiple-correlation coefficient. For methods for testing independence hypotheses and using correlation coefficients to study correlation, see Correlation (in statistics).

**How to Cite This Entry:**

Correlation coefficient.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Correlation_coefficient&oldid=46522