Pearson product-moment correlation coefficient
While the modern theory of correlation and regression has its roots in the work of F. Galton, the version of the product-moment correlation coefficient in current use (2000) is due to K. Pearson [a2]. Pearson's product-moment correlation coefficient $\rho$ is a measure of the strength of a linear relationship between two random variables $X$ and $Y$ (cf. also Random variable) with means $\mu_x=\mathsf{E}(X)$, $\mu_y=\mathsf{E}(Y)$ and finite variances $\sigma^2_x=\text{var}(X)$, $\sigma^2_y=\text{var}(Y)$:
\begin{equation}\rho=\text{corr}(X,Y)=\frac{\text{cov}(X,Y)}{\sigma_x\:\sigma_y},\end{equation}
where $\text{cov}(X,Y)$ is the covariance of $X$ and $Y$,
\begin{equation}\text{cov}(X,Y)=\mathsf{E}[(X-\mu_x)(Y-\mu_y)]=\mathsf{E}(XY)-\mu_x\:\mu_y.\end{equation}
It readily follows that $-1\leq\rho\leq+1$, and that $\rho$ is equal to $-1$ or $+1$ if and only if each of $X$ and $Y$ is almost surely a linear function of the other, i.e., $Y=\alpha+\beta X(\beta\neq0)$ ($1$) with probability $1$ (furthermore, $\rho$ and $\beta$ have the same sign). If $\rho=0$, $X$ and $Y$ are said to be uncorrelated. Independent random variables are always uncorrelated, however uncorrelated random variables need not be independent (cf. also Independence).
The term "product-moment" refers to the observation that $\rho=\mu_{11}/\sqrt{\mu_{20}\:\mu_{02}}$, where $\mu_{ij}=\mathsf{E}[(X-\mu_x)^i(Y-\mu_y)^j]$ denotes the $(i,j)$th product moment of $X$ and $Y$ about their means.
The coefficient $\rho$ also plays a role in linear regression (cf. also Regression analysis). If the regression of $Y$ on $X$ is linear, then $y=\mathsf{E}(Y|X=x)=\mu_y+\rho(\sigma_y/\sigma_x)(x-\mu_x)$, and if the regression of $X$ on $Y$ is linear, then $x=\mathsf{E}(X|Y=y)=\mu_x+\rho(\sigma_x/\sigma_y)(y-\mu_y)$. Note that the product of the two slopes is $\rho^2$.
When $X$ and $Y$ have a bivariate normal distribution (cf. also Normal distribution), $\rho$ is a parameter of the joint density function
\begin{equation}\phi(x,y)=\frac{1}{2\pi\:\sigma_x\:\sigma_y\sqrt{1-\rho^2}}\exp\bigg[\frac{-1}{2(1-\rho^2)}Q\bigg],\\-\infty<x,y<\infty,\end{equation}
with
\begin{equation}=\bigg(\frac{x-\mu_x}{\sigma_x}\bigg)-2\rho\bigg(\frac{x-\mu_x}{\sigma_x}\bigg)\bigg(\frac{y-\mu_y}{\sigma_y}\bigg)+\bigg(\frac{y-\mu_y}{\sigma_y}\bigg)^2\end{equation}
Unlike the general situation, uncorrelated random variables with a bivariate normal distribution are independent.
For a random sample $\{(x_i,y_i)\}^n_{i=1}$ from a bivariate population, $\rho$ is estimated by the sample correlation coefficient (cf. also Correlation coefficient) $r$, given by
\begin{equation}r=\frac{\sum^n_{i=1}(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum^n_{i=1}(x_i-\overline{x})^2\sum^n_{i=1}(y_i-\overline{y})^2}}.\end{equation}
If $x$ and $y$ denote, respectively, the vectors $(x_1-\overline{x},...,x_n-\overline{x})$ and $(y_1-\overline{y},...,y_n-\overline{y})$, and $\theta$ denotes the angle between $x$ and $y$, then
\begin{equation}r=\frac{xy}{|x||y|}=\cos\theta\end{equation}
Further interpretations of $r$ can be found in [a3]. For details on the use of $r$ in hypothesis testing, and for large-sample theory, see [a1].
References
[a1] | O.J. Dunn, V.A. Clark, "Applied statistics: analysis of variance and regression" , Wiley (1974) |
[a2] | K. Pearson, "Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia" Philos. Trans. Royal Soc. London Ser. A , 187 (1896) pp. 253–318 |
[a3] | J.L. Rodgers, W.A. Nicewander, "Thirteen ways to look at the correlation coefficient" The Amer. Statistician , 42 (1988) pp. 59–65 |
Pearson product-moment correlation coefficient. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Pearson_product-moment_correlation_coefficient&oldid=51624