Pearson product-moment correlation coefficient

From Encyclopedia of Mathematics
Jump to: navigation, search

While the modern theory of correlation and regression has its roots in the work of F. Galton, the version of the product-moment correlation coefficient in current use (2000) is due to K. Pearson [a2]. Pearson's product-moment correlation coefficient $\rho$ is a measure of the strength of a linear relationship between two random variables $X$ and $Y$ (cf. also Random variable) with means $\mu_x=\mathsf{E}(X)$, $\mu_y=\mathsf{E}(Y)$ and finite variances $\sigma^2_x=\text{var}(X)$, $\sigma^2_y=\text{var}(Y)$:


where $\text{cov}(X,Y)$ is the covariance of $X$ and $Y$,


It readily follows that $-1\leq\rho\leq+1$, and that $\rho$ is equal to $-1$ or $+1$ if and only if each of $X$ and $Y$ is almost surely a linear function of the other, i.e., $Y=\alpha+\beta X(\beta\neq0)$ ($1$) with probability $1$ (furthermore, $\rho$ and $\beta$ have the same sign). If $\rho=0$, $X$ and $Y$ are said to be uncorrelated. Independent random variables are always uncorrelated, however uncorrelated random variables need not be independent (cf. also Independence).

The term "product-moment" refers to the observation that $\rho=\mu_{11}/\sqrt{\mu_{20}\:\mu_{02}}$, where $\mu_{ij}=\mathsf{E}[(X-\mu_x)^i(Y-\mu_y)^j]$ denotes the $(i,j)$th product moment of $X$ and $Y$ about their means.

The coefficient $\rho$ also plays a role in linear regression (cf. also Regression analysis). If the regression of $Y$ on $X$ is linear, then $y=\mathsf{E}(Y|X=x)=\mu_y+\rho(\sigma_y/\sigma_x)(x-\mu_x)$, and if the regression of $X$ on $Y$ is linear, then $x=\mathsf{E}(X|Y=y)=\mu_x+\rho(\sigma_x/\sigma_y)(y-\mu_y)$. Note that the product of the two slopes is $\rho^2$.

When $X$ and $Y$ have a bivariate normal distribution (cf. also Normal distribution), $\rho$ is a parameter of the joint density function




Unlike the general situation, uncorrelated random variables with a bivariate normal distribution are independent.

For a random sample $\{(x_i,y_i)\}^n_{i=1}$ from a bivariate population, $\rho$ is estimated by the sample correlation coefficient (cf. also Correlation coefficient) $r$, given by


If $x$ and $y$ denote, respectively, the vectors $(x_1-\overline{x},...,x_n-\overline{x})$ and $(y_1-\overline{y},...,y_n-\overline{y})$, and $\theta$ denotes the angle between $x$ and $y$, then


Further interpretations of $r$ can be found in [a3]. For details on the use of $r$ in hypothesis testing, and for large-sample theory, see [a1].


[a1] O.J. Dunn, V.A. Clark, "Applied statistics: analysis of variance and regression" , Wiley (1974)
[a2] K. Pearson, "Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia" Philos. Trans. Royal Soc. London Ser. A , 187 (1896) pp. 253–318
[a3] J.L. Rodgers, W.A. Nicewander, "Thirteen ways to look at the correlation coefficient" The Amer. Statistician , 42 (1988) pp. 59–65
How to Cite This Entry:
Pearson product-moment correlation coefficient. Encyclopedia of Mathematics. URL:
This article was adapted from an original article by R.B. Nelsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article