# Regression

Dependence of the mean value of some random variable on another variable or on several variables. If, for example, for every value $ x = x _ {i} $
one observes $ n _ {i} $
values $ y _ {i1} \dots y _ {i n _ {i} } $
of a random variable $ Y $,
then the dependence of the arithmetic mean

$$ \overline{y}\; _ {i} = \ \frac{1}{n _ {i} } ( y _ {i1} + \dots + y _ {i n _ {i} } ) $$

of these values on $ x _ {i} $ is a regression in the statistical meaning of the term. If $ \overline{y}\; $ varies systematically with $ x $, one assumes, on the basis of an observed phenomenon, that there is a probabilistic dependence: For every fixed value $ x $ the random variable $ Y $ has a definite probability distribution whose mathematical expectation is a function of $ x $:

$$ {\mathsf E} ( Y \mid x ) = m ( x) . $$

The relation $ y = m( x) $, where $ x $ acts as an "independent" variable, is called a regression (or regression function) in the probabilistic sense of the word. The graph of $ m ( x) $ is called the regression line, or regression curve, of $ Y $ on $ x $. The variable $ x $ is called the regression variable or regressor. The accuracy with which the regression curve of $ Y $ on $ x $ reflects the average variation of $ Y $ with variation in $ x $ is measured by the variance of $ Y $( cf. Dispersion), and is computed for every value as follows:

$$ {\mathsf D} ( Y \mid x ) = \sigma ^ {2} ( x) . $$

Graphically, the dependence of $ \sigma ^ {2} ( x) $ on $ x $ is expressed by the scedastic curve. If $ \sigma ^ {2} ( x) = 0 $ for all values of $ x $, then with probability 1 the variables are connected by a perfect functional dependence. If $ \sigma ^ {2} ( x) \neq 0 $ at any value of $ x $ and $ m ( x) $ does not depend on $ x $, then regression of $ Y $ with respect to $ x $ is absent.

In probability theory, the problem of regression is solved in case the values of the regression variable $ x $ correspond to the values of a certain random variable $ X $, and it is assumed that one knows the joint probability distribution of the variables $ X $ and $ Y $( here, the expectation $ {\mathsf E} ( Y \mid x ) $ and the variance $ {\mathsf D} ( Y \mid x ) $ will be the conditional expectation and conditional variance of $ Y $, respectively, for a fixed value $ X = x $). In this case, two regressions are defined: $ Y $ with respect to $ x $ and $ X $ with respect to $ y $, and the concept of regression can also be used to introduce certain measures of the interrelation between $ X $ and $ Y $, defined as characteristics of the degree of concentration of the distribution around the regression curves (see Correlation (in statistics)).

Regression functions possess the property that among all real-valued functions $ f ( x) $ the minimum expectation $ {\mathsf E} ( Y - f ( x) ) ^ {2} $ is attained when $ f ( x) = m ( x) $, that is, the regression of $ Y $ with respect to $ x $ gives the best (in the above sense) representation of the variable $ Y $. The most important case is when the regression of $ Y $ with respect to $ x $ is linear, that is,

$$ {\mathsf E} ( Y \mid x ) = \beta _ {0} + \beta _ {1} x . $$

The coefficients $ \beta _ {0} $ and $ \beta _ {1} $ are called regression coefficients, and are easily calculated:

$$ \beta _ {0} = \ m _ {Y} - \rho \frac{\sigma _ {Y} }{\sigma _ {X} } m _ {X} ,\ \ \beta _ {1} = \rho \frac{\sigma _ {Y} }{\sigma _ {X} } $$

(where $ \rho $ is the correlation coefficient of $ X $ and $ Y $, $ m _ {X} = {\mathsf E} X $, $ m _ {Y} = {\mathsf E} Y $, $ \sigma _ {X} ^ {2} = {\mathsf D} X $, and $ \sigma ^ {2} = {\mathsf D} Y $), and the regression curve of $ Y $ with respect to $ x $ has the form

$$ y = m _ {Y} + \rho \frac{\sigma _ {Y} }{\sigma _ {X} } ( x - m _ {X} ) ; $$

the regression curve of $ X $ with respect to $ y $ is found in a similar way. The linear regression is exact in the case when the two-dimensional distribution of the variables $ X $ and $ Y $ is normal.

Under the conditions of statistical applications, when for the exact determination of the regression there are insufficient facts about the form of the joint probability distribution, there arises the problem of the approximate determination of the regression. To solve this problem, one can choose, out of all functions $ g ( x) $ belonging to a given class, that function which gives the best representation of the variable $ Y $, in the sense that the expectation $ {\mathsf E} ( Y - g ( X) ) ^ {2} $ is minimized. This function is called the mean-square (mean-quadratic) regression.

The simplest case is that of linear mean-square regression, when one looks for the best linear approximation to $ Y $ by means of $ X $, that is, a linear function $ g ( x) = \beta _ {0} + \beta _ {1} x $ for which the expression $ {\mathsf E} ( Y - g ( X) ) ^ {2} $ takes the smallest possible value. The given extremal problem has a unique solution:

$$ \beta _ {0} = m _ {Y} - \beta _ {1} m _ {X} ,\ \ \beta _ {1} = \rho \frac{\sigma _ {Y} }{\sigma _ {X} } , $$

that is, the calculation of an approximate regression curve leads to the same result as that obtained in the case of exact linear regression:

$$ y = m _ {Y} + \rho \frac{\sigma _ {Y} }{\sigma _ {X} } ( x - m _ {X} ) . $$

The minimal value of $ {\mathsf E} ( Y - g ( X) ) ^ {2} $, for calculated values of the parameters, is equal to $ \sigma _ {Y} ^ {2} ( 1 - \rho ^ {2} ) $. If a regression $ m ( x) $ exists, then, for all $ \beta _ {0} $ and $ \beta _ {1} $,

$$ {\mathsf E} [ Y - \beta _ {0} - \beta _ {1} X ] ^ {2} = \ {\mathsf E} [ Y - m ( X) ] ^ {2} + {\mathsf E} [ m ( X) - \beta _ {0} - \beta _ {1} X ] ^ {2} . $$

This implies that the mean-square regression curve $ y = \beta _ {0} + \beta _ {1} X $ gives the best approximation along the $ y $- axis. Therefore, if the curve $ m ( x) $ is a straight line, it coincides with the mean-square regression line.

In the general case, when the regression is far from being linear, one can pose the problem of finding a polynomial $ g ( x) = \beta _ {0} + \beta _ {1} x + \dots + \beta _ {m} x ^ {m} $ of a certain degree $ m $ for which $ {\mathsf E} ( Y - g ( x) ) ^ {2} $ is as small as possible.

A solution of this problem corresponds to polynomial mean-square regression (see Parabolic regression). The function $ y = g ( x) $ is a polynomial of order $ m $, and gives the best approximation to the true regression curve. A generalization of polynomial regression is the regression function expressed as a linear combination of certain given functions:

$$ g ( x) = \beta _ {0} \phi _ {0} ( x) + \dots + \beta _ {m} \phi _ {m} ( x) . $$

The most important case is when $ \phi _ {0} ( x) \dots \phi _ {m} ( x) $ are orthogonal polynomials of corresponding orders constructed from the distribution of $ X $. There are other examples of non-linear (curvilinear) regression, such as trigonometric regression and exponential regression.

The concept of regression can be extended in a natural way to the case where, instead of one regression variable, some set of variables is considered. If the random variables $ X _ {1} \dots X _ {n} $ have a joint probability distribution, then one can define a multiple regression, e.g. as the regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $:

$$ {\mathsf E} ( X _ {1} \mid X _ {2} = x _ {2} \dots X _ {n} = x _ {n} ) = \ m _ {1} ( x _ {2} \dots x _ {n} ) . $$

The corresponding equation defines the regression surface of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $. The linear regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $ has the form

$$ {\mathsf E} ( X _ {1} \mid x _ {2} \dots x _ {n} ) = \ \beta _ {2} x _ {2} + \dots + \beta _ {n} x _ {n} , $$

where $ \beta _ {2} \dots \beta _ {n} $ are the regression coefficients (if $ {\mathsf E} X _ {k} = 0 $). The linear mean-square regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $ is defined as the best linear estimator of the variable $ X _ {1} $ in terms of the variables $ X _ {2} \dots X _ {n} $, in the sense that

$$ {\mathsf E} ( X _ {1} - \beta _ {2} X _ {2} - \dots - \beta _ {n} X _ {n} ) ^ {2} $$

is minimized. The corresponding regression plane gives the best approximation to the regression surface $ x _ {1} = m ( x _ {2} \dots x _ {n} ) $, if the latter exists. If the regression surface is a plane, then it necessarily coincides with the mean-square regression plane (as happens in the case when the joint distribution of all $ n $ variables is normal).

A simple example of regression of $ Y $ with respect to $ X $ is given by the dependence between $ Y $ and $ X $ expressed by the relation $ Y = u ( x) + \delta $, where $ u ( x) = {\mathsf E} ( Y \mid X = x ) $, where $ X $ and $ \delta $ are independent random variables. This representation is useful when designing an experiment for studying a functional relation $ y = u ( x) $ between two non-random variables $ y $ and $ x $. The same regression model is used in numerous applications to study the nature of dependence of a random variable $ Y $ on a non-random variable $ x $. In practice, the choice of the function $ y = u ( x) $ and the estimation of the unknown regression coefficients by experimental data are made using methods of regression analysis.

#### References

[1] | H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) |

[2] | M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979) |

**How to Cite This Entry:**

Regression.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Regression&oldid=48472