Regression
Dependence of the mean value of some random variable on another variable or on several variables. If, for example, for every value one observes values of a random variable , then the dependence of the arithmetic mean
of these values on is a regression in the statistical meaning of the term. If varies systematically with , one assumes, on the basis of an observed phenomenon, that there is a probabilistic dependence: For every fixed value the random variable has a definite probability distribution whose mathematical expectation is a function of :
The relation , where acts as an "independent" variable, is called a regression (or regression function) in the probabilistic sense of the word. The graph of is called the regression line, or regression curve, of on . The variable is called the regression variable or regressor. The accuracy with which the regression curve of on reflects the average variation of with variation in is measured by the variance of (cf. Dispersion), and is computed for every value as follows:
Graphically, the dependence of on is expressed by the scedastic curve. If for all values of , then with probability 1 the variables are connected by a perfect functional dependence. If at any value of and does not depend on , then regression of with respect to is absent.
In probability theory, the problem of regression is solved in case the values of the regression variable correspond to the values of a certain random variable , and it is assumed that one knows the joint probability distribution of the variables and (here, the expectation and the variance will be the conditional expectation and conditional variance of , respectively, for a fixed value ). In this case, two regressions are defined: with respect to and with respect to , and the concept of regression can also be used to introduce certain measures of the interrelation between and , defined as characteristics of the degree of concentration of the distribution around the regression curves (see Correlation (in statistics)).
Regression functions possess the property that among all real-valued functions the minimum expectation is attained when , that is, the regression of with respect to gives the best (in the above sense) representation of the variable . The most important case is when the regression of with respect to is linear, that is,
The coefficients and are called regression coefficients, and are easily calculated:
(where is the correlation coefficient of and , , , , and ), and the regression curve of with respect to has the form
the regression curve of with respect to is found in a similar way. The linear regression is exact in the case when the two-dimensional distribution of the variables and is normal.
Under the conditions of statistical applications, when for the exact determination of the regression there are insufficient facts about the form of the joint probability distribution, there arises the problem of the approximate determination of the regression. To solve this problem, one can choose, out of all functions belonging to a given class, that function which gives the best representation of the variable , in the sense that the expectation is minimized. This function is called the mean-square (mean-quadratic) regression.
The simplest case is that of linear mean-square regression, when one looks for the best linear approximation to by means of , that is, a linear function for which the expression takes the smallest possible value. The given extremal problem has a unique solution:
that is, the calculation of an approximate regression curve leads to the same result as that obtained in the case of exact linear regression:
The minimal value of , for calculated values of the parameters, is equal to . If a regression exists, then, for all and ,
This implies that the mean-square regression curve gives the best approximation along the -axis. Therefore, if the curve is a straight line, it coincides with the mean-square regression line.
In the general case, when the regression is far from being linear, one can pose the problem of finding a polynomial of a certain degree for which is as small as possible.
A solution of this problem corresponds to polynomial mean-square regression (see Parabolic regression). The function is a polynomial of order , and gives the best approximation to the true regression curve. A generalization of polynomial regression is the regression function expressed as a linear combination of certain given functions:
The most important case is when are orthogonal polynomials of corresponding orders constructed from the distribution of . There are other examples of non-linear (curvilinear) regression, such as trigonometric regression and exponential regression.
The concept of regression can be extended in a natural way to the case where, instead of one regression variable, some set of variables is considered. If the random variables have a joint probability distribution, then one can define a multiple regression, e.g. as the regression of with respect to :
The corresponding equation defines the regression surface of with respect to . The linear regression of with respect to has the form
where are the regression coefficients (if ). The linear mean-square regression of with respect to is defined as the best linear estimator of the variable in terms of the variables , in the sense that
is minimized. The corresponding regression plane gives the best approximation to the regression surface , if the latter exists. If the regression surface is a plane, then it necessarily coincides with the mean-square regression plane (as happens in the case when the joint distribution of all variables is normal).
A simple example of regression of with respect to is given by the dependence between and expressed by the relation , where , where and are independent random variables. This representation is useful when designing an experiment for studying a functional relation between two non-random variables and . The same regression model is used in numerous applications to study the nature of dependence of a random variable on a non-random variable . In practice, the choice of the function and the estimation of the unknown regression coefficients by experimental data are made using methods of regression analysis.
References
[1] | H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) |
[2] | M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979) |
Regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Regression&oldid=48472