Difference between revisions of "Regression"

Latest revision as of 08:10, 6 June 2020

Dependence of the mean value of some random variable on another variable or on several variables. If, for example, for every value $ x = x _ {i} $ one observes $ n _ {i} $ values $ y _ {i1} \dots y _ {i n _ {i} } $ of a random variable $ Y $, then the dependence of the arithmetic mean

$$ \overline{y}\; _ {i} = \ \frac{1}{n _ {i} } ( y _ {i1} + \dots + y _ {i n _ {i} } ) $$

of these values on $ x _ {i} $ is a regression in the statistical meaning of the term. If $ \overline{y}\; $ varies systematically with $ x $, one assumes, on the basis of an observed phenomenon, that there is a probabilistic dependence: For every fixed value $ x $ the random variable $ Y $ has a definite probability distribution whose mathematical expectation is a function of $ x $:

$$ {\mathsf E} ( Y \mid x ) = m ( x) . $$

The relation $ y = m( x) $, where $ x $ acts as an "independent" variable, is called a regression (or regression function) in the probabilistic sense of the word. The graph of $ m ( x) $ is called the regression line, or regression curve, of $ Y $ on $ x $. The variable $ x $ is called the regression variable or regressor. The accuracy with which the regression curve of $ Y $ on $ x $ reflects the average variation of $ Y $ with variation in $ x $ is measured by the variance of $ Y $( cf. Dispersion), and is computed for every value as follows:

$$ {\mathsf D} ( Y \mid x ) = \sigma ^ {2} ( x) . $$

Graphically, the dependence of $ \sigma ^ {2} ( x) $ on $ x $ is expressed by the scedastic curve. If $ \sigma ^ {2} ( x) = 0 $ for all values of $ x $, then with probability 1 the variables are connected by a perfect functional dependence. If $ \sigma ^ {2} ( x) \neq 0 $ at any value of $ x $ and $ m ( x) $ does not depend on $ x $, then regression of $ Y $ with respect to $ x $ is absent.

In probability theory, the problem of regression is solved in case the values of the regression variable $ x $ correspond to the values of a certain random variable $ X $, and it is assumed that one knows the joint probability distribution of the variables $ X $ and $ Y $( here, the expectation $ {\mathsf E} ( Y \mid x ) $ and the variance $ {\mathsf D} ( Y \mid x ) $ will be the conditional expectation and conditional variance of $ Y $, respectively, for a fixed value $ X = x $). In this case, two regressions are defined: $ Y $ with respect to $ x $ and $ X $ with respect to $ y $, and the concept of regression can also be used to introduce certain measures of the interrelation between $ X $ and $ Y $, defined as characteristics of the degree of concentration of the distribution around the regression curves (see Correlation (in statistics)).

Regression functions possess the property that among all real-valued functions $ f ( x) $ the minimum expectation $ {\mathsf E} ( Y - f ( x) ) ^ {2} $ is attained when $ f ( x) = m ( x) $, that is, the regression of $ Y $ with respect to $ x $ gives the best (in the above sense) representation of the variable $ Y $. The most important case is when the regression of $ Y $ with respect to $ x $ is linear, that is,

$$ {\mathsf E} ( Y \mid x ) = \beta _ {0} + \beta _ {1} x . $$

The coefficients $ \beta _ {0} $ and $ \beta _ {1} $ are called regression coefficients, and are easily calculated:

$$ \beta _ {0} = \ m _ {Y} - \rho \frac{\sigma _ {Y} }{\sigma _ {X} } m _ {X} ,\ \ \beta _ {1} = \rho \frac{\sigma _ {Y} }{\sigma _ {X} } $$

(where $ \rho $ is the correlation coefficient of $ X $ and $ Y $, $ m _ {X} = {\mathsf E} X $, $ m _ {Y} = {\mathsf E} Y $, $ \sigma _ {X} ^ {2} = {\mathsf D} X $, and $ \sigma ^ {2} = {\mathsf D} Y $), and the regression curve of $ Y $ with respect to $ x $ has the form

$$ y = m _ {Y} + \rho \frac{\sigma _ {Y} }{\sigma _ {X} } ( x - m _ {X} ) ; $$

the regression curve of $ X $ with respect to $ y $ is found in a similar way. The linear regression is exact in the case when the two-dimensional distribution of the variables $ X $ and $ Y $ is normal.

Under the conditions of statistical applications, when for the exact determination of the regression there are insufficient facts about the form of the joint probability distribution, there arises the problem of the approximate determination of the regression. To solve this problem, one can choose, out of all functions $ g ( x) $ belonging to a given class, that function which gives the best representation of the variable $ Y $, in the sense that the expectation $ {\mathsf E} ( Y - g ( X) ) ^ {2} $ is minimized. This function is called the mean-square (mean-quadratic) regression.

The simplest case is that of linear mean-square regression, when one looks for the best linear approximation to $ Y $ by means of $ X $, that is, a linear function $ g ( x) = \beta _ {0} + \beta _ {1} x $ for which the expression $ {\mathsf E} ( Y - g ( X) ) ^ {2} $ takes the smallest possible value. The given extremal problem has a unique solution:

$$ \beta _ {0} = m _ {Y} - \beta _ {1} m _ {X} ,\ \ \beta _ {1} = \rho \frac{\sigma _ {Y} }{\sigma _ {X} } , $$

that is, the calculation of an approximate regression curve leads to the same result as that obtained in the case of exact linear regression:

$$ y = m _ {Y} + \rho \frac{\sigma _ {Y} }{\sigma _ {X} } ( x - m _ {X} ) . $$

The minimal value of $ {\mathsf E} ( Y - g ( X) ) ^ {2} $, for calculated values of the parameters, is equal to $ \sigma _ {Y} ^ {2} ( 1 - \rho ^ {2} ) $. If a regression $ m ( x) $ exists, then, for all $ \beta _ {0} $ and $ \beta _ {1} $,

$$ {\mathsf E} [ Y - \beta _ {0} - \beta _ {1} X ] ^ {2} = \ {\mathsf E} [ Y - m ( X) ] ^ {2} + {\mathsf E} [ m ( X) - \beta _ {0} - \beta _ {1} X ] ^ {2} . $$

This implies that the mean-square regression curve $ y = \beta _ {0} + \beta _ {1} X $ gives the best approximation along the $ y $- axis. Therefore, if the curve $ m ( x) $ is a straight line, it coincides with the mean-square regression line.

In the general case, when the regression is far from being linear, one can pose the problem of finding a polynomial $ g ( x) = \beta _ {0} + \beta _ {1} x + \dots + \beta _ {m} x ^ {m} $ of a certain degree $ m $ for which $ {\mathsf E} ( Y - g ( x) ) ^ {2} $ is as small as possible.

A solution of this problem corresponds to polynomial mean-square regression (see Parabolic regression). The function $ y = g ( x) $ is a polynomial of order $ m $, and gives the best approximation to the true regression curve. A generalization of polynomial regression is the regression function expressed as a linear combination of certain given functions:

$$ g ( x) = \beta _ {0} \phi _ {0} ( x) + \dots + \beta _ {m} \phi _ {m} ( x) . $$

The most important case is when $ \phi _ {0} ( x) \dots \phi _ {m} ( x) $ are orthogonal polynomials of corresponding orders constructed from the distribution of $ X $. There are other examples of non-linear (curvilinear) regression, such as trigonometric regression and exponential regression.

The concept of regression can be extended in a natural way to the case where, instead of one regression variable, some set of variables is considered. If the random variables $ X _ {1} \dots X _ {n} $ have a joint probability distribution, then one can define a multiple regression, e.g. as the regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $:

$$ {\mathsf E} ( X _ {1} \mid X _ {2} = x _ {2} \dots X _ {n} = x _ {n} ) = \ m _ {1} ( x _ {2} \dots x _ {n} ) . $$

The corresponding equation defines the regression surface of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $. The linear regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $ has the form

$$ {\mathsf E} ( X _ {1} \mid x _ {2} \dots x _ {n} ) = \ \beta _ {2} x _ {2} + \dots + \beta _ {n} x _ {n} , $$

where $ \beta _ {2} \dots \beta _ {n} $ are the regression coefficients (if $ {\mathsf E} X _ {k} = 0 $). The linear mean-square regression of $ X _ {1} $ with respect to $ x _ {2} \dots x _ {n} $ is defined as the best linear estimator of the variable $ X _ {1} $ in terms of the variables $ X _ {2} \dots X _ {n} $, in the sense that

$$ {\mathsf E} ( X _ {1} - \beta _ {2} X _ {2} - \dots - \beta _ {n} X _ {n} ) ^ {2} $$

is minimized. The corresponding regression plane gives the best approximation to the regression surface $ x _ {1} = m ( x _ {2} \dots x _ {n} ) $, if the latter exists. If the regression surface is a plane, then it necessarily coincides with the mean-square regression plane (as happens in the case when the joint distribution of all $ n $ variables is normal).

A simple example of regression of $ Y $ with respect to $ X $ is given by the dependence between $ Y $ and $ X $ expressed by the relation $ Y = u ( x) + \delta $, where $ u ( x) = {\mathsf E} ( Y \mid X = x ) $, where $ X $ and $ \delta $ are independent random variables. This representation is useful when designing an experiment for studying a functional relation $ y = u ( x) $ between two non-random variables $ y $ and $ x $. The same regression model is used in numerous applications to study the nature of dependence of a random variable $ Y $ on a non-random variable $ x $. In practice, the choice of the function $ y = u ( x) $ and the estimation of the unknown regression coefficients by experimental data are made using methods of regression analysis.

References

[1]	H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[2]	M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979)

How to Cite This Entry:
Regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Regression&oldid=48472

This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Regression"

Latest revision as of 08:10, 6 June 2020

References

@@ Line 1: / Line 1: @@
-Dependence of the mean value of some random variable on another variable or on several variables. If, for example, for every value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806101.png" /> one observes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806102.png" /> values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806103.png" /> of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806104.png" />, then the dependence of the arithmetic mean
+<!--
+r0806101.png
+$#A+1 = 132 n = 0
+$#C+1 = 132 : ~/encyclopedia/old_files/data/R080/R.0800610 Regression
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806105.png" /></td> </tr></table>
+{{TEX|auto}}
+{{TEX|done}}
-of these values on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806106.png" /> is a regression in the statistical meaning of the term. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806107.png" /> varies systematically with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806108.png" />, one assumes, on the basis of an observed phenomenon, that there is a probabilistic dependence: For every fixed value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r0806109.png" /> the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061010.png" /> has a definite probability distribution whose mathematical expectation is a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061011.png" />:
+Dependence of the mean value of some random variable on another variable or on several variables. If, for example, for every value  $  x = x _ {i} $
+one observes  $  n _ {i} $
+values  $  y _ {i1} \dots y _ {i n _ {i}  } $
+of a random variable  $  Y $,
+then the dependence of the arithmetic mean
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061012.png" /></td> </tr></table>
+$$
+\overline{y}\; _ {i}  = \
-The relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061013.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061014.png" /> acts as an  "independent"  variable, is called a regression (or regression function) in the probabilistic sense of the word. The graph of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061015.png" /> is called the regression line, or regression curve, of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061016.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061017.png" />. The variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061018.png" /> is called the regression variable or regressor. The accuracy with which the regression curve of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061019.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061020.png" /> reflects the average variation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061021.png" /> with variation in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061022.png" /> is measured by the variance of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061023.png" /> (cf. [[Dispersion|Dispersion]]), and is computed for every value as follows:
+\frac{1}{n _ {i} }
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061024.png" /></td> </tr></table>
+( y _ {i1} + \dots + y _ {i n _ {i}  } )
+$$
-Graphically, the dependence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061025.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061026.png" /> is expressed by the scedastic curve. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061027.png" /> for all values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061028.png" />, then with probability 1 the variables are connected by a perfect functional dependence. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061029.png" /> at any value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061031.png" /> does not depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061032.png" />, then regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061033.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061034.png" /> is absent.
+of these values on  $  x _ {i} $
+is a regression in the statistical meaning of the term. If  $  \overline{y}\; $
+varies systematically with  $  x $,
+one assumes, on the basis of an observed phenomenon, that there is a probabilistic dependence: For every fixed value  $  x $
+the random variable  $  Y $
+has a definite probability distribution whose mathematical expectation is a function of  $  x $:
-In probability theory, the problem of regression is solved in case the values of the regression variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061035.png" /> correspond to the values of a certain random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061036.png" />, and it is assumed that one knows the joint probability distribution of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061037.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061038.png" /> (here, the expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061039.png" /> and the variance <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061040.png" /> will be the conditional expectation and conditional variance of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061041.png" />, respectively, for a fixed value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061042.png" />). In this case, two regressions are defined: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061043.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061044.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061045.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061046.png" />, and the concept of regression can also be used to introduce certain measures of the interrelation between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061047.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061048.png" />, defined as characteristics of the degree of concentration of the distribution around the regression curves (see [[Correlation (in statistics)|Correlation (in statistics)]]).
+$$
+{\mathsf E} ( Y \mid  x )  =  m ( x) .
+$$
-Regression functions possess the property that among all real-valued functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061049.png" /> the minimum expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061050.png" /> is attained when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061051.png" />, that is, the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061052.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061053.png" /> gives the best (in the above sense) representation of the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061054.png" />. The most important case is when the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061055.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061056.png" /> is linear, that is,
+The relation  $  y = m( x) $,
+where  $  x $
+acts as an  "independent"  variable, is called a regression (or regression function) in the probabilistic sense of the word. The graph of  $  m ( x) $
+is called the regression line, or regression curve, of  $  Y $
+on  $  x $.
+The variable  $  x $
+is called the regression variable or regressor. The accuracy with which the regression curve of  $  Y $
+on  $  x $
+reflects the average variation of  $  Y $
+with variation in  $  x $
+is measured by the variance of  $  Y $(
+cf. [[Dispersion|Dispersion]]), and is computed for every value as follows:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061057.png" /></td> </tr></table>
+$$
+{\mathsf D} ( Y \mid  x )  =  \sigma  ^ {2} ( x) .
+$$
-The coefficients <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061058.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061059.png" /> are called regression coefficients, and are easily calculated:
+Graphically, the dependence of  $  \sigma  ^ {2} ( x) $
+on  $  x $
+is expressed by the scedastic curve. If  $  \sigma  ^ {2} ( x) = 0 $
+for all values of  $  x $,
+then with probability 1 the variables are connected by a perfect functional dependence. If  $  \sigma  ^ {2} ( x) \neq 0 $
+at any value of  $  x $
+and  $  m ( x) $
+does not depend on  $  x $,
+then regression of  $  Y $
+with respect to  $  x $
+is absent.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061060.png" /></td> </tr></table>
+In probability theory, the problem of regression is solved in case the values of the regression variable  $  x $
+correspond to the values of a certain random variable  $  X $,
+and it is assumed that one knows the joint probability distribution of the variables  $  X $
+and  $  Y $(
+here, the expectation  $  {\mathsf E} ( Y \mid  x ) $
+and the variance  $  {\mathsf D} ( Y \mid  x ) $
+will be the conditional expectation and conditional variance of  $  Y $,
+respectively, for a fixed value  $  X = x $).
+In this case, two regressions are defined:  $  Y $
+with respect to  $  x $
+and  $  X $
+with respect to  $  y $,
+and the concept of regression can also be used to introduce certain measures of the interrelation between  $  X $
+and  $  Y $,
+defined as characteristics of the degree of concentration of the distribution around the regression curves (see [[Correlation (in statistics)|Correlation (in statistics)]]).
-(where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061061.png" /> is the [[Correlation coefficient|correlation coefficient]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061062.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061063.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061065.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061066.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061067.png" />), and the regression curve of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061068.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061069.png" /> has the form
+Regression functions possess the property that among all real-valued functions  $  f ( x) $
+the minimum expectation  $  {\mathsf E} ( Y - f ( x) )  ^ {2} $
+is attained when  $  f ( x) = m ( x) $,
+that is, the regression of  $  Y $
+with respect to  $  x $
+gives the best (in the above sense) representation of the variable  $  Y $.
+The most important case is when the regression of  $  Y $
+with respect to  $  x $
+is linear, that is,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061070.png" /></td> </tr></table>
+$$
+{\mathsf E} ( Y \mid  x )  =  \beta _ {0} + \beta _ {1} x .
+$$
-the regression curve of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061071.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061072.png" /> is found in a similar way. The linear regression is exact in the case when the two-dimensional distribution of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061073.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061074.png" /> is normal.
+The coefficients  $  \beta _ {0} $
+and  $  \beta _ {1} $
+are called regression coefficients, and are easily calculated:
-Under the conditions of statistical applications, when for the exact determination of the regression there are insufficient facts about the form of the joint probability distribution, there arises the problem of the approximate determination of the regression. To solve this problem, one can choose, out of all functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061075.png" /> belonging to a given class, that function which gives the best representation of the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061076.png" />, in the sense that the expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061077.png" /> is minimized. This function is called the mean-square (mean-quadratic) regression.
+$$
+\beta _ {0}  = \
+m _ {Y} - \rho
+\frac{\sigma _ {Y} }{\sigma _ {X} }
+ m _ {X} ,\ \
+\beta _ {1}  =  \rho
+\frac{\sigma _ {Y} }{\sigma _ {X} }
-The simplest case is that of linear mean-square regression, when one looks for the best linear approximation to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061078.png" /> by means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061079.png" />, that is, a linear function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061080.png" /> for which the expression <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061081.png" /> takes the smallest possible value. The given extremal problem has a unique solution:
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061082.png" /></td> </tr></table>
+(where  $  \rho $
+is the [[Correlation coefficient|correlation coefficient]] of  $  X $
+and  $  Y $,
+$  m _ {X} = {\mathsf E} X $,
+$  m _ {Y} = {\mathsf E} Y $,
+$  \sigma _ {X}  ^ {2} = {\mathsf D} X $,
+and  $  \sigma  ^ {2} = {\mathsf D} Y $),
+and the regression curve of  $  Y $
+with respect to  $  x $
+has the form
+$$
+y  =  m _ {Y} + \rho
+\frac{\sigma _ {Y} }{\sigma _ {X} }
+( x - m _ {X} ) ;
+$$
+the regression curve of  $  X $
+with respect to  $  y $
+is found in a similar way. The linear regression is exact in the case when the two-dimensional distribution of the variables  $  X $
+and  $  Y $
+is normal.
+Under the conditions of statistical applications, when for the exact determination of the regression there are insufficient facts about the form of the joint probability distribution, there arises the problem of the approximate determination of the regression. To solve this problem, one can choose, out of all functions  $  g ( x) $
+belonging to a given class, that function which gives the best representation of the variable  $  Y $,
+in the sense that the expectation  $  {\mathsf E} ( Y - g ( X) )  ^ {2} $
+is minimized. This function is called the mean-square (mean-quadratic) regression.
+The simplest case is that of linear mean-square regression, when one looks for the best linear approximation to  $  Y $
+by means of  $  X $,
+that is, a linear function  $  g ( x) = \beta _ {0} + \beta _ {1} x $
+for which the expression  $  {\mathsf E} ( Y - g ( X) )  ^ {2} $
+takes the smallest possible value. The given extremal problem has a unique solution:
+$$
+\beta _ {0}  =  m _ {Y} - \beta _ {1} m _ {X} ,\ \
+\beta _ {1}  =  \rho
+\frac{\sigma _ {Y} }{\sigma _ {X} }
+ ,
+$$
 that is, the calculation of an approximate regression curve leads to the same result as that obtained in the case of exact linear regression:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061083.png" /></td> </tr></table>
+$$
+y  =  m _ {Y} + \rho
+\frac{\sigma _ {Y} }{\sigma _ {X} }
+( x - m _ {X} ) .
+$$
-The minimal value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061084.png" />, for calculated values of the parameters, is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061085.png" />. If a regression <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061086.png" /> exists, then, for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061087.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061088.png" />,
+The minimal value of  $  {\mathsf E} ( Y - g ( X) )  ^ {2} $,
+for calculated values of the parameters, is equal to  $  \sigma _ {Y}  ^ {2} ( 1 - \rho  ^ {2} ) $.
+If a regression  $  m ( x) $
+exists, then, for all  $  \beta _ {0} $
+and  $  \beta _ {1} $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061089.png" /></td> </tr></table>
+$$
+{\mathsf E} [ Y - \beta _ {0} - \beta _ {1} X ]  ^ {2}  = \
+{\mathsf E} [ Y - m ( X) ]  ^ {2} +
+{\mathsf E} [ m ( X) - \beta _ {0} - \beta _ {1} X ]  ^ {2} .
+$$
-This implies that the mean-square regression curve <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061090.png" /> gives the best approximation along the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061091.png" />-axis. Therefore, if the curve <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061092.png" /> is a straight line, it coincides with the mean-square regression line.
+This implies that the mean-square regression curve  $  y = \beta _ {0} + \beta _ {1} X $
+gives the best approximation along the  $  y $-
+axis. Therefore, if the curve  $  m ( x) $
+is a straight line, it coincides with the mean-square regression line.
-In the general case, when the regression is far from being linear, one can pose the problem of finding a polynomial <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061093.png" /> of a certain degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061094.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061095.png" /> is as small as possible.
+In the general case, when the regression is far from being linear, one can pose the problem of finding a polynomial  $  g ( x) = \beta _ {0} + \beta _ {1} x + \dots + \beta _ {m} x  ^ {m} $
+of a certain degree  $  m $
+for which  $  {\mathsf E} ( Y - g ( x) )  ^ {2} $
+is as small as possible.
-A solution of this problem corresponds to polynomial mean-square regression (see [[Parabolic regression|Parabolic regression]]). The function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061096.png" /> is a polynomial of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061097.png" />, and gives the best approximation to the true regression curve. A generalization of polynomial regression is the regression function expressed as a linear combination of certain given functions:
+A solution of this problem corresponds to polynomial mean-square regression (see [[Parabolic regression|Parabolic regression]]). The function  $  y = g ( x) $
+is a polynomial of order  $  m $,
+and gives the best approximation to the true regression curve. A generalization of polynomial regression is the regression function expressed as a linear combination of certain given functions:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061098.png" /></td> </tr></table>
+$$
+g ( x)  =  \beta _ {0} \phi _ {0} ( x) + \dots + \beta _ {m} \phi _ {m} ( x) .
+$$
-The most important case is when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r08061099.png" /> are orthogonal polynomials of corresponding orders constructed from the distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610100.png" />. There are other examples of non-linear (curvilinear) regression, such as trigonometric regression and exponential regression.
+The most important case is when  $  \phi _ {0} ( x) \dots \phi _ {m} ( x) $
+are orthogonal polynomials of corresponding orders constructed from the distribution of  $  X $.
+There are other examples of non-linear (curvilinear) regression, such as trigonometric regression and exponential regression.
-The concept of regression can be extended in a natural way to the case where, instead of one regression variable, some set of variables is considered. If the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610101.png" /> have a joint probability distribution, then one can define a multiple regression, e.g. as the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610102.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610103.png" />:
+The concept of regression can be extended in a natural way to the case where, instead of one regression variable, some set of variables is considered. If the random variables  $  X _ {1} \dots X _ {n} $
+have a joint probability distribution, then one can define a multiple regression, e.g. as the regression of  $  X _ {1} $
+with respect to  $  x _ {2} \dots x _ {n} $:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610104.png" /></td> </tr></table>
+$$
+{\mathsf E} ( X _ {1} \mid  X _ {2} = x _ {2} \dots X _ {n} = x _ {n} )  = \
+m _ {1} ( x _ {2} \dots x _ {n} ) .
+$$
-The corresponding equation defines the regression surface of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610105.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610106.png" />. The linear regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610107.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610108.png" /> has the form
+The corresponding equation defines the regression surface of  $  X _ {1} $
+with respect to  $  x _ {2} \dots x _ {n} $.
+The linear regression of  $  X _ {1} $
+with respect to  $  x _ {2} \dots x _ {n} $
+has the form
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610109.png" /></td> </tr></table>
+$$
+{\mathsf E} ( X _ {1} \mid  x _ {2} \dots x _ {n} )  = \
+\beta _ {2} x _ {2} + \dots + \beta _ {n} x _ {n} ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610110.png" /> are the regression coefficients (if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610111.png" />). The linear mean-square regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610112.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610113.png" /> is defined as the best linear estimator of the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610114.png" /> in terms of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610115.png" />, in the sense that
+where  $  \beta _ {2} \dots \beta _ {n} $
+are the regression coefficients (if  $  {\mathsf E} X _ {k} = 0 $).
+The linear mean-square regression of  $  X _ {1} $
+with respect to  $  x _ {2} \dots x _ {n} $
+is defined as the best linear estimator of the variable  $  X _ {1} $
+in terms of the variables  $  X _ {2} \dots X _ {n} $,
+in the sense that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610116.png" /></td> </tr></table>
+$$
+{\mathsf E} ( X _ {1} - \beta _ {2} X _ {2} - \dots - \beta _ {n} X _ {n} )  ^ {2}
+$$
-is minimized. The corresponding regression plane gives the best approximation to the regression surface <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610117.png" />, if the latter exists. If the regression surface is a plane, then it necessarily coincides with the mean-square regression plane (as happens in the case when the joint distribution of all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610118.png" /> variables is normal).
+is minimized. The corresponding regression plane gives the best approximation to the regression surface  $  x _ {1} = m ( x _ {2} \dots x _ {n} ) $,
+if the latter exists. If the regression surface is a plane, then it necessarily coincides with the mean-square regression plane (as happens in the case when the joint distribution of all  $  n $
+variables is normal).
-A simple example of regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610119.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610120.png" /> is given by the dependence between <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610121.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610122.png" /> expressed by the relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610123.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610124.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610125.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610126.png" /> are independent random variables. This representation is useful when designing an experiment for studying a functional relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610127.png" /> between two non-random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610128.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610129.png" />. The same regression model is used in numerous applications to study the nature of dependence of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610130.png" /> on a non-random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610131.png" />. In practice, the choice of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/r/r080/r080610/r080610132.png" /> and the estimation of the unknown regression coefficients by experimental data are made using methods of [[Regression analysis|regression analysis]].
+A simple example of regression of  $  Y $
+with respect to  $  X $
+is given by the dependence between  $  Y $
+and  $  X $
+expressed by the relation  $  Y = u ( x) + \delta $,
+where  $  u ( x) = {\mathsf E} ( Y \mid  X = x ) $,
+where  $  X $
+and  $  \delta $
+are independent random variables. This representation is useful when designing an experiment for studying a functional relation  $  y = u ( x) $
+between two non-random variables  $  y $
+and  $  x $.
+The same regression model is used in numerous applications to study the nature of dependence of a random variable  $  Y $
+on a non-random variable  $  x $.
+In practice, the choice of the function  $  y = u ( x) $
+and the estimation of the unknown regression coefficients by experimental data are made using methods of [[Regression analysis|regression analysis]].
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Cramér,   "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,   A. Stuart,   "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1979)</TD></TR></table>