# Linear regression

of one random variable $\mathbf Y = ( Y ^ {(} 1) \dots Y ^ {(} m) ) ^ \prime$ on another $\mathbf X = ( X ^ {(} 1) \dots X ^ {(} p) ) ^ \prime$

An $m$- dimensional vector form, linear in $\mathbf x$, supposed to be the conditional mean (given $\mathbf X = \mathbf x$) of the random vector $\mathbf Y$. The corresponding equations

$$\tag{* } y ^ {(} k) ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(} k) \mid \mathbf X = \mathbf x ) = \ \sum _ { j= } 0 ^ { p } b _ {kj} x ^ {(} j) ,$$

$$x ^ {(} 0) \equiv 1 ,\ k = 1 \dots m,$$

are called the linear regression equations of $\mathbf Y$ on $\mathbf X$, and the parameters $b _ {kj}$ are called the regression coefficients (see also Regression), $\mathbf X$ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $\mathbf Y ( \mathbf X )$ under investigation depends.

In addition, the linear regression of $Y ^ {(} k)$ on $\mathbf X$ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $Y ^ {(} k)$ by means of $\mathbf X$, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $( Y _ {i} ^ {(} k) , \mathbf X _ {i} )$, $i = 1 \dots n$, by means of a hyperplane in the space $( Y ^ {(} k) , \mathbf X )$, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $Y ^ {(} k)$ by means of $\mathbf X$( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $Y ^ {(} k)$ by means of linear combinations of $\mathbf X$( linear smoothing of the points $( Y _ {i} ^ {(} k) , \mathbf X _ {i} )$) are:

$$Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right ) ^ {2} \right \} ,$$

$$\widetilde{Q} _ {1} ( \mathbf b ) = \sum _ { i= } 1 ^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(} k) - \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right ) ^ {2} ,$$

$$Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right | \right \} ,$$

$$\widetilde{Q} _ {2} ( \mathbf b ) = \sum _ { j= } 1 ^ { n } \omega _ {i} \left | Y _ {i} ^ {(} k) - \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right | ,$$

$$Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(} k) ( \mathbf X ) , \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right ) \right \} ,$$

$$\widetilde{Q} _ {3} ( \mathbf b ) = \sum _ { i= } 1 ^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(} k) , \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right ) .$$

In these relations the choice of "weights" $\omega ( \mathbf X )$ or $\omega _ {i}$ depends on the nature of the actual scheme under investigation. For example, if the $Y ^ {(} k) ( \mathbf X )$ are interpreted as random variables with known variances ${\mathsf D} Y ^ {(} k) ( \mathbf X )$( or with known estimates of them), then $\omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(} k) ( \mathbf X ) ] ^ {-} 1$. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $\rho ( \cdot , \cdot )$ from $Y ^ {(} k) ( \mathbf X )$ or $Y _ {i} ^ {(} k)$ to the required hyperplane of regression. If the coefficients $b _ {kj}$ are determined by minimizing the quantities $Q _ {1} ( \mathbf b )$ or $\widetilde{Q} _ {1} ( \mathbf b )$, then the linear regression is said to be least squares or $L _ {2}$; if the criteria $Q _ {2} ( \mathbf b )$ and $\widetilde{Q} _ {2} ( \mathbf b )$ are used, the linear regression is said to be minimal absolute deviations or $L _ {1}$; if the criteria $Q _ {3} ( \mathbf b )$ and $\widetilde{Q} _ {3} ( \mathbf b )$ are used, it is said to be minimum $\rho$- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $Q _ {i}$. Thus, if the vector $( \mathbf X ^ \prime , Y ^ {(} k) )$ is subject to a multi-dimensional normal law, then the regression of $Y ^ {(} k)$ on $\mathbf X$ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $\omega ( \mathbf X ) \equiv 1$).

#### References

 [1] Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian) [2] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) [3] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979) [4] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=47663
This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article