Namespaces
Variants
Actions

Linear regression

From Encyclopedia of Mathematics
Jump to: navigation, search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $ on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $

An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations

$$ \tag{* } y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} \mid \mathbf X = \mathbf x ) = \ \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , $$

$$ x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, $$

are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.

In addition, the linear regression of $ Y ^ {(k)} $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) are:

$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } b _ {kj} X ^ {(j)} \right ) ^ {2} \right \} , $$

$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , $$

$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , $$

$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , $$

$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , $$

$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . $$

In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(k)} ( \mathbf X ) $ or $ Y _ {i} ^ {(k)} $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).

References

[1] Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[2] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[3] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979)
[4] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=55042
This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article