Difference between revisions of "Linear regression"
(Importing text file) |
(latex details) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | l0594101.png | ||
+ | $#A+1 = 54 n = 0 | ||
+ | $#C+1 = 54 : ~/encyclopedia/old_files/data/L059/L.0509410 Linear regression | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | ''of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $ | |
+ | on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $'' | ||
− | + | An $ m $- | |
+ | dimensional vector form, linear in $ \mathbf x $, | ||
+ | supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) | ||
+ | of the random vector $ \mathbf Y $. | ||
+ | The corresponding equations | ||
− | + | $$ \tag{* } | |
+ | y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} | ||
+ | \mid \mathbf X = \mathbf x ) = \ | ||
+ | \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , | ||
+ | $$ | ||
− | + | $$ | |
+ | x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, | ||
+ | $$ | ||
− | + | are called the linear regression equations of $ \mathbf Y $ | |
+ | on $ \mathbf X $, | ||
+ | and the parameters $ b _ {kj} $ | ||
+ | are called the regression coefficients (see also [[Regression|Regression]]), $ \mathbf X $ | ||
+ | is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ | ||
+ | under investigation depends. | ||
− | + | In addition, the linear regression of $ Y ^ {(k)} $ | |
+ | on $ \mathbf X $ | ||
+ | is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ | ||
+ | by means of $ \mathbf X $, | ||
+ | or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, | ||
+ | $ i = 1 \dots n $, | ||
+ | by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, | ||
+ | in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ | ||
+ | by means of $ \mathbf X $( | ||
+ | or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ | ||
+ | by means of linear combinations of $ \mathbf X $( | ||
+ | linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) | ||
+ | are: | ||
− | + | $$ | |
+ | Q _ {1} ( \mathbf b ) = {\mathsf E} | ||
+ | \left \{ | ||
+ | \omega ^ {2} ( \mathbf X ) \cdot | ||
+ | \left ( | ||
+ | Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } | ||
+ | b _ {kj} X ^ {(j)} | ||
+ | \right ) ^ {2} | ||
+ | \right \} , | ||
+ | $$ | ||
− | + | $$ | |
+ | \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , | ||
+ | $$ | ||
− | + | $$ | |
+ | Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) | ||
+ | \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , | ||
+ | $$ | ||
− | + | $$ | |
+ | \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , | ||
+ | $$ | ||
− | + | $$ | |
+ | Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega | ||
+ | ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( | ||
+ | \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , | ||
+ | $$ | ||
− | In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type | + | $$ |
+ | \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . | ||
+ | $$ | ||
+ | |||
+ | In these relations the choice of "weights" $ \omega ( \mathbf X ) $ | ||
+ | or $ \omega _ {i} $ | ||
+ | depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ | ||
+ | are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( | ||
+ | or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. | ||
+ | In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ | ||
+ | from $ Y ^ {(k)} ( \mathbf X ) $ | ||
+ | or $ Y _ {i} ^ {(k)} $ | ||
+ | to the required hyperplane of regression. If the coefficients $ b _ {kj} $ | ||
+ | are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ | ||
+ | or $ \widetilde{Q} _ {1} ( \mathbf b ) $, | ||
+ | then the linear regression is said to be least squares or $ L _ {2} $; | ||
+ | if the criteria $ Q _ {2} ( \mathbf b ) $ | ||
+ | and $ \widetilde{Q} _ {2} ( \mathbf b ) $ | ||
+ | are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; | ||
+ | if the criteria $ Q _ {3} ( \mathbf b ) $ | ||
+ | and $ \widetilde{Q} _ {3} ( \mathbf b ) $ | ||
+ | are used, it is said to be minimum $ \rho $- | ||
+ | distance. | ||
+ | |||
+ | In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. | ||
+ | Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ | ||
+ | is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ | ||
+ | on $ \mathbf X $ | ||
+ | in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $). | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> M.G. Kendall, A. Stuart, "The advanced theory of statistics" , '''2. Inference and relationship''' , Macmillan (1979)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> M.G. Kendall, A. Stuart, "The advanced theory of statistics" , '''2. Inference and relationship''' , Macmillan (1979)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)</TD></TR></table> |
Latest revision as of 14:20, 13 January 2024
of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $
on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $
An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations
$$ \tag{* } y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} \mid \mathbf X = \mathbf x ) = \ \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , $$
$$ x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, $$
are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.
In addition, the linear regression of $ Y ^ {(k)} $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) are:
$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } b _ {kj} X ^ {(j)} \right ) ^ {2} \right \} , $$
$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , $$
$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , $$
$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , $$
$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , $$
$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . $$
In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(k)} ( \mathbf X ) $ or $ Y _ {i} ^ {(k)} $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.
In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).
References
[1] | Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian) |
[2] | H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) |
[3] | M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979) |
[4] | C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965) |
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=11531