Difference between revisions of "Linear regression"

Latest revision as of 14:20, 13 January 2024

of one random variable $ \mathbf Y = ( Y ^ {(1)} \dots Y ^ {(m)} ) ^ \prime $ on another $ \mathbf X = ( X ^ {(1)} \dots X ^ {(p)} ) ^ \prime $

An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations

$$ \tag{* } y ^ {(k)} ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(k)} \mid \mathbf X = \mathbf x ) = \ \sum_{j=0}^ { p } b _ {kj} x ^ {(j)} , $$

$$ x ^ {(0)} \equiv 1 ,\ k = 1 \dots m, $$

are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.

In addition, the linear regression of $ Y ^ {(k)} $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(k)} , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(k)} $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(k)} $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(k)} , \mathbf X _ {i} ) $) are:

$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p } b _ {kj} X ^ {(j)} \right ) ^ {2} \right \} , $$

$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) ^ {2} , $$

$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right | \right \} , $$

$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum_{j=1}^ { n } \omega _ {i} \left | Y _ {i} ^ {(k)} - \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right | , $$

$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(k)} ( \mathbf X ) , \sum_{j=0}^ { p } b _ {kj} X ^ {(j)} \right ) \right \} , $$

$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum_{i=1}^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(k)} , \sum_{j=0}^ { p } b _ {kj} X _ {i} ^ {(j)} \right ) . $$

In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(k)} ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(k)} ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(k)} ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(k)} ( \mathbf X ) $ or $ Y _ {i} ^ {(k)} $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(k)}] ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(k)} $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).

References

[1]	Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[2]	H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[3]	M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979)
[4]	C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)

How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=55042

This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Linear regression"

Latest revision as of 14:20, 13 January 2024

References

@@ Line 11: / Line 11: @@
 {{TEX|done}}
-''of one random variable  $  \mathbf Y = ( Y  ^ {(} 1) \dots Y  ^ {(} m) )  ^  \prime  $
+''of one random variable  $  \mathbf Y = ( Y  ^ {(1)} \dots Y  ^ {(m)} )  ^  \prime  $
-on another  $  \mathbf X = ( X  ^ {(} 1) \dots X  ^ {(} p) )  ^  \prime  $''
+on another  $  \mathbf X = ( X  ^ {(1)} \dots X  ^ {(p)} )  ^  \prime  $''
 An  $  m $-
@@ Line 21: / Line 21: @@
 $$ \tag{* }
-y  ^ {(} k) ( \mathbf x , \mathbf b )  =  {\mathsf E} ( Y  ^ {(} k)
+y  ^ {(k)} ( \mathbf x , \mathbf b )  =  {\mathsf E} ( Y  ^ {(k)}
 \mid  \mathbf X = \mathbf x )  = \
-\sum _ { j= } 0 ^ { p }  b _ {kj} x  ^ {(} j) ,
+\sum_{j=0}^ { p }  b _ {kj} x  ^ {(j)} ,
 $$
 $$
-x  ^ {(} 0)  \equiv  1 ,\  k = 1 \dots m,
+x  ^ {(0)} \equiv  1 ,\  k = 1 \dots m,
 $$
@@ Line 37: / Line 37: @@
 under investigation depends.
-In addition, the linear regression of  $  Y  ^ {(} k) $
+In addition, the linear regression of  $  Y  ^ {(k)} $
 on  $  \mathbf X $
-is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of  $  Y  ^ {(} k) $
+is frequently also understood to be the  "best"  (in a well-defined sense) linear approximation of  $  Y  ^ {(k)} $
 by means of  $  \mathbf X $,
-or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $,
+or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $,
 $  i = 1 \dots n $,
-by means of a hyperplane in the space  $  ( Y  ^ {(} k) , \mathbf X ) $,
+by means of a hyperplane in the space  $  ( Y  ^ {(k)} , \mathbf X ) $,
-in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(} k) $
+in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(k)} $
 by means of  $  \mathbf X $(
-or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(} k) $
+or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(k)} $
 by means of linear combinations of  $  \mathbf X $(
-linear smoothing of the points  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $)
+linear smoothing of the points  $  ( Y _ {i}  ^ {(k)} , \mathbf X _ {i} ) $)
 are:
@@ Line 56: / Line 56: @@
 \omega  ^ {2} ( \mathbf X ) \cdot
 \left (
-Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }
+Y  ^ {(k)} ( \mathbf X ) - \sum _ {j=0}^ { p }
-b _ {kj} X  ^ {(} j)
+b _ {kj} X  ^ {(j)}
 \right )  ^ {2}
 \right \} ,
@@ Line 63: / Line 63: @@
 $$
-\widetilde{Q}  _ {1} ( \mathbf b )  =  \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right )  ^ {2} ,
+\widetilde{Q}  _ {1} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right )  ^ {2} ,
 $$
 $$
 Q _ {2} ( \mathbf b )  =  {\mathsf E} \left \{ \omega ( \mathbf X )
-\left | Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }  b _ {kj} X  ^ {(} j) \right | \right \} ,
+\left | Y  ^ {(k)} ( \mathbf X ) - \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right | \right \} ,
 $$
 $$
-\widetilde{Q}  _ {2} ( \mathbf b )  =  \sum _ { j= } 1 ^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right | ,
+\widetilde{Q}  _ {2} ( \mathbf b )  =  \sum_{j=1}^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(k)} - \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right | ,
 $$
 $$
 Q _ {3} ( \mathbf b )  =  {\mathsf E} \left \{ \omega
-  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(} k) (
+  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(k)} (
-\mathbf X ) , \sum _ { j= } 0 ^ { p }  b _ {kj} X  ^ {(} j) \right ) \right \} ,
+\mathbf X ) , \sum_{j=0}^ { p }  b _ {kj} X  ^ {(j)} \right ) \right \} ,
 $$
 $$
-\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(} k) , \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right ) .
+\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum_{i=1}^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(k)} , \sum_{j=0}^ { p }  b _ {kj} X _ {i}  ^ {(j)} \right ) .
 $$
 In these relations the choice of  "weights"   $  \omega ( \mathbf X ) $
 or  $  \omega _ {i} $
-depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(} k) ( \mathbf X ) $
+depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(k)} ( \mathbf X ) $
-are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(} k) ( \mathbf X ) $(
+are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(k)} ( \mathbf X ) $(
-or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(} k) ( \mathbf X ) ]  ^ {-} 1 $.
+or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(k)} ( \mathbf X ) ]  ^ {-} 1 $.
 In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances  $  \rho ( \cdot , \cdot ) $
-from  $  Y  ^ {(} k) ( \mathbf X ) $
+from  $  Y  ^ {(k)} ( \mathbf X ) $
-or  $  Y _ {i}  ^ {(} k) $
+or  $  Y _ {i}  ^ {(k)} $
 to the required hyperplane of regression. If the coefficients  $  b _ {kj} $
 are determined by minimizing the quantities  $  Q _ {1} ( \mathbf b ) $
@@ Line 106: / Line 106: @@
 In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type  $  Q _ {i} $.
-Thus, if the vector  $  ( \mathbf X  ^  \prime  , Y  ^ {(} k) ) $
+Thus, if the vector  $  ( \mathbf X  ^  \prime  , Y  ^ {(k)}] ) $
-is subject to a multi-dimensional normal law, then the regression of  $  Y  ^ {(} k) $
+is subject to a multi-dimensional normal law, then the regression of  $  Y  ^ {(k)} $
 on  $  \mathbf X $
 in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for  $  \omega ( \mathbf X ) \equiv 1 $).