Namespaces
Variants
Actions

Difference between revisions of "Linear regression"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
Line 1: Line 1:
''of one random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594101.png" /> on another <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594102.png" />''
+
<!--
 +
l0594101.png
 +
$#A+1 = 54 n = 0
 +
$#C+1 = 54 : ~/encyclopedia/old_files/data/L059/L.0509410 Linear regression
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
An <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594103.png" />-dimensional vector form, linear in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594104.png" />, supposed to be the conditional mean (given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594105.png" />) of the random vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594106.png" />. The corresponding equations
+
{{TEX|auto}}
 +
{{TEX|done}}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594107.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table>
+
''of one random variable  $  \mathbf Y = ( Y  ^ {(} 1) \dots Y  ^ {(} m) )  ^  \prime  $
 +
on another  $  \mathbf X = ( X  ^ {(} 1) \dots X  ^ {(} p) ) ^  \prime  $''
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594108.png" /></td> </tr></table>
+
An  $  m $-
 +
dimensional vector form, linear in  $  \mathbf x $,
 +
supposed to be the conditional mean (given  $  \mathbf X = \mathbf x $)
 +
of the random vector  $  \mathbf Y $.  
 +
The corresponding equations
  
are called the linear regression equations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l0594109.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941010.png" />, and the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941011.png" /> are called the regression coefficients (see also [[Regression|Regression]]), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941012.png" /> is an observable parameter (not necessarily random), on which the mean of the resulting function (response) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941013.png" /> under investigation depends.
+
$$ \tag{* }
 +
y  ^ {(} k) ( \mathbf x , \mathbf b )  = {\mathsf E} ( Y  ^ {(} k)
 +
\mid  \mathbf X = \mathbf x ) = \
 +
\sum _ { j= } 0 ^ { p }  b _ {kj} x  ^ {(} j) ,
 +
$$
  
In addition, the linear regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941014.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941015.png" /> is frequently also understood to be the  "best" (in a well-defined sense) linear approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941016.png" /> by means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941017.png" />, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941018.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941019.png" />, by means of a hyperplane in the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941020.png" />, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941021.png" /> by means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941022.png" /> (or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941023.png" /> by means of linear combinations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941024.png" /> (linear smoothing of the points <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941025.png" />) are:
+
$$
 +
x ^ {(} 0) \equiv  1 ,\  k = 1 \dots m,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941026.png" /></td> </tr></table>
+
are called the linear regression equations of  $  \mathbf Y $
 +
on  $  \mathbf X $,
 +
and the parameters  $  b _ {kj} $
 +
are called the regression coefficients (see also [[Regression|Regression]]),  $  \mathbf X $
 +
is an observable parameter (not necessarily random), on which the mean of the resulting function (response)  $  \mathbf Y ( \mathbf X ) $
 +
under investigation depends.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941027.png" /></td> </tr></table>
+
In addition, the linear regression of  $  Y  ^ {(} k) $
 +
on  $  \mathbf X $
 +
is frequently also understood to be the  "best" (in a well-defined sense) linear approximation of  $  Y  ^ {(} k) $
 +
by means of  $  \mathbf X $,
 +
or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" )  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $,
 +
$  i = 1 \dots n $,
 +
by means of a hyperplane in the space  $  ( Y  ^ {(} k) , \mathbf X ) $,
 +
in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of  $  Y  ^ {(} k) $
 +
by means of  $  \mathbf X $(
 +
or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of  $  Y  ^ {(} k) $
 +
by means of linear combinations of  $  \mathbf X $(
 +
linear smoothing of the points  $  ( Y _ {i}  ^ {(} k) , \mathbf X _ {i} ) $)
 +
are:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941028.png" /></td> </tr></table>
+
$$
 +
Q _ {1} ( \mathbf b )  = {\mathsf E}
 +
\left \{
 +
\omega  ^ {2} ( \mathbf X ) \cdot
 +
\left (
 +
Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }
 +
b _ {kj} X  ^ {(} j)
 +
\right )  ^ {2}
 +
\right \} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941029.png" /></td> </tr></table>
+
$$
 +
\widetilde{Q}  _ {1} ( \mathbf b )  = \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \left ( Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right )  ^ {2} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941030.png" /></td> </tr></table>
+
$$
 +
Q _ {2} ( \mathbf b )  = {\mathsf E} \left \{ \omega ( \mathbf X )
 +
\left | Y  ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p }  b _ {kj} X  ^ {(} j) \right | \right \} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941031.png" /></td> </tr></table>
+
$$
 +
\widetilde{Q}  _ {2} ( \mathbf b )  = \sum _ { j= } 1 ^ { n }  \omega _ {i} \left | Y _ {i}  ^ {(} k) - \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right | ,
 +
$$
  
In these relations the choice of "weights" <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941032.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941033.png" /> depends on the nature of the actual scheme under investigation. For example, if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941034.png" /> are interpreted as random variables with known variances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941035.png" /> (or with known estimates of them), then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941036.png" />. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941037.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941038.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941039.png" /> to the required hyperplane of regression. If the coefficients <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941040.png" /> are determined by minimizing the quantities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941041.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941042.png" />, then the linear regression is said to be least squares or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941044.png" />; if the criteria <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941045.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941046.png" /> are used, the linear regression is said to be minimal absolute deviations or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941047.png" />; if the criteria <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941049.png" /> are used, it is said to be minimum <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941051.png" />-distance.
+
$$
 +
Q _ {3} ( \mathbf b )  = {\mathsf E} \left \{ \omega
 +
  ^ {2} ( \mathbf X ) \cdot \rho  ^ {2} \left ( Y  ^ {(} k) (
 +
\mathbf X ) , \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right ) \right \} ,
 +
$$
  
In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941052.png" />. Thus, if the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941053.png" /> is subject to a multi-dimensional normal law, then the regression of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941054.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941055.png" /> in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l059/l059410/l05941056.png" />).
+
$$
 +
\widetilde{Q}  _ {3} ( \mathbf b )  =  \sum _ { i= } 1 ^ { n }  \omega _ {i}  ^ {2} \cdot \rho  ^ {2} \left ( Y _ {i}  ^ {(} k) , \sum _ { j= } 0 ^ { p }  b _ {kj} X _ {i}  ^ {(} j) \right ) .
 +
$$
 +
 
 +
In these relations the choice of  "weights"  $  \omega ( \mathbf X ) $
 +
or  $  \omega _ {i} $
 +
depends on the nature of the actual scheme under investigation. For example, if the  $  Y  ^ {(} k) ( \mathbf X ) $
 +
are interpreted as random variables with known variances  $  {\mathsf D} Y  ^ {(} k) ( \mathbf X ) $(
 +
or with known estimates of them), then  $  \omega  ^ {2} ( \mathbf X ) = [ {\mathsf D} Y  ^ {(} k) ( \mathbf X ) ]  ^ {-} 1 $.
 +
In the last two criteria the  "discrepancies"  of the approximation or the smoothing are measured by the distances  $  \rho ( \cdot , \cdot ) $
 +
from  $  Y  ^ {(} k) ( \mathbf X ) $
 +
or  $  Y _ {i}  ^ {(} k) $
 +
to the required hyperplane of regression. If the coefficients  $  b _ {kj} $
 +
are determined by minimizing the quantities  $  Q _ {1} ( \mathbf b ) $
 +
or  $  \widetilde{Q}  _ {1} ( \mathbf b ) $,
 +
then the linear regression is said to be least squares or  $  L _ {2} $;
 +
if the criteria  $  Q _ {2} ( \mathbf b ) $
 +
and  $  \widetilde{Q}  _ {2} ( \mathbf b ) $
 +
are used, the linear regression is said to be minimal absolute deviations or  $  L _ {1} $;
 +
if the criteria  $  Q _ {3} ( \mathbf b ) $
 +
and  $  \widetilde{Q}  _ {3} ( \mathbf b ) $
 +
are used, it is said to be minimum  $  \rho $-
 +
distance.
 +
 
 +
In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $  Q _ {i} $.  
 +
Thus, if the vector $  ( \mathbf X  ^  \prime  , Y  ^ {(} k) ) $
 +
is subject to a multi-dimensional normal law, then the regression of $  Y  ^ {(} k) $
 +
on $  \mathbf X $
 +
in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $  \omega ( \mathbf X ) \equiv 1 $).
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  Yu.V. Linnik,  "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft.  (1961)  (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Macmillan  (1979)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  Yu.V. Linnik,  "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft.  (1961)  (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Macmillan  (1979)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR></table>

Revision as of 22:17, 5 June 2020


of one random variable $ \mathbf Y = ( Y ^ {(} 1) \dots Y ^ {(} m) ) ^ \prime $ on another $ \mathbf X = ( X ^ {(} 1) \dots X ^ {(} p) ) ^ \prime $

An $ m $- dimensional vector form, linear in $ \mathbf x $, supposed to be the conditional mean (given $ \mathbf X = \mathbf x $) of the random vector $ \mathbf Y $. The corresponding equations

$$ \tag{* } y ^ {(} k) ( \mathbf x , \mathbf b ) = {\mathsf E} ( Y ^ {(} k) \mid \mathbf X = \mathbf x ) = \ \sum _ { j= } 0 ^ { p } b _ {kj} x ^ {(} j) , $$

$$ x ^ {(} 0) \equiv 1 ,\ k = 1 \dots m, $$

are called the linear regression equations of $ \mathbf Y $ on $ \mathbf X $, and the parameters $ b _ {kj} $ are called the regression coefficients (see also Regression), $ \mathbf X $ is an observable parameter (not necessarily random), on which the mean of the resulting function (response) $ \mathbf Y ( \mathbf X ) $ under investigation depends.

In addition, the linear regression of $ Y ^ {(} k) $ on $ \mathbf X $ is frequently also understood to be the "best" (in a well-defined sense) linear approximation of $ Y ^ {(} k) $ by means of $ \mathbf X $, or even the result of the best (in a well-defined sense) smoothing of a system of experimental points ( "observations" ) $ ( Y _ {i} ^ {(} k) , \mathbf X _ {i} ) $, $ i = 1 \dots n $, by means of a hyperplane in the space $ ( Y ^ {(} k) , \mathbf X ) $, in situations when the interpretation of these points as samples from a corresponding general population need not be allowable. With such a definition one has to distinguish different versions of linear regression, depending on the choice of the method of computing the errors of the linear approximation of $ Y ^ {(} k) $ by means of $ \mathbf X $( or depending on the actual choice of a criterion for the amount of smoothing). The most widespread criteria for the quality of the approximation of $ Y ^ {(} k) $ by means of linear combinations of $ \mathbf X $( linear smoothing of the points $ ( Y _ {i} ^ {(} k) , \mathbf X _ {i} ) $) are:

$$ Q _ {1} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \left ( Y ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right ) ^ {2} \right \} , $$

$$ \widetilde{Q} _ {1} ( \mathbf b ) = \sum _ { i= } 1 ^ { n } \omega _ {i} ^ {2} \left ( Y _ {i} ^ {(} k) - \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right ) ^ {2} , $$

$$ Q _ {2} ( \mathbf b ) = {\mathsf E} \left \{ \omega ( \mathbf X ) \left | Y ^ {(} k) ( \mathbf X ) - \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right | \right \} , $$

$$ \widetilde{Q} _ {2} ( \mathbf b ) = \sum _ { j= } 1 ^ { n } \omega _ {i} \left | Y _ {i} ^ {(} k) - \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right | , $$

$$ Q _ {3} ( \mathbf b ) = {\mathsf E} \left \{ \omega ^ {2} ( \mathbf X ) \cdot \rho ^ {2} \left ( Y ^ {(} k) ( \mathbf X ) , \sum _ { j= } 0 ^ { p } b _ {kj} X ^ {(} j) \right ) \right \} , $$

$$ \widetilde{Q} _ {3} ( \mathbf b ) = \sum _ { i= } 1 ^ { n } \omega _ {i} ^ {2} \cdot \rho ^ {2} \left ( Y _ {i} ^ {(} k) , \sum _ { j= } 0 ^ { p } b _ {kj} X _ {i} ^ {(} j) \right ) . $$

In these relations the choice of "weights" $ \omega ( \mathbf X ) $ or $ \omega _ {i} $ depends on the nature of the actual scheme under investigation. For example, if the $ Y ^ {(} k) ( \mathbf X ) $ are interpreted as random variables with known variances $ {\mathsf D} Y ^ {(} k) ( \mathbf X ) $( or with known estimates of them), then $ \omega ^ {2} ( \mathbf X ) = [ {\mathsf D} Y ^ {(} k) ( \mathbf X ) ] ^ {-} 1 $. In the last two criteria the "discrepancies" of the approximation or the smoothing are measured by the distances $ \rho ( \cdot , \cdot ) $ from $ Y ^ {(} k) ( \mathbf X ) $ or $ Y _ {i} ^ {(} k) $ to the required hyperplane of regression. If the coefficients $ b _ {kj} $ are determined by minimizing the quantities $ Q _ {1} ( \mathbf b ) $ or $ \widetilde{Q} _ {1} ( \mathbf b ) $, then the linear regression is said to be least squares or $ L _ {2} $; if the criteria $ Q _ {2} ( \mathbf b ) $ and $ \widetilde{Q} _ {2} ( \mathbf b ) $ are used, the linear regression is said to be minimal absolute deviations or $ L _ {1} $; if the criteria $ Q _ {3} ( \mathbf b ) $ and $ \widetilde{Q} _ {3} ( \mathbf b ) $ are used, it is said to be minimum $ \rho $- distance.

In certain cases, linear regression in the classical sense (*) is the same as linear regression defined by using functionals of the type $ Q _ {i} $. Thus, if the vector $ ( \mathbf X ^ \prime , Y ^ {(} k) ) $ is subject to a multi-dimensional normal law, then the regression of $ Y ^ {(} k) $ on $ \mathbf X $ in the sense of (*) is linear and is the same as least squares or minimum mean squares linear regression (for $ \omega ( \mathbf X ) \equiv 1 $).

References

[1] Yu.V. Linnik, "Methode der kleinste Quadraten in moderner Darstellung" , Deutsch. Verlag Wissenschaft. (1961) (Translated from Russian)
[2] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[3] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Macmillan (1979)
[4] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
How to Cite This Entry:
Linear regression. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Linear_regression&oldid=11531
This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article