Namespaces
Variants
Actions

Difference between revisions of "Covariance analysis"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (fixing superscripts)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
A collection of methods in mathematical statistics relating to the analysis of models of the dependence of the mean value of some random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268101.png" /> on a set of non-quantitative factors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268102.png" /> and simultaneously on a set of quantitative factors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268103.png" />. The variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268104.png" /> are called the concomitant variables relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268105.png" />; the factors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268106.png" /> define a set of conditions of a qualitative nature under which the observations on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268107.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268108.png" /> are obtained, and are described by so-called indicator variables; among the concomitant and indicator variables can be both random and non-random ones (controlled in the experiment); if the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c0268109.png" /> is a vector, then one talks about multivariate analysis of covariance.
+
<!--
 +
c0268101.png
 +
$#A+1 = 37 n = 0
 +
$#C+1 = 37 : ~/encyclopedia/old_files/data/C026/C.0206810 Covariance analysis
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
The basic theoretic and applied problems in the analysis of covariance relate to linear models. For example, if the scheme under analysis consists of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681010.png" /> observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681011.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681012.png" /> concomitant variables and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681013.png" /> possible types of experimental conditions, then the linear model of the corresponding analysis of covariance is defined by the equations
+
{{TEX|auto}}
 +
{{TEX|done}}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681014.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table>
+
A collection of methods in mathematical statistics relating to the analysis of models of the dependence of the mean value of some random variable  $  Y $
 +
on a set of non-quantitative factors  $  F $
 +
and simultaneously on a set of quantitative factors  $  x $.  
 +
The variables  $  x $
 +
are called the concomitant variables relative to  $  Y $;  
 +
the factors  $  F $
 +
define a set of conditions of a qualitative nature under which the observations on  $  Y $
 +
and  $  x $
 +
are obtained, and are described by so-called indicator variables; among the concomitant and indicator variables can be both random and non-random ones (controlled in the experiment); if the random variable  $  Y $
 +
is a vector, then one talks about multivariate analysis of covariance.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681015.png" /></td> </tr></table>
+
The basic theoretic and applied problems in the analysis of covariance relate to linear models. For example, if the scheme under analysis consists of  $  n $
 +
observations  $  Y _ {1} \dots Y _ {n} $
 +
with  $  p $
 +
concomitant variables and  $  k $
 +
possible types of experimental conditions, then the linear model of the corresponding analysis of covariance is defined by the equations
  
where the indicator variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681016.png" /> are equal to 1 if the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681017.png" />-th experimental condition prevails for the observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681018.png" /> and 0 otherwise; the coefficients <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681019.png" /> measure the influence of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681020.png" />-th condition; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681021.png" /> is the value of the concomitant variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681022.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681023.png" /> is obtained; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681024.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681025.png" />; the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681026.png" /> are the values of the corresponding regression coefficients of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681027.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681028.png" /> which, in general, depend on the concrete combination of the conditions of the experiment, that is, on the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681029.png" />; the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681030.png" /> are random errors having zero mean values. The main content of the analysis of covariance is the construction of statistical estimators for the unknown parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681031.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681032.png" />, and of statistical criteria for testing various hypotheses about the values of these parameters.
+
$$ \tag{* }
 +
Y _ {i}  = \
 +
\sum _ {j = 1 } ^ { k }
 +
f _ {ij} \theta _ {j} +
 +
\sum _ {s = 1 } ^ { p }
 +
\beta _ {s} ( F _ {i} )
 +
x _ {i}  ^ {( s)} +
 +
\epsilon _ {i} ( F _ {i} ),
 +
$$
  
If in the model (*) one postulates a priori that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681033.png" />, then a [[Dispersion analysis|dispersion analysis]] model is obtained; if in (*) one excludes the influence of the non-quantitative factors (by setting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681034.png" />), then a [[Regression analysis|regression analysis]] model is obtained. The terminology  "analysis of covariance"  refers to the fact that in its calculations one makes use of the decomposition of the [[Covariance|covariance]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681036.png" /> in precisely the same way as the decomposition of the sum of squares of the deviations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c026/c026810/c02681037.png" /> is used in dispersion analysis.
+
$$
 +
i  =  1 \dots n,
 +
$$
 +
 
 +
where the indicator variables  $  f _ {ij} $
 +
are equal to 1 if the  $  j $-th experimental condition prevails for the observation  $  Y _ {i} $
 +
and 0 otherwise; the coefficients  $  \theta _ {j} $
 +
measure the influence of the  $  j $-th condition;  $  x _ {i}  ^ {( s)} $
 +
is the value of the concomitant variable  $  x  ^ {( s)} $
 +
for which  $  Y _ {i} $
 +
is obtained;  $  i = 1 \dots n $;
 +
$  s = 1 \dots p $;
 +
the  $  \beta _ {s} ( F _ {i} ) $
 +
are the values of the corresponding regression coefficients of  $  Y $
 +
on  $  x  ^ {(} s) $
 +
which, in general, depend on the concrete combination of the conditions of the experiment, that is, on the vector  $  F _ {i} = ( f _ {i1} \dots f _ {ik} ) $;
 +
the  $  \epsilon _ {i} ( F _ {i} ) $
 +
are random errors having zero mean values. The main content of the analysis of covariance is the construction of statistical estimators for the unknown parameters  $  \theta _ {1} \dots \theta _ {k} $;
 +
$  \beta _ {1} \dots \beta _ {p} $,
 +
and of statistical criteria for testing various hypotheses about the values of these parameters.
 +
 
 +
If in the model (*) one postulates a priori that $  \beta _ {1} = \dots = \beta _ {p} = 0 $,  
 +
then a [[Dispersion analysis|dispersion analysis]] model is obtained; if in (*) one excludes the influence of the non-quantitative factors (by setting $  \theta _ {1} = \dots = \theta _ {k} = 0 $),  
 +
then a [[Regression analysis|regression analysis]] model is obtained. The terminology  "analysis of covariance"  refers to the fact that in its calculations one makes use of the decomposition of the [[Covariance|covariance]] of $  Y $
 +
and $  X $
 +
in precisely the same way as the decomposition of the sum of squares of the deviations of $  Y $
 +
is used in dispersion analysis.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Scheffé,  "The analysis of variance" , Wiley  (1959)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''3''' , Griffin  (1983)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  ''Biometrics'' , '''13''' :  3  (1957)  (Special issue devoted to the analysis of covariance)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  H. Scheffé,  "The analysis of variance" , Wiley  (1959)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''3''' , Griffin  (1983)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  ''Biometrics'' , '''13''' :  3  (1957)  (Special issue devoted to the analysis of covariance)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
 
Concomitant variables are also called covariates, and instead of dispersion analysis one more often uses the term analysis of variance.
 
Concomitant variables are also called covariates, and instead of dispersion analysis one more often uses the term analysis of variance.

Latest revision as of 06:25, 18 January 2022


A collection of methods in mathematical statistics relating to the analysis of models of the dependence of the mean value of some random variable $ Y $ on a set of non-quantitative factors $ F $ and simultaneously on a set of quantitative factors $ x $. The variables $ x $ are called the concomitant variables relative to $ Y $; the factors $ F $ define a set of conditions of a qualitative nature under which the observations on $ Y $ and $ x $ are obtained, and are described by so-called indicator variables; among the concomitant and indicator variables can be both random and non-random ones (controlled in the experiment); if the random variable $ Y $ is a vector, then one talks about multivariate analysis of covariance.

The basic theoretic and applied problems in the analysis of covariance relate to linear models. For example, if the scheme under analysis consists of $ n $ observations $ Y _ {1} \dots Y _ {n} $ with $ p $ concomitant variables and $ k $ possible types of experimental conditions, then the linear model of the corresponding analysis of covariance is defined by the equations

$$ \tag{* } Y _ {i} = \ \sum _ {j = 1 } ^ { k } f _ {ij} \theta _ {j} + \sum _ {s = 1 } ^ { p } \beta _ {s} ( F _ {i} ) x _ {i} ^ {( s)} + \epsilon _ {i} ( F _ {i} ), $$

$$ i = 1 \dots n, $$

where the indicator variables $ f _ {ij} $ are equal to 1 if the $ j $-th experimental condition prevails for the observation $ Y _ {i} $ and 0 otherwise; the coefficients $ \theta _ {j} $ measure the influence of the $ j $-th condition; $ x _ {i} ^ {( s)} $ is the value of the concomitant variable $ x ^ {( s)} $ for which $ Y _ {i} $ is obtained; $ i = 1 \dots n $; $ s = 1 \dots p $; the $ \beta _ {s} ( F _ {i} ) $ are the values of the corresponding regression coefficients of $ Y $ on $ x ^ {(} s) $ which, in general, depend on the concrete combination of the conditions of the experiment, that is, on the vector $ F _ {i} = ( f _ {i1} \dots f _ {ik} ) $; the $ \epsilon _ {i} ( F _ {i} ) $ are random errors having zero mean values. The main content of the analysis of covariance is the construction of statistical estimators for the unknown parameters $ \theta _ {1} \dots \theta _ {k} $; $ \beta _ {1} \dots \beta _ {p} $, and of statistical criteria for testing various hypotheses about the values of these parameters.

If in the model (*) one postulates a priori that $ \beta _ {1} = \dots = \beta _ {p} = 0 $, then a dispersion analysis model is obtained; if in (*) one excludes the influence of the non-quantitative factors (by setting $ \theta _ {1} = \dots = \theta _ {k} = 0 $), then a regression analysis model is obtained. The terminology "analysis of covariance" refers to the fact that in its calculations one makes use of the decomposition of the covariance of $ Y $ and $ X $ in precisely the same way as the decomposition of the sum of squares of the deviations of $ Y $ is used in dispersion analysis.

References

[1] H. Scheffé, "The analysis of variance" , Wiley (1959)
[2] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 3 , Griffin (1983)
[3] Biometrics , 13 : 3 (1957) (Special issue devoted to the analysis of covariance)

Comments

Concomitant variables are also called covariates, and instead of dispersion analysis one more often uses the term analysis of variance.

How to Cite This Entry:
Covariance analysis. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Covariance_analysis&oldid=15376
This article was adapted from an original article by S.A. Aivazyan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article