# Covariance analysis

A collection of methods in mathematical statistics relating to the analysis of models of the dependence of the mean value of some random variable $Y$ on a set of non-quantitative factors $F$ and simultaneously on a set of quantitative factors $x$. The variables $x$ are called the concomitant variables relative to $Y$; the factors $F$ define a set of conditions of a qualitative nature under which the observations on $Y$ and $x$ are obtained, and are described by so-called indicator variables; among the concomitant and indicator variables can be both random and non-random ones (controlled in the experiment); if the random variable $Y$ is a vector, then one talks about multivariate analysis of covariance.

The basic theoretic and applied problems in the analysis of covariance relate to linear models. For example, if the scheme under analysis consists of $n$ observations $Y _ {1} \dots Y _ {n}$ with $p$ concomitant variables and $k$ possible types of experimental conditions, then the linear model of the corresponding analysis of covariance is defined by the equations

$$\tag{* } Y _ {i} = \ \sum _ {j = 1 } ^ { k } f _ {ij} \theta _ {j} + \sum _ {s = 1 } ^ { p } \beta _ {s} ( F _ {i} ) x _ {i} ^ {(} s) + \epsilon _ {i} ( F _ {i} ),$$

$$i = 1 \dots n,$$

where the indicator variables $f _ {ij}$ are equal to 1 if the $j$- th experimental condition prevails for the observation $Y _ {i}$ and 0 otherwise; the coefficients $\theta _ {j}$ measure the influence of the $j$- th condition; $x _ {i} ^ {(} s)$ is the value of the concomitant variable $x ^ {(} s)$ for which $Y _ {i}$ is obtained; $i = 1 \dots n$; $s = 1 \dots p$; the $\beta _ {s} ( F _ {i} )$ are the values of the corresponding regression coefficients of $Y$ on $x ^ {(} s)$ which, in general, depend on the concrete combination of the conditions of the experiment, that is, on the vector $F _ {i} = ( f _ {i1} \dots f _ {ik} )$; the $\epsilon _ {i} ( F _ {i} )$ are random errors having zero mean values. The main content of the analysis of covariance is the construction of statistical estimators for the unknown parameters $\theta _ {1} \dots \theta _ {k}$; $\beta _ {1} \dots \beta _ {p}$, and of statistical criteria for testing various hypotheses about the values of these parameters.

If in the model (*) one postulates a priori that $\beta _ {1} = \dots = \beta _ {p} = 0$, then a dispersion analysis model is obtained; if in (*) one excludes the influence of the non-quantitative factors (by setting $\theta _ {1} = \dots = \theta _ {k} = 0$), then a regression analysis model is obtained. The terminology "analysis of covariance" refers to the fact that in its calculations one makes use of the decomposition of the covariance of $Y$ and $X$ in precisely the same way as the decomposition of the sum of squares of the deviations of $Y$ is used in dispersion analysis.

#### References

 [1] H. Scheffé, "The analysis of variance" , Wiley (1959) [2] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 3 , Griffin (1983) [3] Biometrics , 13 : 3 (1957) (Special issue devoted to the analysis of covariance)