# Chi-squared test

A test for the verification of a hypothesis $H _ {0}$ according to which a random vector of frequencies $\nu = ( \nu _ {1} \dots \nu _ {k} )$ has a given polynomial distribution, characterized by a vector of positive probabilities $p = ( p _ {1} \dots p _ {k} )$, $p _ {1} + \dots + p _ {k} = 1$. The "chi-squared" test is based on the Pearson statistic

$$X ^ {2} = \ \sum _ {i = 1 } ^ { k } \frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} } = \ { \frac{1}{n} } \sum \frac{\nu _ {i} ^ {2} }{p _ {i} } - n,\ \ n = \nu _ {1} + \dots + \nu _ {k} ,$$

which has in the limit, as $n \rightarrow \infty$, a "chi-squared" distribution with $k - 1$ degrees of freedom, that is,

$$\lim\limits _ {n \rightarrow \infty } \ {\mathsf P} \{ X ^ {2} \leq x \mid H _ {0} \} = \ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \leq x \} .$$

According to the "chi-squared" test with significance level $\approx \alpha$, the hypothesis $H _ {0}$ must be rejected if $X ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha )$, where $\chi _ {k - 1 } ^ {2} ( \alpha )$ is the upper $\alpha$- quantile of the "chi-squared" distribution with $k - 1$ degrees of freedom, that is,

$${\mathsf P} \{ \chi _ {k - 1 } ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) \} = \alpha .$$

The statistic $X ^ {2}$ is also used to verify the hypothesis $H _ {0}$ that the distribution functions of independent identically-distributed random variables $X _ {1} \dots X _ {k}$ belong to a family of continuous functions $F ( x, \theta )$, $x \in \mathbf R ^ {1}$, $\theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset \mathbf R ^ {m}$, $\Theta$ an open set. After dividing the real line by points $x _ {0} < \dots < x _ {k}$, $x _ {0} = - \infty$, $x _ {k} = + \infty$, into $k$ intervals $( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 } , x _ {k} )$, $k > m$, such that for all $\theta \in \Theta$,

$$p _ {i} ( \theta ) = \ {\mathsf P} \{ X _ {i} \in ( x _ {i - 1 } , x _ {i} ] \} > 0,$$

$i = 1 \dots k$; $p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1$, one forms the frequency vector $\nu = ( \nu _ {1} \dots \nu _ {k} )$, which is obtained as a result of grouping the values of the random variables $X _ {1} \dots X _ {n}$ into these intervals. Let

$$X ^ {2} ( \theta ) = \ \sum _ {i = 1 } ^ { k } \frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) }$$

be a random variable depending on the unknown parameter $\theta$. To verify the hypothesis $H _ {0}$ one uses the statistic $X ^ {2} ( \widetilde \theta _ {n} )$, where $\widetilde \theta _ {n}$ is an estimator of the parameter $\theta$, computed by the method of the minimum of "chi-squared" , that is,

$$X ^ {2} ( \widetilde \theta _ {n} ) = \ \min _ {\theta \in \Theta } \ X ^ {2} ( \theta ).$$

If the intervals of the grouping are chosen so that all $p _ {i} ( \theta ) > 0$, if the functions $\partial ^ {2} p _ {i} ( \theta )/ \partial \theta _ {j} \partial \theta _ {r}$ are continuous for all $\theta \in \Theta$, $i = 1 \dots k$; $j, r = 1 \dots m$, and if the matrix $\| \partial p _ {i} ( \theta )/ \partial \theta _ {j} \|$ has rank $m$, then if the hypothesis $H _ {0}$ is valid and as $n \rightarrow \infty$, the statistic $X ^ {2} ( \widetilde \theta _ {n} )$ has in the limit a "chi-squared" distribution with $k - m - 1$ degrees of freedom, which can be used to verify $H _ {0}$ by the "chi-squared" test. If one substitutes a maximum-likelihood estimator $\widehat \theta _ {n}$ in $X ^ {2} ( \theta )$, computed from the non-grouped data $X _ {1} \dots X _ {n}$, then under the validity of $H _ {0}$ and as $n \rightarrow \infty$, the statistic $X ^ {2} ( \widehat \theta _ {n} )$ is distributed in the limit like

$$\xi _ {1} ^ {2} + \dots + \xi _ {k - m - 1 } ^ {2} + \mu _ {1} \xi _ {k - m } ^ {2} + \dots + \mu _ {m} \xi _ {k - 1 } ^ {2} ,$$

where $\xi _ {1} \dots \xi _ {k - 1 }$ are independent standard normally-distributed random variables, and the numbers $\mu _ {1} \dots \mu _ {m}$ lie between 0 and 1 and, generally speaking, depend upon the unknown parameter $\theta$. From this it follows that the use of maximum-likelihood estimators in applications of the "chi-squared" test for the verification of the hypothesis $H _ {0}$ leads to difficulties connected with the computation of a non-standard limit distribution.

In  there are some recommendations concerning the $\chi ^ {2}$- test in this case; in particular, in the normal case , the general continuous case , , the discrete case , , and in the problem of several samples .

How to Cite This Entry:
Chi-squared test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Chi-squared_test&oldid=46338
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article