# Chi-squared test

A test for the verification of a hypothesis $H _ {0}$ according to which a random vector of frequencies $\nu = ( \nu _ {1} \dots \nu _ {k} )$ has a given polynomial distribution, characterized by a vector of positive probabilities $p = ( p _ {1} \dots p _ {k} )$, $p _ {1} + \dots + p _ {k} = 1$. The "chi-squared" test is based on the Pearson statistic

$$X ^ {2} = \ \sum _ {i = 1 } ^ { k } \frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} } = \ { \frac{1}{n} } \sum \frac{\nu _ {i} ^ {2} }{p _ {i} } - n,\ \ n = \nu _ {1} + \dots + \nu _ {k} ,$$

which has in the limit, as $n \rightarrow \infty$, a "chi-squared" distribution with $k - 1$ degrees of freedom, that is,

$$\lim\limits _ {n \rightarrow \infty } \ {\mathsf P} \{ X ^ {2} \leq x \mid H _ {0} \} = \ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \leq x \} .$$

According to the "chi-squared" test with significance level $\approx \alpha$, the hypothesis $H _ {0}$ must be rejected if $X ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha )$, where $\chi _ {k - 1 } ^ {2} ( \alpha )$ is the upper $\alpha$- quantile of the "chi-squared" distribution with $k - 1$ degrees of freedom, that is,

$${\mathsf P} \{ \chi _ {k - 1 } ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) \} = \alpha .$$

The statistic $X ^ {2}$ is also used to verify the hypothesis $H _ {0}$ that the distribution functions of independent identically-distributed random variables $X _ {1} \dots X _ {k}$ belong to a family of continuous functions $F ( x, \theta )$, $x \in \mathbf R ^ {1}$, $\theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset \mathbf R ^ {m}$, $\Theta$ an open set. After dividing the real line by points $x _ {0} < \dots < x _ {k}$, $x _ {0} = - \infty$, $x _ {k} = + \infty$, into $k$ intervals $( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 } , x _ {k} )$, $k > m$, such that for all $\theta \in \Theta$,

$$p _ {i} ( \theta ) = \ {\mathsf P} \{ X _ {i} \in ( x _ {i - 1 } , x _ {i} ] \} > 0,$$

$i = 1 \dots k$; $p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1$, one forms the frequency vector $\nu = ( \nu _ {1} \dots \nu _ {k} )$, which is obtained as a result of grouping the values of the random variables $X _ {1} \dots X _ {n}$ into these intervals. Let

$$X ^ {2} ( \theta ) = \ \sum _ {i = 1 } ^ { k } \frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) }$$

be a random variable depending on the unknown parameter $\theta$. To verify the hypothesis $H _ {0}$ one uses the statistic $X ^ {2} ( \widetilde \theta _ {n} )$, where $\widetilde \theta _ {n}$ is an estimator of the parameter $\theta$, computed by the method of the minimum of "chi-squared" , that is,

$$X ^ {2} ( \widetilde \theta _ {n} ) = \ \min _ {\theta \in \Theta } \ X ^ {2} ( \theta ).$$

If the intervals of the grouping are chosen so that all $p _ {i} ( \theta ) > 0$, if the functions $\partial ^ {2} p _ {i} ( \theta )/ \partial \theta _ {j} \partial \theta _ {r}$ are continuous for all $\theta \in \Theta$, $i = 1 \dots k$; $j, r = 1 \dots m$, and if the matrix $\| \partial p _ {i} ( \theta )/ \partial \theta _ {j} \|$ has rank $m$, then if the hypothesis $H _ {0}$ is valid and as $n \rightarrow \infty$, the statistic $X ^ {2} ( \widetilde \theta _ {n} )$ has in the limit a "chi-squared" distribution with $k - m - 1$ degrees of freedom, which can be used to verify $H _ {0}$ by the "chi-squared" test. If one substitutes a maximum-likelihood estimator $\widehat \theta _ {n}$ in $X ^ {2} ( \theta )$, computed from the non-grouped data $X _ {1} \dots X _ {n}$, then under the validity of $H _ {0}$ and as $n \rightarrow \infty$, the statistic $X ^ {2} ( \widehat \theta _ {n} )$ is distributed in the limit like

$$\xi _ {1} ^ {2} + \dots + \xi _ {k - m - 1 } ^ {2} + \mu _ {1} \xi _ {k - m } ^ {2} + \dots + \mu _ {m} \xi _ {k - 1 } ^ {2} ,$$

where $\xi _ {1} \dots \xi _ {k - 1 }$ are independent standard normally-distributed random variables, and the numbers $\mu _ {1} \dots \mu _ {m}$ lie between 0 and 1 and, generally speaking, depend upon the unknown parameter $\theta$. From this it follows that the use of maximum-likelihood estimators in applications of the "chi-squared" test for the verification of the hypothesis $H _ {0}$ leads to difficulties connected with the computation of a non-standard limit distribution.

In [3][8] there are some recommendations concerning the $\chi ^ {2}$- test in this case; in particular, in the normal case [3], the general continuous case [4], [8], the discrete case [6], [8], and in the problem of several samples [7].

#### References

 [1] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1983) [2] D.M. Chibisov, "Certain chi-square type tests for continuous distributions" Theory Probab. Appl. , 16 : 1 (1971) pp. 1–22 Teor. Veroyatnost. i Primenen. , 16 : 1 (1971) pp. 3–20 [3] M.S. Nikulin, "Chi-square test for continuous distributions with shift and scale parameters" Theory Probab. Appl. , 18 : 3 (1973) pp. 559–568 Teor. Veroyatnost. i Primenen. , 18 : 3 (1973) pp. 583–592 [4] K.O. Dzhaparidze, M.S. Nikulin, "On a modification of the standard statistics of Pearson" Theor. Probab. Appl. , 19 : 4 (1974) pp. 851–853 Teor. Veroyatnost. i Primenen. , 19 : 4 (1974) pp. 886–888 [5] M.S. Nikulin, "On a quantile test" Theory Probab. Appl. , 19 : 2 (1974) pp. 410–413 Teor. Veroyatnost. i Primenen. : 2 (1974) pp. 410–414 [6] L.N. Bol'shev, M. Mirvaliev, "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions" Theory Probab. Appl. , 23 : 3 (1974) pp. 461–474 Teor. Veroyatnost. i Primenen. , 23 : 3 (1978) pp. 481–494 [7] L.N. Bol'shev, M.S. Nikulin, "A certain solution of the homogeneity problem" Serdica , 1 (1975) pp. 104–109 (In Russian) [8] P.E. Greenwood, M.S. Nikulin, "Investigations in the theory of probabilities distributions. X" Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov. , 156 (1987) pp. 42–65 (In Russian)

The "chi-squared" test is also called the "chi-square" test or $\chi ^ {2}$- test.