Difference between revisions of "Chi-squared test"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | c0221101.png | ||
+ | $#A+1 = 66 n = 0 | ||
+ | $#C+1 = 66 : ~/encyclopedia/old_files/data/C022/C.0202110 \BQT Chi\AAhsquared\EQT test | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | which has | + | A test for the verification of a hypothesis $ H _ {0} $ |
+ | according to which a random vector of frequencies $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $ | ||
+ | has a given polynomial distribution, characterized by a vector of positive probabilities $ p = ( p _ {1} \dots p _ {k} ) $, | ||
+ | $ p _ {1} + \dots + p _ {k} = 1 $. | ||
+ | The "chi-squared" test is based on the Pearson statistic | ||
− | + | $$ | |
+ | X ^ {2} = \ | ||
+ | \sum _ {i = 1 } ^ { k } | ||
− | + | \frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} } | |
+ | = \ | ||
+ | { | ||
+ | \frac{1}{n} | ||
+ | } \sum | ||
− | + | \frac{\nu _ {i} ^ {2} }{p _ {i} } | |
+ | - n,\ \ | ||
+ | n = \nu _ {1} + \dots + \nu _ {k} , | ||
+ | $$ | ||
− | + | which has in the limit, as $ n \rightarrow \infty $, | |
+ | a [[Chi-squared distribution| "chi-squared" distribution]] with $ k - 1 $ | ||
+ | degrees of freedom, that is, | ||
− | + | $$ | |
+ | \lim\limits _ {n \rightarrow \infty } \ | ||
+ | {\mathsf P} \{ X ^ {2} \leq | ||
+ | x \mid H _ {0} \} = \ | ||
+ | {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \leq x \} . | ||
+ | $$ | ||
− | + | According to the "chi-squared" test with significance level $ \approx \alpha $, | |
+ | the hypothesis $ H _ {0} $ | ||
+ | must be rejected if $ X ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) $, | ||
+ | where $ \chi _ {k - 1 } ^ {2} ( \alpha ) $ | ||
+ | is the upper $ \alpha $- | ||
+ | quantile of the "chi-squared" distribution with $ k - 1 $ | ||
+ | degrees of freedom, that is, | ||
− | + | $$ | |
+ | {\mathsf P} \{ | ||
+ | \chi _ {k - 1 } ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) | ||
+ | \} = \alpha . | ||
+ | $$ | ||
− | + | The statistic $ X ^ {2} $ | |
+ | is also used to verify the hypothesis $ H _ {0} $ | ||
+ | that the distribution functions of independent identically-distributed random variables $ X _ {1} \dots X _ {k} $ | ||
+ | belong to a family of continuous functions $ F ( x, \theta ) $, | ||
+ | $ x \in \mathbf R ^ {1} $, | ||
+ | $ \theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset \mathbf R ^ {m} $, | ||
+ | $ \Theta $ | ||
+ | an open set. After dividing the real line by points $ x _ {0} < \dots < x _ {k} $, | ||
+ | $ x _ {0} = - \infty $, | ||
+ | $ x _ {k} = + \infty $, | ||
+ | into $ k $ | ||
+ | intervals $ ( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 } , x _ {k} ) $, | ||
+ | $ k > m $, | ||
+ | such that for all $ \theta \in \Theta $, | ||
− | + | $$ | |
+ | p _ {i} ( \theta ) = \ | ||
+ | {\mathsf P} \{ X _ {i} \in | ||
+ | ( x _ {i - 1 } , x _ {i} ] | ||
+ | \} > 0, | ||
+ | $$ | ||
− | + | $ i = 1 \dots k $; | |
+ | $ p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1 $, | ||
+ | one forms the frequency vector $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $, | ||
+ | which is obtained as a result of grouping the values of the random variables $ X _ {1} \dots X _ {n} $ | ||
+ | into these intervals. Let | ||
− | + | $$ | |
+ | X ^ {2} ( \theta ) = \ | ||
+ | \sum _ {i = 1 } ^ { k } | ||
− | + | \frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) } | |
− | In [[#References|[3]]]–[[#References|[8]]] there are some recommendations concerning the | + | $$ |
+ | |||
+ | be a random variable depending on the unknown parameter $ \theta $. | ||
+ | To verify the hypothesis $ H _ {0} $ | ||
+ | one uses the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $, | ||
+ | where $ \widetilde \theta _ {n} $ | ||
+ | is an estimator of the parameter $ \theta $, | ||
+ | computed by the method of the minimum of "chi-squared" , that is, | ||
+ | |||
+ | $$ | ||
+ | X ^ {2} ( \widetilde \theta _ {n} ) = \ | ||
+ | \min _ {\theta \in \Theta } \ | ||
+ | X ^ {2} ( \theta ). | ||
+ | $$ | ||
+ | |||
+ | If the intervals of the grouping are chosen so that all $ p _ {i} ( \theta ) > 0 $, | ||
+ | if the functions $ \partial ^ {2} p _ {i} ( \theta )/ \partial \theta _ {j} \partial \theta _ {r} $ | ||
+ | are continuous for all $ \theta \in \Theta $, | ||
+ | $ i = 1 \dots k $; | ||
+ | $ j, r = 1 \dots m $, | ||
+ | and if the matrix $ \| \partial p _ {i} ( \theta )/ \partial \theta _ {j} \| $ | ||
+ | has rank $ m $, | ||
+ | then if the hypothesis $ H _ {0} $ | ||
+ | is valid and as $ n \rightarrow \infty $, | ||
+ | the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $ | ||
+ | has in the limit a "chi-squared" distribution with $ k - m - 1 $ | ||
+ | degrees of freedom, which can be used to verify $ H _ {0} $ | ||
+ | by the "chi-squared" test. If one substitutes a maximum-likelihood estimator $ \widehat \theta _ {n} $ | ||
+ | in $ X ^ {2} ( \theta ) $, | ||
+ | computed from the non-grouped data $ X _ {1} \dots X _ {n} $, | ||
+ | then under the validity of $ H _ {0} $ | ||
+ | and as $ n \rightarrow \infty $, | ||
+ | the statistic $ X ^ {2} ( \widehat \theta _ {n} ) $ | ||
+ | is distributed in the limit like | ||
+ | |||
+ | $$ | ||
+ | \xi _ {1} ^ {2} + \dots + | ||
+ | \xi _ {k - m - 1 } ^ {2} + | ||
+ | \mu _ {1} \xi _ {k - m } ^ {2} + \dots + | ||
+ | \mu _ {m} \xi _ {k - 1 } ^ {2} , | ||
+ | $$ | ||
+ | |||
+ | where $ \xi _ {1} \dots \xi _ {k - 1 } $ | ||
+ | are independent standard normally-distributed random variables, and the numbers $ \mu _ {1} \dots \mu _ {m} $ | ||
+ | lie between 0 and 1 and, generally speaking, depend upon the unknown parameter $ \theta $. | ||
+ | From this it follows that the use of maximum-likelihood estimators in applications of the "chi-squared" test for the verification of the hypothesis $ H _ {0} $ | ||
+ | leads to difficulties connected with the computation of a non-standard limit distribution. | ||
+ | |||
+ | In [[#References|[3]]]–[[#References|[8]]] there are some recommendations concerning the $ \chi ^ {2} $- | ||
+ | test in this case; in particular, in the normal case [[#References|[3]]], the general continuous case [[#References|[4]]], [[#References|[8]]], the discrete case [[#References|[6]]], [[#References|[8]]], and in the problem of several samples [[#References|[7]]]. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> M.G. Kendall, A. Stuart, "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin (1983)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> D.M. Chibisov, "Certain chi-square type tests for continuous distributions" ''Theory Probab. Appl.'' , '''16''' : 1 (1971) pp. 1–22 ''Teor. Veroyatnost. i Primenen.'' , '''16''' : 1 (1971) pp. 3–20</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> M.S. Nikulin, "Chi-square test for continuous distributions with shift and scale parameters" ''Theory Probab. Appl.'' , '''18''' : 3 (1973) pp. 559–568 ''Teor. Veroyatnost. i Primenen.'' , '''18''' : 3 (1973) pp. 583–592</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> K.O. Dzhaparidze, M.S. Nikulin, "On a modification of the standard statistics of Pearson" ''Theor. Probab. Appl.'' , '''19''' : 4 (1974) pp. 851–853 ''Teor. Veroyatnost. i Primenen.'' , '''19''' : 4 (1974) pp. 886–888</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> M.S. Nikulin, "On a quantile test" ''Theory Probab. Appl.'' , '''19''' : 2 (1974) pp. 410–413 ''Teor. Veroyatnost. i Primenen.'' : 2 (1974) pp. 410–414</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> L.N. Bol'shev, M. Mirvaliev, "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions" ''Theory Probab. Appl.'' , '''23''' : 3 (1974) pp. 461–474 ''Teor. Veroyatnost. i Primenen.'' , '''23''' : 3 (1978) pp. 481–494</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top"> L.N. Bol'shev, M.S. Nikulin, "A certain solution of the homogeneity problem" ''Serdica'' , '''1''' (1975) pp. 104–109 (In Russian)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top"> P.E. Greenwood, M.S. Nikulin, "Investigations in the theory of probabilities distributions. X" ''Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov.'' , '''156''' (1987) pp. 42–65 (In Russian)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> M.G. Kendall, A. Stuart, "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin (1983)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> D.M. Chibisov, "Certain chi-square type tests for continuous distributions" ''Theory Probab. Appl.'' , '''16''' : 1 (1971) pp. 1–22 ''Teor. Veroyatnost. i Primenen.'' , '''16''' : 1 (1971) pp. 3–20</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> M.S. Nikulin, "Chi-square test for continuous distributions with shift and scale parameters" ''Theory Probab. Appl.'' , '''18''' : 3 (1973) pp. 559–568 ''Teor. Veroyatnost. i Primenen.'' , '''18''' : 3 (1973) pp. 583–592</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> K.O. Dzhaparidze, M.S. Nikulin, "On a modification of the standard statistics of Pearson" ''Theor. Probab. Appl.'' , '''19''' : 4 (1974) pp. 851–853 ''Teor. Veroyatnost. i Primenen.'' , '''19''' : 4 (1974) pp. 886–888</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> M.S. Nikulin, "On a quantile test" ''Theory Probab. Appl.'' , '''19''' : 2 (1974) pp. 410–413 ''Teor. Veroyatnost. i Primenen.'' : 2 (1974) pp. 410–414</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> L.N. Bol'shev, M. Mirvaliev, "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions" ''Theory Probab. Appl.'' , '''23''' : 3 (1974) pp. 461–474 ''Teor. Veroyatnost. i Primenen.'' , '''23''' : 3 (1978) pp. 481–494</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top"> L.N. Bol'shev, M.S. Nikulin, "A certain solution of the homogeneity problem" ''Serdica'' , '''1''' (1975) pp. 104–109 (In Russian)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top"> P.E. Greenwood, M.S. Nikulin, "Investigations in the theory of probabilities distributions. X" ''Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov.'' , '''156''' (1987) pp. 42–65 (In Russian)</TD></TR></table> | ||
− | |||
− | |||
====Comments==== | ====Comments==== | ||
− | The "chi-squared" test is also called the "chi-square" test or | + | The "chi-squared" test is also called the "chi-square" test or $ \chi ^ {2} $- |
+ | test. |
Latest revision as of 16:43, 4 June 2020
A test for the verification of a hypothesis $ H _ {0} $
according to which a random vector of frequencies $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $
has a given polynomial distribution, characterized by a vector of positive probabilities $ p = ( p _ {1} \dots p _ {k} ) $,
$ p _ {1} + \dots + p _ {k} = 1 $.
The "chi-squared" test is based on the Pearson statistic
$$ X ^ {2} = \ \sum _ {i = 1 } ^ { k } \frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} } = \ { \frac{1}{n} } \sum \frac{\nu _ {i} ^ {2} }{p _ {i} } - n,\ \ n = \nu _ {1} + \dots + \nu _ {k} , $$
which has in the limit, as $ n \rightarrow \infty $, a "chi-squared" distribution with $ k - 1 $ degrees of freedom, that is,
$$ \lim\limits _ {n \rightarrow \infty } \ {\mathsf P} \{ X ^ {2} \leq x \mid H _ {0} \} = \ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \leq x \} . $$
According to the "chi-squared" test with significance level $ \approx \alpha $, the hypothesis $ H _ {0} $ must be rejected if $ X ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) $, where $ \chi _ {k - 1 } ^ {2} ( \alpha ) $ is the upper $ \alpha $- quantile of the "chi-squared" distribution with $ k - 1 $ degrees of freedom, that is,
$$ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) \} = \alpha . $$
The statistic $ X ^ {2} $ is also used to verify the hypothesis $ H _ {0} $ that the distribution functions of independent identically-distributed random variables $ X _ {1} \dots X _ {k} $ belong to a family of continuous functions $ F ( x, \theta ) $, $ x \in \mathbf R ^ {1} $, $ \theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset \mathbf R ^ {m} $, $ \Theta $ an open set. After dividing the real line by points $ x _ {0} < \dots < x _ {k} $, $ x _ {0} = - \infty $, $ x _ {k} = + \infty $, into $ k $ intervals $ ( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 } , x _ {k} ) $, $ k > m $, such that for all $ \theta \in \Theta $,
$$ p _ {i} ( \theta ) = \ {\mathsf P} \{ X _ {i} \in ( x _ {i - 1 } , x _ {i} ] \} > 0, $$
$ i = 1 \dots k $; $ p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1 $, one forms the frequency vector $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $, which is obtained as a result of grouping the values of the random variables $ X _ {1} \dots X _ {n} $ into these intervals. Let
$$ X ^ {2} ( \theta ) = \ \sum _ {i = 1 } ^ { k } \frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) } $$
be a random variable depending on the unknown parameter $ \theta $. To verify the hypothesis $ H _ {0} $ one uses the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $, where $ \widetilde \theta _ {n} $ is an estimator of the parameter $ \theta $, computed by the method of the minimum of "chi-squared" , that is,
$$ X ^ {2} ( \widetilde \theta _ {n} ) = \ \min _ {\theta \in \Theta } \ X ^ {2} ( \theta ). $$
If the intervals of the grouping are chosen so that all $ p _ {i} ( \theta ) > 0 $, if the functions $ \partial ^ {2} p _ {i} ( \theta )/ \partial \theta _ {j} \partial \theta _ {r} $ are continuous for all $ \theta \in \Theta $, $ i = 1 \dots k $; $ j, r = 1 \dots m $, and if the matrix $ \| \partial p _ {i} ( \theta )/ \partial \theta _ {j} \| $ has rank $ m $, then if the hypothesis $ H _ {0} $ is valid and as $ n \rightarrow \infty $, the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $ has in the limit a "chi-squared" distribution with $ k - m - 1 $ degrees of freedom, which can be used to verify $ H _ {0} $ by the "chi-squared" test. If one substitutes a maximum-likelihood estimator $ \widehat \theta _ {n} $ in $ X ^ {2} ( \theta ) $, computed from the non-grouped data $ X _ {1} \dots X _ {n} $, then under the validity of $ H _ {0} $ and as $ n \rightarrow \infty $, the statistic $ X ^ {2} ( \widehat \theta _ {n} ) $ is distributed in the limit like
$$ \xi _ {1} ^ {2} + \dots + \xi _ {k - m - 1 } ^ {2} + \mu _ {1} \xi _ {k - m } ^ {2} + \dots + \mu _ {m} \xi _ {k - 1 } ^ {2} , $$
where $ \xi _ {1} \dots \xi _ {k - 1 } $ are independent standard normally-distributed random variables, and the numbers $ \mu _ {1} \dots \mu _ {m} $ lie between 0 and 1 and, generally speaking, depend upon the unknown parameter $ \theta $. From this it follows that the use of maximum-likelihood estimators in applications of the "chi-squared" test for the verification of the hypothesis $ H _ {0} $ leads to difficulties connected with the computation of a non-standard limit distribution.
In [3]–[8] there are some recommendations concerning the $ \chi ^ {2} $- test in this case; in particular, in the normal case [3], the general continuous case [4], [8], the discrete case [6], [8], and in the problem of several samples [7].
References
[1] | M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1983) |
[2] | D.M. Chibisov, "Certain chi-square type tests for continuous distributions" Theory Probab. Appl. , 16 : 1 (1971) pp. 1–22 Teor. Veroyatnost. i Primenen. , 16 : 1 (1971) pp. 3–20 |
[3] | M.S. Nikulin, "Chi-square test for continuous distributions with shift and scale parameters" Theory Probab. Appl. , 18 : 3 (1973) pp. 559–568 Teor. Veroyatnost. i Primenen. , 18 : 3 (1973) pp. 583–592 |
[4] | K.O. Dzhaparidze, M.S. Nikulin, "On a modification of the standard statistics of Pearson" Theor. Probab. Appl. , 19 : 4 (1974) pp. 851–853 Teor. Veroyatnost. i Primenen. , 19 : 4 (1974) pp. 886–888 |
[5] | M.S. Nikulin, "On a quantile test" Theory Probab. Appl. , 19 : 2 (1974) pp. 410–413 Teor. Veroyatnost. i Primenen. : 2 (1974) pp. 410–414 |
[6] | L.N. Bol'shev, M. Mirvaliev, "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions" Theory Probab. Appl. , 23 : 3 (1974) pp. 461–474 Teor. Veroyatnost. i Primenen. , 23 : 3 (1978) pp. 481–494 |
[7] | L.N. Bol'shev, M.S. Nikulin, "A certain solution of the homogeneity problem" Serdica , 1 (1975) pp. 104–109 (In Russian) |
[8] | P.E. Greenwood, M.S. Nikulin, "Investigations in the theory of probabilities distributions. X" Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov. , 156 (1987) pp. 42–65 (In Russian) |
Comments
The "chi-squared" test is also called the "chi-square" test or $ \chi ^ {2} $- test.
Chi-squared test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Chi-squared_test&oldid=15852