Namespaces
Variants
Actions

Difference between revisions of "Chi-squared test"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
A test for the verification of a hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221101.png" /> according to which a random vector of frequencies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221102.png" /> has a given polynomial distribution, characterized by a vector of positive probabilities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221103.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221104.png" />. The  "chi-squared"  test is based on the Pearson statistic
+
<!--
 +
c0221101.png
 +
$#A+1 = 66 n = 0
 +
$#C+1 = 66 : ~/encyclopedia/old_files/data/C022/C.0202110 \BQT Chi\AAhsquared\EQT test
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221105.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
which has in the limit, as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221106.png" />, a [["Chi-squared" distribution| "chi-squared"  distribution]] with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221107.png" /> degrees of freedom, that is,
+
A test for the verification of a hypothesis  $  H _ {0} $
 +
according to which a random vector of frequencies  $  \nu = ( \nu _ {1} \dots \nu _ {k} ) $
 +
has a given polynomial distribution, characterized by a vector of positive probabilities  $  p = ( p _ {1} \dots p _ {k} ) $,
 +
$  p _ {1} + \dots + p _ {k} = 1 $.  
 +
The  "chi-squared"  test is based on the Pearson statistic
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221108.png" /></td> </tr></table>
+
$$
 +
X  ^ {2}  = \
 +
\sum _ {i = 1 } ^ { k }
  
According to the  "chi-squared" test with significance level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c0221109.png" />, the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211010.png" /> must be rejected if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211011.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211012.png" /> is the upper <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211013.png" />-quantile of the  "chi-squared" distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211014.png" /> degrees of freedom, that is,
+
\frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} }
 +
  = \
 +
{
 +
\frac{1}{n}
 +
  } \sum
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211015.png" /></td> </tr></table>
+
\frac{\nu _ {i}  ^ {2} }{p _ {i} }
 +
- n,\ \
 +
n = \nu _ {1} + \dots + \nu _ {k} ,
 +
$$
  
The statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211016.png" /> is also used to verify the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211017.png" /> that the distribution functions of independent identically-distributed random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211018.png" /> belong to a family of continuous functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211019.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211020.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211021.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211022.png" /> an open set. After dividing the real line by points <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211023.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211024.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211025.png" />, into <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211026.png" /> intervals <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211027.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211028.png" />, such that for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211029.png" />,
+
which has in the limit, as  $  n \rightarrow \infty $,
 +
a [[Chi-squared distribution| "chi-squared" distribution]] with  $  k - 1 $
 +
degrees of freedom, that is,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211030.png" /></td> </tr></table>
+
$$
 +
\lim\limits _ {n \rightarrow \infty } \
 +
{\mathsf P} \{ X  ^ {2} \leq
 +
x \mid  H _ {0} \}  = \
 +
{\mathsf P} \{ \chi _ {k - 1 }  ^ {2} \leq  x \} .
 +
$$
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211031.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211032.png" />, one forms the frequency vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211033.png" />, which is obtained as a result of grouping the values of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211034.png" /> into these intervals. Let
+
According to the  "chi-squared" test with significance level  $  \approx \alpha $,  
 +
the hypothesis  $  H _ {0} $
 +
must be rejected if  $  X  ^ {2} \geq  \chi _ {k - 1 }  ^ {2} ( \alpha ) $,  
 +
where  $  \chi _ {k - 1 }  ^ {2} ( \alpha ) $
 +
is the upper  $  \alpha $-
 +
quantile of the "chi-squared" distribution with  $  k - 1 $
 +
degrees of freedom, that is,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211035.png" /></td> </tr></table>
+
$$
 +
{\mathsf P} \{
 +
\chi _ {k - 1 }  ^ {2} \geq  \chi _ {k - 1 }  ^ {2} ( \alpha )
 +
\}  = \alpha .
 +
$$
  
be a random variable depending on the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211036.png" />. To verify the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211037.png" /> one uses the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211038.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211039.png" /> is an estimator of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211040.png" />, computed by the method of the minimum of "chi-squared" , that is,
+
The statistic  $  X  ^ {2} $
 +
is also used to verify the hypothesis $  H _ {0} $
 +
that the distribution functions of independent identically-distributed random variables  $  X _ {1} \dots X _ {k} $
 +
belong to a family of continuous functions  $  F ( x, \theta ) $,
 +
$  x \in \mathbf R  ^ {1} $,
 +
$  \theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset  \mathbf R  ^ {m} $,
 +
$  \Theta $
 +
an open set. After dividing the real line by points  $  x _ {0} < \dots < x _ {k} $,
 +
$  x _ {0} = - \infty $,  
 +
$  x _ {k} = + \infty $,
 +
into  $  k $
 +
intervals  $  ( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 }  , x _ {k} ) $,  
 +
$ k > m $,  
 +
such that for all  $  \theta \in \Theta $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211041.png" /></td> </tr></table>
+
$$
 +
p _ {i} ( \theta )  = \
 +
{\mathsf P} \{ X _ {i} \in
 +
( x _ {i - 1 }  , x _ {i} ]
 +
\}  > 0,
 +
$$
  
If the intervals of the grouping are chosen so that all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211042.png" />, if the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211043.png" /> are continuous for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211044.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211045.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211046.png" />, and if the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211047.png" /> has rank <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211048.png" />, then if the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211049.png" /> is valid and as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211050.png" />, the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211051.png" /> has in the limit a  "chi-squared"  distribution with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211052.png" /> degrees of freedom, which can be used to verify <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211053.png" /> by the  "chi-squared" test. If one substitutes a maximum-likelihood estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211054.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211055.png" />, computed from the non-grouped data <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211056.png" />, then under the validity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211057.png" /> and as <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211058.png" />, the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211059.png" /> is distributed in the limit like
+
$  i = 1 \dots k $;  
 +
$  p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1 $,  
 +
one forms the frequency vector  $  \nu = ( \nu _ {1} \dots \nu _ {k} ) $,  
 +
which is obtained as a result of grouping the values of the random variables $ X _ {1} \dots X _ {n} $
 +
into these intervals. Let
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211060.png" /></td> </tr></table>
+
$$
 +
X  ^ {2} ( \theta )  = \
 +
\sum _ {i = 1 } ^ { k }
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211061.png" /> are independent standard normally-distributed random variables, and the numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211062.png" /> lie between 0 and 1 and, generally speaking, depend upon the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211063.png" />. From this it follows that the use of maximum-likelihood estimators in applications of the  "chi-squared" test for the verification of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211064.png" /> leads to difficulties connected with the computation of a non-standard limit distribution.
+
\frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) }
  
In [[#References|[3]]]–[[#References|[8]]] there are some recommendations concerning the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211065.png" />-test in this case; in particular, in the normal case [[#References|[3]]], the general continuous case [[#References|[4]]], [[#References|[8]]], the discrete case [[#References|[6]]], [[#References|[8]]], and in the problem of several samples [[#References|[7]]].
+
$$
 +
 
 +
be a random variable depending on the unknown parameter  $  \theta $.
 +
To verify the hypothesis  $  H _ {0} $
 +
one uses the statistic  $  X  ^ {2} ( \widetilde \theta  _ {n} ) $,
 +
where  $  \widetilde \theta  _ {n} $
 +
is an estimator of the parameter  $  \theta $,
 +
computed by the method of the minimum of  "chi-squared" , that is,
 +
 
 +
$$
 +
X  ^ {2} ( \widetilde \theta  _ {n} )  = \
 +
\min _ {\theta \in \Theta } \
 +
X  ^ {2} ( \theta ).
 +
$$
 +
 
 +
If the intervals of the grouping are chosen so that all  $  p _ {i} ( \theta ) > 0 $,
 +
if the functions  $  \partial  ^ {2} p _ {i} ( \theta )/ \partial  \theta _ {j} \partial  \theta _ {r} $
 +
are continuous for all  $  \theta \in \Theta $,
 +
$  i = 1 \dots k $;
 +
$  j, r = 1 \dots m $,
 +
and if the matrix  $  \| \partial  p _ {i} ( \theta )/ \partial  \theta _ {j} \| $
 +
has rank  $  m $,
 +
then if the hypothesis  $  H _ {0} $
 +
is valid and as  $  n \rightarrow \infty $,
 +
the statistic  $  X  ^ {2} ( \widetilde \theta  _ {n} ) $
 +
has in the limit a  "chi-squared"  distribution with  $  k - m - 1 $
 +
degrees of freedom, which can be used to verify  $  H _ {0} $
 +
by the  "chi-squared"  test. If one substitutes a maximum-likelihood estimator  $  \widehat \theta  _ {n} $
 +
in  $  X  ^ {2} ( \theta ) $,
 +
computed from the non-grouped data  $  X _ {1} \dots X _ {n} $,
 +
then under the validity of  $  H _ {0} $
 +
and as  $  n \rightarrow \infty $,
 +
the statistic  $  X  ^ {2} ( \widehat \theta  _ {n} ) $
 +
is distributed in the limit like
 +
 
 +
$$
 +
\xi _ {1}  ^ {2} + \dots +
 +
\xi _ {k - m - 1 }  ^ {2} +
 +
\mu _ {1} \xi _ {k - m }  ^ {2} + \dots +
 +
\mu _ {m} \xi _ {k - 1 }  ^ {2} ,
 +
$$
 +
 
 +
where  $  \xi _ {1} \dots \xi _ {k - 1 }  $
 +
are independent standard normally-distributed random variables, and the numbers  $  \mu _ {1} \dots \mu _ {m} $
 +
lie between 0 and 1 and, generally speaking, depend upon the unknown parameter  $  \theta $.
 +
From this it follows that the use of maximum-likelihood estimators in applications of the  "chi-squared"  test for the verification of the hypothesis  $  H _ {0} $
 +
leads to difficulties connected with the computation of a non-standard limit distribution.
 +
 
 +
In [[#References|[3]]]–[[#References|[8]]] there are some recommendations concerning the $  \chi  ^ {2} $-
 +
test in this case; in particular, in the normal case [[#References|[3]]], the general continuous case [[#References|[4]]], [[#References|[8]]], the discrete case [[#References|[6]]], [[#References|[8]]], and in the problem of several samples [[#References|[7]]].
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1983)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  D.M. Chibisov,  "Certain chi-square type tests for continuous distributions"  ''Theory Probab. Appl.'' , '''16''' :  1  (1971)  pp. 1–22  ''Teor. Veroyatnost. i Primenen.'' , '''16''' :  1  (1971)  pp. 3–20</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.S. Nikulin,  "Chi-square test for continuous distributions with shift and scale parameters"  ''Theory Probab. Appl.'' , '''18''' :  3  (1973)  pp. 559–568  ''Teor. Veroyatnost. i Primenen.'' , '''18''' :  3  (1973)  pp. 583–592</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  K.O. Dzhaparidze,  M.S. Nikulin,  "On a modification of the standard statistics of Pearson"  ''Theor. Probab. Appl.'' , '''19''' :  4  (1974)  pp. 851–853  ''Teor. Veroyatnost. i Primenen.'' , '''19''' :  4  (1974)  pp. 886–888</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  M.S. Nikulin,  "On a quantile test"  ''Theory Probab. Appl.'' , '''19''' :  2  (1974)  pp. 410–413  ''Teor. Veroyatnost. i Primenen.'' :  2  (1974)  pp. 410–414</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  L.N. Bol'shev,  M. Mirvaliev,  "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions"  ''Theory Probab. Appl.'' , '''23''' :  3  (1974)  pp. 461–474  ''Teor. Veroyatnost. i Primenen.'' , '''23''' :  3  (1978)  pp. 481–494</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  L.N. Bol'shev,  M.S. Nikulin,  "A certain solution of the homogeneity problem"  ''Serdica'' , '''1'''  (1975)  pp. 104–109  (In Russian)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  P.E. Greenwood,  M.S. Nikulin,  "Investigations in the theory of probabilities distributions. X"  ''Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov.'' , '''156'''  (1987)  pp. 42–65  (In Russian)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1983)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  D.M. Chibisov,  "Certain chi-square type tests for continuous distributions"  ''Theory Probab. Appl.'' , '''16''' :  1  (1971)  pp. 1–22  ''Teor. Veroyatnost. i Primenen.'' , '''16''' :  1  (1971)  pp. 3–20</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  M.S. Nikulin,  "Chi-square test for continuous distributions with shift and scale parameters"  ''Theory Probab. Appl.'' , '''18''' :  3  (1973)  pp. 559–568  ''Teor. Veroyatnost. i Primenen.'' , '''18''' :  3  (1973)  pp. 583–592</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  K.O. Dzhaparidze,  M.S. Nikulin,  "On a modification of the standard statistics of Pearson"  ''Theor. Probab. Appl.'' , '''19''' :  4  (1974)  pp. 851–853  ''Teor. Veroyatnost. i Primenen.'' , '''19''' :  4  (1974)  pp. 886–888</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  M.S. Nikulin,  "On a quantile test"  ''Theory Probab. Appl.'' , '''19''' :  2  (1974)  pp. 410–413  ''Teor. Veroyatnost. i Primenen.'' :  2  (1974)  pp. 410–414</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  L.N. Bol'shev,  M. Mirvaliev,  "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions"  ''Theory Probab. Appl.'' , '''23''' :  3  (1974)  pp. 461–474  ''Teor. Veroyatnost. i Primenen.'' , '''23''' :  3  (1978)  pp. 481–494</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  L.N. Bol'shev,  M.S. Nikulin,  "A certain solution of the homogeneity problem"  ''Serdica'' , '''1'''  (1975)  pp. 104–109  (In Russian)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  P.E. Greenwood,  M.S. Nikulin,  "Investigations in the theory of probabilities distributions. X"  ''Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov.'' , '''156'''  (1987)  pp. 42–65  (In Russian)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
The  "chi-squared"  test is also called the  "chi-square"  test or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/c/c022/c022110/c02211067.png" />-test.
+
The  "chi-squared"  test is also called the  "chi-square"  test or $  \chi  ^ {2} $-
 +
test.

Latest revision as of 16:43, 4 June 2020


A test for the verification of a hypothesis $ H _ {0} $ according to which a random vector of frequencies $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $ has a given polynomial distribution, characterized by a vector of positive probabilities $ p = ( p _ {1} \dots p _ {k} ) $, $ p _ {1} + \dots + p _ {k} = 1 $. The "chi-squared" test is based on the Pearson statistic

$$ X ^ {2} = \ \sum _ {i = 1 } ^ { k } \frac{( \nu _ {i} - np _ {i} ) ^ {2} }{np _ {i} } = \ { \frac{1}{n} } \sum \frac{\nu _ {i} ^ {2} }{p _ {i} } - n,\ \ n = \nu _ {1} + \dots + \nu _ {k} , $$

which has in the limit, as $ n \rightarrow \infty $, a "chi-squared" distribution with $ k - 1 $ degrees of freedom, that is,

$$ \lim\limits _ {n \rightarrow \infty } \ {\mathsf P} \{ X ^ {2} \leq x \mid H _ {0} \} = \ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \leq x \} . $$

According to the "chi-squared" test with significance level $ \approx \alpha $, the hypothesis $ H _ {0} $ must be rejected if $ X ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) $, where $ \chi _ {k - 1 } ^ {2} ( \alpha ) $ is the upper $ \alpha $- quantile of the "chi-squared" distribution with $ k - 1 $ degrees of freedom, that is,

$$ {\mathsf P} \{ \chi _ {k - 1 } ^ {2} \geq \chi _ {k - 1 } ^ {2} ( \alpha ) \} = \alpha . $$

The statistic $ X ^ {2} $ is also used to verify the hypothesis $ H _ {0} $ that the distribution functions of independent identically-distributed random variables $ X _ {1} \dots X _ {k} $ belong to a family of continuous functions $ F ( x, \theta ) $, $ x \in \mathbf R ^ {1} $, $ \theta = ( \theta _ {1} \dots \theta _ {m} ) \in \Theta \subset \mathbf R ^ {m} $, $ \Theta $ an open set. After dividing the real line by points $ x _ {0} < \dots < x _ {k} $, $ x _ {0} = - \infty $, $ x _ {k} = + \infty $, into $ k $ intervals $ ( x _ {0} , x _ {1} ] \dots ( x _ {k - 1 } , x _ {k} ) $, $ k > m $, such that for all $ \theta \in \Theta $,

$$ p _ {i} ( \theta ) = \ {\mathsf P} \{ X _ {i} \in ( x _ {i - 1 } , x _ {i} ] \} > 0, $$

$ i = 1 \dots k $; $ p _ {1} ( \theta ) + \dots + p _ {k} ( \theta ) = 1 $, one forms the frequency vector $ \nu = ( \nu _ {1} \dots \nu _ {k} ) $, which is obtained as a result of grouping the values of the random variables $ X _ {1} \dots X _ {n} $ into these intervals. Let

$$ X ^ {2} ( \theta ) = \ \sum _ {i = 1 } ^ { k } \frac{[ \nu _ {i} - np _ {i} ( \theta )] ^ {2} }{np _ {i} ( \theta ) } $$

be a random variable depending on the unknown parameter $ \theta $. To verify the hypothesis $ H _ {0} $ one uses the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $, where $ \widetilde \theta _ {n} $ is an estimator of the parameter $ \theta $, computed by the method of the minimum of "chi-squared" , that is,

$$ X ^ {2} ( \widetilde \theta _ {n} ) = \ \min _ {\theta \in \Theta } \ X ^ {2} ( \theta ). $$

If the intervals of the grouping are chosen so that all $ p _ {i} ( \theta ) > 0 $, if the functions $ \partial ^ {2} p _ {i} ( \theta )/ \partial \theta _ {j} \partial \theta _ {r} $ are continuous for all $ \theta \in \Theta $, $ i = 1 \dots k $; $ j, r = 1 \dots m $, and if the matrix $ \| \partial p _ {i} ( \theta )/ \partial \theta _ {j} \| $ has rank $ m $, then if the hypothesis $ H _ {0} $ is valid and as $ n \rightarrow \infty $, the statistic $ X ^ {2} ( \widetilde \theta _ {n} ) $ has in the limit a "chi-squared" distribution with $ k - m - 1 $ degrees of freedom, which can be used to verify $ H _ {0} $ by the "chi-squared" test. If one substitutes a maximum-likelihood estimator $ \widehat \theta _ {n} $ in $ X ^ {2} ( \theta ) $, computed from the non-grouped data $ X _ {1} \dots X _ {n} $, then under the validity of $ H _ {0} $ and as $ n \rightarrow \infty $, the statistic $ X ^ {2} ( \widehat \theta _ {n} ) $ is distributed in the limit like

$$ \xi _ {1} ^ {2} + \dots + \xi _ {k - m - 1 } ^ {2} + \mu _ {1} \xi _ {k - m } ^ {2} + \dots + \mu _ {m} \xi _ {k - 1 } ^ {2} , $$

where $ \xi _ {1} \dots \xi _ {k - 1 } $ are independent standard normally-distributed random variables, and the numbers $ \mu _ {1} \dots \mu _ {m} $ lie between 0 and 1 and, generally speaking, depend upon the unknown parameter $ \theta $. From this it follows that the use of maximum-likelihood estimators in applications of the "chi-squared" test for the verification of the hypothesis $ H _ {0} $ leads to difficulties connected with the computation of a non-standard limit distribution.

In [3][8] there are some recommendations concerning the $ \chi ^ {2} $- test in this case; in particular, in the normal case [3], the general continuous case [4], [8], the discrete case [6], [8], and in the problem of several samples [7].

References

[1] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1983)
[2] D.M. Chibisov, "Certain chi-square type tests for continuous distributions" Theory Probab. Appl. , 16 : 1 (1971) pp. 1–22 Teor. Veroyatnost. i Primenen. , 16 : 1 (1971) pp. 3–20
[3] M.S. Nikulin, "Chi-square test for continuous distributions with shift and scale parameters" Theory Probab. Appl. , 18 : 3 (1973) pp. 559–568 Teor. Veroyatnost. i Primenen. , 18 : 3 (1973) pp. 583–592
[4] K.O. Dzhaparidze, M.S. Nikulin, "On a modification of the standard statistics of Pearson" Theor. Probab. Appl. , 19 : 4 (1974) pp. 851–853 Teor. Veroyatnost. i Primenen. , 19 : 4 (1974) pp. 886–888
[5] M.S. Nikulin, "On a quantile test" Theory Probab. Appl. , 19 : 2 (1974) pp. 410–413 Teor. Veroyatnost. i Primenen. : 2 (1974) pp. 410–414
[6] L.N. Bol'shev, M. Mirvaliev, "Chi-square goodness-of-fit test for the Poisson, binomial and negative binomial distributions" Theory Probab. Appl. , 23 : 3 (1974) pp. 461–474 Teor. Veroyatnost. i Primenen. , 23 : 3 (1978) pp. 481–494
[7] L.N. Bol'shev, M.S. Nikulin, "A certain solution of the homogeneity problem" Serdica , 1 (1975) pp. 104–109 (In Russian)
[8] P.E. Greenwood, M.S. Nikulin, "Investigations in the theory of probabilities distributions. X" Zap. Nauchn. Sem. Leningr. Otdel. Mat. Inst. Steklov. , 156 (1987) pp. 42–65 (In Russian)

Comments

The "chi-squared" test is also called the "chi-square" test or $ \chi ^ {2} $- test.

How to Cite This Entry:
Chi-squared test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Chi-squared_test&oldid=15852
This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article