Difference between revisions of "Smirnov test"

Latest revision as of 08:14, 6 June 2020

Smirnov $ 2 $- samples test

A non-parametric (or distribution-free) statistical test for testing hypotheses about the homogeneity of two samples.

Let $ X _ {1} \dots X _ {n} $ and $ Y _ {1} \dots Y _ {m} $ be mutually-independent random variables, where each sample consists of identically continuously distributed elements, and suppose one wishes to test the hypothesis $ H _ {0} $ that both samples are taken from the same population. If

$$ X _ {(} 1) \leq \dots \leq X _ {(} n) \ \ \textrm{ and } \ Y _ {(} 1) \leq \dots \leq Y _ {(} m) $$

are the order statistics corresponding to the given samples, and $ F _ {n} ( x) $ and $ G _ {m} ( x) $ are the empirical distribution functions corresponding to them, then $ H _ {0} $ can be written in the form of the identity:

$$ H _ {0} :\ {\mathsf E} F _ {n} ( x) \equiv {\mathsf E} G _ {m} ( x) . $$

Further, consider the following hypotheses as possible alternatives to $ H _ {0} $:

$$ H _ {1} ^ {+} :\ \sup _ {| x | < \infty } {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] > 0 , $$

$$ H _ {1} ^ {-} : \ \inf _ {| x | < \infty } {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] < 0 , $$

$$ H _ {1} : \ \sup _ {| x | < \infty } | {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] | > 0 . $$

To test $ H _ {0} $ against the one-sided alternatives $ H _ {1} ^ {+} $ and $ H _ {1} ^ {-} $, and also against the two-sided $ H _ {1} $, N.V. Smirnov proposed a test based on the statistics

$$ D _ {m,n} ^ {+} = \sup _ {| x | < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = $$

$$ = \ \max _ {1 \leq k \leq m } \left ( \frac{k}{m} - F _ {n} ( Y _ {(} k) ) \right ) = \max _ {1 \leq s \leq n } \left ( G _ {m} ( X _ {(} s) ) - s- \frac{1}{n} \right ) , $$

$$ D _ {m,n} ^ {-} = - \inf _ {| x| < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = $$

$$ = \ \max _ {i \leq k \leq m } \left ( F _ {n} ( Y _ {(} k) ) - k- \frac{1}{m} \right ) = \max _ {1 \leq s \leq n } \left ( \frac{s}{n} - G _ {m} ( X _ {(} s) ) \right ) , $$

$$ D _ {m,n} = \sup _ {| x | < \infty } | G _ {m} ( x) - F _ {n} ( x) | = \max ( D _ {m,n} ^ {+} , D _ {m,n} ^ {-} ), $$

respectively, where it follows from the definitions of $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ that under the hypothesis $ H _ {0} $, $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ have the same distribution. Asymptotic tests can be based on the following theorem: If $ \min ( m , n ) \rightarrow \infty $, then the validity of $ H _ {0} $ implies that

$$ \lim\limits _ {m \rightarrow \infty } {\mathsf P} \left \{ \sqrt { \frac{mn}{m+} n } D _ {m,n} ^ {+} < y \right \} = 1 - e ^ {- 2 y ^ {2} } ,\ y > 0 , $$

$$ \lim\limits _ {m \rightarrow \infty } {\mathsf P} \left \{ \sqrt { \frac{mn}{m+} n } D _ {m,n} < y \right \} = K ( y) ,\ y > 0 , $$

where $ K ( y) $ is the Kolmogorov distribution function (cf. Statistical estimator). Asymptotic expansions for the distribution functions of the statistics $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ have been found (see [4]–[6]).

Using the Smirnov test with significance level $ \alpha $, $ H _ {0} $ may be rejected in favour of one of the above alternatives $ H _ {1} ^ {+} $, $ H _ {1} ^ {-} $ when the corresponding statistic exceeds the $ \alpha $- critical value of the test; this value can be calculated using the approximations obtained by L.N. Bol'shev [2] by means of Pearson asymptotic transformations.

References

[1]	N.V. Smirnov, "Estimates of the divergence between empirical distribution curves in two independent samples" Byull. Moskov. Gosudarstv. Univ. (A) , 2 : 2 (1939) pp. 3–14
[2]	L.N. Bol'shev, "Asymptotically Pearson transformations" Theor. Probab. Appl. , 8 (1963) pp. 121–146 Teor. Veroyatnost. i Primenen. , 8 : 2 (1963) pp. 129–155
[3]	L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)
[4]	V.S. Korolyuk, "Asymptotic analysis of the distribution of the maximum deviation in the Bernoulli scheme" Theor. Probab. Appl. , 4 (1959) pp. 339–366 Teor. Veroyatnost. i Primenen. , 4 (1959) pp. 369–397
[5]	Li-Chien Chang, "On the exact distribution of A.N. Kolmogorov's statistic and its asymptotic expansion (I and II)" Matematika , 4 : 2 (1960) pp. 135–139 (In Russian)
[6]	A.A. Borovkov, "On the two-sample problem" Izv. Akad. Nauk SSSR Ser. Mat. , 26 : 4 (1962) pp. 605–624 (In Russian)

Comments

References

[a1]	D.B. Owen, "A handbook of statistical tables" , Addison-Wesley (1962)
[a2]	E.S. Pearson, H.O. Hartley, "Biometrika tables for statisticians" , 2 , Cambridge Univ. Press (1972)

How to Cite This Entry:
Smirnov test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Smirnov_test&oldid=48739

This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Smirnov test"

Latest revision as of 08:14, 6 June 2020

References

Comments

References

@@ Line 1: / Line 1: @@
-''Smirnov <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858502.png" />-samples test''
+<!--
+s0858502.png
+$#A+1 = 39 n = 0
+$#C+1 = 39 : ~/encyclopedia/old_files/data/S085/S.0805850 Smirnov test,
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
+''Smirnov  $  2 $-
+samples test''
 A non-parametric (or distribution-free) statistical test for testing hypotheses about the homogeneity of two samples.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858503.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858504.png" /> be mutually-independent random variables, where each sample consists of identically continuously distributed elements, and suppose one wishes to test the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858505.png" /> that both samples are taken from the same population. If
+Let  $  X _ {1} \dots X _ {n} $
+and  $  Y _ {1} \dots Y _ {m} $
+be mutually-independent random variables, where each sample consists of identically continuously distributed elements, and suppose one wishes to test the hypothesis  $  H _ {0} $
+that both samples are taken from the same population. If
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858506.png" /></td> </tr></table>
+$$
+X _ {(} 1)  \leq  \dots \leq   X _ {(} n) \ \
+\textrm{ and } \  Y _ {(} 1)  \leq  \dots \leq   Y _ {(} m)
+$$
-are the order statistics corresponding to the given samples, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858507.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858508.png" /> are the empirical distribution functions corresponding to them, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s0858509.png" /> can be written in the form of the identity:
+are the order statistics corresponding to the given samples, and  $  F _ {n} ( x) $
+and  $  G _ {m} ( x) $
+are the empirical distribution functions corresponding to them, then  $  H _ {0} $
+can be written in the form of the identity:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585010.png" /></td> </tr></table>
+$$
+H _ {0} :\  {\mathsf E} F _ {n} ( x)  \equiv  {\mathsf E} G _ {m} ( x) .
+$$
-Further, consider the following hypotheses as possible alternatives to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585011.png" />:
+Further, consider the following hypotheses as possible alternatives to  $  H _ {0} $:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585012.png" /></td> </tr></table>
+$$
+H _ {1}  ^ {+} :\  \sup _
+{| x | < \infty }  {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ]  >  0 ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585013.png" /></td> </tr></table>
+$$
+H _ {1}  ^ {-} : \  \inf _ {| x | < \infty }
+ {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ]  <  0 ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585014.png" /></td> </tr></table>
+$$
+H _ {1} : \  \sup _ {| x | < \infty }  |
+{\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] |  >  0 .
+$$
-To test <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585015.png" /> against the one-sided alternatives <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585016.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585017.png" />, and also against the two-sided <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585018.png" />, N.V. Smirnov proposed a test based on the statistics
+To test  $  H _ {0} $
+against the one-sided alternatives  $  H _ {1}  ^ {+} $
+and  $  H _ {1}  ^ {-} $,
+and also against the two-sided  $  H _ {1} $,
+N.V. Smirnov proposed a test based on the statistics
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585019.png" /></td> </tr></table>
+$$
+D _ {m,n}  ^ {+}  =  \sup _
+{| x | < \infty }  [ G _ {m} ( x) - F _ {n} ( x) ] =
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585020.png" /></td> </tr></table>
+$$
+= \
+\max _ {1 \leq  k \leq  m }  \left (
+\frac{k}{m}
+ - F _ {n} ( Y _ {(} k) ) \right )  =  \max _ {1 \leq  s \leq  n }
+ \left ( G _ {m} ( X _ {(} s) ) - s-
+\frac{1}{n}
+ \right ) ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585021.png" /></td> </tr></table>
+$$
+D _ {m,n}  ^ {-}  =  - \inf _ {| x| < \infty }  [ G _ {m} ( x) - F _ {n} ( x) ] =
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585022.png" /></td> </tr></table>
+$$
+= \
+\max _ {i \leq  k \leq  m }  \left ( F _ {n} ( Y _ {(} k) ) - k-
+\frac{1}{m}
+ \right )  =  \max _ {1 \leq  s \leq  n }
+ \left (
+\frac{s}{n}
+ - G _ {m} ( X _ {(} s) ) \right ) ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585023.png" /></td> </tr></table>
+$$
+D _ {m,n}  =  \sup _ {| x | < \infty }  |
+G _ {m} ( x) - F _ {n} ( x) |  =  \max  ( D _ {m,n}  ^ {+} , D _ {m,n}  ^ {-} ),
+$$
-respectively, where it follows from the definitions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585024.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585025.png" /> that under the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585027.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585028.png" /> have the same distribution. Asymptotic tests can be based on the following theorem: If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585029.png" />, then the validity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585030.png" /> implies that
+respectively, where it follows from the definitions of  $  D _ {m,n}  ^ {+} $
+and  $  D _ {m,n}  ^ {-} $
+that under the hypothesis  $  H _ {0} $,
+$  D _ {m,n}  ^ {+} $
+and  $  D _ {m,n}  ^ {-} $
+have the same distribution. Asymptotic tests can be based on the following theorem: If  $  \min ( m , n ) \rightarrow \infty $,
+then the validity of  $  H _ {0} $
+implies that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585031.png" /></td> </tr></table>
+$$
+\lim\limits _ {m \rightarrow \infty }  {\mathsf P}
+\left \{ \sqrt {
+\frac{mn}{m+}
+ n } D _ {m,n}  ^ {+} < y \right \}
+ =  1 - e ^ {- 2 y  ^ {2} } ,\  y > 0 ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585032.png" /></td> </tr></table>
+$$
+\lim\limits _ {m \rightarrow \infty }  {\mathsf P} \left \{ \sqrt
+{
+\frac{mn}{m+}
+ n } D _ {m,n} < y \right \}  =  K ( y) ,\  y > 0 ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585033.png" /> is the Kolmogorov distribution function (cf. [[Statistical estimator|Statistical estimator]]). Asymptotic expansions for the distribution functions of the statistics <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585034.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585035.png" /> have been found (see [[#References|[4]]]–[[#References|[6]]]).
+where  $  K ( y) $
+is the Kolmogorov distribution function (cf. [[Statistical estimator|Statistical estimator]]). Asymptotic expansions for the distribution functions of the statistics  $  D _ {m,n}  ^ {+} $
+and  $  D _ {m,n}  ^ {-} $
+have been found (see [[#References|[4]]]–[[#References|[6]]]).
-Using the Smirnov test with significance level <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585036.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585037.png" /> may be rejected in favour of one of the above alternatives <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585038.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585039.png" /> when the corresponding statistic exceeds the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s085/s085850/s08585040.png" />-critical value of the test; this value can be calculated using the approximations obtained by L.N. Bol'shev [[#References|[2]]] by means of Pearson asymptotic transformations.
+Using the Smirnov test with significance level  $  \alpha $,
+$  H _ {0} $
+may be rejected in favour of one of the above alternatives  $  H _ {1}  ^ {+} $,
+$  H _ {1}  ^ {-} $
+when the corresponding statistic exceeds the  $  \alpha $-
+critical value of the test; this value can be calculated using the approximations obtained by L.N. Bol'shev [[#References|[2]]] by means of Pearson asymptotic transformations.
 See also [[Kolmogorov test|Kolmogorov test]]; [[Kolmogorov–Smirnov test|Kolmogorov–Smirnov test]].
@@ Line 45: / Line 132: @@
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  N.V. Smirnov,   "Estimates of the divergence between empirical distribution curves in two independent samples"  ''Byull. Moskov. Gosudarstv. Univ. (A)'' , '''2''' :  2  (1939)  pp. 3–14</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  L.N. Bol'shev,   "Asymptotically Pearson transformations"  ''Theor. Probab. Appl.'' , '''8'''  (1963)  pp. 121–146  ''Teor. Veroyatnost. i Primenen.'' , '''8''' :  2  (1963)  pp. 129–155</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  L.N. Bol'shev,   N.V. Smirnov,   "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka  (1983)  (In Russian)  (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  V.S. Korolyuk,   "Asymptotic analysis of the distribution of the maximum deviation in the Bernoulli scheme"  ''Theor. Probab. Appl.'' , '''4'''  (1959)  pp. 339–366  ''Teor. Veroyatnost. i Primenen.'' , '''4'''  (1959)  pp. 369–397</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  Li-Chien Chang,   "On the exact distribution of A.N. Kolmogorov's statistic and its asymptotic expansion (I and II)"  ''Matematika'' , '''4''' :  2  (1960)  pp. 135–139  (In Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  A.A. Borovkov,   "On the two-sample problem"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''26''' :  4  (1962)  pp. 605–624  (In Russian)</TD></TR></table>
 ====Comments====
 ====References====
 <table><TR><TD valign="top">[a1]</TD> <TD valign="top">  D.B. Owen,   "A handbook of statistical tables" , Addison-Wesley  (1962)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  E.S. Pearson,   H.O. Hartley,   "Biometrika tables for statisticians" , '''2''' , Cambridge Univ. Press  (1972)</TD></TR></table>