Difference between revisions of "Smirnov test"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | s0858502.png | ||
+ | $#A+1 = 39 n = 0 | ||
+ | $#C+1 = 39 : ~/encyclopedia/old_files/data/S085/S.0805850 Smirnov test, | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
+ | ''Smirnov $ 2 $- | ||
+ | samples test'' | ||
A non-parametric (or distribution-free) statistical test for testing hypotheses about the homogeneity of two samples. | A non-parametric (or distribution-free) statistical test for testing hypotheses about the homogeneity of two samples. | ||
− | Let | + | Let $ X _ {1} \dots X _ {n} $ |
+ | and $ Y _ {1} \dots Y _ {m} $ | ||
+ | be mutually-independent random variables, where each sample consists of identically continuously distributed elements, and suppose one wishes to test the hypothesis $ H _ {0} $ | ||
+ | that both samples are taken from the same population. If | ||
− | + | $$ | |
+ | X _ {(} 1) \leq \dots \leq X _ {(} n) \ \ | ||
+ | \textrm{ and } \ Y _ {(} 1) \leq \dots \leq Y _ {(} m) | ||
+ | $$ | ||
− | are the order statistics corresponding to the given samples, and | + | are the order statistics corresponding to the given samples, and $ F _ {n} ( x) $ |
+ | and $ G _ {m} ( x) $ | ||
+ | are the empirical distribution functions corresponding to them, then $ H _ {0} $ | ||
+ | can be written in the form of the identity: | ||
− | + | $$ | |
+ | H _ {0} :\ {\mathsf E} F _ {n} ( x) \equiv {\mathsf E} G _ {m} ( x) . | ||
+ | $$ | ||
− | Further, consider the following hypotheses as possible alternatives to | + | Further, consider the following hypotheses as possible alternatives to $ H _ {0} $: |
− | + | $$ | |
+ | H _ {1} ^ {+} :\ \sup _ | ||
+ | {| x | < \infty } {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] > 0 , | ||
+ | $$ | ||
− | + | $$ | |
+ | H _ {1} ^ {-} : \ \inf _ {| x | < \infty } | ||
+ | {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] < 0 , | ||
+ | $$ | ||
− | + | $$ | |
+ | H _ {1} : \ \sup _ {| x | < \infty } | | ||
+ | {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] | > 0 . | ||
+ | $$ | ||
− | To test | + | To test $ H _ {0} $ |
+ | against the one-sided alternatives $ H _ {1} ^ {+} $ | ||
+ | and $ H _ {1} ^ {-} $, | ||
+ | and also against the two-sided $ H _ {1} $, | ||
+ | N.V. Smirnov proposed a test based on the statistics | ||
− | + | $$ | |
+ | D _ {m,n} ^ {+} = \sup _ | ||
+ | {| x | < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \max _ {1 \leq k \leq m } \left ( | ||
+ | \frac{k}{m} | ||
+ | - F _ {n} ( Y _ {(} k) ) \right ) = \max _ {1 \leq s \leq n } | ||
+ | \left ( G _ {m} ( X _ {(} s) ) - s- | ||
+ | \frac{1}{n} | ||
+ | \right ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | D _ {m,n} ^ {-} = - \inf _ {| x| < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \max _ {i \leq k \leq m } \left ( F _ {n} ( Y _ {(} k) ) - k- | ||
+ | \frac{1}{m} | ||
+ | \right ) = \max _ {1 \leq s \leq n } | ||
+ | \left ( | ||
+ | \frac{s}{n} | ||
+ | - G _ {m} ( X _ {(} s) ) \right ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | D _ {m,n} = \sup _ {| x | < \infty } | | ||
+ | G _ {m} ( x) - F _ {n} ( x) | = \max ( D _ {m,n} ^ {+} , D _ {m,n} ^ {-} ), | ||
+ | $$ | ||
− | respectively, where it follows from the definitions of | + | respectively, where it follows from the definitions of $ D _ {m,n} ^ {+} $ |
+ | and $ D _ {m,n} ^ {-} $ | ||
+ | that under the hypothesis $ H _ {0} $, | ||
+ | $ D _ {m,n} ^ {+} $ | ||
+ | and $ D _ {m,n} ^ {-} $ | ||
+ | have the same distribution. Asymptotic tests can be based on the following theorem: If $ \min ( m , n ) \rightarrow \infty $, | ||
+ | then the validity of $ H _ {0} $ | ||
+ | implies that | ||
− | < | + | $$ |
+ | \lim\limits _ {m \rightarrow \infty } {\mathsf P} | ||
+ | \left \{ \sqrt { | ||
+ | \frac{mn}{m+} | ||
+ | n } D _ {m,n} ^ {+} < y \right \} | ||
+ | = 1 - e ^ {- 2 y ^ {2} } ,\ y > 0 , | ||
+ | $$ | ||
− | < | + | $$ |
+ | \lim\limits _ {m \rightarrow \infty } {\mathsf P} \left \{ \sqrt | ||
+ | { | ||
+ | \frac{mn}{m+} | ||
+ | n } D _ {m,n} < y \right \} = K ( y) ,\ y > 0 , | ||
+ | $$ | ||
− | where | + | where $ K ( y) $ |
+ | is the Kolmogorov distribution function (cf. [[Statistical estimator|Statistical estimator]]). Asymptotic expansions for the distribution functions of the statistics $ D _ {m,n} ^ {+} $ | ||
+ | and $ D _ {m,n} ^ {-} $ | ||
+ | have been found (see [[#References|[4]]]–[[#References|[6]]]). | ||
− | Using the Smirnov test with significance level | + | Using the Smirnov test with significance level $ \alpha $, |
+ | $ H _ {0} $ | ||
+ | may be rejected in favour of one of the above alternatives $ H _ {1} ^ {+} $, | ||
+ | $ H _ {1} ^ {-} $ | ||
+ | when the corresponding statistic exceeds the $ \alpha $- | ||
+ | critical value of the test; this value can be calculated using the approximations obtained by L.N. Bol'shev [[#References|[2]]] by means of Pearson asymptotic transformations. | ||
See also [[Kolmogorov test|Kolmogorov test]]; [[Kolmogorov–Smirnov test|Kolmogorov–Smirnov test]]. | See also [[Kolmogorov test|Kolmogorov test]]; [[Kolmogorov–Smirnov test|Kolmogorov–Smirnov test]]. | ||
Line 45: | Line 132: | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> N.V. Smirnov, "Estimates of the divergence between empirical distribution curves in two independent samples" ''Byull. Moskov. Gosudarstv. Univ. (A)'' , '''2''' : 2 (1939) pp. 3–14</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> L.N. Bol'shev, "Asymptotically Pearson transformations" ''Theor. Probab. Appl.'' , '''8''' (1963) pp. 121–146 ''Teor. Veroyatnost. i Primenen.'' , '''8''' : 2 (1963) pp. 129–155</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> V.S. Korolyuk, "Asymptotic analysis of the distribution of the maximum deviation in the Bernoulli scheme" ''Theor. Probab. Appl.'' , '''4''' (1959) pp. 339–366 ''Teor. Veroyatnost. i Primenen.'' , '''4''' (1959) pp. 369–397</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> Li-Chien Chang, "On the exact distribution of A.N. Kolmogorov's statistic and its asymptotic expansion (I and II)" ''Matematika'' , '''4''' : 2 (1960) pp. 135–139 (In Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> A.A. Borovkov, "On the two-sample problem" ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''26''' : 4 (1962) pp. 605–624 (In Russian)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> N.V. Smirnov, "Estimates of the divergence between empirical distribution curves in two independent samples" ''Byull. Moskov. Gosudarstv. Univ. (A)'' , '''2''' : 2 (1939) pp. 3–14</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> L.N. Bol'shev, "Asymptotically Pearson transformations" ''Theor. Probab. Appl.'' , '''8''' (1963) pp. 121–146 ''Teor. Veroyatnost. i Primenen.'' , '''8''' : 2 (1963) pp. 129–155</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> V.S. Korolyuk, "Asymptotic analysis of the distribution of the maximum deviation in the Bernoulli scheme" ''Theor. Probab. Appl.'' , '''4''' (1959) pp. 339–366 ''Teor. Veroyatnost. i Primenen.'' , '''4''' (1959) pp. 369–397</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top"> Li-Chien Chang, "On the exact distribution of A.N. Kolmogorov's statistic and its asymptotic expansion (I and II)" ''Matematika'' , '''4''' : 2 (1960) pp. 135–139 (In Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top"> A.A. Borovkov, "On the two-sample problem" ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''26''' : 4 (1962) pp. 605–624 (In Russian)</TD></TR></table> | ||
− | |||
− | |||
====Comments==== | ====Comments==== | ||
− | |||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[a1]</TD> <TD valign="top"> D.B. Owen, "A handbook of statistical tables" , Addison-Wesley (1962)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> E.S. Pearson, H.O. Hartley, "Biometrika tables for statisticians" , '''2''' , Cambridge Univ. Press (1972)</TD></TR></table> | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> D.B. Owen, "A handbook of statistical tables" , Addison-Wesley (1962)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> E.S. Pearson, H.O. Hartley, "Biometrika tables for statisticians" , '''2''' , Cambridge Univ. Press (1972)</TD></TR></table> |
Latest revision as of 08:14, 6 June 2020
Smirnov $ 2 $-
samples test
A non-parametric (or distribution-free) statistical test for testing hypotheses about the homogeneity of two samples.
Let $ X _ {1} \dots X _ {n} $ and $ Y _ {1} \dots Y _ {m} $ be mutually-independent random variables, where each sample consists of identically continuously distributed elements, and suppose one wishes to test the hypothesis $ H _ {0} $ that both samples are taken from the same population. If
$$ X _ {(} 1) \leq \dots \leq X _ {(} n) \ \ \textrm{ and } \ Y _ {(} 1) \leq \dots \leq Y _ {(} m) $$
are the order statistics corresponding to the given samples, and $ F _ {n} ( x) $ and $ G _ {m} ( x) $ are the empirical distribution functions corresponding to them, then $ H _ {0} $ can be written in the form of the identity:
$$ H _ {0} :\ {\mathsf E} F _ {n} ( x) \equiv {\mathsf E} G _ {m} ( x) . $$
Further, consider the following hypotheses as possible alternatives to $ H _ {0} $:
$$ H _ {1} ^ {+} :\ \sup _ {| x | < \infty } {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] > 0 , $$
$$ H _ {1} ^ {-} : \ \inf _ {| x | < \infty } {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] < 0 , $$
$$ H _ {1} : \ \sup _ {| x | < \infty } | {\mathsf E} [ G _ {m} ( x) - F _ {n} ( x) ] | > 0 . $$
To test $ H _ {0} $ against the one-sided alternatives $ H _ {1} ^ {+} $ and $ H _ {1} ^ {-} $, and also against the two-sided $ H _ {1} $, N.V. Smirnov proposed a test based on the statistics
$$ D _ {m,n} ^ {+} = \sup _ {| x | < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = $$
$$ = \ \max _ {1 \leq k \leq m } \left ( \frac{k}{m} - F _ {n} ( Y _ {(} k) ) \right ) = \max _ {1 \leq s \leq n } \left ( G _ {m} ( X _ {(} s) ) - s- \frac{1}{n} \right ) , $$
$$ D _ {m,n} ^ {-} = - \inf _ {| x| < \infty } [ G _ {m} ( x) - F _ {n} ( x) ] = $$
$$ = \ \max _ {i \leq k \leq m } \left ( F _ {n} ( Y _ {(} k) ) - k- \frac{1}{m} \right ) = \max _ {1 \leq s \leq n } \left ( \frac{s}{n} - G _ {m} ( X _ {(} s) ) \right ) , $$
$$ D _ {m,n} = \sup _ {| x | < \infty } | G _ {m} ( x) - F _ {n} ( x) | = \max ( D _ {m,n} ^ {+} , D _ {m,n} ^ {-} ), $$
respectively, where it follows from the definitions of $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ that under the hypothesis $ H _ {0} $, $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ have the same distribution. Asymptotic tests can be based on the following theorem: If $ \min ( m , n ) \rightarrow \infty $, then the validity of $ H _ {0} $ implies that
$$ \lim\limits _ {m \rightarrow \infty } {\mathsf P} \left \{ \sqrt { \frac{mn}{m+} n } D _ {m,n} ^ {+} < y \right \} = 1 - e ^ {- 2 y ^ {2} } ,\ y > 0 , $$
$$ \lim\limits _ {m \rightarrow \infty } {\mathsf P} \left \{ \sqrt { \frac{mn}{m+} n } D _ {m,n} < y \right \} = K ( y) ,\ y > 0 , $$
where $ K ( y) $ is the Kolmogorov distribution function (cf. Statistical estimator). Asymptotic expansions for the distribution functions of the statistics $ D _ {m,n} ^ {+} $ and $ D _ {m,n} ^ {-} $ have been found (see [4]–[6]).
Using the Smirnov test with significance level $ \alpha $, $ H _ {0} $ may be rejected in favour of one of the above alternatives $ H _ {1} ^ {+} $, $ H _ {1} ^ {-} $ when the corresponding statistic exceeds the $ \alpha $- critical value of the test; this value can be calculated using the approximations obtained by L.N. Bol'shev [2] by means of Pearson asymptotic transformations.
See also Kolmogorov test; Kolmogorov–Smirnov test.
References
[1] | N.V. Smirnov, "Estimates of the divergence between empirical distribution curves in two independent samples" Byull. Moskov. Gosudarstv. Univ. (A) , 2 : 2 (1939) pp. 3–14 |
[2] | L.N. Bol'shev, "Asymptotically Pearson transformations" Theor. Probab. Appl. , 8 (1963) pp. 121–146 Teor. Veroyatnost. i Primenen. , 8 : 2 (1963) pp. 129–155 |
[3] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[4] | V.S. Korolyuk, "Asymptotic analysis of the distribution of the maximum deviation in the Bernoulli scheme" Theor. Probab. Appl. , 4 (1959) pp. 339–366 Teor. Veroyatnost. i Primenen. , 4 (1959) pp. 369–397 |
[5] | Li-Chien Chang, "On the exact distribution of A.N. Kolmogorov's statistic and its asymptotic expansion (I and II)" Matematika , 4 : 2 (1960) pp. 135–139 (In Russian) |
[6] | A.A. Borovkov, "On the two-sample problem" Izv. Akad. Nauk SSSR Ser. Mat. , 26 : 4 (1962) pp. 605–624 (In Russian) |
Comments
References
[a1] | D.B. Owen, "A handbook of statistical tables" , Addison-Wesley (1962) |
[a2] | E.S. Pearson, H.O. Hartley, "Biometrika tables for statisticians" , 2 , Cambridge Univ. Press (1972) |
Smirnov test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Smirnov_test&oldid=48739