Difference between revisions of "Kolmogorov-Smirnov test"
Ulf Rehmann (talk | contribs) m (moved Kolmogorov–Smirnov test to Kolmogorov-Smirnov test: ascii title) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
+ | <!-- | ||
+ | k0557401.png | ||
+ | $#A+1 = 36 n = 0 | ||
+ | $#C+1 = 36 : ~/encyclopedia/old_files/data/K055/K.0505740 Kolmogorov\ANDSmirnov test | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
{{MSC|62G10}} | {{MSC|62G10}} | ||
[[Category:Nonparametric inference]] | [[Category:Nonparametric inference]] | ||
− | A [[Non-parametric test|non-parametric test]] used for testing a hypothesis | + | A [[Non-parametric test|non-parametric test]] used for testing a hypothesis $ H _ {0} $, |
+ | according to which independent random variables $ X _ {1} \dots X _ {n} $ | ||
+ | have a given continuous distribution function $ F $, | ||
+ | against the one-sided alternative $ H _ {1} ^ {+} $: | ||
+ | $ \sup _ {| x|<\infty } ( {\mathsf E} F _ {n} ( x) - F ( x) ) > 0 $, | ||
+ | where $ {\mathsf E} F _ {n} $ | ||
+ | is the mathematical expectation of the empirical distribution function $ F _ {n} $. | ||
+ | The Kolmogorov–Smirnov test is constructed from the statistic | ||
+ | |||
+ | $$ | ||
+ | D _ {n} ^ {+} = \ | ||
+ | \sup _ | ||
+ | {| x | < \infty } \ | ||
+ | ( F _ {n} ( x) - F ( x) ) = \ | ||
+ | \max _ | ||
+ | {1 \leq m \leq n } \ | ||
+ | \left ( | ||
+ | |||
+ | \frac{m}{n} | ||
+ | - F ( X _ {(} m) ) | ||
+ | \right ) , | ||
+ | $$ | ||
− | + | where $ X _ {(} 1) \leq \dots \leq X _ {(} n) $ | |
+ | is the [[Variational series|variational series]] (or set of order statistics) obtained from the sample $ X _ {1} \dots X _ {n} $. | ||
+ | Thus, the Kolmogorov–Smirnov test is a variant of the [[Kolmogorov test|Kolmogorov test]] for testing the hypothesis $ H _ {0} $ | ||
+ | against a one-sided alternative $ H _ {1} ^ {+} $. | ||
+ | By studying the distribution of the statistic $ D _ {n} ^ {+} $, | ||
+ | N.V. Smirnov [[#References|[1]]] showed that | ||
− | + | $$ \tag{1 } | |
+ | {\mathsf P} \{ D _ {n} ^ {+} \geq \lambda \} = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \sum _ { k= } 0 ^ { {[ } n ( 1 - \lambda ) ] | ||
+ | } \lambda \left ( \begin{array}{c} | ||
+ | n \\ | ||
+ | k | ||
+ | \end{array} | ||
+ | \right ) \left | ||
+ | ( \lambda + | ||
+ | \frac{k}{n} | ||
+ | \right ) ^ {k-} 1 \left ( 1 - \lambda - | ||
+ | \frac{k}{n} | ||
+ | \right ) ^ {n-} k , | ||
+ | $$ | ||
− | < | + | where $ 0 < \lambda < 1 $ |
+ | and $ [ a ] $ | ||
+ | is the integer part of the number $ a $. | ||
+ | Smirnov obtained in addition to the exact distribution (1) of $ D _ {n} $ | ||
+ | its limit distribution, namely: If $ n \rightarrow \infty $ | ||
+ | and $ 0 < \lambda _ {0} < \lambda = O ( n ^ {1/6} ) $, | ||
+ | then | ||
− | + | $$ | |
+ | {\mathsf P} \{ D _ {n} ^ {+} \geq \lambda \} = \ | ||
+ | e ^ {- 2 \lambda ^ {2} } | ||
+ | \left [ 1 + O \left ( | ||
+ | \frac{1}{\sqrt n} | ||
+ | \right ) \right ] , | ||
+ | $$ | ||
− | + | where $ \lambda _ {0} $ | |
+ | is any positive number. By means of the technique of asymptotic Pearson transformation it has been proved [[#References|[2]]] that if $ n \rightarrow \infty $ | ||
+ | and $ 0 < \lambda _ {0} < \lambda = O ( n ^ {1/3} ) $, | ||
+ | then | ||
− | + | $$ \tag{2 } | |
+ | {\mathsf P} | ||
+ | \left \{ | ||
− | + | \frac{1}{18n} | |
+ | ( 6 n D _ {n} ^ {+} + 1 ) ^ {2} \geq \lambda | ||
+ | \right \} | ||
+ | = e ^ {- \lambda } | ||
+ | \left [ 1 + O \left ( | ||
+ | \frac{1}{n} | ||
+ | \right ) \right ] . | ||
+ | $$ | ||
− | According to the Kolmogorov–Smirnov test, the hypothesis | + | According to the Kolmogorov–Smirnov test, the hypothesis $ H _ {0} $ |
+ | must be rejected with significance level $ \alpha $ | ||
+ | whenever | ||
− | + | $$ | |
+ | \mathop{\rm exp} \ | ||
+ | \left [ | ||
+ | |||
+ | \frac{( - 6 n D _ {n} ^ {+} + 1 ) ^ {2} }{18n} | ||
+ | |||
+ | \right ] | ||
+ | \leq \alpha , | ||
+ | $$ | ||
where, by virtue of (2), | where, by virtue of (2), | ||
− | + | $$ | |
+ | {\mathsf P} | ||
+ | \left \{ | ||
+ | \mathop{\rm exp} \ | ||
+ | \left [ | ||
− | + | \frac{( - 6 n D _ {n} ^ {+} + 1 ) ^ {2} }{18n} | |
− | + | \right ] | |
+ | \leq \alpha | ||
+ | \right \} | ||
+ | = \alpha | ||
+ | \left ( | ||
+ | 1 + O \left ( | ||
+ | \frac{1}{n} | ||
+ | \right ) \ | ||
+ | \right ) . | ||
+ | $$ | ||
− | whose distribution is the same as that of the statistic | + | The testing of $ H _ {0} $ |
+ | against the alternative $ H _ {1} ^ {-} $: | ||
+ | $ \inf _ {| x | < \infty } ( {\mathsf E} F _ {n} ( x) - F ( x) ) < 0 $ | ||
+ | is dealt with similarly. In this case the statistic of the Kolmogorov–Smirnov test is the random variable | ||
+ | |||
+ | $$ | ||
+ | D _ {n} ^ {-} = - | ||
+ | \inf _ | ||
+ | {| x | < \infty } \ | ||
+ | ( F _ {n} ( x) - F ( x) ) = \ | ||
+ | \max _ | ||
+ | {1 \leq m \leq n } \ | ||
+ | \left ( | ||
+ | F ( X _ {(} m) ) - m- | ||
+ | \frac{1}{n} | ||
+ | |||
+ | \right ) , | ||
+ | $$ | ||
+ | |||
+ | whose distribution is the same as that of the statistic $ D _ {n} ^ {+} $ | ||
+ | when $ H _ {0} $ | ||
+ | is true. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> N.V. Smirnov, "Approximate distribution laws for random variables, constructed from empirical data" ''Uspekhi Mat. Nauk'' , '''10''' (1944) pp. 179–206 (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> L.N. Bol'shev, "Asymptotically Pearson transformations" ''Theor. Probab. Appl.'' , '''8''' (1963) pp. 121–146 ''Teor. Veroyatnost. i Primenen.'' , '''8''' : 2 (1963) pp. 129–155</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> N.V. Smirnov, "Approximate distribution laws for random variables, constructed from empirical data" ''Uspekhi Mat. Nauk'' , '''10''' (1944) pp. 179–206 (In Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> L.N. Bol'shev, "Asymptotically Pearson transformations" ''Theor. Probab. Appl.'' , '''8''' (1963) pp. 121–146 ''Teor. Veroyatnost. i Primenen.'' , '''8''' : 2 (1963) pp. 129–155</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , ''Libr. math. tables'' , '''46''' , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> B.L. van der Waerden, "Mathematische Statistik" , Springer (1957)</TD></TR></table> | ||
− | |||
− | |||
====Comments==== | ====Comments==== |
Revision as of 22:14, 5 June 2020
2020 Mathematics Subject Classification: Primary: 62G10 [MSN][ZBL]
A non-parametric test used for testing a hypothesis $ H _ {0} $, according to which independent random variables $ X _ {1} \dots X _ {n} $ have a given continuous distribution function $ F $, against the one-sided alternative $ H _ {1} ^ {+} $: $ \sup _ {| x|<\infty } ( {\mathsf E} F _ {n} ( x) - F ( x) ) > 0 $, where $ {\mathsf E} F _ {n} $ is the mathematical expectation of the empirical distribution function $ F _ {n} $. The Kolmogorov–Smirnov test is constructed from the statistic
$$ D _ {n} ^ {+} = \ \sup _ {| x | < \infty } \ ( F _ {n} ( x) - F ( x) ) = \ \max _ {1 \leq m \leq n } \ \left ( \frac{m}{n} - F ( X _ {(} m) ) \right ) , $$
where $ X _ {(} 1) \leq \dots \leq X _ {(} n) $ is the variational series (or set of order statistics) obtained from the sample $ X _ {1} \dots X _ {n} $. Thus, the Kolmogorov–Smirnov test is a variant of the Kolmogorov test for testing the hypothesis $ H _ {0} $ against a one-sided alternative $ H _ {1} ^ {+} $. By studying the distribution of the statistic $ D _ {n} ^ {+} $, N.V. Smirnov [1] showed that
$$ \tag{1 } {\mathsf P} \{ D _ {n} ^ {+} \geq \lambda \} = $$
$$ = \ \sum _ { k= } 0 ^ { {[ } n ( 1 - \lambda ) ] } \lambda \left ( \begin{array}{c} n \\ k \end{array} \right ) \left ( \lambda + \frac{k}{n} \right ) ^ {k-} 1 \left ( 1 - \lambda - \frac{k}{n} \right ) ^ {n-} k , $$
where $ 0 < \lambda < 1 $ and $ [ a ] $ is the integer part of the number $ a $. Smirnov obtained in addition to the exact distribution (1) of $ D _ {n} $ its limit distribution, namely: If $ n \rightarrow \infty $ and $ 0 < \lambda _ {0} < \lambda = O ( n ^ {1/6} ) $, then
$$ {\mathsf P} \{ D _ {n} ^ {+} \geq \lambda \} = \ e ^ {- 2 \lambda ^ {2} } \left [ 1 + O \left ( \frac{1}{\sqrt n} \right ) \right ] , $$
where $ \lambda _ {0} $ is any positive number. By means of the technique of asymptotic Pearson transformation it has been proved [2] that if $ n \rightarrow \infty $ and $ 0 < \lambda _ {0} < \lambda = O ( n ^ {1/3} ) $, then
$$ \tag{2 } {\mathsf P} \left \{ \frac{1}{18n} ( 6 n D _ {n} ^ {+} + 1 ) ^ {2} \geq \lambda \right \} = e ^ {- \lambda } \left [ 1 + O \left ( \frac{1}{n} \right ) \right ] . $$
According to the Kolmogorov–Smirnov test, the hypothesis $ H _ {0} $ must be rejected with significance level $ \alpha $ whenever
$$ \mathop{\rm exp} \ \left [ \frac{( - 6 n D _ {n} ^ {+} + 1 ) ^ {2} }{18n} \right ] \leq \alpha , $$
where, by virtue of (2),
$$ {\mathsf P} \left \{ \mathop{\rm exp} \ \left [ \frac{( - 6 n D _ {n} ^ {+} + 1 ) ^ {2} }{18n} \right ] \leq \alpha \right \} = \alpha \left ( 1 + O \left ( \frac{1}{n} \right ) \ \right ) . $$
The testing of $ H _ {0} $ against the alternative $ H _ {1} ^ {-} $: $ \inf _ {| x | < \infty } ( {\mathsf E} F _ {n} ( x) - F ( x) ) < 0 $ is dealt with similarly. In this case the statistic of the Kolmogorov–Smirnov test is the random variable
$$ D _ {n} ^ {-} = - \inf _ {| x | < \infty } \ ( F _ {n} ( x) - F ( x) ) = \ \max _ {1 \leq m \leq n } \ \left ( F ( X _ {(} m) ) - m- \frac{1}{n} \right ) , $$
whose distribution is the same as that of the statistic $ D _ {n} ^ {+} $ when $ H _ {0} $ is true.
References
[1] | N.V. Smirnov, "Approximate distribution laws for random variables, constructed from empirical data" Uspekhi Mat. Nauk , 10 (1944) pp. 179–206 (In Russian) |
[2] | L.N. Bol'shev, "Asymptotically Pearson transformations" Theor. Probab. Appl. , 8 (1963) pp. 121–146 Teor. Veroyatnost. i Primenen. , 8 : 2 (1963) pp. 129–155 |
[3] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics" , Libr. math. tables , 46 , Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[4] | B.L. van der Waerden, "Mathematische Statistik" , Springer (1957) |
Comments
There is also a two-sample Kolmogorov–Smirnov test, cf. the editorial comments to Kolmogorov test and, for details, [a1], [a2].
References
[a1] | G.E. Noether, "A brief survey of nonparametric statistics" R.V. Hogg (ed.) , Studies in statistics , Math. Assoc. Amer. (1978) pp. 39–65 |
[a2] | M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973) |
Kolmogorov-Smirnov test. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kolmogorov-Smirnov_test&oldid=22659