Difference between revisions of "Hypergeometric distribution"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | <!-- | ||
+ | h0484301.png | ||
+ | $#A+1 = 36 n = 0 | ||
+ | $#C+1 = 36 : ~/encyclopedia/old_files/data/H048/H.0408430 Hypergeometric distribution | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
The probability distribution defined by the formula | The probability distribution defined by the formula | ||
− | + | $$ \tag{* } | |
+ | p _ {m} = \ | ||
+ | |||
+ | \frac{\left ( \begin{array}{c} | ||
+ | M \\ | ||
+ | m | ||
+ | \end{array} | ||
+ | \right ) \left ( \begin{array}{c} | ||
+ | N- M \\ | ||
+ | n- m | ||
+ | \end{array} | ||
+ | \right ) }{\left ( \begin{array}{c} | ||
+ | N \\ | ||
+ | n | ||
+ | \end{array} | ||
+ | \right ) } | ||
+ | ,\ \ | ||
+ | m = 0, 1 \dots | ||
+ | $$ | ||
+ | |||
+ | where $ M $, | ||
+ | $ N $ | ||
+ | and $ n $ | ||
+ | are non-negative integers and $ M \leq N $, | ||
+ | $ n \leq N $( | ||
+ | here $ ( _ {b} ^ {a} ) $ | ||
+ | is the binomial coefficient, sometimes also denoted by $ C _ {a} ^ {b} $). | ||
+ | The hypergeometric distribution is usually connected with sampling without replacement: Formula (*) gives the probability of obtaining exactly $ m $" | ||
+ | marked" elements as a result of randomly sampling $ n $ | ||
+ | items from a population containing $ N $ | ||
+ | elements out of which $ M $ | ||
+ | elements are "marked" and $ N - M $ | ||
+ | are "unmarked" . The probability (*) is defined only for | ||
− | + | $$ | |
+ | \max ( 0, M + n - N) \leq m \leq \min ( n, M). | ||
+ | $$ | ||
− | + | However, the definition (*) may be used for all $ m \geq 0 $, | |
+ | because one may assume that $ ( _ {b} ^ {a} ) = 0 $ | ||
+ | if $ b > a $, | ||
+ | so that the equality $ p _ {m} = 0 $ | ||
+ | may be understood as the impossibility of obtaining $ m $" | ||
+ | marked" elements of the sample. The sum of the values $ p _ {m} $, | ||
+ | extended to include the entire sample space, is one. If one puts $ M/N = p $, | ||
+ | then (*) may be written as follows: | ||
− | + | $$ | |
+ | p _ {m} = \ | ||
+ | \left ( \begin{array}{c} | ||
+ | n \\ | ||
+ | m | ||
+ | \end{array} | ||
+ | \right ) | ||
− | + | \frac{A _ {Np} ^ {m} A _ {Nq} ^ {n - m } }{A _ {N} ^ {n} } | |
+ | , | ||
+ | $$ | ||
where | where | ||
− | + | $$ | |
+ | A _ {a} ^ {b} = \left ( \begin{array}{c} | ||
+ | a \\ | ||
+ | b | ||
+ | \end{array} | ||
+ | \right ) b! \ \ | ||
+ | \textrm{ and } \ \ | ||
+ | p + q = 1. | ||
+ | $$ | ||
+ | |||
+ | If $ p $ | ||
+ | is constant and $ N \rightarrow \infty $, | ||
+ | the binomial approximation | ||
+ | |||
+ | $$ | ||
+ | p _ {m} \sim \left ( \begin{array}{c} | ||
+ | n \\ | ||
+ | m | ||
+ | \end{array} | ||
+ | \right ) | ||
+ | p ^ {m} q ^ {n - m } | ||
+ | $$ | ||
+ | |||
+ | results. The expectation of the hypergeometric distribution is independent of $ N $ | ||
+ | and coincides with the expectation $ np $ | ||
+ | of the corresponding [[Binomial distribution|binomial distribution]]. The variance of the hypergeometric distribution, | ||
+ | |||
+ | $$ | ||
+ | \sigma ^ {2} = npq | ||
+ | \frac{N - n }{N - 1 } | ||
+ | , | ||
+ | $$ | ||
− | + | is smaller than that of the binomial law, $ \sigma ^ {2} = npq $. | |
+ | If $ N \rightarrow \infty $, | ||
+ | the moments of the hypergeometric distribution of any order tend to the corresponding values of the moments of the binomial distribution. The generating function of the hypergeometric distribution has the form | ||
− | + | $$ | |
+ | P ( x) = \ | ||
− | + | \frac{A _ {N - M } ^ {n} }{A _ {N} ^ {n} } | |
− | + | \sum _ {m = 0 } ^ { n } | |
− | + | \frac{A _ {M} ^ {m} A _ {n} ^ {m} }{A _ {N - M - n + m } ^ {m} } | |
− | + | \frac{x ^ {m} }{m!} | |
+ | . | ||
+ | $$ | ||
− | The series on the right-hand side of this equation represents the [[Hypergeometric function|hypergeometric function]] | + | The series on the right-hand side of this equation represents the [[Hypergeometric function|hypergeometric function]] $ F ( \alpha , \beta ; \gamma ; x ) $, |
+ | where $ \alpha = - n $, | ||
+ | $ \beta = - M $ | ||
+ | and $ \gamma = N - M - n + 1 $( | ||
+ | hence the name of the distribution). The probability (*) and the corresponding distribution function have been tabulated for a wide range of values. | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> | + | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> G.I. Lieberman, D.B. Owen, "Tables of the hypergeometric probability distribution", Stanford Univ. Press (1961)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> D.B. Owen, "Handbook of statistical tables", Addison-Wesley (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", ''Libr. math. tables'', '''46''', Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> W. Feller, [[Feller, "An introduction to probability theory and its applications"|"An introduction to probability theory and its applications"]], '''1''', Wiley (1970)</TD></TR></table> |
Latest revision as of 22:11, 5 June 2020
The probability distribution defined by the formula
$$ \tag{* } p _ {m} = \ \frac{\left ( \begin{array}{c} M \\ m \end{array} \right ) \left ( \begin{array}{c} N- M \\ n- m \end{array} \right ) }{\left ( \begin{array}{c} N \\ n \end{array} \right ) } ,\ \ m = 0, 1 \dots $$
where $ M $, $ N $ and $ n $ are non-negative integers and $ M \leq N $, $ n \leq N $( here $ ( _ {b} ^ {a} ) $ is the binomial coefficient, sometimes also denoted by $ C _ {a} ^ {b} $). The hypergeometric distribution is usually connected with sampling without replacement: Formula (*) gives the probability of obtaining exactly $ m $" marked" elements as a result of randomly sampling $ n $ items from a population containing $ N $ elements out of which $ M $ elements are "marked" and $ N - M $ are "unmarked" . The probability (*) is defined only for
$$ \max ( 0, M + n - N) \leq m \leq \min ( n, M). $$
However, the definition (*) may be used for all $ m \geq 0 $, because one may assume that $ ( _ {b} ^ {a} ) = 0 $ if $ b > a $, so that the equality $ p _ {m} = 0 $ may be understood as the impossibility of obtaining $ m $" marked" elements of the sample. The sum of the values $ p _ {m} $, extended to include the entire sample space, is one. If one puts $ M/N = p $, then (*) may be written as follows:
$$ p _ {m} = \ \left ( \begin{array}{c} n \\ m \end{array} \right ) \frac{A _ {Np} ^ {m} A _ {Nq} ^ {n - m } }{A _ {N} ^ {n} } , $$
where
$$ A _ {a} ^ {b} = \left ( \begin{array}{c} a \\ b \end{array} \right ) b! \ \ \textrm{ and } \ \ p + q = 1. $$
If $ p $ is constant and $ N \rightarrow \infty $, the binomial approximation
$$ p _ {m} \sim \left ( \begin{array}{c} n \\ m \end{array} \right ) p ^ {m} q ^ {n - m } $$
results. The expectation of the hypergeometric distribution is independent of $ N $ and coincides with the expectation $ np $ of the corresponding binomial distribution. The variance of the hypergeometric distribution,
$$ \sigma ^ {2} = npq \frac{N - n }{N - 1 } , $$
is smaller than that of the binomial law, $ \sigma ^ {2} = npq $. If $ N \rightarrow \infty $, the moments of the hypergeometric distribution of any order tend to the corresponding values of the moments of the binomial distribution. The generating function of the hypergeometric distribution has the form
$$ P ( x) = \ \frac{A _ {N - M } ^ {n} }{A _ {N} ^ {n} } \sum _ {m = 0 } ^ { n } \frac{A _ {M} ^ {m} A _ {n} ^ {m} }{A _ {N - M - n + m } ^ {m} } \frac{x ^ {m} }{m!} . $$
The series on the right-hand side of this equation represents the hypergeometric function $ F ( \alpha , \beta ; \gamma ; x ) $, where $ \alpha = - n $, $ \beta = - M $ and $ \gamma = N - M - n + 1 $( hence the name of the distribution). The probability (*) and the corresponding distribution function have been tabulated for a wide range of values.
References
[1] | G.I. Lieberman, D.B. Owen, "Tables of the hypergeometric probability distribution", Stanford Univ. Press (1961) |
[2] | D.B. Owen, "Handbook of statistical tables", Addison-Wesley (1962) |
[3] | L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", Libr. math. tables, 46, Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[4] | W. Feller, "An introduction to probability theory and its applications", 1, Wiley (1970) |
Hypergeometric distribution. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Hypergeometric_distribution&oldid=17430