Namespaces
Variants
Actions

Difference between revisions of "Hypergeometric distribution"

From Encyclopedia of Mathematics
Jump to: navigation, search
(→‎References: Feller: internal link)
m (tex encoded by computer)
 
Line 1: Line 1:
 +
<!--
 +
h0484301.png
 +
$#A+1 = 36 n = 0
 +
$#C+1 = 36 : ~/encyclopedia/old_files/data/H048/H.0408430 Hypergeometric distribution
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
The probability distribution defined by the formula
 
The probability distribution defined by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484301.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table>
+
$$ \tag{* }
 +
p _ {m}  = \
 +
 
 +
\frac{\left ( \begin{array}{c}
 +
M \\
 +
m
 +
\end{array}
 +
\right ) \left ( \begin{array}{c}
 +
N- M \\
 +
n- m
 +
\end{array}
 +
\right ) }{\left ( \begin{array}{c}
 +
N \\
 +
n
 +
\end{array}
 +
\right ) }
 +
,\ \
 +
m = 0, 1 \dots
 +
$$
 +
 
 +
where  $  M $,
 +
$  N $
 +
and  $  n $
 +
are non-negative integers and  $  M \leq  N $,
 +
$  n \leq  N $(
 +
here  $  ( _ {b}  ^ {a} ) $
 +
is the binomial coefficient, sometimes also denoted by  $  C _ {a}  ^ {b} $).
 +
The hypergeometric distribution is usually connected with sampling without replacement: Formula (*) gives the probability of obtaining exactly  $  m $"
 +
marked" elements as a result of randomly sampling  $  n $
 +
items from a population containing  $  N $
 +
elements out of which  $  M $
 +
elements are  "marked" and  $  N - M $
 +
are  "unmarked" . The probability (*) is defined only for
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484302.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484303.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484304.png" /> are non-negative integers and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484305.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484306.png" /> (here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484307.png" /> is the binomial coefficient, sometimes also denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484308.png" />). The hypergeometric distribution is usually connected with sampling without replacement: Formula (*) gives the probability of obtaining exactly <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h0484309.png" />  "marked"  elements as a result of randomly sampling <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843010.png" /> items from a population containing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843011.png" /> elements out of which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843012.png" /> elements are "marked" and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843013.png" /> are "unmarked" . The probability (*) is defined only for
+
$$
 +
\max  ( 0, M + n - N\leq  m \leq  \min ( n, M).
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843014.png" /></td> </tr></table>
+
However, the definition (*) may be used for all  $  m \geq  0 $,
 +
because one may assume that  $  ( _ {b}  ^ {a} ) = 0 $
 +
if  $  b > a $,
 +
so that the equality  $  p _ {m} = 0 $
 +
may be understood as the impossibility of obtaining  $  m $"
 +
marked" elements of the sample. The sum of the values  $  p _ {m} $,
 +
extended to include the entire sample space, is one. If one puts  $  M/N = p $,
 +
then (*) may be written as follows:
  
However, the definition (*) may be used for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843015.png" />, because one may assume that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843016.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843017.png" />, so that the equality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843018.png" /> may be understood as the impossibility of obtaining <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843019.png" /> "marked" elements of the sample. The sum of the values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843020.png" />, extended to include the entire sample space, is one. If one puts <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843021.png" />, then (*) may be written as follows:
+
$$
 +
p _ {m}  = \
 +
\left ( \begin{array}{c}
 +
n \\
 +
  m
 +
\end{array}
 +
  \right )
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843022.png" /></td> </tr></table>
+
\frac{A _ {Np}  ^ {m} A _ {Nq} ^ {n - m } }{A _ {N}  ^ {n} }
 +
,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843023.png" /></td> </tr></table>
+
$$
 +
A _ {a}  ^ {b}  = \left ( \begin{array}{c}
 +
a \\
 +
b
 +
\end{array}
 +
\right ) b! \ \
 +
\textrm{ and } \ \
 +
p + q  = 1.
 +
$$
 +
 
 +
If  $  p $
 +
is constant and  $  N \rightarrow \infty $,
 +
the binomial approximation
 +
 
 +
$$
 +
p _ {m}  \sim  \left ( \begin{array}{c}
 +
n \\
 +
m
 +
\end{array}
 +
\right )
 +
p  ^ {m} q ^ {n - m }
 +
$$
 +
 
 +
results. The expectation of the hypergeometric distribution is independent of  $  N $
 +
and coincides with the expectation  $  np $
 +
of the corresponding [[Binomial distribution|binomial distribution]]. The variance of the hypergeometric distribution,
 +
 
 +
$$
 +
\sigma  ^ {2}  =  npq
 +
\frac{N - n }{N - 1 }
 +
,
 +
$$
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843024.png" /> is constant and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843025.png" />, the binomial approximation
+
is smaller than that of the binomial law,  $  \sigma  ^ {2} = npq $.  
 +
If  $  N \rightarrow \infty $,  
 +
the moments of the hypergeometric distribution of any order tend to the corresponding values of the moments of the binomial distribution. The generating function of the hypergeometric distribution has the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843026.png" /></td> </tr></table>
+
$$
 +
P ( x)  = \
  
results. The expectation of the hypergeometric distribution is independent of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843027.png" /> and coincides with the expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843028.png" /> of the corresponding [[Binomial distribution|binomial distribution]]. The variance of the hypergeometric distribution,
+
\frac{A _ {N - M }  ^ {n} }{A _ {N}  ^ {n} }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843029.png" /></td> </tr></table>
+
\sum _ {m = 0 } ^ { n }
  
is smaller than that of the binomial law, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843030.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843031.png" />, the moments of the hypergeometric distribution of any order tend to the corresponding values of the moments of the binomial distribution. The generating function of the hypergeometric distribution has the form
+
\frac{A _ {M}  ^ {m} A _ {n}  ^ {m} }{A _ {N - M - n + m }  ^ {m} }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843032.png" /></td> </tr></table>
+
\frac{x  ^ {m} }{m!}
 +
.
 +
$$
  
The series on the right-hand side of this equation represents the [[Hypergeometric function|hypergeometric function]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843033.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843034.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/h/h048/h048430/h04843036.png" /> (hence the name of the distribution). The probability (*) and the corresponding distribution function have been tabulated for a wide range of values.
+
The series on the right-hand side of this equation represents the [[Hypergeometric function|hypergeometric function]] $  F ( \alpha , \beta ;  \gamma ;  x ) $,  
 +
where $  \alpha = - n $,  
 +
$  \beta = - M $
 +
and $  \gamma = N - M - n + 1 $(
 +
hence the name of the distribution). The probability (*) and the corresponding distribution function have been tabulated for a wide range of values.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> G.I. Lieberman, D.B. Owen, "Tables of the hypergeometric probability distribution", Stanford Univ. Press (1961)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> D.B. Owen, "Handbook of statistical tables", Addison-Wesley (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", ''Libr. math. tables'', '''46''', Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> W. Feller, [[Feller, "An introduction to probability theory and its  applications"|"An introduction to probability theory and its  applications"]], '''1''', Wiley (1970)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> G.I. Lieberman, D.B. Owen, "Tables of the hypergeometric probability distribution", Stanford Univ. Press (1961)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> D.B. Owen, "Handbook of statistical tables", Addison-Wesley (1962)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top"> L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", ''Libr. math. tables'', '''46''', Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top"> W. Feller, [[Feller, "An introduction to probability theory and its  applications"|"An introduction to probability theory and its  applications"]], '''1''', Wiley (1970)</TD></TR></table>

Latest revision as of 22:11, 5 June 2020


The probability distribution defined by the formula

$$ \tag{* } p _ {m} = \ \frac{\left ( \begin{array}{c} M \\ m \end{array} \right ) \left ( \begin{array}{c} N- M \\ n- m \end{array} \right ) }{\left ( \begin{array}{c} N \\ n \end{array} \right ) } ,\ \ m = 0, 1 \dots $$

where $ M $, $ N $ and $ n $ are non-negative integers and $ M \leq N $, $ n \leq N $( here $ ( _ {b} ^ {a} ) $ is the binomial coefficient, sometimes also denoted by $ C _ {a} ^ {b} $). The hypergeometric distribution is usually connected with sampling without replacement: Formula (*) gives the probability of obtaining exactly $ m $" marked" elements as a result of randomly sampling $ n $ items from a population containing $ N $ elements out of which $ M $ elements are "marked" and $ N - M $ are "unmarked" . The probability (*) is defined only for

$$ \max ( 0, M + n - N) \leq m \leq \min ( n, M). $$

However, the definition (*) may be used for all $ m \geq 0 $, because one may assume that $ ( _ {b} ^ {a} ) = 0 $ if $ b > a $, so that the equality $ p _ {m} = 0 $ may be understood as the impossibility of obtaining $ m $" marked" elements of the sample. The sum of the values $ p _ {m} $, extended to include the entire sample space, is one. If one puts $ M/N = p $, then (*) may be written as follows:

$$ p _ {m} = \ \left ( \begin{array}{c} n \\ m \end{array} \right ) \frac{A _ {Np} ^ {m} A _ {Nq} ^ {n - m } }{A _ {N} ^ {n} } , $$

where

$$ A _ {a} ^ {b} = \left ( \begin{array}{c} a \\ b \end{array} \right ) b! \ \ \textrm{ and } \ \ p + q = 1. $$

If $ p $ is constant and $ N \rightarrow \infty $, the binomial approximation

$$ p _ {m} \sim \left ( \begin{array}{c} n \\ m \end{array} \right ) p ^ {m} q ^ {n - m } $$

results. The expectation of the hypergeometric distribution is independent of $ N $ and coincides with the expectation $ np $ of the corresponding binomial distribution. The variance of the hypergeometric distribution,

$$ \sigma ^ {2} = npq \frac{N - n }{N - 1 } , $$

is smaller than that of the binomial law, $ \sigma ^ {2} = npq $. If $ N \rightarrow \infty $, the moments of the hypergeometric distribution of any order tend to the corresponding values of the moments of the binomial distribution. The generating function of the hypergeometric distribution has the form

$$ P ( x) = \ \frac{A _ {N - M } ^ {n} }{A _ {N} ^ {n} } \sum _ {m = 0 } ^ { n } \frac{A _ {M} ^ {m} A _ {n} ^ {m} }{A _ {N - M - n + m } ^ {m} } \frac{x ^ {m} }{m!} . $$

The series on the right-hand side of this equation represents the hypergeometric function $ F ( \alpha , \beta ; \gamma ; x ) $, where $ \alpha = - n $, $ \beta = - M $ and $ \gamma = N - M - n + 1 $( hence the name of the distribution). The probability (*) and the corresponding distribution function have been tabulated for a wide range of values.

References

[1] G.I. Lieberman, D.B. Owen, "Tables of the hypergeometric probability distribution", Stanford Univ. Press (1961)
[2] D.B. Owen, "Handbook of statistical tables", Addison-Wesley (1962)
[3] L.N. Bol'shev, N.V. Smirnov, "Tables of mathematical statistics", Libr. math. tables, 46, Nauka (1983) (In Russian) (Processed by L.S. Bark and E.S. Kedrova)
[4] W. Feller, "An introduction to probability theory and its applications", 1, Wiley (1970)
How to Cite This Entry:
Hypergeometric distribution. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Hypergeometric_distribution&oldid=25955
This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article