Difference between revisions of "Empirical distribution"
(Importing text file) |
(Completed rendering of article in TeX.) |
||
Line 1: | Line 1: | ||
''sample distribution'' | ''sample distribution'' | ||
− | A probability distribution determined from a sample for the estimation of a true distribution. Suppose that | + | A probability distribution that is determined from a random sample used for the estimation of a true distribution. Suppose that $ X_{1},\ldots,X_{n} $ are independent and identically-distributed random variables with distribution function $ F $, and let $ X_{(1)} \leq \ldots \leq X_{(n)} $ be the corresponding order statistics. The empirical distribution corresponding to $ (X_{1},\ldots,X_{n}) $ is defined as the discrete distribution that assigns to every value $ X_{k} $ the probability $ \dfrac{1}{n} $. The empirical distribution function $ F_{n} $ is the step-function with steps of multiples of $ \dfrac{1}{n} $ at the points defined by $ X_{(1)},\ldots,X_{(n)} $: |
− | + | $$ | |
− | + | {F_{n}}(x) = | |
− | + | \begin{cases} | |
− | + | 0, & \text{if} ~ x \leq X_{(1)}; \\ | |
− | + | \dfrac{k}{n}, & \text{if} ~ X_{(k)} < x \leq X_{(k + 1)} ~ \text{and} ~ 1 \leq k \leq n - 1; \\ | |
− | + | 1, & \text{if} ~ x > X_{(n)}. | |
+ | \end{cases} | ||
+ | $$ | ||
+ | For fixed values of $ X_{1},\ldots,X_{n} $, the function $ F_{n} $ has all the properties of an ordinary [[Distribution function|distribution function]]. For every fixed $ x \in \mathbf{R} $, the function $ {F_{n}}(x) $ is a random variable as a function of $ X_{1},\ldots,X_{n} $. Hence, the empirical distribution corresponding to a random sample $ (X_{1},\ldots,X_{n}) $ is given by the family $ ({F_{n}}(x))_{x \in \mathbf{R}} $ of random variables. Here, for a fixed $ x \in \mathbf{R} $, we have | ||
+ | $$ | ||
+ | \mathsf{E} {F_{n}}(x) = F(x), \qquad | ||
+ | \mathsf{D} {F_{n}}(x) = \frac{1}{n} F(x) [1 - F(x)] | ||
+ | $$ | ||
and | and | ||
+ | $$ | ||
+ | \mathsf{P} \! \left\{ {F_{n}}(x) = \frac{k}{n} \right\} = \binom{n}{k} [F(x)]^{k} [1 - F(x)]^{n - k}. | ||
+ | $$ | ||
− | + | In accordance with the Law of Large Numbers, $ {F_{n}}(x) \to F(x) $ with probability $ 1 $ as $ n \to \infty $, for each $ x \in \mathbf{R} $. This means that $ {F_{n}}(x) $ is an unbiased and consistent estimator of the distribution function $ F(x) $. The empirical distribution function converges, uniformly in $ x $, with probability $ 1 $ to $ F(x) $ as $ n \to \infty $, i.e., if | |
− | + | $$ | |
− | In accordance with the | + | D_{n} \stackrel{\text{df}}{=} \sup_{x \in \mathbf{R}} |{F_{n}}(x) - F(x)|, |
− | + | $$ | |
− | + | then the '''Glivenko–Cantelli Theorem''' states that | |
− | + | $$ | |
− | then | + | \mathsf{P} \! \left\{ \lim_{n \to \infty} D_{n} = 0 \right\} = 1. |
− | + | $$ | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | The quantity $ D_{n} $ is a measure of the proximity of $ {F_{n}}(x) $ to $ F(x) $. A.N. Kolmogorov found (in 1933) its limit distribution: For a continuous function $ F(x) $, we have | |
+ | $$ | ||
+ | \forall z \in \mathbf{R}_{> 0}: \qquad | ||
+ | \lim_{n \to \infty} \mathsf{P} \{ \sqrt{n} D_{n} < z \} | ||
+ | = K(z) | ||
+ | = \sum_{n = - \infty}^{\infty} (- 1)^{k} e^{- 2 k^{2} z^{2}}. | ||
+ | $$ | ||
− | If | + | If $ F $ is not known, then to verify the hypothesis that it is a given continuous function $ F_{0} $, one uses tests based on statistics of type $ D_{n} $ (see [[Kolmogorov test|Kolmogorov test]]; [[Kolmogorov–Smirnov test|Kolmogorov–Smirnov test]]; [[Non-parametric methods in statistics|Non-parametric methods in statistics]]). |
− | Moments and any other characteristics of an empirical distribution are called sample or empirical; for example, | + | Moments and any other characteristics of an empirical distribution are called sample or empirical; for example, $ \displaystyle \bar{X} = \sum_{k = 1}^{n} \frac{X_{k}}{n} $ is the '''sample mean''', $ \displaystyle s^{2} = \sum_{k = 1}^{n} \frac{\left( X_{k} - \bar{X} \right)^{2}}{n} $ is the '''sample variance''', and $ \displaystyle \widehat{\alpha}_{r} = \sum_{k = 1}^{n} \frac{X_{k}^{r}}{n} $ is the '''sample moment''' of order $ r $. |
Sample characteristics serve as statistical estimators of the corresponding characteristics of the original distribution. | Sample characteristics serve as statistical estimators of the corresponding characteristics of the original distribution. | ||
====References==== | ====References==== | ||
− | |||
+ | <table> | ||
+ | <TR><TD valign="top">[1]</TD><TD valign="top"> | ||
+ | L.N. Bol’shev, N.V. Smirnov, "Tables of mathematical statistics", ''Libr. math. tables'', '''46''', Nauka (1983). (In Russian) (Processed by L.S. Bark and E.S. Kedrova)</TD></TR> | ||
+ | <TR><TD valign="top">[2]</TD><TD valign="top"> B.L. van der Waerden, "Mathematische Statistik", Springer (1957).</TD></TR> | ||
+ | <TR><TD valign="top">[3]</TD> <TD valign="top"> A.A. Borovkov, "Mathematical statistics", Moscow (1984). (In Russian)</TD></TR> | ||
+ | </table> | ||
+ | ====Comments==== | ||
− | + | The use of the empirical distribution in statistics and the associated theory has been greatly developed in recent years. This has been surveyed in [[#References|[a2]]]. For developments in strong convergence theory associated with the empirical distribution, see [[#References|[a1]]]. | |
− | The use of the empirical distribution in statistics and the associated theory has been greatly developed in recent years. This has been surveyed in [[#References|[a2]]]. For | ||
====References==== | ====References==== | ||
− | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> | + | |
+ | <table> | ||
+ | <TR><TD valign="top">[a1]</TD> <TD valign="top"> | ||
+ | M. Csörgö, P. Révész, "Strong approximation in probability and statistics", Acad. Press (1981).</TD></TR> | ||
+ | <TR><TD valign="top">[a2]</TD><TD valign="top"> | ||
+ | G.R. Shorack, J.A. Wellner, "Empirical processes with applications to statistics", Wiley (1986).</TD></TR> | ||
+ | <TR><TD valign="top">[a3]</TD><TD valign="top"> | ||
+ | M. Loève, "Probability theory", Princeton Univ. Press (1963), pp. Sect. 16.3.</TD></TR> | ||
+ | <TR><TD valign="top">[a4]</TD> <TD valign="top"> | ||
+ | P. Gaenssler, W. Stute, "Empirical processes: a survey of results for independent and identically distributed random variables", ''Ann. Prob.'', '''7''' (1977), pp. 193–243.</TD></TR> | ||
+ | </table> |
Latest revision as of 04:08, 22 June 2017
sample distribution
A probability distribution that is determined from a random sample used for the estimation of a true distribution. Suppose that $ X_{1},\ldots,X_{n} $ are independent and identically-distributed random variables with distribution function $ F $, and let $ X_{(1)} \leq \ldots \leq X_{(n)} $ be the corresponding order statistics. The empirical distribution corresponding to $ (X_{1},\ldots,X_{n}) $ is defined as the discrete distribution that assigns to every value $ X_{k} $ the probability $ \dfrac{1}{n} $. The empirical distribution function $ F_{n} $ is the step-function with steps of multiples of $ \dfrac{1}{n} $ at the points defined by $ X_{(1)},\ldots,X_{(n)} $: $$ {F_{n}}(x) = \begin{cases} 0, & \text{if} ~ x \leq X_{(1)}; \\ \dfrac{k}{n}, & \text{if} ~ X_{(k)} < x \leq X_{(k + 1)} ~ \text{and} ~ 1 \leq k \leq n - 1; \\ 1, & \text{if} ~ x > X_{(n)}. \end{cases} $$
For fixed values of $ X_{1},\ldots,X_{n} $, the function $ F_{n} $ has all the properties of an ordinary distribution function. For every fixed $ x \in \mathbf{R} $, the function $ {F_{n}}(x) $ is a random variable as a function of $ X_{1},\ldots,X_{n} $. Hence, the empirical distribution corresponding to a random sample $ (X_{1},\ldots,X_{n}) $ is given by the family $ ({F_{n}}(x))_{x \in \mathbf{R}} $ of random variables. Here, for a fixed $ x \in \mathbf{R} $, we have $$ \mathsf{E} {F_{n}}(x) = F(x), \qquad \mathsf{D} {F_{n}}(x) = \frac{1}{n} F(x) [1 - F(x)] $$ and $$ \mathsf{P} \! \left\{ {F_{n}}(x) = \frac{k}{n} \right\} = \binom{n}{k} [F(x)]^{k} [1 - F(x)]^{n - k}. $$
In accordance with the Law of Large Numbers, $ {F_{n}}(x) \to F(x) $ with probability $ 1 $ as $ n \to \infty $, for each $ x \in \mathbf{R} $. This means that $ {F_{n}}(x) $ is an unbiased and consistent estimator of the distribution function $ F(x) $. The empirical distribution function converges, uniformly in $ x $, with probability $ 1 $ to $ F(x) $ as $ n \to \infty $, i.e., if $$ D_{n} \stackrel{\text{df}}{=} \sup_{x \in \mathbf{R}} |{F_{n}}(x) - F(x)|, $$ then the Glivenko–Cantelli Theorem states that $$ \mathsf{P} \! \left\{ \lim_{n \to \infty} D_{n} = 0 \right\} = 1. $$
The quantity $ D_{n} $ is a measure of the proximity of $ {F_{n}}(x) $ to $ F(x) $. A.N. Kolmogorov found (in 1933) its limit distribution: For a continuous function $ F(x) $, we have $$ \forall z \in \mathbf{R}_{> 0}: \qquad \lim_{n \to \infty} \mathsf{P} \{ \sqrt{n} D_{n} < z \} = K(z) = \sum_{n = - \infty}^{\infty} (- 1)^{k} e^{- 2 k^{2} z^{2}}. $$
If $ F $ is not known, then to verify the hypothesis that it is a given continuous function $ F_{0} $, one uses tests based on statistics of type $ D_{n} $ (see Kolmogorov test; Kolmogorov–Smirnov test; Non-parametric methods in statistics).
Moments and any other characteristics of an empirical distribution are called sample or empirical; for example, $ \displaystyle \bar{X} = \sum_{k = 1}^{n} \frac{X_{k}}{n} $ is the sample mean, $ \displaystyle s^{2} = \sum_{k = 1}^{n} \frac{\left( X_{k} - \bar{X} \right)^{2}}{n} $ is the sample variance, and $ \displaystyle \widehat{\alpha}_{r} = \sum_{k = 1}^{n} \frac{X_{k}^{r}}{n} $ is the sample moment of order $ r $.
Sample characteristics serve as statistical estimators of the corresponding characteristics of the original distribution.
References
[1] | L.N. Bol’shev, N.V. Smirnov, "Tables of mathematical statistics", Libr. math. tables, 46, Nauka (1983). (In Russian) (Processed by L.S. Bark and E.S. Kedrova) |
[2] | B.L. van der Waerden, "Mathematische Statistik", Springer (1957). |
[3] | A.A. Borovkov, "Mathematical statistics", Moscow (1984). (In Russian) |
Comments
The use of the empirical distribution in statistics and the associated theory has been greatly developed in recent years. This has been surveyed in [a2]. For developments in strong convergence theory associated with the empirical distribution, see [a1].
References
[a1] | M. Csörgö, P. Révész, "Strong approximation in probability and statistics", Acad. Press (1981). |
[a2] | G.R. Shorack, J.A. Wellner, "Empirical processes with applications to statistics", Wiley (1986). |
[a3] | M. Loève, "Probability theory", Princeton Univ. Press (1963), pp. Sect. 16.3. |
[a4] | P. Gaenssler, W. Stute, "Empirical processes: a survey of results for independent and identically distributed random variables", Ann. Prob., 7 (1977), pp. 193–243. |
Empirical distribution. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Empirical_distribution&oldid=41639