Empirical distribution

sample distribution

A probability distribution that is determined from a random sample used for the estimation of a true distribution. Suppose that $ X_{1},\ldots,X_{n} $ are independent and identically-distributed random variables with distribution function $ F $, and let $ X_{(1)} \leq \ldots \leq X_{(n)} $ be the corresponding order statistics. The empirical distribution corresponding to $ (X_{1},\ldots,X_{n}) $ is defined as the discrete distribution that assigns to every value $ X_{k} $ the probability $ \dfrac{1}{n} $. The empirical distribution function $ F_{n} $ is the step-function with steps of multiples of $ \dfrac{1}{n} $ at the points defined by $ X_{(1)},\ldots,X_{(n)} $: $$ {F_{n}}(x) = \begin{cases} 0, & \text{if} ~ x \leq X_{(1)}; \\ \dfrac{k}{n}, & \text{if} ~ X_{(k)} < x \leq X_{(k + 1)} ~ \text{and} ~ 1 \leq k \leq n - 1; \\ 1, & \text{if} ~ x > X_{(n)}. \end{cases} $$

For fixed values of $ X_{1},\ldots,X_{n} $, the function $ F_{n} $ has all the properties of an ordinary distribution function. For every fixed $ x \in \mathbf{R} $, the function $ {F_{n}}(x) $ is a random variable as a function of $ X_{1},\ldots,X_{n} $. Hence, the empirical distribution corresponding to a random sample $ (X_{1},\ldots,X_{n}) $ is given by the family $ ({F_{n}}(x))_{x \in \mathbf{R}} $ of random variables. Here, for a fixed $ x \in \mathbf{R} $, we have $$ \mathsf{E} {F_{n}}(x) = F(x), \qquad \mathsf{D} {F_{n}}(x) = \frac{1}{n} F(x) [1 - F(x)] $$ and $$ \mathsf{P} \! \left\{ {F_{n}}(x) = \frac{k}{n} \right\} = \binom{n}{k} [F(x)]^{k} [1 - F(x)]^{n - k}. $$

In accordance with the Law of Large Numbers, $ {F_{n}}(x) \to F(x) $ with probability $ 1 $ as $ n \to \infty $, for each $ x \in \mathbf{R} $. This means that $ {F_{n}}(x) $ is an unbiased and consistent estimator of the distribution function $ F(x) $. The empirical distribution function converges, uniformly in $ x $, with probability $ 1 $ to $ F(x) $ as $ n \to \infty $, i.e., if $$ D_{n} \stackrel{\text{df}}{=} \sup_{x \in \mathbf{R}} |{F_{n}}(x) - F(x)|, $$ then the Glivenko–Cantelli Theorem states that $$ \mathsf{P} \! \left\{ \lim_{n \to \infty} D_{n} = 0 \right\} = 1. $$

The quantity $ D_{n} $ is a measure of the proximity of $ {F_{n}}(x) $ to $ F(x) $. A.N. Kolmogorov found (in 1933) its limit distribution: For a continuous function $ F(x) $, we have $$ \forall z \in \mathbf{R}_{> 0}: \qquad \lim_{n \to \infty} \mathsf{P} \{ \sqrt{n} D_{n} < z \} = K(z) = \sum_{n = - \infty}^{\infty} (- 1)^{k} e^{- 2 k^{2} z^{2}}. $$

If $ F $ is not known, then to verify the hypothesis that it is a given continuous function $ F_{0} $, one uses tests based on statistics of type $ D_{n} $ (see Kolmogorov test; Kolmogorov–Smirnov test; Non-parametric methods in statistics).

Moments and any other characteristics of an empirical distribution are called sample or empirical; for example, $ \displaystyle \bar{X} = \sum_{k = 1}^{n} \frac{X_{k}}{n} $ is the sample mean, $ \displaystyle s^{2} = \sum_{k = 1}^{n} \frac{\left( X_{k} - \bar{X} \right)^{2}}{n} $ is the sample variance, and $ \displaystyle \widehat{\alpha}_{r} = \sum_{k = 1}^{n} \frac{X_{k}^{r}}{n} $ is the sample moment of order $ r $.

Sample characteristics serve as statistical estimators of the corresponding characteristics of the original distribution.

References

[1]	L.N. Bol’shev, N.V. Smirnov, "Tables of mathematical statistics", Libr. math. tables, 46, Nauka (1983). (In Russian) (Processed by L.S. Bark and E.S. Kedrova)
[2]	B.L. van der Waerden, "Mathematische Statistik", Springer (1957).
[3]	A.A. Borovkov, "Mathematical statistics", Moscow (1984). (In Russian)

Comments

The use of the empirical distribution in statistics and the associated theory has been greatly developed in recent years. This has been surveyed in [a2]. For developments in strong convergence theory associated with the empirical distribution, see [a1].

References

[a1]	M. Csörgö, P. Révész, "Strong approximation in probability and statistics", Acad. Press (1981).
[a2]	G.R. Shorack, J.A. Wellner, "Empirical processes with applications to statistics", Wiley (1986).
[a3]	M. Loève, "Probability theory", Princeton Univ. Press (1963), pp. Sect. 16.3.
[a4]	P. Gaenssler, W. Stute, "Empirical processes: a survey of results for independent and identically distributed random variables", Ann. Prob., 7 (1977), pp. 193–243.

How to Cite This Entry:
Empirical distribution. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Empirical_distribution&oldid=41639

This article was adapted from an original article by A.V. Prokhorov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Empirical distribution

References

Comments

References