Law of large numbers

2010 Mathematics Subject Classification: Primary: 60F05 [MSN][ZBL]

A general principle according to which under certain very general conditions the simultaneous action of random factors leads to a result which is practically non-random. That the frequency of occurrence of a random event tends to become equal to its probability as the number of trials increases (a phenomenon which was probably first noted for games of chance) may serve as the first example of this principle.

At the turn of the 17th century J. Bernoulli [B], [B2] demonstrated a theorem stating that, in a sequence of independent trials, in each of which the probability of occurrence of a certain event $A$ has the same value $p$, $0 < p < 1$, the relationship

$$\tag{1 } {\mathsf P} \left \{ \left | \frac{\mu _ {n} }{n} - p \right | > \epsilon \right \} \rightarrow 0$$

is valid for any $\epsilon > 0$ if $n \rightarrow \infty$; here, $\mu _ {n}$ is the number of occurrences of the event in the first $n$ trials and ${\mu _ {n} } /n$ is the frequency of the occurrence. This Bernoulli theorem was extended by S. Poisson [P] to the case of a sequence of independent trials in which the probability of the occurrence of an event $A$ varies with the number of the trial. Let this probability in the $k$- th trial be $p _ {k}$, $k = 1, 2 \dots$ and let

$$\overline{p}\; _ {n} = \ \frac{p _ {1} + \dots + p _ {n} }{n} .$$

Then, according to the Poisson theorem,

$$\tag{2 } {\mathsf P} \left \{ \left | \frac{\mu _ {n} }{n} - {\overline{p}\; _ {n} } \right | > \epsilon \right \} \rightarrow 0$$

for any $\epsilon > 0$ if $n \rightarrow \infty$. A rigorous proof of this theorem was first given in 1846 by P.L. Chebyshev, whose method was quite different from that of Poisson and was based on certain extremal considerations, whereas Poisson deduced (2) from an approximate formula for the above probability, based on a law of Gauss which at that time had not yet been proved rigorously. Poisson was the first to use the term "law of large numbers" , by which he denoted his own generalization of the Bernoulli theorem.

A further natural extension of the Bernoulli and Poisson theorems is a consequence of the fact that the random variables $\mu _ {n}$ may be represented as the sum

$$\mu _ {n} = X _ {1} + \dots + X _ {n}$$

of independent random variables, where $X _ {k} = 1$ if $A$ occurs in the $k$- th trial and $X _ {k} = 0$ otherwise. The mathematical expectation ${\mathsf E} ( \mu _ {n} / n)$( which is identical with the average of the mathematical expectations ${\mathsf E} X _ {k}$, $1 \leq k \leq n$) is $p$ in the case of Bernoulli and is $\overline{p}\; _ {n}$ in the case of Poisson. In other words, in both cases one deals with a deviation of the arithmetical average of the variables $X _ {k}$ from the arithmetical average of their mathematical expectations.

In his work "On average quantities" , which appeared in 1867, Chebyshev established that the relationship

$$\tag{3 } {\mathsf P} \left \{ \left | \frac{X _ {1} + \dots + X _ {n} }{n} - \frac{ {\mathsf E} X _ {1} + \dots + {\mathsf E} X _ {n} }{n} \right | > \epsilon \right \} \rightarrow 0$$

is valid, for any $\epsilon > 0$ if $n \rightarrow \infty$, for independent random variables $X _ {1} \dots X _ {n} \dots$ under fairly general assumptions. Chebyshev assumed that the mathematical expectations ${\mathsf E} X _ {k} ^ {2}$ are all bounded by the same constant, even though his proof makes it clear that the requirement of boundedness of the variances of $X _ {k}$, ${\mathsf D} X _ {k} = {\mathsf E} X _ {k} ^ {2} - ( {\mathsf E} X _ {k} ) ^ {2}$, or even the requirement

$$B _ {n} ^ {2} = \ {\mathsf D} X _ {1} + \dots + {\mathsf D} X _ {n} = o ( n ^ {2} ),\ \ n \rightarrow \infty ,$$

is sufficient. It was thus shown by Chebyshev that the Bernoulli theorem is susceptible of extensive generalizations. A.A. Markov noted the possibility of further extensions and proposed to apply the term "law of large numbers" to all extensions of the Bernoulli theorem (and, in particular, to (3)). Chebyshev's method is based on a rigorous formulation of all the properties of the mathematical expectations and on the use of the so- called Chebyshev inequality in probability theory. For the probability (3) it yields an estimate of the form

$$n ^ {-} 2 \epsilon ^ {-} 2 \sum _ { k= } 1 ^ { n } {\mathsf D} X _ {k} .$$

This estimate may be replaced by a more exact one — but under more restrictive conditions, see Bernstein inequality. Subsequent proofs of the law of large numbers are all, to varying extents, developments of Chebyshev's method. Using an appropriate "truncation" of the random variables $X _ {k}$( replacement of these variables by auxiliary variables $X _ {n,k} ^ \prime$, viz. $X _ {n,k} ^ \prime = X _ {k}$ if $| X _ {k} - {\mathsf E} X _ {k} | \leq L _ {n}$ and $X _ {n,k} ^ \prime = 0$ if $| X _ {k} - {\mathsf E} X _ {k} | > L _ {n}$, where $L _ {n}$ is a constant), Markov extended the law of large numbers to the case when the variance of the terms does not exist. He showed, for instance, that (3) is valid if, for certain constants $\delta > 0$ and $L > 0$ and for all $n$,

$${\mathsf E} | X _ {n} - {\mathsf E} X _ {n} | ^ {1+ \delta } < L .$$

Khinchin's theorem (1929) is proved in a similar manner: If $X _ {n}$ have the same distribution and if ${\mathsf E} X _ {n}$ exists, then the law of large numbers (3) is valid.

It is possible to formulate more or less final versions of the law of large numbers for sums of independent random variables. To do this, it is expedient to adopt a more general point of view, involving the concept of a sequence of asymptotically-constant random variables. The random variables of a sequence $Y _ {1} \dots Y _ {n} \dots$ are said to be asymptotically constant if there exists a sequence of constants $C _ {1} \dots C _ {n} \dots$ such that for any $\epsilon > 0$, if $n \rightarrow \infty$,

$$\tag{4 } {\mathsf P} \{ | Y _ {n} - C _ {n} | > \epsilon \} \rightarrow 0$$

(i.e. $Y _ {n} - C _ {n}$ converges to zero in probability). If (4) is valid for some $C _ {n}$, it is also valid for $C _ {n} ^ { \prime } = m Y _ {n}$, where $mY$ is the median (cf. Median (in statistics)) of the random variable $Y$. Furthermore, instead of the sequence $X _ {1} \dots X _ {n} \dots$ of independent random variables, one may also take a so-called triangular array:

$$\begin{array}{c} X _ {1,1 } \dots X _ {1,k _ {1} } , \\ X _ {2,1 } \dots X _ {2,k _ {2} } , \\ {} \dots \dots \dots , \\ X _ {n,1 } \dots X _ {n,k _ {n} } , \\ {} \dots \dots \dots \\ \end{array}$$

of random variables (the first subscript is the row number, while the second indicates the number of the variable within the row). The random variables of each individual row are assumed to be mutually independent. A sequence of partial sums is readily reduced to a triangular array if one assumes that $k _ {1} = 1$, $k _ {2} = 2 \dots$ $X _ {n,k } = {X _ {k} } /n$.

Let

$$Y _ {n} = X _ {n,1 } + \dots + X _ {n,k _ {n} } .$$

The general problem of the applicability of the law of large numbers to sums of independent random variables may then be stated as follows: Under what conditions are the sums $Y _ {n}$ asymptotically constant?

This question was answered in 1928 by A.N. Kolmogorov. Assume, without loss of generality, that the medians of the variables $X _ {n,k }$ are zero. Let $\widetilde{X} _ {n,k } = X _ {n,k }$ if $| X _ {n,k } | \leq 1$ and $\widetilde{X} _ {n,k } = 0$ if $| X _ {n,k } | > 1$. The simultaneous fulfillment of the two conditions

$$\sum _ { k = 1 } ^ { {k _ n } } {\mathsf P} \{ | X _ {n,k } | > 1 \} \rightarrow 0 \ \textrm{ if } n \rightarrow \infty$$

and

$$\sum _ { k = 1 } ^ { {k _ n } } {\mathsf E} \widetilde{X} {} _ {n,k } ^ {2} \rightarrow 0 \ \textrm{ if } n \rightarrow \infty ,$$

is necessary and sufficient for the sums $Y _ {n}$ to be asymptotically constant. The sum $\sum _ {k=} 1 ^ {k _ {n} } {\mathsf E} {\widetilde{X} _ {n,k } }$ may be taken as $C _ {n}$. That these conditions are sufficient is readily demonstrated by Chebyshev's method. If the mathematical expectations ${\mathsf E} X _ {n,k}$ exist, it is easy to find supplementary conditions under which it is permissible to choose $C _ {n} = {\mathsf E} Y _ {n}$, which yields necessary and sufficient conditions for the law of large numbers in the classical formulation (3). For a sequence of independent identically-distributed variables $\{ X _ {n} \}$ these conditions are reduced — in accordance with Khinchin's theorem quoted above — to the existence of the mathematical expectation. At the same time, the condition

$$\tag{5 } n {\mathsf P} \{ | X _ {1} | > n \} \rightarrow 0 \ \textrm{ if } n \rightarrow \infty$$

is necessary and sufficient for the arithmetical averages $Y _ {n}$ to be asymptotically constant. Examples of cases in which condition (5) is not met are easily found. Thus, the condition is not met if all $X _ {n}$ have a Cauchy distribution with density $1/ \pi ( 1 + x ^ {2} )$( to which corresponds the characteristic function $\mathop{\rm exp} \{ - | t | \}$). Here, the arithmetical averages $Y _ {n} = ( X _ {1} + \dots + X _ {n} ) / n$ have the characteristic function $\mathop{\rm exp} \{ - n | t/n | \} = \mathop{\rm exp} \{ - | t | \}$ and, as a result, have the same distribution for any $n$ as do the individual terms.

The most important cases to which the law of large numbers does not apply involve the times of return to the starting point in a random walk. Thus, in a symmetric Bernoulli random walk the time $T _ {n}$ passing prior to the $n$- th return to the starting point is the sum of the $n$ independent random variables $X _ {1} \dots X _ {n}$, where $X _ {1}$ is the time passed prior to the first return, $X _ {2}$ is the time between the first and the second return, etc. The distribution of the variable $2T _ {n} / \pi n ^ {2}$ converges as $n \rightarrow \infty$ to a non-degenerate limit distribution with density

$$p ( x) = \ \frac{1}{\sqrt {2 \pi } } e ^ {- 1/2x } x ^ {- 3/2 } \ \textrm{ for } x > 0$$

and zero for $x \leq 0$. Accordingly, in this case the distribution of the arithmetical average of $X _ {i}$, i.e. ${T _ {n} } /n$, is located, roughly speaking, on a segment of length of order $n$( whereas, in cases where the law of large numbers is applicable, it is located on segments of length $o( 1)$).

The applicability of the law of large numbers to sums of dependent variables (both in its classical and in its more general formulations) is connected, first and foremost, with the asymptotic independence of the random variables $X _ {i}$ and $X _ {j}$ as the difference between their subscripts $| i - j |$ increases. The corresponding theorems were first proved by Markov in 1907 for variables connected in a Markov chain. In fact, let $X _ {1} \dots X _ {n} \dots$ be a homogeneous Markov chain with finite number of states, and let all one-step transition probabilities be positive. Here, the asymptotic independence of $X _ {j}$ and $X _ {i}$ as $| i- j | \rightarrow \infty$ follows from the fact that the conditional distribution of $X _ {j}$ under a fixed value of $X _ {i}$ tends, as $n \rightarrow \infty$, to a limit which does not depend on the chosen value of $X _ {i}$( Markov's ergodic theorem). The law of large numbers is deduced from this theorem: It is first established that, as $n \rightarrow \infty$,

$${\mathsf E} \left ( \frac{X _ {1} + \dots + X _ {n} }{n} - a \right ) ^ {2} \rightarrow 0,$$

where $a = \lim\limits _ {k \rightarrow \infty } {\mathsf E} X _ {k}$; hence it follows that as $n \rightarrow \infty$,

$${\mathsf P} \left \{ \left | \frac{X _ {1} + \dots + X _ {n} }{n} - a \ \right | > \epsilon \right \} \rightarrow 0.$$

The following, more general case, is governed by Bernstein's conditions: If ${\mathsf D} X _ {j} < L$, $| R( X _ {i} , X _ {j} ) | \leq \phi ( | i - j |)$, where $L$ is a constant, $R$ is the correlation coefficient and $\phi ( n)$ is a function which tends to zero as $n \rightarrow \infty$, then the law of large numbers (3) is applicable to the variables $\{ X _ {n} \}$. For sequences $\{ X _ {n} \}$ which are stationary in the wide sense, the correlation condition may be weakened somewhat, by replacing it by

$$\lim\limits _ {n \rightarrow \infty } { \frac{1}{n} } \sum _ {j = 1 } ^ { n } R _ {j} = 0,$$

where $R _ {j} = R( X _ {i} , X _ {i+} j )$.

The above-mentioned results may be extended in various ways. First, only convergence in probability was considered so far. Other types of convergence may also be considered, including convergence with probability one, mean-square convergence, etc. (in fact, many of the above conditions ensure mean-square convergence, which implies convergence in probability). The case of convergence with probability one is important, and for this reason has been especially named the strong law of large numbers.

Furthermore, many of these theorems can be applied, with suitable modifications, to random vectors having values in Euclidean spaces of any dimension, in a Hilbert space or in certain Banach spaces. For instance, if $\{ X _ {n} \}$ is a sequence of independent identically-distributed random vectors with values in a separable Banach space, and if ${\mathsf E} \| X _ {n} \|$( where $\| x \|$ is the norm of $x$) is finite, one has

$${\mathsf P} \left \{ \left \| \frac{X _ {1} + \dots + X _ {n} }{n} - {\mathsf E} X _ {1} \right \| > \epsilon \right \} \rightarrow 0$$

for any $\epsilon > 0$ if $n \rightarrow \infty$.

The law of large numbers, when considered in its most general form, is closely related to ergodic theorems (cf. Ergodic theorem). Clearly, many theorems are also applicable to the case of the average $( 1 / T) \int _ {0} ^ {T} X( t) dt$, where $X( t)$ is a random process depending on a continuous parameter (see, for example, [L]).

Finally, instead of the sums of random variables one may consider other symmetric functions of them. This was in fact done by A.Ya. Khinchin (1951–1955) to prove certain conclusions in statistical mechanics [K]. The result obtained by him may be illustrated by the following example. Let $X _ {n,1} \dots X _ {n,n}$ be the coordinates of a point uniformly distributed on the surface of the sphere

$$X _ {n,1 } ^ {2} + \dots + X _ {n,n } ^ {2} = n \mu ,\ \ \mu > 0.$$

The law of large numbers then applies to a wide class of symmetric functions $f( X _ {n,1} \dots X _ {n,n} )$ in the sense that as $n \rightarrow \infty$, their values are asymptotically constant (this is similar to the observation made in 1925 by P. Lévy to the effect that sufficiently regular functions of a very large number of variables are almost constant in a large part of their domain of definition).

Extensive statistical data, illustrating the law of large numbers, can be found in older textbooks; see, for example, [M], [U].

References

 [B] J. Bernoulli, "Ars conjectandi" , Werke , 3 , Birkhäuser (1975) pp. 107–286 (Original: Basle, 1713) MR2349550 MR2393219 MR0935946 MR0850992 MR0827905 Zbl 0957.01032 Zbl 0694.01020 Zbl 30.0210.01 [P] S.D. Poisson, "Récherches sur la probabilité des jugements en matière criminelle et en matière civile" , Paris (1837) [C] P.L. Chebyshev, "Oeuvres de P.L. Tchebycheff" , 2 , Chelsea (1961) (Translated from Russian) [M] A.A. Markov, "Wahrscheinlichkeitsrechung" , Teubner (1912) (Translated from Russian) Zbl 39.0292.02 [Be] S.N. Bernshtein, "Probability theory" , Moscow-Leningrad (1946) (In Russian) MR1868030 [GK] B.V. Gnedenko, A.N. Kolmogorov, "Limit distributions for sums of independent random variables" , Addison-Wesley (1954) (Translated from Russian) MR0062975 Zbl 0056.36001 [D] J.L. Doob, "Stochastic processes" , Chapman & Hall (1953) MR1570654 MR0058896 Zbl 0053.26802 [G] U. Grenander, "Probabilities on algebraic structures" , Wiley (1963) MR0206994 Zbl 0131.34804 [K] A.Ya. Khinchin, "Symmetric functions on multi-dimensional surfaces" , To the memory of A.A. Adronov , Moscow (1955) pp. 541–574 (In Russian) MR74335 [L] M. Loève, "Probability theory" , Princeton Univ. Press (1963) MR0203748 Zbl 0108.14202 [U] J.V. Uspensky, "Introduction to mathematical probability" , McGraw-Hill (1937) MR1524355 Zbl 63.1069.01 [B2] J. Bernoulli, "On the law of large numbers" , Moscow (1986) (In Russian) Zbl 0646.01008