# Law of large numbers

A general principle according to which under certain very general conditions the simultaneous action of random factors leads to a result which is practically non-random. That the frequency of occurrence of a random event tends to become equal to its probability as the number of trials increases (a phenomenon which was probably first noted for games of chance) may serve as the first example of this principle.

At the turn of the 17th century J. Bernoulli [1], [12] demonstrated a theorem stating that, in a sequence of independent trials, in each of which the probability of occurrence of a certain event has the same value , , the relationship

(1) |

is valid for any if ; here, is the number of occurrences of the event in the first trials and is the frequency of the occurrence. This Bernoulli theorem was extended by S. Poisson [2] to the case of a sequence of independent trials in which the probability of the occurrence of an event varies with the number of the trial. Let this probability in the -th trial be , and let

Then, according to the Poisson theorem,

(2) |

for any if . A rigorous proof of this theorem was first given in 1846 by P.L. Chebyshev, whose method was quite different from that of Poisson and was based on certain extremal considerations, whereas Poisson deduced (2) from an approximate formula for the above probability, based on a law of Gauss which at that time had not yet been proved rigorously. Poisson was the first to use the term "law of large numbers" , by which he denoted his own generalization of the Bernoulli theorem.

A further natural extension of the Bernoulli and Poisson theorems is a consequence of the fact that the random variables may be represented as the sum

of independent random variables, where if occurs in the -th trial and otherwise. The mathematical expectation (which is identical with the average of the mathematical expectations , ) is in the case of Bernoulli and is in the case of Poisson. In other words, in both cases one deals with a deviation of the arithmetical average of the variables from the arithmetical average of their mathematical expectations.

In his work "On average quantities" , which appeared in 1867, Chebyshev established that the relationship

(3) |

is valid, for any if , for independent random variables under fairly general assumptions. Chebyshev assumed that the mathematical expectations are all bounded by the same constant, even though his proof makes it clear that the requirement of boundedness of the variances of , , or even the requirement

is sufficient. It was thus shown by Chebyshev that the Bernoulli theorem is susceptible of extensive generalizations. A.A. Markov noted the possibility of further extensions and proposed to apply the term "law of large numbers" to all extensions of the Bernoulli theorem (and, in particular, to (3)). Chebyshev's method is based on a rigorous formulation of all the properties of the mathematical expectations and on the use of the so- called Chebyshev inequality in probability theory. For the probability (3) it yields an estimate of the form

This estimate may be replaced by a more exact one — but under more restrictive conditions, see Bernstein inequality. Subsequent proofs of the law of large numbers are all, to varying extents, developments of Chebyshev's method. Using an appropriate "truncation" of the random variables (replacement of these variables by auxiliary variables , viz. if and if , where is a constant), Markov extended the law of large numbers to the case when the variance of the terms does not exist. He showed, for instance, that (3) is valid if, for certain constants and and for all ,

Khinchin's theorem (1929) is proved in a similar manner: If have the same distribution and if exists, then the law of large numbers (3) is valid.

It is possible to formulate more or less final versions of the law of large numbers for sums of independent random variables. To do this, it is expedient to adopt a more general point of view, involving the concept of a sequence of asymptotically-constant random variables. The random variables of a sequence are said to be asymptotically constant if there exists a sequence of constants such that for any , if ,

(4) |

(i.e. converges to zero in probability). If (4) is valid for some , it is also valid for , where is the median (cf. Median (in statistics)) of the random variable . Furthermore, instead of the sequence of independent random variables, one may also take a so-called triangular array:

of random variables (the first subscript is the row number, while the second indicates the number of the variable within the row). The random variables of each individual row are assumed to be mutually independent. A sequence of partial sums is readily reduced to a triangular array if one assumes that , .

Let

The general problem of the applicability of the law of large numbers to sums of independent random variables may then be stated as follows: Under what conditions are the sums asymptotically constant?

This question was answered in 1928 by A.N. Kolmogorov. Assume, without loss of generality, that the medians of the variables are zero. Let if and if . The simultaneous fulfillment of the two conditions

and

is necessary and sufficient for the sums to be asymptotically constant. The sum may be taken as . That these conditions are sufficient is readily demonstrated by Chebyshev's method. If the mathematical expectations exist, it is easy to find supplementary conditions under which it is permissible to choose , which yields necessary and sufficient conditions for the law of large numbers in the classical formulation (3). For a sequence of independent identically-distributed variables these conditions are reduced — in accordance with Khinchin's theorem quoted above — to the existence of the mathematical expectation. At the same time, the condition

(5) |

is necessary and sufficient for the arithmetical averages to be asymptotically constant. Examples of cases in which condition (5) is not met are easily found. Thus, the condition is not met if all have a Cauchy distribution with density (to which corresponds the characteristic function ). Here, the arithmetical averages have the characteristic function and, as a result, have the same distribution for any as do the individual terms.

The most important cases to which the law of large numbers does not apply involve the times of return to the starting point in a random walk. Thus, in a symmetric Bernoulli random walk the time passing prior to the -th return to the starting point is the sum of the independent random variables , where is the time passed prior to the first return, is the time between the first and the second return, etc. The distribution of the variable converges as to a non-degenerate limit distribution with density

and zero for . Accordingly, in this case the distribution of the arithmetical average of , i.e. , is located, roughly speaking, on a segment of length of order (whereas, in cases where the law of large numbers is applicable, it is located on segments of length ).

The applicability of the law of large numbers to sums of dependent variables (both in its classical and in its more general formulations) is connected, first and foremost, with the asymptotic independence of the random variables and as the difference between their subscripts increases. The corresponding theorems were first proved by Markov in 1907 for variables connected in a Markov chain. In fact, let be a homogeneous Markov chain with finite number of states, and let all one-step transition probabilities be positive. Here, the asymptotic independence of and as follows from the fact that the conditional distribution of under a fixed value of tends, as , to a limit which does not depend on the chosen value of (Markov's ergodic theorem). The law of large numbers is deduced from this theorem: It is first established that, as ,

where ; hence it follows that as ,

The following, more general case, is governed by Bernstein's conditions: If , , where is a constant, is the correlation coefficient and is a function which tends to zero as , then the law of large numbers (3) is applicable to the variables . For sequences which are stationary in the wide sense, the correlation condition may be weakened somewhat, by replacing it by

where .

The above-mentioned results may be extended in various ways. First, only convergence in probability was considered so far. Other types of convergence may also be considered, including convergence with probability one, mean-square convergence, etc. (in fact, many of the above conditions ensure mean-square convergence, which implies convergence in probability). The case of convergence with probability one is important, and for this reason has been especially named the strong law of large numbers.

Furthermore, many of these theorems can be applied, with suitable modifications, to random vectors having values in Euclidean spaces of any dimension, in a Hilbert space or in certain Banach spaces. For instance, if is a sequence of independent identically-distributed random vectors with values in a separable Banach space, and if (where is the norm of ) is finite, one has

for any if .

The law of large numbers, when considered in its most general form, is closely related to ergodic theorems (cf. Ergodic theorem). Clearly, many theorems are also applicable to the case of the average , where is a random process depending on a continuous parameter (see, for example, [10]).

Finally, instead of the sums of random variables one may consider other symmetric functions of them. This was in fact done by A.Ya. Khinchin (1951–1955) to prove certain conclusions in statistical mechanics [9]. The result obtained by him may be illustrated by the following example. Let be the coordinates of a point uniformly distributed on the surface of the sphere

The law of large numbers then applies to a wide class of symmetric functions in the sense that as , their values are asymptotically constant (this is similar to the observation made in 1925 by P. Lévy to the effect that sufficiently regular functions of a very large number of variables are almost constant in a large part of their domain of definition).

Extensive statistical data, illustrating the law of large numbers, can be found in older textbooks; see, for example, [4], [11].

#### References

[1] | J. Bernoulli, "Ars conjectandi" , Werke , 3 , Birkhäuser (1975) pp. 107–286 (Original: Basle, 1713) |

[2] | S.D. Poisson, "Récherches sur la probabilité des jugements en matière criminelle et en matière civile" , Paris (1837) |

[3] | P.L. Chebyshev, "Oeuvres de P.L. Tchebycheff" , 2 , Chelsea (1961) (Translated from Russian) |

[4] | A.A. Markov, "Wahrscheinlichkeitsrechung" , Teubner (1912) (Translated from Russian) |

[5] | S.N. Bernshtein, "Probability theory" , Moscow-Leningrad (1946) (In Russian) |

[6] | B.V. Gnedenko, A.N. Kolmogorov, "Limit distributions for sums of independent random variables" , Addison-Wesley (1954) (Translated from Russian) |

[7] | J.L. Doob, "Stochastic processes" , Chapman & Hall (1953) |

[8] | U. Grenander, "Probabilities on algebraic structures" , Wiley (1963) |

[9] | A.Ya. Khinchin, "Symmetric functions on multi-dimensional surfaces" , To the memory of A.A. Adronov , Moscow (1955) pp. 541–574 (In Russian) |

[10] | M. Loève, "Probability theory" , Princeton Univ. Press (1963) |

[11] | J.V. Uspensky, "Introduction to mathematical probability" , McGraw-Hill (1937) |

[12] | J. Bernoulli, "On the law of large numbers" , Moscow (1986) (In Russian) |

#### Comments

#### References

[a1] | P. Révész, "The laws of large numbers" , Acad. Press (1968) |

[a2] | J. Hoffman-Jørgensen, G. Pisier, "The law of large numbers and the central limit theorem in Banach spaces" Ann. Prob. , 4 (1976) pp. 587–599 |

[a3] | P. Hall, C.C. Heyde, "Martingale limit theory and its application" , Acad. Press (1980) |

**How to Cite This Entry:**

Law of large numbers.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Law_of_large_numbers&oldid=15659