# Central limit theorem

2010 Mathematics Subject Classification: Primary: 60F05 [MSN][ZBL]

A common name for a number of limit theorems in probability theory stating conditions under which sums or other functions of a large number of independent or weakly-dependent random variables have a probability distribution close to the normal distribution.

The classical version of the central limit theorem is concerned with a sequence

$$\tag{1 } X _ {1} \dots X _ {n} \dots$$

of independent random variables having finite (mathematical) expectations ${\mathsf E} X _ {k} = a _ {k}$, and finite variances ${\mathsf D} X _ {k} = b _ {k}$, and with the sums

$$\tag{2 } S _ {n} = \ X _ {1} + \dots + X _ {n} .$$

Suppose that $A _ {n} = {\mathsf E} S _ {n} = a _ {1} + \dots + a _ {n}$, $B _ {n} = {\mathsf D} S _ {n} = b _ {1} + \dots + b _ {n}$. The distribution functions

$$F _ {n} ( x) = \ {\mathsf P} \{ Z _ {n} < x \} ,$$

that is, the "normalized" sums

$$\tag{3 } Z _ {n} = \ \frac{S _ {n} - A _ {n} }{\sqrt {B _ {n} } } ,$$

which have expectation 0 and variance 1, are compared with the "standard" normal distribution function

$$\Phi ( x) = \ \frac{1}{\sqrt {2 \pi } } \int\limits _ {- \infty } ^ { x } e ^ {- z ^ {2} /2 } dz$$

corresponding to the normal distribution with expectation 0 and variance 1. In this case the central limit theorem asserts that under certain conditions, as $n \rightarrow \infty$, for any $x \in \mathbf R$,

$$F _ {n} ( x) \rightarrow \Phi ( x),$$

or, what is the same, for any interval $( \alpha , \beta )$:

$${\mathsf P} \{ \alpha < Z _ {n} < \beta \} = \ {\mathsf P} \{ A _ {n} + \alpha \sqrt {B _ {n} } < S _ {n} < A _ {n} + \beta \sqrt {B _ {n} } \} \rightarrow$$

$$\rightarrow \ \Phi ( \beta ) - \Phi ( \alpha ),$$

A clearer understanding of conditions for the emergence of a normal distribution as the limit of distributions of sums of independent random variables comes about by consisting a triangular array of random variables instead of a sequence (see [GK]). In this case one considers for every $n = 1, 2 \dots$ a sequence of variables

$$X _ {n,1} \dots X _ {n,n} ,$$

putting

$$X _ {n,k} = \ \frac{X _ {k} - a _ {k} }{\sqrt {B _ {n} } } ,\ \ 1 \leq k \leq n.$$

Then the random variables inside each sequence (row) are independent, and

$$Z _ {n} = \ X _ {n,1} + \dots + X _ {n,n} .$$

The usual conditions for applicability of the central limit theorem (such as Lyapunov's condition or the condition of the Lindeberg–Feller theorem) imply that $X _ {n,k}$ is asymptotically negligible. For example, from Lyapunov's condition with third moments, that is, from the condition that as $n \rightarrow \infty$,

$$\tag{4 } L _ {n} = \ \frac{1}{B _ {n} ^ {3/2} } \sum _ {n = 1 } ^ { n } {\mathsf E} | X _ {k} - a _ {k} | ^ {3} \rightarrow 0 ,$$

for any $\epsilon > 0$ the inequality

$$\max _ {1 \leq k \leq n } \ {\mathsf P} \{ | X _ {n,k} | > \epsilon \} = \ \max _ {1 \leq k \leq n } \ {\mathsf P} \{ | X _ {k} - a _ {k} | > \epsilon \sqrt {B _ {n} } \} \leq$$

$$\leq \ \max _ {1 \leq k \leq n } \ \frac{1}{\epsilon ^ {3} B _ {n} ^ {3/2} } {\mathsf E} | X _ {k} - a _ {k} | ^ {3} \leq L _ {n} \rightarrow 0$$

follows as $n \rightarrow \infty$, and the fact that the quantities at the left-hand side of this chain of inequalities tend to zero indicates the asymptotic negligibility of the random variables forming the array.

Suppose now that

$$\tag{5 } X _ {n,1} \dots X _ {n,k _ {n} } ,$$

$n = 1, 2 \dots$ is an arbitrary triangular array of asymptotically-negligible random variables that are independent within each sequence. If the limit distribution for the sums $Z _ {n} = X _ {n,1} + \dots + X _ {n,k _ {n} }$ exists and is non-degenerate, then it is normal if and only if, as $n \rightarrow \infty$, for any $\epsilon > 0$,

$$\tag{6 } {\mathsf P} \left \{ \max _ {1 \leq k \leq k _ {n} } \ | X _ {n,k} | > \epsilon \right \} \rightarrow 0,$$

that is, if the maximal term in $Z _ {n}$ becomes vanishingly small in comparison with the whole sum. (Without condition (6) one can only assert that the limit law for $Z _ {n}$ belongs to the class of infinitely-divisible distributions, cf. Infinitely-divisible distribution.) Two additional conditions that together with (6) are necessary and sufficient for the convergence of the distributions of the sums $Z _ {n}$ to a limit can be found in the article Triangular array.

When the condition of asymptotic negligibility of the variables in the triangular array considered above does not hold, the situation becomes complicated. The well-known theorem of H. Cramér that the sum of several independent random variables is normally distributed if and only if each of the summands is, makes it possible to assume (as P. Lévy did, see [Le], Chapt. 5, Theor. 38) that the sum of independent random variables has a distribution close to normal if the "large" terms are almost normal and if the collection of "small" terms is subject to the condition of "normality" of the distributions of the sums of the asymptotically-negligible terms. A precise form of an argument of this kind was first obtained for the triangular array (5) with ${\mathsf E} X _ {n,k} = 0$, $\sum _ {n = 1 } ^ {k _ {n} } {\mathsf D} X _ {n,k} = 1$( see [Z]). Here, for the convergence of the distribution functions $F _ {n} ( x) = {\mathsf P} \{ Z _ {n} < x \}$ to the normal distribution function $\Phi ( x)$ it is necessary and sufficient that the following two conditions hold simultaneously:

1) as $n \rightarrow \infty$,

$$\alpha _ {n} = \ \max _ {1 \leq k \leq k _ {n} } \ L ( F _ {n,k} , \Phi _ {n,k} ) \rightarrow 0,$$

where $L ( F _ {n,k} , \Phi _ {n,k} )$ is the Lévy distance (see Lévy metric) between the distribution functions $F _ {n,k} ( x)$ of the random variables $X _ {n,k}$ and the normal distribution functions $\Phi _ {n,k} ( x)$ with the same expectation and variance as $F _ {n,k} ( x)$; and

2) for any $\epsilon > 0$, as $n \rightarrow \infty$,

$$\Delta _ {n} ( \epsilon ) = \ \sum _ { k = 1 } ^ { {k _ n } } \int\limits _ {| x | > \epsilon } x ^ {2} dF _ {n,k} ( x) \rightarrow 0,$$

where the sum is over those $k$, $1 \leq k \leq k _ {n}$, for which ${\mathsf D} X _ {n,k} < \sqrt {\alpha _ {n} }$.

This form of the statement is quite close to the one originally proposed by Lévy. Other formulations are possible (see, for example, [R]), which in a certain sense are more reminiscent of the Lindeberg–Feller theorem.

Nowadays this form of the central limit theorem can be obtained as a special case of a more general summation theorem on a triangular array without the condition of asymptotic negligibility.

In practical respects it is important to have an idea of the rate of convergence of the distributions of the sums to the normal distribution. For this purpose there are inequalities and asymptotic expansions (and also the theory of probability of large deviations; see also Cramér theorem; Limit theorems). In what follows, for simplicity of the exposition, a triangular array is considered, and the variables participating in (1) are assumed to be identically distributed. Let $F ( x) = {\mathsf P} \{ X _ {k} < x \}$. A typical example of inequalities for the deviation of the distribution functions $F _ {n} ( x)$ of the normalized sum (2) from $\Phi ( x)$ is the Berry–Esseen inequality: For all $x$,

$$\tag{7 } | F _ {n} ( x) - \Phi ( x) | \leq \ C \frac{ {\mathsf E} | X _ {1} - a _ {1} | ^ {3} }{\sigma _ {1} ^ {3} } \cdot \frac{1}{\sqrt n } ,$$

where $C$ is an absolute constant. (The best possible value of $C$ is not known at present (1984); however, it does not exceed 0.7655.) Inequalities like (7) become less informative if the terms $X _ {k}$ themselves are "almost normal" . Thus, if they are actually normal, then the left-hand side of (7) is zero, while the right-hand side is $C/ \sqrt {2 \pi }$. Therefore, from the beginning of the 1960's onwards one proposed analogues of (7) in which on the right-hand side instead of the moments of the random variables $X _ {k}$ other characteristics stand, similar to the moments but determined by the difference

$$F ( x) - \Phi \left ( \frac{x - a _ {1} }{\sigma _ {1} } \right )$$

in such a way that they become smaller, the smaller this difference. On the right-hand side of (7) and its generalizations one can put a function of $x$ that decreases unboundedly as $| x | \rightarrow \infty$( so-called inhomogeneous estimators). One considers (see [P]) also other methods of measuring the "proximity" of $F _ {n} ( x)$ to $\Phi ( x)$, for example, in the sense of the space $L _ {p}$( in so-called global versions of the central limit theorem) or methods based on a comparison of local characteristics of the distributions (see Local limit theorems).

The asymptotic expansion for the difference $F _ {n} ( x) - \Phi ( x)$ has the form (see [GK], [Cr]) for $\sigma = 1$:

$$F _ {n} ( x) - \Phi ( x) = \ \frac{e ^ {- x ^ {2} /2 } }{\sqrt {2 \pi } } \left ( \frac{Q _ {1} ( x) }{n ^ {1/2} } + \frac{Q _ {2} ( x) }{n} + \frac{Q _ {3} ( x) }{n ^ {3/2} } + \dots \right ) .$$

Here $Q _ {k} ( x)$ are polynomials of degree $3k - 1$ in $x$ with coefficients depending only on the first $k - 2$ moments of the terms. For the binomial distribution, the first term of the asymptotic expansion was indicated by P. Laplace in 1812, and, completely, but without a rigorous justification, the expansion was described by P.L. Chebyshev in 1887. The first estimate of the remainder, under the assumption that the $s$- th moment $\beta _ {s} = {\mathsf E} | X _ {k} | ^ {2}$, $s \geq 3$, is finite and that

$$\overline{\lim\limits}\; _ {| t | \rightarrow \infty } \ | {\mathsf E} e ^ {it X _ {k} } | < 1,$$

the so-called Cramér condition, was given by Cramér in 1928. This result, in a somewhat stronger form, asserts that

$$F _ {n} ( x) - \Phi ( x) = \ \frac{e ^ {- x ^ {2} /2 } }{\sqrt {2 \pi } } \left ( \frac{Q _ {1} ( x) }{\sqrt n } + \dots + \frac{Q _ {s - 2 } ( x) }{n ^ {( s - 2)/2 } } \right . +$$

$$+ \left . o \left ( \frac{1}{n ^ {( s - 2)/2 } } \right ) \right ) ,$$

uniformly in $x$. This asymptotic expansion serves as the basis for the construction of a broad class of transformations of random variables (cf. Random variables, transformations of).

The central limit theorem can be extended to the case when (1) (or the triangular array generalizing it) is formed by vectors in the $m$- dimensional Euclidean space $\mathbf R ^ {m}$. Suppose, for example, that the random vectors (1) are independent, identically distributed and with probability 1 do not lie in some hypersurface, and that ${\mathsf E} X _ {k} = 0$ and ${\mathsf E} \| X _ {k} \| ^ {2} < \infty$ with the usual Euclidean norm in $\mathbf R ^ {m}$. Under these conditions, as $n \rightarrow \infty$, the probability distributions of the normalized sums

$$Z _ {n} ^ { \prime } = \ \frac{X _ {1} + \dots + X _ {n} }{\sqrt n }$$

converge weakly (see Convergence of distributions) to the normal distribution $\Phi _ \Lambda$ in $\mathbf R ^ {m}$ with expectation equal to the zero vector and covariance matrix $\Lambda$ equal to that of the $X _ {k}$. Moreover, this convergence is uniform on broad classes of subsets of $\mathbf R ^ {m}$( see [BR]). For example, it is uniform on the class $\mathfrak C$ of all convex Borel subsets of $\mathbf R ^ {m}$: As $n \rightarrow \infty$,

$$\tag{8 } \sup _ {\begin{array}{c} {} \\ A \in \mathfrak C \end{array} } \ | P _ {n} ( A) - \Phi _ \Lambda ( A) | \rightarrow 0.$$

Under additional assumptions the rate of the convergence (8) can be estimated.

The central limit theorem can also be extended to sequences (and arrays) of independent random vectors with values in infinite-dimensional spaces. The central limit theorem in the "customary" form need not hold. (Here the influence of the "geometry" of the space manifests itself, see Random element.) Of special interest is the case when the terms of (1) take values in a separable Hilbert space $H$. The assertion quoted above on the weak convergence in $\mathbf R ^ {m}$ of the distributions of the normalized sums $Z _ {n} ^ { \prime }$ to the normal distribution remain verbally true in $H$. The convergence is uniform on comparatively narrow classes (for example, on the class of all balls with centre at the origin, or balls the centres of which lie in some fixed ball; the convergence on the class of all balls need not be uniform). Let $S _ {r}$ be the ball in $H$ of radius $r$ with centre at the origin. Here an analogue of (7) is an inequality of the following type. Suppose that

$${\mathsf E} X _ {k} = 0,\ \ {\mathsf E} \| X _ {k} \| ^ {2} < \infty ,$$

and that the distribution of the $X _ {k}$ is not concentrated on any finite-dimensional subspace of $H$; then in special cases (similar to the one analyzed in the example below)

$$\Delta _ {n} = \ \sup _ { r } \ | {\mathsf P} \{ Z _ {n} ^ { \prime } \in S _ {r} \} - \Phi _ \Lambda ( S _ {r} ) | = \ O \left ( { \frac{1}{n} } \right ) .$$

Under the condition ${\mathsf E} \| X _ {k} \| ^ {3 + \alpha } < \infty$, where $\alpha$ is a fixed and not too-small number, it can be asserted that for any $\epsilon > 0$,

$$\Delta _ {n} = O \left ( \frac{1}{n ^ {1 - \epsilon } } \right )$$

(this is true, for example, when $\alpha = 1$).

Quite specific problems, e.g. of mathematical statistics, may lead to a central limit theorem in infinite-dimensional spaces, in particular, in $H$.

Example. Let $\theta _ {1} , \theta _ {2} \dots$ be a sequence of independent random variables that are uniformly distributed on the interval $[ 0, 1]$. Let $X _ {k} ( t)$, $k = 1, 2 \dots$ be random elements in the space $L _ {2} [ 0, 1]$( the space of functions with integrable squares with respect to the Lebesgue measure on $[ 0, 1]$) given as follows:

$$X _ {k} ( t) = \left \{ \begin{array}{ll} - t & \textrm{ for } 0 \leq t \leq \theta _ {k} , \\ 1- t &\ \textrm{ for } \theta _ {k} < t \leq 1 . \\ \end{array} \right .$$

Then ${\mathsf E} X _ {k} ( t) = 0$, $0 \leq t \leq 1$, and

$$Z _ {n} ^ { \prime } ( t) = \ \frac{X _ {1} ( t) + \dots + X _ {n} ( t) }{\sqrt n } = \ \sqrt n ( G _ {n} ( t) - t),$$

where $G _ {n} ( t)$ is the empirical distribution function constructed from the sample $\theta _ {1} \dots \theta _ {n}$ of size $n$ from a uniform distribution on $[ 0, 1]$. Here the square of the norm,

$$\| Z _ {n} ^ { \prime } \| ^ {2} = \ \int\limits _ { 0 } ^ { 1 } ( Z _ {n} ^ { \prime } ( t)) ^ {2} \ dt = n \int\limits _ { 0 } ^ { 1 } ( H _ {n} ( t) - t) ^ {2} dt ,$$

coincides with the statistic $\omega ^ {2}$ of the Cramér–von Mises–Smirnov test (see Cramér–von Mises test). In accordance with the central limit theorem there exists a limit distribution for the $\omega _ {n} ^ {2}$ as $n \rightarrow \infty$. It coincides with the distribution of the square of the norm of a certain normally-distributed vector in $H$ and is known as the "omega-squared" distribution. Thus, the central limit theorem justifies the replacement for large $n$ of the distribution $\omega _ {n} ^ {2}$ by $\omega ^ {2}$, and this is at the basis of applications of the statistical tests mentioned above.

Numerous versions are known of generalizations of the central limit theorem to sums of dependent variables. (In the case of homogeneous finite Markov chains, the simplest non-homogeneous chains with two states, and certain other schemes; this was done by Markov himself in 1907–1911, subsequent generalizations are connected in the first instance with the name of S.N. Bernshtein [B].) A basic feature peculiar to all generalizations of this kind of the central limit theorem (if one is concerned with a triangular array) consists in the fact that the dependence between the events determined by $X _ {1} \dots X _ {k}$, and those determined by $X _ {k + p } , X _ {k + p + 1 } \dots$ becomes vanishingly small when $p$ grows indefinitely.

As regards the methods of proof of the central limit theorem, in the case of independent terms the most powerful is, generally, the method of characteristic functions; it completes and occasionally replaces the so-called "method of compositions" (see [Sa]) (and also the method known as that of "metric distances" ). In the case of dependent variables the most effective method, on the whole, is the method of semi-invariants (see, for example, [St]). This method is suitable for the study of functions of random variables more general than sums or linear functions (for example, for quadratic and other forms).

Concerning the central limit theorem in number theory see Number theory, probabilistic methods in. The central limit theorem is also applicable in certain problems in function theory and in the theory of dynamical systems.

#### References

 [G] B.V. Gnedenko, "A course of probability theory", Moscow (1969) (In Russian) [F] W. Feller, "An introduction to probability theory and its applications", 1–2 , Wiley (1957–1971) [Cr] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946) MR0016588 Zbl 0063.01014 [GK] B.V. Gnedenko, A.N. Kolmogorov, "Limit distributions for sums of independent random variables" , Addison-Wesley (1954) (Translated from Russian) MR0062975 Zbl 0056.36001 [IL] I.A. Ibragimov, Yu.V. Linnik, "Independent and stationary sequences of random variables" , Wolters-Noordhoff (1971) (Translated from Russian) MR0322926 Zbl 0219.60027 [P] V.V. Petrov, "Sums of independent random variables" , Springer (1975) (Translated from Russian) MR0388499 Zbl 0322.60043 Zbl 0322.60042 [Z] V.M. Zolotarev, "A generalization of the Lindeberg–Feller theorem" Theory Probab. Appl. , 12 (1967) pp. 608–618 Teor. Veroyatnost. i Primenen. , 12 : 4 (1967) pp. 666–677 MR0225367 Zbl 0234.60031 [R] V.I. Rotar', "An extension of the Lindeberg–Feller theorem" Math. Notes , 18 (1975) pp. 123–128 Mat. Zametki , 18 : 1 (1975) pp. 129–135 Zbl 0348.60025 [Ch] P.L. Chebyshev, "Selected works" , Moscow (1955) (In Russian) [BR] R.N. Bhattacharya, R. Ranga Rao, "Normal approximations and asymptotic expansions" , Wiley (1976) MR0436272 [Sa] V.V. Sazonov, "Normal aproximation: some recent advances" , Springer (1981) (Translated from Russian) [B] S.N. Bernshtein, "Collected works" , 4 , Moscow (1964) (In Russian) [M] A.A. Markov, "Selected works" , Moscow-Leningrad (1951) (In Russian) MR0050525 Zbl 0054.00305 [St] V.A. Statulyavichus, "?", Teor. Veroyatnost. i Primenen. , 5 : 2 (1960) MR2222750 [Le] P. Lévy, "Théorie de l'addition des variables aléatoires" , Gauthier-Villars (1937)