Difference between revisions of "Empirical process"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
− | + | <!-- | |
+ | e1100801.png | ||
+ | $#A+1 = 59 n = 0 | ||
+ | $#C+1 = 59 : ~/encyclopedia/old_files/data/E110/E.1100080 Empirical process | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
− | + | {{TEX|auto}} | |
+ | {{TEX|done}} | ||
− | + | A [[Stochastic process|stochastic process]] constructed from a [[Sample|sample]] and the corresponding [[Probability measure|probability measure]]. Let $ X _ {1} \dots X _ {n} , \dots $ | |
+ | be a sequence of independent random elements with common law $ P $, | ||
+ | taking values in a [[Measurable space|measurable space]] $ ( S, {\mathcal S} ) $. | ||
+ | The empirical measure $ P _ {n} $ | ||
+ | of the first $ n $ | ||
+ | $ X _ {i} $ | ||
+ | s is the discrete random measure that places mass $ {1 / n } $ | ||
+ | on each such $ X _ {i} $: | ||
− | + | $$ | |
+ | P _ {n} ( C ) = { | ||
+ | \frac{1}{n} | ||
+ | } \# \left \{ {1 \leq i \leq n } : {X _ {i} \in C } \right \} , \quad C \in {\mathcal S}. | ||
+ | $$ | ||
− | + | Obviously, $ n P _ {n} ( C ) $ | |
+ | is binomially distributed with parameters $ n $ | ||
+ | and $ P ( C ) $( | ||
+ | cf. [[Binomial distribution|Binomial distribution]]). Hence $ {\mathsf E} P _ {n} ( C ) = P ( C ) $, | ||
+ | $ {\mathsf P} ( {\lim\limits } _ {n \rightarrow \infty } P _ {n} ( C ) = P ( C ) ) = 1 $, | ||
+ | and $ \sqrt n ( P _ {n} ( C ) - P ( C ) ) $ | ||
+ | converges in distribution, as $ n \rightarrow \infty $, | ||
+ | to a centred normal random variable with variance $ P ( C ) ( 1 - P ( C ) ) $( | ||
+ | cf. [[Convergence in distribution|Convergence in distribution]]). Therefore it is natural to define an empirical process indexed by sets by | ||
− | + | $$ \tag{a1 } | |
+ | \alpha _ {n} ( C ) = \sqrt n ( P _ {n} ( C ) - P ( C ) ) , \quad C \in {\mathcal C}, | ||
+ | $$ | ||
− | where | + | where $ {\mathcal C} \subset {\mathcal S} $. |
+ | If $ ( S, {\mathcal S} ) = ( \mathbf R, {\mathcal B} ) $ | ||
+ | and $ {\mathcal C} = \{ {( - \infty,x ] } : {x \in \mathbf R } \} $, | ||
+ | one writes $ F _ {n} ( x ) = P _ {n} ( ( - \infty,x ] ) $ | ||
+ | for the empirical distribution function, and the empirical process specializes to the classical empirical process | ||
− | + | $$ \tag{a2 } | |
+ | \alpha _ {n} ( x ) = \sqrt n ( F _ {n} ( x ) - F ( x ) ) , \quad x \in \mathbf R, | ||
+ | $$ | ||
+ | |||
+ | where $ F ( x ) = {\mathsf P} ( X _ {i} \leq x ) $, | ||
+ | $ x \in \mathbf R $, | ||
+ | is the distribution function of the elements $ X _ {i} $. | ||
+ | Replacing sets by their indicator functions leads, more generally, to the definition of an empirical process indexed by functions: | ||
+ | |||
+ | $$ \tag{a3 } | ||
+ | \alpha _ {n} ( f ) = \sqrt n ( P _ {n} ( f ) - P ( f ) ) , \quad f \in {\mathcal F}, | ||
+ | $$ | ||
where | where | ||
− | + | $$ | |
+ | P _ {n} ( f ) = \int\limits _ { S } f {d P _ {n} } = { | ||
+ | \frac{1}{n} | ||
+ | } \sum _ {i = 1 } ^ { n } f ( X _ {i} ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | P ( f ) = \int\limits _ { S } f {d P } = {\mathsf E} f ( X _ {i} ) , | ||
+ | $$ | ||
− | and | + | and $ {\mathcal F} $ |
+ | is a suitable class of measurable functions from $ S $ | ||
+ | to $ \mathbf R $. | ||
− | The main aim of the theory of empirical processes is to obtain results uniformly in | + | The main aim of the theory of empirical processes is to obtain results uniformly in $ C $, |
+ | $ x $ | ||
+ | or $ f $; | ||
+ | in particular, Glivenko–Cantelli-type theorems, central limit theorems, laws of the iterated logarithm, and probability inequalities (cf., e.g., [[Empirical distribution|Empirical distribution]]; [[Central limit theorem|Central limit theorem]]; [[Law of the iterated logarithm|Law of the iterated logarithm]]). (Measurability issues will be disregarded in the sequel.) The concept of a [[Vapnik–Chervonenkis class|Vapnik–Chervonenkis class]] plays an important role in set-indexed situations. E.g., if $ {\mathcal C} $ | ||
+ | is a Vapnik–Chervonenkis class, then for every probability measure $ P $ | ||
+ | on $ ( S, {\mathcal S} ) $, | ||
− | + | $$ \tag{a4 } | |
+ | \sup _ {C \in {\mathcal C} } \left | {P _ {n} ( C ) - P ( C ) } \right | \rightarrow 0 \textrm{ a.s. } , | ||
+ | $$ | ||
− | and | + | and $ \alpha _ {n} ( C ) $, |
+ | $ C \in {\mathcal C} $, | ||
+ | converges weakly (see [[#References|[a10]]] and [[Weak topology|Weak topology]]) to $ B _ {P} ( C ) $, | ||
+ | $ C \in {\mathcal C} $, | ||
+ | a centred, bounded [[Gaussian process|Gaussian process]], which is uniformly continuous (with respect to the pseudometric $ d $ | ||
+ | defined by $ d ( C _ {1} ,C _ {2} ) = P ( C _ {1} \Delta C _ {2} ) $) | ||
+ | and has covariance structure | ||
− | + | $$ | |
+ | {\mathsf E} B _ {P} ( C _ {1} ) B _ {P} ( C _ {2} ) = P ( C _ {1} \cap C _ {2} ) - P ( C _ {1} ) P ( C _ {2} ) , | ||
+ | $$ | ||
− | + | $$ | |
+ | C _ {1} ,C _ {2} \in {\mathcal C}. | ||
+ | $$ | ||
− | For the classical empirical process in (a2), this limiting process specializes to | + | For the classical empirical process in (a2), this limiting process specializes to $ B \circ F $, |
+ | where $ B $ | ||
+ | is a Brownian bridge (cf. [[Non-parametric methods in statistics|Non-parametric methods in statistics]]). A sharp version of the first result is the following: (a4) holds if and only if | ||
− | + | $$ \tag{a5 } | |
+ | {\mathsf P} roman \AAh {\lim\limits } { | ||
+ | \frac{ { \mathop{\rm log} } \Delta ^ {\mathcal C} ( X _ {1} \dots X _ {n} ) }{n} | ||
+ | } = 0, | ||
+ | $$ | ||
where | where | ||
− | + | $$ | |
+ | \Delta ^ {\mathcal C} ( X _ {1} \dots X _ {n} ) = \# \left \{ {C \cap \{ X _ {1} \dots X _ {n} \} } : {C \in {\mathcal C} } \right \} | ||
+ | $$ | ||
− | (see [[Vapnik–Chervonenkis class|Vapnik–Chervonenkis class]]). A corresponding sharp version of the [[Central limit theorem|central limit theorem]] exists too; essentially the only change is that the | + | (see [[Vapnik–Chervonenkis class|Vapnik–Chervonenkis class]]). A corresponding sharp version of the [[Central limit theorem|central limit theorem]] exists too; essentially the only change is that the $ n $ |
+ | in the denominator of (a5) has to be replaced by $ \sqrt n $ | ||
+ | to obtain an "if and only if" condition for the central limit theorem. Other useful concepts in connection with empirical processes are various notions of [[Entropy|entropy]], see [[#References|[a12]]], [[#References|[a13]]], [[#References|[a9]]], [[#References|[a10]]]. Also, for the function-indexed process in (a3), the analogues of (a4) and the central limit theorem above have been studied thoroughly, see [[#References|[a5]]], [[#References|[a9]]], [[#References|[a10]]]. | ||
− | For the classical empirical process in (a2), approximation theorems which yield a rate of convergence in the central limit theorem are extremely useful: A sequence of Brownian bridges | + | For the classical empirical process in (a2), approximation theorems which yield a rate of convergence in the central limit theorem are extremely useful: A sequence of Brownian bridges $ \{ {B _ {n} ( t ) } : {t \in [ 0,1 ] } \} $, |
+ | $ n = 2,3, \dots $, | ||
+ | can be constructed such that for all $ \lambda > 0 $ | ||
− | + | $$ | |
+ | {\mathsf P} \left ( \sup _ {x \in \mathbf R } \left | {\alpha _ {n} ( x ) - B _ {n} ( F ( x ) ) } \right | > { | ||
+ | \frac{12 { \mathop{\rm log} } n + \lambda }{\sqrt n } | ||
+ | } \right ) \leq 2e ^ {- {\lambda / 6 } } . | ||
+ | $$ | ||
− | A similar, only slightly less sharp, result can be obtained for the situation where the joint distribution of the | + | A similar, only slightly less sharp, result can be obtained for the situation where the joint distribution of the $ B _ {n} $ |
+ | s is known, i.e., the $ B _ {n} $ | ||
+ | s are defined by means of one single Kiefer process, see [[#References|[a3]]]. | ||
Empirical and related processes have many applications in many different subfields of probability theory and (non-parametric) statistics. | Empirical and related processes have many applications in many different subfields of probability theory and (non-parametric) statistics. |
Latest revision as of 19:37, 5 June 2020
A stochastic process constructed from a sample and the corresponding probability measure. Let $ X _ {1} \dots X _ {n} , \dots $
be a sequence of independent random elements with common law $ P $,
taking values in a measurable space $ ( S, {\mathcal S} ) $.
The empirical measure $ P _ {n} $
of the first $ n $
$ X _ {i} $
s is the discrete random measure that places mass $ {1 / n } $
on each such $ X _ {i} $:
$$ P _ {n} ( C ) = { \frac{1}{n} } \# \left \{ {1 \leq i \leq n } : {X _ {i} \in C } \right \} , \quad C \in {\mathcal S}. $$
Obviously, $ n P _ {n} ( C ) $ is binomially distributed with parameters $ n $ and $ P ( C ) $( cf. Binomial distribution). Hence $ {\mathsf E} P _ {n} ( C ) = P ( C ) $, $ {\mathsf P} ( {\lim\limits } _ {n \rightarrow \infty } P _ {n} ( C ) = P ( C ) ) = 1 $, and $ \sqrt n ( P _ {n} ( C ) - P ( C ) ) $ converges in distribution, as $ n \rightarrow \infty $, to a centred normal random variable with variance $ P ( C ) ( 1 - P ( C ) ) $( cf. Convergence in distribution). Therefore it is natural to define an empirical process indexed by sets by
$$ \tag{a1 } \alpha _ {n} ( C ) = \sqrt n ( P _ {n} ( C ) - P ( C ) ) , \quad C \in {\mathcal C}, $$
where $ {\mathcal C} \subset {\mathcal S} $. If $ ( S, {\mathcal S} ) = ( \mathbf R, {\mathcal B} ) $ and $ {\mathcal C} = \{ {( - \infty,x ] } : {x \in \mathbf R } \} $, one writes $ F _ {n} ( x ) = P _ {n} ( ( - \infty,x ] ) $ for the empirical distribution function, and the empirical process specializes to the classical empirical process
$$ \tag{a2 } \alpha _ {n} ( x ) = \sqrt n ( F _ {n} ( x ) - F ( x ) ) , \quad x \in \mathbf R, $$
where $ F ( x ) = {\mathsf P} ( X _ {i} \leq x ) $, $ x \in \mathbf R $, is the distribution function of the elements $ X _ {i} $. Replacing sets by their indicator functions leads, more generally, to the definition of an empirical process indexed by functions:
$$ \tag{a3 } \alpha _ {n} ( f ) = \sqrt n ( P _ {n} ( f ) - P ( f ) ) , \quad f \in {\mathcal F}, $$
where
$$ P _ {n} ( f ) = \int\limits _ { S } f {d P _ {n} } = { \frac{1}{n} } \sum _ {i = 1 } ^ { n } f ( X _ {i} ) , $$
$$ P ( f ) = \int\limits _ { S } f {d P } = {\mathsf E} f ( X _ {i} ) , $$
and $ {\mathcal F} $ is a suitable class of measurable functions from $ S $ to $ \mathbf R $.
The main aim of the theory of empirical processes is to obtain results uniformly in $ C $, $ x $ or $ f $; in particular, Glivenko–Cantelli-type theorems, central limit theorems, laws of the iterated logarithm, and probability inequalities (cf., e.g., Empirical distribution; Central limit theorem; Law of the iterated logarithm). (Measurability issues will be disregarded in the sequel.) The concept of a Vapnik–Chervonenkis class plays an important role in set-indexed situations. E.g., if $ {\mathcal C} $ is a Vapnik–Chervonenkis class, then for every probability measure $ P $ on $ ( S, {\mathcal S} ) $,
$$ \tag{a4 } \sup _ {C \in {\mathcal C} } \left | {P _ {n} ( C ) - P ( C ) } \right | \rightarrow 0 \textrm{ a.s. } , $$
and $ \alpha _ {n} ( C ) $, $ C \in {\mathcal C} $, converges weakly (see [a10] and Weak topology) to $ B _ {P} ( C ) $, $ C \in {\mathcal C} $, a centred, bounded Gaussian process, which is uniformly continuous (with respect to the pseudometric $ d $ defined by $ d ( C _ {1} ,C _ {2} ) = P ( C _ {1} \Delta C _ {2} ) $) and has covariance structure
$$ {\mathsf E} B _ {P} ( C _ {1} ) B _ {P} ( C _ {2} ) = P ( C _ {1} \cap C _ {2} ) - P ( C _ {1} ) P ( C _ {2} ) , $$
$$ C _ {1} ,C _ {2} \in {\mathcal C}. $$
For the classical empirical process in (a2), this limiting process specializes to $ B \circ F $, where $ B $ is a Brownian bridge (cf. Non-parametric methods in statistics). A sharp version of the first result is the following: (a4) holds if and only if
$$ \tag{a5 } {\mathsf P} roman \AAh {\lim\limits } { \frac{ { \mathop{\rm log} } \Delta ^ {\mathcal C} ( X _ {1} \dots X _ {n} ) }{n} } = 0, $$
where
$$ \Delta ^ {\mathcal C} ( X _ {1} \dots X _ {n} ) = \# \left \{ {C \cap \{ X _ {1} \dots X _ {n} \} } : {C \in {\mathcal C} } \right \} $$
(see Vapnik–Chervonenkis class). A corresponding sharp version of the central limit theorem exists too; essentially the only change is that the $ n $ in the denominator of (a5) has to be replaced by $ \sqrt n $ to obtain an "if and only if" condition for the central limit theorem. Other useful concepts in connection with empirical processes are various notions of entropy, see [a12], [a13], [a9], [a10]. Also, for the function-indexed process in (a3), the analogues of (a4) and the central limit theorem above have been studied thoroughly, see [a5], [a9], [a10].
For the classical empirical process in (a2), approximation theorems which yield a rate of convergence in the central limit theorem are extremely useful: A sequence of Brownian bridges $ \{ {B _ {n} ( t ) } : {t \in [ 0,1 ] } \} $, $ n = 2,3, \dots $, can be constructed such that for all $ \lambda > 0 $
$$ {\mathsf P} \left ( \sup _ {x \in \mathbf R } \left | {\alpha _ {n} ( x ) - B _ {n} ( F ( x ) ) } \right | > { \frac{12 { \mathop{\rm log} } n + \lambda }{\sqrt n } } \right ) \leq 2e ^ {- {\lambda / 6 } } . $$
A similar, only slightly less sharp, result can be obtained for the situation where the joint distribution of the $ B _ {n} $ s is known, i.e., the $ B _ {n} $ s are defined by means of one single Kiefer process, see [a3].
Empirical and related processes have many applications in many different subfields of probability theory and (non-parametric) statistics.
References
[a1] | K.S. Alexander, "Rates of growth and sample moduli for weighted empirical processes indexed by sets" Probab. Th. Rel. Fields , 75 (1987) pp. 379–423 |
[a2] | M. Csörgő, S. Csörgő, L. Horváth, D.M. Mason, "Weighted empirical and quantile processes" Ann. of Probab. , 14 (1986) pp. 31–85 |
[a3] | M. Csörgő, P. Révész, "Strong approximations in probability and statistics" , Acad. Press (1981) |
[a4] | P. Deheuvels, D.M. Mason, "Functional laws of the iterated logarithm for the increments of empirical and quantile processes" Ann. of Probab. , 20 (1992) pp. 1248–1287 |
[a5] | R.M. Dudley, "Universal Donsker classes and metric entropy" Ann. of Probab. , 15 (1987) pp. 1306–1326 |
[a6] | J.H.J. Einmahl, "The a.s. behavior of the weighted empirical process and the LIL for the weighted tail empirical process" Ann. of Probab. , 20 (1992) pp. 681–695 |
[a7] | E. Giné, "Empirical processes and applications: an overview" Bernoulli , 2 (1996) pp. 1–28 |
[a8] | P. Massart, "The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality" Ann. of Probab. , 18 (1990) pp. 1269–1283 |
[a9] | D. Pollard, "Convergence of stochastic processes" , Springer (1984) |
[a10] | A. Sheehy, J.A. Wellner, "Uniform Donsker classes of functions" Ann. of Probab. , 20 (1992) pp. 1983–2030 |
[a11] | G.R. Shorack, J.A. Wellner, "Empirical processes with applications to statistics" , Wiley (1986) |
[a12] | K.S. Alexander, "Probability inequalities for empirical processes and a law of the iterated logarithm" Ann. of Probab. , 12 (1984) pp. 1041–1067 |
[a13] | K.S. Alexander, "Correction: Probability inequalities for empirical processes and a law of the iterated logarithm" Ann. of Probab. , 15 (1987) pp. 428–430 |
Empirical process. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Empirical_process&oldid=12746