Difference between revisions of "Bootstrap method"
(Importing text file) |
(Revised article to recover some original statements.) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | A computer-intensive | + | A computer-intensive '''re-[[Sample|sampling]]''' method, introduced in statistics by B. Efron in 1979 ([[#References|[a3]]]) for estimating the variability of statistical quantities and for setting [[Confidence set|confidence regions]]. The name ‘bootstrap’ refers to the analogy of pulling oneself up by one’s own bootstraps. Efron’s bootstrap is to re-sample the data — given observations , artificial bootstrap samples are drawn with replacement from X_{1},\ldots,X_{n} , putting an equal probability mass of \dfrac{1}{n} on X_{i} for each i \in \{ 1,\ldots,n \} . For example, with a sample size of $ n = 5 $ and distinct observations X_{1},X_{2},X_{3},X_{4},X_{5} , one might obtain X_{3},X_{3},X_{1},X_{5},X_{4} as a bootstrap sample. In fact, there are 126 distinct bootstrap samples in this case. |
− | A more | + | A more rigorous description of Efron’s non-parametric bootstrap in a simple setting is as follows. Suppose that (X_{1},\ldots,X_{n}) is a random sample of size n drawn from a population whose underlying probability space is (\Omega,\mathscr{S},\mathsf{P}) , i.e., the X_{i} ’s are independent and identically-distributed [[Random variable|random variables]] on $ (\Omega,\mathscr{S},\mathsf{P}) $. Let F denote the common [[Distribution function|cumulative distribution function (cdf)]] of the X_{i} ’s, which we assume to be unknown. Let \Theta be some pre-specified (non-linear) functional on the space of cdf’s; the parameter that we want to estimate is $ \theta \stackrel{\text{df}}{=} \Theta(F) $. Let $ T: \mathbf{R}^{n} \to \mathbf{R} be a [[Statistical estimator|statistical estimator]] for \theta $ based on (X_{1},\ldots,X_{n}) (cf. also [[Statistical estimation|statistical estimation]]), and let T_{n} \stackrel{\text{df}}{=} T(X_{1},\ldots,X_{n}) . The object of interest is then the cdf G_{n} of the random variable \sqrt{n} (T_{n} - \theta) on (\Omega,\mathscr{S},\mathsf{P}) defined by |
+ | $$ | ||
+ | \forall x \in \mathbf{R}: \qquad | ||
+ | {G_{n}}(x) \stackrel{\text{df}}{=} \mathsf{P}(\{ \omega \in \Omega \mid \sqrt{n} [{T_{n}}(\omega) - \theta] \leq x \}). | ||
+ | $$ | ||
+ | This is the cdf of T_{n} properly normalized. The scaling factor \sqrt{n} is a classical one, while the centering of T_{n} is by the parameter \theta . | ||
− | + | Efron’s non-parametric bootstrap estimator of G_{n} , which we denote by G_{n}^{\ast} , is now defined by | |
+ | $$ | ||
+ | \forall x \in \mathbf{R}, ~ \forall \omega \in \Omega: \qquad | ||
+ | {G_{n}^{\ast}}(x;\omega) \stackrel{\text{df}}{=} | ||
+ | [{\mathsf{P}_{n}^{\ast}}(\omega)](\{ \mathbf{a} \in \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} \mid \sqrt{n} [[{T_{n}^{\ast}}(\omega)](\mathbf{a}) - {\theta_{n}}(\omega)] \leq x \}). | ||
+ | $$ | ||
+ | Here, $ {T_{n}^{\ast}}(\omega) \stackrel{\text{df}}{=} T({X_{1}^{\ast}}(\omega),\ldots,{X_{n}^{\ast}}(\omega)) $, where we have the following: | ||
+ | * ({X_{1}^{\ast}}(\omega),\ldots,{X_{n}^{\ast}}(\omega)) is a random sample (the bootstrap sample) drawn from the set \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} . | ||
+ | * For each i \in \{ 1,\ldots,n \} , the random variable {X_{i}^{\ast}}(\omega) is defined by [{X_{i}^{\ast}}(\omega)](\mathbf{a}) \stackrel{\text{df}}{=} \mathbf{a}(i) for all \mathbf{a} \in \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} . | ||
+ | The cdf of the {X_{i}^{\ast}}(\omega) ’s is designated to be $ {\hat{F}_{n}}(\bullet;\omega) , where \hat{F}_{n} denotes the [[Empirical distribution|empirical cdf]] associated with the sample (X_{1},\ldots,X_{n}) . Note that \hat{F}_{n} (which is a random step function) puts a probability mass of \dfrac{1}{n} on X_{i} for each i \in \{ 1,\ldots,n \} $, and it is sometimes referred to as the '''re-sampling distribution'''. Finally, {\mathsf{P}_{n}^{\ast}}(\omega) denotes the probability measure on \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} that corresponds to $ {\hat{F}_{n}}(\bullet;\omega) , and {\theta_{n}}(\omega) \stackrel{\text{df}}{=} \Theta \! \left( {\hat{F}_{n}}(\bullet;\omega) \right) $. | ||
− | + | Obviously, given an observed sample ({X_{1}}(\omega),\ldots,{X_{n}}(\omega)) , the cdf {\hat{F}_{n}}(\bullet;\omega) is completely known and hence, {G_{n}^{\ast}}(\bullet;\omega) is also completely known (at least in principle). One may view G_{n}^{\ast} as the empirical counterpart in the ‘bootstrap world’ to G_{n} in the ‘real world’. In practice, an exact computation of G_{n}^{\ast} is usually impractical (for an observed sample ({X_{1}}(\omega),\ldots,{X_{n}}(\omega)) consisting of n distinct numbers, there are \displaystyle \binom{2 n - 1}{n} distinct bootstrap samples), but G_{n}^{\ast} can be approximated by means of [[Monte-Carlo method|Monte-Carlo simulation]]. Efficient bootstrap simulation is discussed, for example, in [[#References|[a2]]] and [[#References|[a10]]]. | |
− | + | When does Efron’s bootstrap work? The consistency of the bootstrap approximation G_{n}^{\ast} viewed as an estimate of G_{n} , i.e., the requirement that | |
+ | $$ | ||
+ | \sup_{x \in \mathbf{R}} |{G_{n}^{\ast}}(x;\bullet) - {G_{n}}(x)| \stackrel{\mathsf{P}}{\longrightarrow} 0, \qquad (\text{Convergence in probability}) | ||
+ | $$ | ||
+ | is generally viewed as an absolute prerequisite for Efron’s bootstrap to work in the problem at hand. Of course, bootstrap consistency is only a first-order asymptotic result, and the error committed when G_{n} is estimated by G_{n}^{\ast} may still be quite large in finite samples. Second-order asymptotics (cf. [[Edgeworth series|Edgeworth series]]) enables one to investigate the speed at which \displaystyle \sup_{x \in \mathbf{R}} |{G_{n}^{\ast}}(x;\bullet) - {G_{n}}(x)| converges to 0 in probability, and also to identify cases where the rate of convergence is faster than \dfrac{1}{\sqrt{n}} — the classical Berry-Esseen-type rate for the normal approximation. An example in which the bootstrap possesses the beneficial property of being more accurate than the traditional normal approximation is the Student t -statistic and, more generally, Studentized statistics. For this reason, the use of bootstrapped Studentized statistics for setting confidence intervals is strongly advocated in a number of important problems. A general reference is [[#References|[a7]]]. | ||
− | < | + | When does the bootstrap fail? It has been proved in [[#References|[a1]]] that in the case of the mean, Efron’s bootstrap fails when F is the [[Attraction domain of a stable distribution|domain of attraction]] of an \alpha -stable law with $ 0 < \alpha < 2 . However, by re-sampling from \hat{F}_{n} $ with a (smaller) re-sample size m(n) that satisfies m(n) \to \infty and $ \dfrac{m(n)}{n} \to 0 as n \to \infty $, it can be shown that the (modified) bootstrap works. More generally, in recent years, the importance of a proper choice of the re-sampling distribution has become clear (see [[#References|[a5]]], [[#References|[a9]]] and [[#References|[a10]]]). |
− | for | + | The bootstrap can be an effective tool in many problems of statistical inference, for example, the construction of a confidence band in non-parametric regression, testing for the number of modes of a density, or the calibration of confidence bounds (see [[#References|[a2]]], [[#References|[a4]]] and [[#References|[a8]]]). Re-sampling methods for dependent data, such as the '''block bootstrap''', is another important topic of recent research (see [[#References|[a2]]] and [[#References|[a6]]]). |
− | + | ====References==== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | <table> | |
− | + | <TR><TD valign="top">[a1]</TD><TD valign="top"> | |
− | + | K.B. Athreya, “Bootstrap of the mean in the infinite variance case”, ''Ann. Statist.'', '''15''' (1987), pp. 724–731.</TD></TR> | |
− | <table><TR><TD valign="top">[a1]</TD> <TD valign="top"> | + | <TR><TD valign="top">[a2]</TD><TD valign="top"> |
+ | A.C. Davison, D.V. Hinkley, “Bootstrap methods and their application”, Cambridge Univ. Press (1997).</TD></TR> | ||
+ | <TR><TD valign="top">[a3]</TD><TD valign="top"> | ||
+ | B. Efron, “Bootstrap methods: another look at the jackknife”, ''Ann. Statist.'', '''7''' (1979), pp. 1–26.</TD></TR> | ||
+ | <TR><TD valign="top">[a4]</TD><TD valign="top"> | ||
+ | B. Efron, R.J. Tibshirani, “An introduction to the bootstrap”, Chapman & Hall (1993).</TD></TR> | ||
+ | <TR><TD valign="top">[a5]</TD> <TD valign="top"> | ||
+ | E. Giné, “Lectures on some aspects of the bootstrap”, P. Bernard (ed.), ''Ecole d'Eté de Probab. Saint Flour XXVI-1996'', ''Lecture Notes Math.'', '''1665''', Springer (1997).</TD></TR> | ||
+ | <TR><TD valign="top">[a6]</TD><TD valign="top"> | ||
+ | F. Götze, H.R. Künsch, “Second order correctness of the blockwise bootstrap for stationary observations”, ''Ann. Statist.'', '''24''' (1996), pp. 1914–1933.</TD></TR> | ||
+ | <TR><TD valign="top">[a7]</TD><TD valign="top"> | ||
+ | P. Hall, “The bootstrap and Edgeworth expansion”, Springer (1992).</TD></TR> | ||
+ | <TR><TD valign="top">[a8]</TD><TD valign="top"> | ||
+ | E. Mammen, “When does bootstrap work? Asymptotic results and simulations”, ''Lecture Notes Statist.'', '''77''', Springer (1992).</TD></TR> | ||
+ | <TR><TD valign="top">[a9]</TD><TD valign="top"> | ||
+ | H. Putter, W.R. van Zwet, “Resampling: consistency of substitution estimators”, ''Ann. Statist.'', '''24''' (1996), pp. 2297–2318.</TD></TR> | ||
+ | <TR><TD valign="top">[a10]</TD> <TD valign="top"> | ||
+ | J. Shao, D. Tu, “The jackknife and bootstrap”, Springer (1995).</TD></TR> | ||
+ | </table> |
Latest revision as of 05:52, 15 June 2017
A computer-intensive re-sampling method, introduced in statistics by B. Efron in 1979 ([a3]) for estimating the variability of statistical quantities and for setting confidence regions. The name ‘bootstrap’ refers to the analogy of pulling oneself up by one’s own bootstraps. Efron’s bootstrap is to re-sample the data — given observations X_{1},\ldots,X_{n} , artificial bootstrap samples are drawn with replacement from X_{1},\ldots,X_{n} , putting an equal probability mass of \dfrac{1}{n} on X_{i} for each i \in \{ 1,\ldots,n \} . For example, with a sample size of n = 5 and distinct observations X_{1},X_{2},X_{3},X_{4},X_{5} , one might obtain X_{3},X_{3},X_{1},X_{5},X_{4} as a bootstrap sample. In fact, there are 126 distinct bootstrap samples in this case.
A more rigorous description of Efron’s non-parametric bootstrap in a simple setting is as follows. Suppose that (X_{1},\ldots,X_{n}) is a random sample of size n drawn from a population whose underlying probability space is (\Omega,\mathscr{S},\mathsf{P}) , i.e., the X_{i} ’s are independent and identically-distributed random variables on (\Omega,\mathscr{S},\mathsf{P}) . Let F denote the common cumulative distribution function (cdf) of the X_{i} ’s, which we assume to be unknown. Let \Theta be some pre-specified (non-linear) functional on the space of cdf’s; the parameter that we want to estimate is \theta \stackrel{\text{df}}{=} \Theta(F) . Let T: \mathbf{R}^{n} \to \mathbf{R} be a statistical estimator for \theta based on (X_{1},\ldots,X_{n}) (cf. also statistical estimation), and let T_{n} \stackrel{\text{df}}{=} T(X_{1},\ldots,X_{n}) . The object of interest is then the cdf G_{n} of the random variable \sqrt{n} (T_{n} - \theta) on (\Omega,\mathscr{S},\mathsf{P}) defined by \forall x \in \mathbf{R}: \qquad {G_{n}}(x) \stackrel{\text{df}}{=} \mathsf{P}(\{ \omega \in \Omega \mid \sqrt{n} [{T_{n}}(\omega) - \theta] \leq x \}). This is the cdf of T_{n} properly normalized. The scaling factor \sqrt{n} is a classical one, while the centering of T_{n} is by the parameter \theta .
Efron’s non-parametric bootstrap estimator of G_{n} , which we denote by G_{n}^{\ast} , is now defined by \forall x \in \mathbf{R}, ~ \forall \omega \in \Omega: \qquad {G_{n}^{\ast}}(x;\omega) \stackrel{\text{df}}{=} [{\mathsf{P}_{n}^{\ast}}(\omega)](\{ \mathbf{a} \in \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} \mid \sqrt{n} [[{T_{n}^{\ast}}(\omega)](\mathbf{a}) - {\theta_{n}}(\omega)] \leq x \}). Here, {T_{n}^{\ast}}(\omega) \stackrel{\text{df}}{=} T({X_{1}^{\ast}}(\omega),\ldots,{X_{n}^{\ast}}(\omega)) , where we have the following:
- ({X_{1}^{\ast}}(\omega),\ldots,{X_{n}^{\ast}}(\omega)) is a random sample (the bootstrap sample) drawn from the set \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} .
- For each i \in \{ 1,\ldots,n \} , the random variable {X_{i}^{\ast}}(\omega) is defined by [{X_{i}^{\ast}}(\omega)](\mathbf{a}) \stackrel{\text{df}}{=} \mathbf{a}(i) for all \mathbf{a} \in \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} .
The cdf of the {X_{i}^{\ast}}(\omega) ’s is designated to be {\hat{F}_{n}}(\bullet;\omega) , where \hat{F}_{n} denotes the empirical cdf associated with the sample (X_{1},\ldots,X_{n}) . Note that \hat{F}_{n} (which is a random step function) puts a probability mass of \dfrac{1}{n} on X_{i} for each i \in \{ 1,\ldots,n \} , and it is sometimes referred to as the re-sampling distribution. Finally, {\mathsf{P}_{n}^{\ast}}(\omega) denotes the probability measure on \{ {X_{1}}(\omega),\ldots,{X_{n}}(\omega) \}^{n} that corresponds to {\hat{F}_{n}}(\bullet;\omega) , and {\theta_{n}}(\omega) \stackrel{\text{df}}{=} \Theta \! \left( {\hat{F}_{n}}(\bullet;\omega) \right) .
Obviously, given an observed sample ({X_{1}}(\omega),\ldots,{X_{n}}(\omega)) , the cdf {\hat{F}_{n}}(\bullet;\omega) is completely known and hence, {G_{n}^{\ast}}(\bullet;\omega) is also completely known (at least in principle). One may view G_{n}^{\ast} as the empirical counterpart in the ‘bootstrap world’ to G_{n} in the ‘real world’. In practice, an exact computation of G_{n}^{\ast} is usually impractical (for an observed sample ({X_{1}}(\omega),\ldots,{X_{n}}(\omega)) consisting of n distinct numbers, there are \displaystyle \binom{2 n - 1}{n} distinct bootstrap samples), but G_{n}^{\ast} can be approximated by means of Monte-Carlo simulation. Efficient bootstrap simulation is discussed, for example, in [a2] and [a10].
When does Efron’s bootstrap work? The consistency of the bootstrap approximation G_{n}^{\ast} viewed as an estimate of G_{n} , i.e., the requirement that \sup_{x \in \mathbf{R}} |{G_{n}^{\ast}}(x;\bullet) - {G_{n}}(x)| \stackrel{\mathsf{P}}{\longrightarrow} 0, \qquad (\text{Convergence in probability}) is generally viewed as an absolute prerequisite for Efron’s bootstrap to work in the problem at hand. Of course, bootstrap consistency is only a first-order asymptotic result, and the error committed when G_{n} is estimated by G_{n}^{\ast} may still be quite large in finite samples. Second-order asymptotics (cf. Edgeworth series) enables one to investigate the speed at which \displaystyle \sup_{x \in \mathbf{R}} |{G_{n}^{\ast}}(x;\bullet) - {G_{n}}(x)| converges to 0 in probability, and also to identify cases where the rate of convergence is faster than \dfrac{1}{\sqrt{n}} — the classical Berry-Esseen-type rate for the normal approximation. An example in which the bootstrap possesses the beneficial property of being more accurate than the traditional normal approximation is the Student t -statistic and, more generally, Studentized statistics. For this reason, the use of bootstrapped Studentized statistics for setting confidence intervals is strongly advocated in a number of important problems. A general reference is [a7].
When does the bootstrap fail? It has been proved in [a1] that in the case of the mean, Efron’s bootstrap fails when F is the domain of attraction of an \alpha -stable law with 0 < \alpha < 2 . However, by re-sampling from \hat{F}_{n} with a (smaller) re-sample size m(n) that satisfies m(n) \to \infty and \dfrac{m(n)}{n} \to 0 as n \to \infty , it can be shown that the (modified) bootstrap works. More generally, in recent years, the importance of a proper choice of the re-sampling distribution has become clear (see [a5], [a9] and [a10]).
The bootstrap can be an effective tool in many problems of statistical inference, for example, the construction of a confidence band in non-parametric regression, testing for the number of modes of a density, or the calibration of confidence bounds (see [a2], [a4] and [a8]). Re-sampling methods for dependent data, such as the block bootstrap, is another important topic of recent research (see [a2] and [a6]).
References
[a1] | K.B. Athreya, “Bootstrap of the mean in the infinite variance case”, Ann. Statist., 15 (1987), pp. 724–731. |
[a2] | A.C. Davison, D.V. Hinkley, “Bootstrap methods and their application”, Cambridge Univ. Press (1997). |
[a3] | B. Efron, “Bootstrap methods: another look at the jackknife”, Ann. Statist., 7 (1979), pp. 1–26. |
[a4] | B. Efron, R.J. Tibshirani, “An introduction to the bootstrap”, Chapman & Hall (1993). |
[a5] | E. Giné, “Lectures on some aspects of the bootstrap”, P. Bernard (ed.), Ecole d'Eté de Probab. Saint Flour XXVI-1996, Lecture Notes Math., 1665, Springer (1997). |
[a6] | F. Götze, H.R. Künsch, “Second order correctness of the blockwise bootstrap for stationary observations”, Ann. Statist., 24 (1996), pp. 1914–1933. |
[a7] | P. Hall, “The bootstrap and Edgeworth expansion”, Springer (1992). |
[a8] | E. Mammen, “When does bootstrap work? Asymptotic results and simulations”, Lecture Notes Statist., 77, Springer (1992). |
[a9] | H. Putter, W.R. van Zwet, “Resampling: consistency of substitution estimators”, Ann. Statist., 24 (1996), pp. 2297–2318. |
[a10] | J. Shao, D. Tu, “The jackknife and bootstrap”, Springer (1995). |
Bootstrap method. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bootstrap_method&oldid=11753