# Bootstrap asymptotics

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
This article Bootstrap Asymptotics was adapted from an original article by Rudolf J Beran, which appeared in StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. The original article ([http://statprob.com/encyclopedia/BootstrapAsymptotics.html StatProb Source], Local Files: pdf | tex) is copyrighted by the author(s), the article has been donated to Encyclopedia of Mathematics, and its further issues are under Creative Commons Attribution Share-Alike License'. All pages from StatProb are contained in the Category StatProb.

2010 Mathematics Subject Classification: Primary: 62F40 Secondary: 62E20 [MSN][ZBL]

$\def\thetahat{\hat{\theta}}$ $\def\chat{\hat{c}}$ $\def\Xbar{\bar{X}}$ $\def\E{\textrm{E}}$

BOOTSTRAP ASYMPTOTICS

\it Rudolf Beran 

Distinguished Professor Emeritus, Department of Statistics

University of California, Davis, CA 95616, USA

E-mail: beran@wald.ucdavis.edu

The bootstrap, introduced by Efron (1979), merges simulation with formal model-based statistical inference. A statistical model for a sample $X_n$ of size $n$ is a family of distributions $\{P_{\theta,n} \colon \theta \in \Theta\}$. The parameter space $\Theta$ is typically metric, possibly infinite-dimensional. The value of $\theta$ that identifies the true distribution from which $X_n$ is drawn is unknown. Suppose that $\thetahat_n = \thetahat_n(X_n)$ is a consistent estimator of $\theta$. The bootstrap idea is:

(a) Create an artificial \textsl{bootstrap world} in which the true parameter value is $\thetahat_n$ and the sample $X_n^*$ is generated from the fitted model $P_{\thetahat_n,n}$. That is, the conditional distribution of $X_n^*$, given the data $X_n$, is $P_{\thetahat_n,n}$.

(b) Act as if a sampling distribution computed in the fully known bootstrap world is a trustworthy approximation to the corresponding, but unknown, sampling distribution in the model world.

For example, consider constructing a confidence set for a parametric function $\tau(\theta)$, whose range is the set $T$. As in the classical pivotal method, let $R_n(X_n,\tau(\theta))$ be a specified \textsl{root}, a real-valued function of the sample and $\tau(\theta)$. Let $H_n(\theta)$ be the sampling distribution of the root under the model. The \textsl{bootstrap distribution} of the root is $H_n(\thetahat_n)$, a random probability measure that can also be viewed as the conditional distribution of $R_n(X_n^*, \tau(\thetahat_n))$ given the sample $X_n$. An associated \textsl{bootstrap confidence set} for $\tau(\theta)$, of nominal coverage probability $\beta$, is then $C_{n,B} = \{t \in T \colon R_n(X_n,t) \le H_n^{-1}(\beta, \thetahat_n)\}$. The quantile on the right can be approximated, for instance, by Monte Carlo techniques. The intuitive expectation is that the coverage probability of $C_{n,B}$ will be close to $\beta$ whenever $\thetahat_n$ is close to $\theta$.

When does the bootstrap approach work? Bootstrap samples are perturbations of the data from which they are generated. If the goal is to probe how a statistical procedure performs on data sets similar to the one at hand, then repeating the statistical procedure on bootstrap samples stands to be instructive. An exploratory rationale for the bootstrap appeals intellectually when empirically supported probability models for the data are lacking. Indeed, the literature on "statistical inference" continues to struggle with an uncritical tendency to view data as a \textsl{random} sample from a statistical model \textsl{known} to the statistician apart from parameter values. In discussing the history of probability theory, Doob (1972) described the mysterious interplay between probability models and physical phenomena: "But deeper and subtler investigations had to await until the blessing and curse of direct physical significance had been replaced by the bleak reliability of abstract mathematics."

Efron (1979) and most of the subsequent bootstrap literature postulate that the statistical model $\{P_{\theta,n} \colon \theta \in \Theta\}$ for the data is credible. "The bootstrap works" is taken to mean that bootstrap distributions, and interesting functionals thereof, converge in probability to the correct limits as sample size $n$ increases. The convergence is typically established pointwise for each value of $\theta$ in the parameter space $\Theta$. A template argument: Suppose that $\Theta$ is metric and that (a) $\thetahat_n \rightarrow \theta$ in $P_{\theta, n}$-probability as $n \rightarrow \infty$; (b) for any sequence $\{\theta_n \in \Theta\}$ that converges to $\theta$, $H_n (\theta_n) \Rightarrow H(\theta)$. Then $H_n(\thetahat_n) \Rightarrow H(\theta)$ in $P_{\theta,n}$-probability. Moreover, any weakly continuous functional of the bootstrap distribution converges in probability to the value of that functional at the limit distribution.

Such equicontinuity reasoning, in various formulations, is widespread in the literature on bootstrap convergence. For statistical models of practical interest, considerable insight may be needed to devise a metric on $\Theta$ such that the template sufficient conditions both hold. Some early papers on bootstrap convergence after Efron (1979) are Bickel and Freedman (1981), Hall (1986), Beran (1987). Broader references are the books and monographs by Hall (1992), Mammen (1992), Efron and Tibshirani (1993), Davison and Hinkley (1997) and the review articles in the bootstrap issue of Statistical Science \mathbf{18} (2003).

These references leave the impression that bootstrap methods often work, in the sense of correct pointwise asymptotic convergence or pointwise second-order accuracy, at every $\theta$ in the parameter space $\Theta$. Counter-examples to this impression have prompted further investigations. One line of research has established necessary and sufficient conditions for correct pointwise convergence of bootstrap distributions as $n$ tends to infinity (cf. Beran (1997), van Zwet and van Zwet (1999)).

In another direction, Putter (1994) showed: Suppose that the parameter space $\Theta$ is complete metric and that (a) $H_n(\theta) \Rightarrow H(\theta)$ for every $\theta \in \Theta$ as $n \rightarrow \infty$; (b) $H_n(\theta)$ is continuous in $\theta$, in the topology of weak convergence, for every $n \ge 1$; (c) $\thetahat_n \rightarrow \theta$ in $P_{\theta, n}$-probability for every $\theta \in \Theta$ as $n \rightarrow \infty$. Then $H_n(\thetahat_n) \Rightarrow H(\theta)$ in $P_{\theta, n}$-probability for "almost all" $\theta \in \Theta$. The technical definition of "almost all" is a set of Baire category II. While "almost all" $\theta$ may sound harmless, the failure of bootstrap convergence on a tiny set in the parameter space typically stems from non-uniform convergence of bootstrap distributions over neighborhoods of that set. When that is the case, pointwise limits are highly deceptive.

To see this concretely, let $\thetahat_{n,S}$ denote the James-Stein estimator for an unknown $p$-dimensional mean vector $\theta$ on which we have $n$ i.i.d. observations, each having a $N(0,I_p)$ error. Let $H_n(\theta)$ be the sampling distribution of the root $n^{1/2}(\thetahat_{n,S} - \theta)$ under this model. As $n$ tends to infinity with $p \ge 3$ fixed, we find (cf. Beran (1997)):

(a) The natural bootstrap distribution $H_n(\Xbar_n)$, where $\Xbar_n$ is the sample mean vector, converges correctly almost everywhere on the parameter space, except at $\theta = 0$. A similar failure occurs for the bootstrap distribution $H_n(\thetahat_{n,S})$.

(b) The weak convergences of the sampling distribution $H_n(\theta)$ and of the two bootstrap distributions just described are \textsl{not} uniform over neighborhoods of the point of bootstrap failure, $\theta = 0$.

(c) The exact quadratic risk of the James-Stein estimator strictly dominates that of $\Xbar_n$ at \textsl{every} $\theta$, especially at $\theta =0$. If the dimension $p$ is held fixed, the region of substantial dominance in risk shrinks towards $\theta = 0$ as $n$ increases. The asymptotic risk of the James-Stein estimator dominates that of the sample mean only at $\theta = 0$. That the dominance is strict for every finite $n \ge 1$ is missed by the non-uniform limit. Apt in describing non-uniform limits is George Berkeley's celebrated comment on infinitesimals: "ghosts of departed quantities."

In the James-Stein example, correct pointwise convergence of bootstrap distributions as $n$ tends to infinity is an inadequate "bootstrap works" concept, doomed by lack of uniform convergence. The example provides a leading instance of an estimator that dominates classical counterparts in risk and fails to bootstrap naively. The message extends farther. Stein (1956, first section) already noted that multiple shrinkage estimators, which apply different shrinkage factors to the summands in a projective decomposition of the mean vector, are "better for most practical purposes." Stein (1966) developed multiple shrinkage estimators in detail. In recent years, low risk multiple shrinkage estimators have been constructed implicitly through regularization techniques, among them adaptive penalized least squares with quadratic penalties, adaptive submodel selection, or adaptive symmetric linear estimators. Naive bootstrapping of such modern estimators fails as it does in the James-Stein case.

Research into these difficulties has taken two paths: (a) devising bootstrap patches that fix \textsl{pointwise} convergence of bootstrap distributions as the number of replications $n$ tends to infinity (cf.\ Beran (1997) for examples and references to the literature); (b) studying bootstrap procedures under asymptotics in which the dimension $p$ of the parameter space increases while $n$ is held fixed or increases. Large $p$ bootstrap asymptotics turn out to be uniform over usefully large subsets of the parameter space and yield effective bootstrap confidence sets around the James-Stein estimator and other regularization estimators (cf. Beran (1995), Beran and Dümbgen (1998)). The first section of Stein (1956) foreshadowed the role of large $p$ asymptotics in studies of modern estimators.

{\bf

How to Cite This Entry:
Bootstrap asymptotics. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bootstrap_asymptotics&oldid=37732