%%% Title of object: Bootstrap Asymptotics
%%% Canonical Name: BootstrapAsymptotics
%%% Type: Topic
%%% Created on: 2010-08-30 21:22:52
%%% Modified on: 2010-08-30 21:22:52
%%% Creator: statprobberan
%%% Modifier: nicholst
%%%
%%% Classification: msc:62F40, msc:62E20
%%% Keywords: bootstrap success, bootstrap failure, bootstrap confidence sets, regularization estimators, dimensional asymptotics
%%% Synonyms: Bootstrap Asymptotics=Resampling Asymptotics
%%% Preamble:
% this is the default PlanetMath preamble. as your knowledge
% of TeX increases, you will probably want to edit this, but
% it should be fine as is for beginners.
% almost certainly you want these
\documentclass[10pt]{article}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{amsfonts}
% used for TeXing text within eps files
%\usepackage{psfrag}
% need this for including graphics (\includegraphics)
%\usepackage{graphicx}
% for neatly defining theorems and propositions
%\usepackage{amsthm}
% making logically defined graphics
%\usepackage{xypic}
% there are many more packages, add them here as you need them
% define commands here
%%Content:
\usepackage{amsmath,amsthm,amssymb}
\usepackage[dvips]{graphicx}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}
\newtheorem{corollary}{Corollary}
\newtheorem{proposition}{Proposition}
\theoremstyle{definition}
\newtheorem{definition}{Definition}
\theoremstyle{remark}
\newtheorem{remark}{Remark}
\theoremstyle{remark}
\newtheorem{example}{Example}
\renewcommand{\figurename}{Fig.}
\def\thetahat{\hat{\theta}}
\def\chat{\hat{c}}
\def\Xbar{\bar{X}}
\def\E{\textrm{E}}
\oddsidemargin 16.5mm
\evensidemargin 16.5mm
\textwidth 28cc
\textheight 42cc
\parskip .5mm
\parindent 2cc
\begin{document}
\begin{center}
{\bf \large BOOTSTRAP ASYMPTOTICS}\\[4mm]
{\large \it Rudolf Beran \footnote{Rudolf Beran was Department Chair at UC
Davis (2003--07) and at UC Berkeley (1986--89). He received in 2006 the
Memorial Medal of the Faculty of Mathematics and Physics, Charles
University, Prague, in recognition of ``distinguished and wide-ranging
achievements in mathematical statistics, $\ldots$ devoted service to the
international statistical community, and a long-lasting
collaboration with Czech statisticians.'' During 1997--99 he held an
Alexander von Humboldt U.S.\ Senior Scientist Award at Heidelberg
University. He has authored or co-authored over 100 papers in international
journals and published lecture notes (with G.\ R.\ Ducharme) on
\textit{Asymptotic Theory for Bootstrap Methods in Statistics}
(Publications CRM, Universit\'{e} de Montr\'{e}al, 1991).}}
Distinguished Professor Emeritus, Department of Statistics
University of California, Davis, CA 95616, USA
E-mail: beran@wald.ucdavis.edu
\end{center}
The bootstrap, introduced by Efron (1979), merges simulation with formal
model-based statistical inference. A statistical model for a sample $X_n$
of size $n$ is a family of distributions $\{P_{\theta,n} \colon \theta \in
\Theta\}$. The parameter space $\Theta$ is typically metric, possibly
infinite-dimensional. The value of $\theta$ that identifies the true
distribution from which $X_n$ is drawn is unknown. Suppose that
$\thetahat_n = \thetahat_n(X_n)$ is a consistent estimator of $\theta$. The
bootstrap idea is:
(a) Create an artificial \textsl{bootstrap world} in which the true
parameter value is $\thetahat_n$ and the sample $X_n^*$ is generated from
the fitted model $P_{\thetahat_n,n}$. That is, the conditional distribution
of $X_n^*$, given the data $X_n$, is $P_{\thetahat_n,n}$.
(b) Act as if a sampling distribution computed in the fully known
bootstrap world is a trustworthy approximation to the corresponding, but
unknown, sampling distribution in the model world.
For example, consider constructing a confidence set for a parametric
function $\tau(\theta)$, whose range is the set $T$. As in the classical
pivotal method, let $R_n(X_n,\tau(\theta))$ be a specified \textsl{root}, a
real-valued function of the sample and $\tau(\theta)$. Let $H_n(\theta)$ be
the sampling distribution of the root under the model. The
\textsl{bootstrap distribution} of the root is $H_n(\thetahat_n)$, a random
probability measure that can also be viewed as the conditional distribution
of $R_n(X_n^*, \tau(\thetahat_n))$ given the sample $X_n$. An associated
\textsl{bootstrap confidence set} for $\tau(\theta)$, of nominal coverage
probability $\beta$, is then $C_{n,B} = \{t \in T \colon R_n(X_n,t) \le
H_n^{-1}(\beta, \thetahat_n)\}$. The quantile on the right can be
approximated, for instance, by Monte Carlo techniques. The intuitive
expectation is that the coverage probability of $C_{n,B}$ will be close to
$\beta$ whenever $\thetahat_n$ is close to $\theta$.
When does the bootstrap approach work? Bootstrap samples are perturbations
of the data from which they are generated. If the goal is to probe how a
statistical procedure performs on data sets similar to the one at hand,
then repeating the statistical procedure on bootstrap samples stands to be
instructive. An exploratory rationale for the bootstrap appeals
intellectually when empirically supported probability models for the data
are lacking. Indeed, the literature on ``statistical inference'' continues
to struggle with an uncritical tendency to view data as a \textsl{random}
sample from a statistical model \textsl{known} to the statistician apart
from parameter values. In discussing the history of probability theory,
Doob (1972) described the mysterious interplay between probability models
and physical phenomena: ``But deeper and subtler investigations had to
await until the blessing and curse of direct physical significance had been
replaced by the bleak reliability of abstract mathematics.''
Efron (1979) and most of the subsequent bootstrap literature postulate that
the statistical model $\{P_{\theta,n} \colon \theta \in \Theta\}$ for the
data is credible. ``The bootstrap works'' is taken to mean that bootstrap
distributions, and interesting functionals thereof, converge in probability
to the correct limits as sample size $n$ increases. The convergence is
typically established pointwise for each value of $\theta$ in the parameter
space $\Theta$. A template argument: Suppose that $\Theta$ is metric and
that (a) $\thetahat_n \rightarrow \theta$ in $P_{\theta, n}$-probability as
$n \rightarrow \infty$; (b) for any sequence $\{\theta_n \in \Theta\}$ that
converges to $\theta$, $H_n (\theta_n) \Rightarrow H(\theta)$. Then
$H_n(\thetahat_n) \Rightarrow H(\theta)$ in $P_{\theta,n}$-probability.
Moreover, any weakly continuous functional of the bootstrap distribution
converges in probability to the value of that functional at the limit
distribution.
Such equicontinuity reasoning, in various formulations, is widespread in
the literature on bootstrap convergence. For statistical models of
practical interest, considerable insight may be needed to devise a metric
on $\Theta$ such that the template sufficient conditions both hold. Some
early papers on bootstrap convergence after Efron (1979) are Bickel and
Freedman (1981), Hall (1986), Beran (1987). Broader references are the
books and monographs by Hall (1992), Mammen (1992), Efron and Tibshirani
(1993), Davison and Hinkley (1997) and the review articles in the bootstrap
issue of \textit{Statistical Science} \textbf{18} (2003).
These references leave the impression that bootstrap methods often work, in
the sense of correct pointwise asymptotic convergence or pointwise
second-order accuracy, at every $\theta$ in the parameter space $\Theta$.
Counter-examples to this impression have prompted further investigations.
One line of research has established necessary and sufficient conditions
for correct pointwise convergence of bootstrap distributions as $n$ tends
to infinity (cf.\ Beran (1997), van Zwet and van Zwet (1999)).
In another direction, Putter (1994) showed: Suppose that the parameter
space $\Theta$ is complete metric and that (a) $H_n(\theta) \Rightarrow
H(\theta)$ for every $\theta \in \Theta$ as $n \rightarrow \infty$; (b)
$H_n(\theta)$ is continuous in $\theta$, in the topology of weak
convergence, for every $n \ge 1$; (c) $\thetahat_n \rightarrow \theta$ in
$P_{\theta, n}$-probability for every $\theta \in \Theta$ as $n \rightarrow
\infty$. Then $H_n(\thetahat_n) \Rightarrow H(\theta)$ in $P_{\theta,
n}$-probability for ``almost all'' $\theta \in \Theta$. The technical
definition of ``almost all'' is a set of Baire category II. While ``almost
all'' $\theta$ may sound harmless, the failure of bootstrap convergence on
a tiny set in the parameter space typically stems from non-uniform
convergence of bootstrap distributions over neighborhoods of that set. When
that is the case, pointwise limits are highly deceptive.
To see this concretely, let $\thetahat_{n,S}$ denote the James-Stein
estimator for an unknown $p$-dimensional mean vector $\theta$ on which we
have $n$ i.i.d.\ observations, each having a $N(0,I_p)$ error. Let
$H_n(\theta)$ be the sampling distribution of the root
$n^{1/2}(\thetahat_{n,S} - \theta)$ under this model. As $n$ tends to
infinity with $p \ge 3$ fixed, we find (cf.\ Beran (1997)):
(a) The natural bootstrap distribution $H_n(\Xbar_n)$, where $\Xbar_n$ is
the sample mean vector, converges correctly almost everywhere on the
parameter space, except at $\theta = 0$. A similar failure occurs for the
bootstrap distribution $H_n(\thetahat_{n,S})$.
(b) The weak convergences of the sampling distribution $H_n(\theta)$ and of
the two bootstrap distributions just described are \textsl{not} uniform
over neighborhoods of the point of bootstrap failure, $\theta = 0$.
(c) The exact quadratic risk of the James-Stein estimator strictly
dominates that of $\Xbar_n$ at \textsl{every} $\theta$, especially at
$\theta =0 $. If the dimension $p$ is held fixed, the region of substantial
dominance in risk shrinks towards $\theta = 0$ as $n$ increases. The
asymptotic risk of the James-Stein estimator dominates that of the sample
mean only at $\theta = 0$. That the dominance is strict for every finite $n
\ge 1$ is missed by the non-uniform limit. Apt in describing non-uniform
limits is George Berkeley's celebrated comment on infinitesimals: ``ghosts
of departed quantities.''
In the James-Stein example, correct pointwise convergence of bootstrap
distributions as $n$ tends to infinity is an inadequate ``bootstrap works''
concept, doomed by lack of uniform convergence. The example provides a
leading instance of an estimator that dominates classical counterparts in
risk and fails to bootstrap naively. The message extends farther. Stein
(1956, first section) already noted that multiple shrinkage estimators,
which apply different shrinkage factors to the summands in a projective
decomposition of the mean vector, are ``better for most practical
purposes.'' Stein (1966) developed multiple shrinkage estimators in detail.
In recent years, low risk multiple shrinkage estimators have been
constructed implicitly through regularization techniques, among them
adaptive penalized least squares with quadratic penalties, adaptive
submodel selection, or adaptive symmetric linear estimators. Naive
bootstrapping of such modern estimators fails as it does in the James-Stein
case.
Research into these difficulties has taken two paths: (a) devising
bootstrap patches that fix \textsl{pointwise} convergence of bootstrap
distributions as the number of replications $n$ tends to infinity (cf.\
Beran (1997) for examples and references to the literature); (b) studying
bootstrap procedures under asymptotics in which the dimension $p$ of the
parameter space increases while $n$ is held fixed or increases. Large $p$
bootstrap asymptotics turn out to be uniform over usefully large subsets of
the parameter space and yield effective bootstrap confidence sets around
the James-Stein estimator and other regularization estimators (cf.\ Beran
(1995), Beran and D\"umbgen (1998)). The first section of Stein (1956)
foreshadowed the role of large $p$ asymptotics in studies of modern
estimators.
\vspace{1.5cc}
\noindent{\bf References}
\newcounter{ref}
\begin{list}{\small [\,\arabic{ref}\,]}{\usecounter{ref} \leftmargin 4mm
\itemsep -1mm}
{\small
\item
Beran, R. (1987).
Prepivoting to reduce level error of confidence sets.
\textit{Biometrika}
\textbf{74} 457--468.
\item
Beran, R. (1995).
Stein confidence sets and the bootstrap.
\textit{Statistica Sinica}
\textbf{5} 109--127.
\item
Beran, R. (1997).
Diagnosing bootstrap success.
\textit{Annals of the Institute of Statistical Mathematics}
\textbf{49} 1--24.
\item
Beran, R. and D\"umbgen, L. (1998).
Modulation of estimators and confidence sets.
\textit{Annals of Statistics}
\textbf{26} 1826--1856.
\item
Bickel, P. J. and Freedman, D. A. (1981).
Some asymptotic theory for the bootstrap.
\textit{Annals of Statistics}
\textbf{9} 1196--1217.
\item
Davison, A. C. and Hinkley, D. V. (1997).
\textit{ Bootstrap Methods and their Application}.
Cambridge University Press.
\item
Doob, J. L. (1972).
William Feller and twentieth century probability.
In \textit{Proceedings of the Sixth Berkeley Symposium on Mathematical
Statistics and Probability} (L.\ M.\ Le Cam, J.\ Neyman, E.\ L.\ Scott,
eds.)
\textbf{II}, xv--xx.
University of California Press, Berkeley and Los Angeles.
\item
Efron, B. (1979).
Bootstrap methods: another look at the jackknife.
\textit{Annals of Statistics}
\textbf{7} 1--26.
\item
Efron, B. and Tibshirani, R. (1993).
\textit{An Introduction to the Bootstrap}.
Chapman and Hall, New York.
\item
Hall, P. (1986).
On the bootstrap and confidence intervals.
\textit{Annals of Statistics}
\textbf{14} 1431--1452.
\item
Hall, P. (1992)
\textit{The Bootstrap and Edgeworth Expansion}.
Springer, New York.
\item
Mammen, E. (1992).
\textit{When Does Bootstrap Work?}
Lecture Notes in Statistics
\textbf{77}.
Springer, New York.
\item
Putter, H. (1994).
\textit{Consistency of Resampling Methods}.
Ph.D.\ dissertation, Leiden University.
\item
Stein, C. (1956).
Inadmissibility of the usual estimator for the mean of a multivariate
normal distribution.
In \textit{Proceedings of the Third Berkeley Symposium on Mathematical
Statistics and Probability} (J.\ Neyman, ed.)
\textbf{I}, 197--206.
University of California Press, Berkeley and Los Angeles.
\item
Stein, C. (1966).
An approach to the recovery of inter-block information in balanced
incomplete block designs.
In \textit{Festschrift for Jerzy Neyman} (F.\ N.\ David, ed.)
351--364.
Wiley, New York.
\item
van Zwet, E. W. and van Zwet, W. R. (1999).
A remark on consistent estimation.
\textsl{Mathematical Methods of Statistics}
\textbf{8} 277--284.
}
\end{list}
\end{document}