# Bernstein-von Mises theorem

Let $\{ {X _ {j} } : {j \geq 1 } \}$ be independent identically distributed random variables with a probability density depending on a parameter $\theta$( cf. Random variable; Probability distribution). Suppose that an a priori distribution for $\theta$ is chosen. One of the fundamental theorems in the asymptotic theory of Bayesian inference (cf. Bayesian approach) is concerned with the convergence of the a posteriori density of $\theta$, given $X _ {1} \dots X _ {n}$, to the normal density. In other words, the a posteriori distribution tends to look like a normal distribution asymptotically. This phenomenon was first noted in the case of independent and identically distributed observations by P.S. Laplace. A related, but different, result was proved by S.N. Bernstein [a2], who considered the a posteriori distribution of $\theta$ given the average $n ^ {-1 } ( X _ {1} + \dots + X _ {n} )$. R. von Mises [a12] extended the result to a posteriori distributions conditioned by a finite number of differentiable functionals of the empirical distribution function. L. Le Cam [a5] studied the problem in his work on asymptotic properties of maximum likelihood and related Bayesian estimates. The Bernstein–von Mises theorem about convergence in the $L _ {1}$- mean for the case of independent and identically distributed random variables reads as follows, see [a3].

Let $X _ {i}$, $1 \leq i \leq n$, be independent identically distributed random variables with probability density $f ( x, \theta )$, $\theta \in \Theta \subset \mathbf R$. Suppose $\Theta$ is open and $\lambda$ is an a priori probability density on $\Theta$ which is continuous and positive in an open neighbourhood of the true parameter $\theta _ {0}$. Let $h ( x, \theta ) = { \mathop{\rm log} } f ( x, \theta )$. Suppose that ${ {\partial h } / {\partial \theta } }$ and ${ {\partial ^ {2} h } / {\partial \theta ^ {2} } }$ exist and are continuous in $\theta$. Further, suppose that $i ( \theta ) = - {\mathsf E} _ \theta [ { {\partial ^ {2} h } / {\partial \theta ^ {2} } } ]$ is continuous, with $0 < i ( \theta ) < \infty$. Let $K ( \cdot )$ be a non-negative function satisfying

$$\int\limits _ {- \infty } ^ \infty {K ( t ) { \mathop{\rm exp} } \left [ - { \frac{( i ( \theta _ {0} ) - \epsilon ) t ^ {2} }{2} } \right ] } {d t } < \infty$$

for some $0 < \epsilon < i ( \theta _ {0} )$. Let ${\widehat \theta } _ {n}$ be a maximum-likelihood estimator of $\theta$ based on $X _ {1} \dots X _ {n}$( cf. Maximum-likelihood method) and let $L _ {n} ( \theta )$ be the corresponding likelihood function. It is known that under certain regularity conditions there exists a compact neighbourhood $U _ {\theta _ {0} }$ of $\theta _ {0}$ such that:

${\widehat \theta } _ {n} \rightarrow \theta _ {0}$ almost surely;

$( { {\partial { \mathop{\rm log} } L _ {n} ( \theta ) } / {\partial \theta } } ) \mid _ {\theta = {\widehat \theta } _ {n} } = 0$ for large $n$;

$n ^ {1/2 } ( {\widehat \theta } _ {n} - \theta _ {0} )$ converges in distribution (cf. Convergence in distribution) to the normal distribution with mean $0$ and variance ${1 / {i ( \theta _ {0} ) } }$ as $n \rightarrow \infty$.

Let $f _ {n} ( \theta \mid x _ {1} \dots x _ {n} )$ denote the a posteriori density of $\theta$ given the observation $( x _ {1} \dots x _ {n} )$ and the a priori probability density $\lambda ( \theta )$, that is,

$$f _ {n} ( \theta \mid x _ {1} \dots x _ {n} ) = { \frac{\prod _ {i = 1 } ^ { n } f ( x _ {i} , \theta ) \lambda ( \theta ) }{\int\limits _ \Theta {\prod _ {i = 1 } ^ { n } f ( x _ {i} , \phi ) \lambda ( \phi ) } {d \phi } } } .$$

Let $f _ {n} ^ {*} ( t \mid x _ {1} \dots x _ {n} ) = n ^ {- 1/2 } f _ {n} ( {\widehat \theta } _ {n} + tn ^ {- 1/2 } )$. Then $f _ {n} ^ {*} ( t \mid x _ {1} \dots x _ {n} )$ is the a posteriori density of $t = n ^ {1/2 } ( \theta - {\widehat \theta } _ {n} )$.

A generalized version of the Bernstein–von Mises theorem, under the assumptions stated above and some addition technical conditions, is as follows.

If, for every $h > 0$ and $\delta > 0$,

$$e ^ {- n \delta } \int\limits _ {\left | t \right | > h } {K ( n ^ {1/2 } t ) \lambda ( {\widehat \theta } _ {n} + t ) } {d t } \rightarrow 0 \textrm{ a.s. } [ {\mathsf P} _ {\theta _ {0} } ] ,$$

then

$${\lim\limits } _ {n \rightarrow \infty } \int\limits _ {- \infty } ^ \infty {K ( t ) } \cdot$$

$$\cdot {\left | {f _ {n} ^ {*} ( t \mid X _ {1} \dots X _ {n} ) - \left ( { \frac{i ( \theta _ {0} ) }{2 \pi } } \right ) ^ { {1 / 2 } } e ^ {- { \frac{1}{2} } i ( \theta _ {0} ) t ^ {2} } } \right | } {d t } =$$

$$= 0 \textrm{ a.s. } [ {\mathsf P} _ {\theta _ {0} } ] .$$

For $K ( t ) \equiv 1$ one finds that the a posteriori density converges to the normal density in $L _ {1}$- mean convergence. The result can be extended to a multi-dimensional parameter. As an application of the above theorem, it can be shown that the Bayesian estimator is strongly consistent and asymptotically efficient for a suitable class of loss functions (cf. [a11]). For rates of convergence see [a4], [a7], [a8].

B.L.S. Prakasa Rao [a6] has generalized the result to arbitrary discrete-time stochastic processes (cf. [a1]); for extensions to diffusion processes and diffusion fields, see [a9], [a10].

#### References

 [a1] I.V. Basawa, B.L.S. Prakasa Rao, "Statistical inference for stochastic processes" , Acad. Press (1980) [a2] S.N. Bernstein, "Theory of probability" (1917) (In Russian) [a3] J.D. Borwanker, G. Kallianpur, B.L.S. Prakasa Rao, "The Bernstein–von Mises theorem for Markov processes" Ann. Math. Stat. , 43 (1971) pp. 1241–1253 [a4] C. Hipp, R. Michael, "On the Bernstein–von Mises approximation of posterior distribution" Ann. Stat. , 4 (1976) pp. 972–980 [a5] L. Le Cam, "On some asymptotic properties of maximum likelihood estimates and related Bayes estimates" Univ. California Publ. Stat. , 1 (1953) pp. 277–330 [a6] B.L.S. Prakasa Rao, "Statistical inference for stochastic processes" G. Sankaranarayanan (ed.) , Proc. Advanced Symp. on Probability and its Applications , Annamalai Univ. (1976) pp. 43–150 [a7] B.L.S. Prakasa Rao, "Rate of convergence of Bernstein–von Mises approximation for Markov processes" Serdica , 4 (1978) pp. 36–42 [a8] B.L.S. Prakasa Rao, "The equivalence between (modified) Bayes estimator and maximum likelihood estimator for Markov processes" Ann. Inst. Statist. Math. , 31 (1979) pp. 499–513 [a9] B.L.S. Prakasa Rao, "The Bernstein–von Mises theorem for a class of diffusion processes" Teor. Sluch. Prots. , 9 (1981) pp. 95–104 (In Russian) [a10] B.L.S. Prakasa Rao, "On Bayes estimation for diffusion fields" J.K. Ghosh (ed.) J. Roy (ed.) , Statistics: Applications and New Directions , Statistical Publishing Soc. (1984) pp. 504–511 [a11] B.L.S. Prakasa Rao, "Asymptotic theory of statistical inference" , Wiley (1987) [a12] R. von Mises, "Wahrscheinlichkeitsrechnung" , Springer (1931)
How to Cite This Entry:
Bernstein-von Mises theorem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bernstein-von_Mises_theorem&oldid=46024
This article was adapted from an original article by B.L.S. Prakasa-Rao (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article