# Bayesian approach, empirical

A statistical interpretation of the Bayesian approach yielding conclusions on unobservable parameters even if their a priori distribution is unknown. Let $(Y, X)$ be a random vector for which the density $p(y \mid x)$ of the conditional distribution of $Y$ for any given value of the random parameter $X = x$ is known. If, as a result of some experiment, only the realization of $Y$ is observed, while the corresponding realization of $X$ is unknown, and if it is necessary to estimate the value of a given function $\phi (X)$ of the non-observed realization, then, in accordance with the empirical Bayesian approach, the conditional mathematical expectation ${\mathsf E} \{ \phi (X) \mid Y \}$ should be used as an approximate value $\psi (Y)$ for $\phi (X)$. In view of the Bayes formula, this expectation is given by the formula

$$\tag{1 } \psi (Y) = \ \frac{\int\limits \phi (x) p (Y \mid x) p (x) d \mu (x) }{q (Y) } ,$$

where

$$\tag{2 } q (y) = \int\limits p (y \mid x) p (x) d \mu (x),$$

$p(x)$ is the density of the unconditional (a priori) distribution of $X$, $\mu (x)$ is the corresponding $\sigma$- finite measure; and the function $q(y)$ represents the density of the unconditional distribution of $Y$.

If the a priori density $p(x)$ is unknown, it is not possible to compute the values of $\psi$ and $q$. However, if a sufficiently large number of realizations of the random variables $Y _ {1} \dots Y _ {k}$, which are drawn from the distribution with density $q(y)$, is known, it is possible to construct a consistent estimator $\widehat{q} (y)$, which depends only on $Y _ {1} \dots Y _ {k}$. S.N. Bernshtein [1] proposed to estimate the value of $\psi (Y)$ by substituting $\widehat{q} (y)$ for $q(y)$ in (2), finding the solution $\widehat{p} (x)$ of this integral equation, and then substituting $\widehat{p}$ and $\widehat{q}$ in the right-hand side of (1). However, this method is difficult, since solving this integral equation (2) is an ill-posed problem in numerical mathematics.

In certain special cases the statistical approach may be employed not only to estimate $q$, but also $\psi$[3]. This is possible if the identity

$$\tag{3 } \phi (x)p(y \mid x) = \ \lambda (y) r [z(y) \mid x ] ,$$

involving $x$ and $y$, is true. In (3), $\lambda (y)$ and $z(y)$ are functions which depend on $y$ only, while $r(z \mid x)$, being a function of $z$, is a probability density (i.e. may be regarded as the density of an arbitrary distribution of some random variable $Z$ for a given value $X = x$). If (3) is true, the numerator of (1) is equal to the product $\lambda (Y)s[z(Y)]$, where $s(z) = \int r(z \mid x)p(x) d \mu (x)$ is the density of the unconditional distribution of $Z$. Thus, if a sufficiently large number of realizations of independent random variables $Z _ {1} \dots Z _ {m}$ with density distribution $s(z)$ is available, then it is possible to construct a consistent estimator $\widehat{s} (z)$ for $s (z)$, and hence also to find a consistent estimator $\widehat \psi (Y)$ for $\psi (Y)$:

$$\tag{4 } \phi (X) \approx \psi (Y) \approx \ \widehat \psi (Y) = \ \frac{\lambda (Y) \widehat{s} [z (Y)] }{\widehat{q} (Y) } .$$

For instance, if one has to estimate $\psi (X) = X ^ {h}$, where $h$ is a positive integer, and $p(y \mid x) = x ^ {y} e ^ {-x} / y !$( $y = 0, 1 ,\dots$; $x > 0$), then $\phi (x) p (y \mid x) = \lambda (y)p(y + h \mid x)$, where $\lambda (y) = (y+h) ! /y !$. Since, here, $r(z \mid x) = p(z \mid x)$, one has $s(z) = q(z)$. Accordingly, $\widehat \psi (Y) = \lambda (Y) \widehat{q} (Y+h)/ \widehat{q} (Y)$, i.e. only the sequence of realizations $Y _ {1} , Y _ {2} \dots$ is required to find $\widehat \psi$. If, on the other hand, $p(y \mid x) = b (y \mid x) = C _ {n} ^ {y} x ^ {y} (1 - x) ^ {n-y }$( $y = 0 \dots n$; $n$ is a positive integer; $0 \leq x \leq 1$, $C _ {n} ^ {y} = ( {} _ {y} ^ {n} )$), then $\psi (x)p(y \mid x) = \lambda (y)r(y+h \mid x)$, where $\lambda (y) = C _ {n} ^ {y} / C _ {n+h } ^ {y+h }$ and $r(z \mid x) = b _ {n+h } (z \mid x) \neq p(z \mid x)$. For this reason two sequences of empirical values $Y _ {i}$ and $Z _ {j}$ are required in this case to construct $\widehat \psi (Y)$.

This form of the empirical Bayesian approach is applicable to the very narrow class of densities $p(y \mid x)$ and functions $\phi (x)$ which satisfy condition (3); even if this condition is in fact met, the construction of the estimator (4) is subject to the observability of the random variables $Z _ {j}$, the distribution of which usually differs from that of the variables $Y _ {i}$ which are observed directly. For practical purposes, it is preferable to use the empirical Bayesian approach in a modified form, in which these disadvantages are absent. In this modification the approximation which is constructed does not yield a consistent estimator of $\psi (Y)$( such an estimator may even be non-existent), but rather upper and lower estimators of this function, which are found by solving a problem of linear programming, as follows. Let $\Psi _ {1} (Y)$ and $\Psi _ {2} (Y)$ be the constrained minimum and maximum of the linear functional (with respect to the unknown a priori density $p(x)$) in the numerator of (1), calculated under the linear constraints $p(x) \geq 0$, $\int p(x) d \mu (x) = 1$ and $q(Y) \equiv \int p(Y \mid x)p(x) d \mu (x) = \widehat{q} (Y)$, where $\widehat{q} (Y)$ is the estimator of $q(Y)$ mentioned above, constructed from the results of the observations $Y _ {1} \dots Y _ {k}$. One may conclude in such a case that $\Psi _ {1} (Y)/ \widehat{q} (Y) \leq \psi (Y) \leq \Psi _ {2} (Y)/ \widehat{q} (Y)$, where the probability of the truth of this conclusion tends to one (by virtue of the law of large numbers) as the number of random variables $Y _ {i}$, used to construct the estimator $\widehat{q} (Y)$, increases without limit. Other modifications of the empirical Bayesian approach are also possible — for example, by adding to the last-named condition $q(Y) = \widehat{q} (Y)$ a finite number of conditions of the form $q( y _ {i} ) = \widehat{q} (y _ {i} )$, where $y _ {i}$ are preliminarily given numbers; if $\widehat{q}$ is replaced by the corresponding confidence bounds for $q$, the conditions are obtained in the form of inequalities $q _ {1} (y _ {i} ) \leq q(y _ {i} ) \leq q _ {2} (y _ {i} )$, etc.

In certain cases, which are important in practice, satisfactory majorants, which can be computed without the use of the laborious method of linear programming, can be found for the functions $\Psi _ {1}$ and $\Psi _ {2}$( see the example in the entry Sample method which deals with statistical control).

See the entry Discriminant analysis for the applications of the empirical Bayesian approach to hypotheses testing concerning the values of random parameters.

#### References

 [1] S.N. Bernshtein, "On "fiducial" probabilities of Fisher" Izv. Akad. Nauk SSSR Ser. Mat. , 5 (1941) pp. 85–94 (In Russian) (English abstract) [2] L.N. Bol'shev, "Applications of the empirical Bayes approach" , Proc. Internat. Congress Mathematicians (Nice, 1970) , 3 , Gauthier-Villars (1971) pp. 241–247 [3] H. Robbins, "An empirical Bayes approach to statistics" , Proc. Berkeley Symp. Math. Statist. Probab. , 1 , Berkeley-Los Angeles (1956) pp. 157–163
How to Cite This Entry:
Bayesian approach, empirical. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach,_empirical&oldid=45998
This article was adapted from an original article by L.N. Bol'shev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article