Namespaces
Variants
Actions

Difference between revisions of "Bayesian approach, empirical"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
A statistical interpretation of the [[Bayesian approach|Bayesian approach]] yielding conclusions on unobservable parameters even if their [[A priori distribution|a priori distribution]] is unknown. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154001.png" /> be a random vector for which the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154002.png" /> of the conditional distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154003.png" /> for any given value of the random parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154004.png" /> is known. If, as a result of some experiment, only the realization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154005.png" /> is observed, while the corresponding realization of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154006.png" /> is unknown, and if it is necessary to estimate the value of a given function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154007.png" /> of the non-observed realization, then, in accordance with the empirical Bayesian approach, the conditional mathematical expectation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154008.png" /> should be used as an approximate value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b0154009.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540010.png" />. In view of the [[Bayes formula|Bayes formula]], this expectation is given by the formula
+
<!--
 +
b0154001.png
 +
$#A+1 = 101 n = 0
 +
$#C+1 = 101 : ~/encyclopedia/old_files/data/B015/B.0105400 Bayesian approach, empirical
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540011.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
 +
A statistical interpretation of the [[Bayesian approach|Bayesian approach]] yielding conclusions on unobservable parameters even if their [[A priori distribution|a priori distribution]] is unknown. Let  $  (Y, X) $
 +
be a random vector for which the density  $  p(y \mid  x) $
 +
of the conditional distribution of  $  Y $
 +
for any given value of the random parameter  $  X = x $
 +
is known. If, as a result of some experiment, only the realization of  $  Y $
 +
is observed, while the corresponding realization of  $  X $
 +
is unknown, and if it is necessary to estimate the value of a given function  $  \phi (X) $
 +
of the non-observed realization, then, in accordance with the empirical Bayesian approach, the conditional mathematical expectation  $  {\mathsf E} \{ \phi (X) \mid  Y \} $
 +
should be used as an approximate value  $  \psi (Y) $
 +
for  $  \phi (X) $.  
 +
In view of the [[Bayes formula|Bayes formula]], this expectation is given by the formula
 +
 
 +
$$ \tag{1 }
 +
\psi (Y)  = \
 +
 
 +
\frac{\int\limits \phi (x) p (Y \mid  x) p (x)  d \mu (x) }{q (Y) }
 +
,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540012.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
$$ \tag{2 }
 +
q (y)  = \int\limits p (y \mid  x) p (x)  d \mu (x),
 +
$$
 +
 
 +
$  p(x) $
 +
is the density of the unconditional (a priori) distribution of  $  X $,
 +
$  \mu (x) $
 +
is the corresponding  $  \sigma $-
 +
finite measure; and the function  $  q(y) $
 +
represents the density of the unconditional distribution of  $  Y $.
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540013.png" /> is the density of the unconditional (a priori) distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540014.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540015.png" /> is the corresponding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540016.png" />-finite measure; and the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540017.png" /> represents the density of the unconditional distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540018.png" />.
+
If the a priori density  $  p(x) $
 +
is unknown, it is not possible to compute the values of  $  \psi $
 +
and  $  q $.
 +
However, if a sufficiently large number of realizations of the random variables  $  Y _ {1} \dots Y _ {k} $,
 +
which are drawn from the distribution with density  $  q(y) $,
 +
is known, it is possible to construct a consistent estimator  $  \widehat{q}  (y) $,  
 +
which depends only on  $  Y _ {1} \dots Y _ {k} $.  
 +
S.N. Bernshtein [[#References|[1]]] proposed to estimate the value of  $  \psi (Y) $
 +
by substituting  $  \widehat{q}  (y) $
 +
for  $  q(y) $
 +
in (2), finding the solution  $  \widehat{p}  (x) $
 +
of this integral equation, and then substituting  $  \widehat{p}  $
 +
and  $  \widehat{q}  $
 +
in the right-hand side of (1). However, this method is difficult, since solving this integral equation (2) is an ill-posed problem in numerical mathematics.
  
If the a priori density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540019.png" /> is unknown, it is not possible to compute the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540020.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540021.png" />. However, if a sufficiently large number of realizations of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540022.png" />, which are drawn from the distribution with density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540023.png" />, is known, it is possible to construct a consistent estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540024.png" />, which depends only on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540025.png" />. S.N. Bernshtein [[#References|[1]]] proposed to estimate the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540026.png" /> by substituting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540027.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540028.png" /> in (2), finding the solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540029.png" /> of this integral equation, and then substituting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540031.png" /> in the right-hand side of (1). However, this method is difficult, since solving this integral equation (2) is an ill-posed problem in numerical mathematics.
+
In certain special cases the statistical approach may be employed not only to estimate  $  q $,  
 +
but also  $  \psi $[[#References|[3]]]. This is possible if the identity
  
In certain special cases the statistical approach may be employed not only to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540032.png" />, but also <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540033.png" /> [[#References|[3]]]. This is possible if the identity
+
$$ \tag{3 }
 +
\phi (x)p(y \mid  x)  = \
 +
\lambda (y) r [z(y) \mid  x ] ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540034.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
involving  $  x $
 +
and  $  y $,
 +
is true. In (3),  $  \lambda (y) $
 +
and  $  z(y) $
 +
are functions which depend on  $  y $
 +
only, while  $  r(z \mid  x) $,
 +
being a function of  $  z $,
 +
is a probability density (i.e. may be regarded as the density of an arbitrary distribution of some random variable  $  Z $
 +
for a given value  $  X = x $).
 +
If (3) is true, the numerator of (1) is equal to the product  $  \lambda (Y)s[z(Y)] $,
 +
where  $  s(z) = \int r(z \mid  x)p(x)  d \mu (x) $
 +
is the density of the unconditional distribution of  $  Z $.
 +
Thus, if a sufficiently large number of realizations of independent random variables  $  Z _ {1} \dots Z _ {m} $
 +
with density distribution  $  s(z) $
 +
is available, then it is possible to construct a consistent estimator  $  \widehat{s}  (z) $
 +
for  $  s (z) $,
 +
and hence also to find a consistent estimator  $  \widehat \psi  (Y) $
 +
for  $  \psi (Y) $:
  
involving <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540035.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540036.png" />, is true. In (3), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540037.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540038.png" /> are functions which depend on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540039.png" /> only, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540040.png" />, being a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540041.png" />, is a probability density (i.e. may be regarded as the density of an arbitrary distribution of some random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540042.png" /> for a given value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540043.png" />). If (3) is true, the numerator of (1) is equal to the product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540044.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540045.png" /> is the density of the unconditional distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540046.png" />. Thus, if a sufficiently large number of realizations of independent random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540047.png" /> with density distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540048.png" /> is available, then it is possible to construct a consistent estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540049.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540050.png" />, and hence also to find a consistent estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540051.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540052.png" />:
+
$$ \tag{4 }
 +
\phi (X) \approx  \psi (Y) \approx \
 +
\widehat \psi  (Y) = \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540053.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
\frac{\lambda (Y) \widehat{s}  [z (Y)] }{\widehat{q}  (Y) }
 +
.
 +
$$
  
For instance, if one has to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540054.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540055.png" /> is a positive integer, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540056.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540057.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540058.png" />), then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540059.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540060.png" />. Since, here, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540061.png" />, one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540062.png" />. Accordingly, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540063.png" />, i.e. only the sequence of realizations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540064.png" /> is required to find <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540065.png" />. If, on the other hand, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540066.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540067.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540068.png" /> is a positive integer; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540069.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540070.png" />), then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540071.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540072.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540073.png" />. For this reason two sequences of empirical values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540074.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540075.png" /> are required in this case to construct <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540076.png" />.
+
For instance, if one has to estimate $  \psi (X) = X  ^ {h} $,  
 +
where $  h $
 +
is a positive integer, and $  p(y \mid  x) = x  ^ {y} e  ^ {-x} / y ! $(
 +
$  y = 0, 1 ,\dots $;
 +
$  x > 0 $),  
 +
then $  \phi (x) p (y \mid  x) = \lambda (y)p(y + h \mid  x) $,  
 +
where $  \lambda (y) = (y+h) ! /y ! $.  
 +
Since, here, $  r(z \mid  x) = p(z \mid  x) $,  
 +
one has $  s(z) = q(z) $.  
 +
Accordingly, $  \widehat \psi  (Y) = \lambda (Y) \widehat{q}  (Y+h)/ \widehat{q}  (Y) $,  
 +
i.e. only the sequence of realizations $  Y _ {1} , Y _ {2} \dots $
 +
is required to find $  \widehat \psi  $.  
 +
If, on the other hand, $  p(y \mid  x) = b (y \mid  x) = C _ {n}  ^ {y} x  ^ {y} (1 - x) ^ {n-y } $(
 +
$  y = 0 \dots n $;  
 +
$  n $
 +
is a positive integer; 0 \leq  x \leq  1 $,  
 +
$  C _ {n}  ^ {y} = ( {} _ {y}  ^ {n} ) $),  
 +
then $  \psi (x)p(y \mid  x) = \lambda (y)r(y+h \mid  x) $,  
 +
where $  \lambda (y) = C _ {n}  ^ {y} / C _ {n+h }  ^ {y+h } $
 +
and $  r(z \mid  x) = b _ {n+h }  (z \mid  x) \neq p(z \mid  x) $.  
 +
For this reason two sequences of empirical values $  Y _ {i} $
 +
and $  Z _ {j} $
 +
are required in this case to construct $  \widehat \psi  (Y) $.
  
This form of the empirical Bayesian approach is applicable to the very narrow class of densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540077.png" /> and functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540078.png" /> which satisfy condition (3); even if this condition is in fact met, the construction of the estimator (4) is subject to the observability of the random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540079.png" />, the distribution of which usually differs from that of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540080.png" /> which are observed directly. For practical purposes, it is preferable to use the empirical Bayesian approach in a modified form, in which these disadvantages are absent. In this modification the approximation which is constructed does not yield a consistent estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540081.png" /> (such an estimator may even be non-existent), but rather upper and lower estimators of this function, which are found by solving a problem of linear programming, as follows. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540082.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540083.png" /> be the constrained minimum and maximum of the linear functional (with respect to the unknown a priori density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540084.png" />) in the numerator of (1), calculated under the linear constraints <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540085.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540086.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540087.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540088.png" /> is the estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540089.png" /> mentioned above, constructed from the results of the observations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540090.png" />. One may conclude in such a case that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540091.png" />, where the probability of the truth of this conclusion tends to one (by virtue of the [[Law of large numbers|law of large numbers]]) as the number of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540092.png" />, used to construct the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540093.png" />, increases without limit. Other modifications of the empirical Bayesian approach are also possible — for example, by adding to the last-named condition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540094.png" /> a finite number of conditions of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540095.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540096.png" /> are preliminarily given numbers; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540097.png" /> is replaced by the corresponding confidence bounds for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540098.png" />, the conditions are obtained in the form of inequalities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b01540099.png" />, etc.
+
This form of the empirical Bayesian approach is applicable to the very narrow class of densities $  p(y \mid  x) $
 +
and functions $  \phi (x) $
 +
which satisfy condition (3); even if this condition is in fact met, the construction of the estimator (4) is subject to the observability of the random variables $  Z _ {j} $,  
 +
the distribution of which usually differs from that of the variables $  Y _ {i} $
 +
which are observed directly. For practical purposes, it is preferable to use the empirical Bayesian approach in a modified form, in which these disadvantages are absent. In this modification the approximation which is constructed does not yield a consistent estimator of $  \psi (Y) $(
 +
such an estimator may even be non-existent), but rather upper and lower estimators of this function, which are found by solving a problem of linear programming, as follows. Let $  \Psi _ {1} (Y) $
 +
and $  \Psi _ {2} (Y) $
 +
be the constrained minimum and maximum of the linear functional (with respect to the unknown a priori density $  p(x) $)  
 +
in the numerator of (1), calculated under the linear constraints $  p(x) \geq  0 $,  
 +
$  \int p(x)  d \mu (x) = 1 $
 +
and $  q(Y) \equiv \int p(Y \mid  x)p(x)  d \mu (x) = \widehat{q}  (Y) $,  
 +
where $  \widehat{q}  (Y) $
 +
is the estimator of $  q(Y) $
 +
mentioned above, constructed from the results of the observations $  Y _ {1} \dots Y _ {k} $.  
 +
One may conclude in such a case that $  \Psi _ {1} (Y)/ \widehat{q}  (Y) \leq  \psi (Y) \leq  \Psi _ {2} (Y)/ \widehat{q}  (Y) $,  
 +
where the probability of the truth of this conclusion tends to one (by virtue of the [[Law of large numbers|law of large numbers]]) as the number of random variables $  Y _ {i} $,  
 +
used to construct the estimator $  \widehat{q}  (Y) $,  
 +
increases without limit. Other modifications of the empirical Bayesian approach are also possible — for example, by adding to the last-named condition $  q(Y) = \widehat{q}  (Y) $
 +
a finite number of conditions of the form $  q( y _ {i} ) = \widehat{q}  (y _ {i} ) $,  
 +
where $  y _ {i} $
 +
are preliminarily given numbers; if $  \widehat{q}  $
 +
is replaced by the corresponding confidence bounds for $  q $,  
 +
the conditions are obtained in the form of inequalities $  q _ {1} (y _ {i} ) \leq  q(y _ {i} ) \leq  q _ {2} (y _ {i} ) $,  
 +
etc.
  
In certain cases, which are important in practice, satisfactory majorants, which can be computed without the use of the laborious method of linear programming, can be found for the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b015400100.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015400/b015400101.png" /> (see the example in the entry [[Sample method|Sample method]] which deals with statistical control).
+
In certain cases, which are important in practice, satisfactory majorants, which can be computed without the use of the laborious method of linear programming, can be found for the functions $  \Psi _ {1} $
 +
and $  \Psi _ {2} $(
 +
see the example in the entry [[Sample method|Sample method]] which deals with statistical control).
  
 
See the entry [[Discriminant analysis|Discriminant analysis]] for the applications of the empirical Bayesian approach to hypotheses testing concerning the values of random parameters.
 
See the entry [[Discriminant analysis|Discriminant analysis]] for the applications of the empirical Bayesian approach to hypotheses testing concerning the values of random parameters.

Latest revision as of 10:33, 29 May 2020


A statistical interpretation of the Bayesian approach yielding conclusions on unobservable parameters even if their a priori distribution is unknown. Let $ (Y, X) $ be a random vector for which the density $ p(y \mid x) $ of the conditional distribution of $ Y $ for any given value of the random parameter $ X = x $ is known. If, as a result of some experiment, only the realization of $ Y $ is observed, while the corresponding realization of $ X $ is unknown, and if it is necessary to estimate the value of a given function $ \phi (X) $ of the non-observed realization, then, in accordance with the empirical Bayesian approach, the conditional mathematical expectation $ {\mathsf E} \{ \phi (X) \mid Y \} $ should be used as an approximate value $ \psi (Y) $ for $ \phi (X) $. In view of the Bayes formula, this expectation is given by the formula

$$ \tag{1 } \psi (Y) = \ \frac{\int\limits \phi (x) p (Y \mid x) p (x) d \mu (x) }{q (Y) } , $$

where

$$ \tag{2 } q (y) = \int\limits p (y \mid x) p (x) d \mu (x), $$

$ p(x) $ is the density of the unconditional (a priori) distribution of $ X $, $ \mu (x) $ is the corresponding $ \sigma $- finite measure; and the function $ q(y) $ represents the density of the unconditional distribution of $ Y $.

If the a priori density $ p(x) $ is unknown, it is not possible to compute the values of $ \psi $ and $ q $. However, if a sufficiently large number of realizations of the random variables $ Y _ {1} \dots Y _ {k} $, which are drawn from the distribution with density $ q(y) $, is known, it is possible to construct a consistent estimator $ \widehat{q} (y) $, which depends only on $ Y _ {1} \dots Y _ {k} $. S.N. Bernshtein [1] proposed to estimate the value of $ \psi (Y) $ by substituting $ \widehat{q} (y) $ for $ q(y) $ in (2), finding the solution $ \widehat{p} (x) $ of this integral equation, and then substituting $ \widehat{p} $ and $ \widehat{q} $ in the right-hand side of (1). However, this method is difficult, since solving this integral equation (2) is an ill-posed problem in numerical mathematics.

In certain special cases the statistical approach may be employed not only to estimate $ q $, but also $ \psi $[3]. This is possible if the identity

$$ \tag{3 } \phi (x)p(y \mid x) = \ \lambda (y) r [z(y) \mid x ] , $$

involving $ x $ and $ y $, is true. In (3), $ \lambda (y) $ and $ z(y) $ are functions which depend on $ y $ only, while $ r(z \mid x) $, being a function of $ z $, is a probability density (i.e. may be regarded as the density of an arbitrary distribution of some random variable $ Z $ for a given value $ X = x $). If (3) is true, the numerator of (1) is equal to the product $ \lambda (Y)s[z(Y)] $, where $ s(z) = \int r(z \mid x)p(x) d \mu (x) $ is the density of the unconditional distribution of $ Z $. Thus, if a sufficiently large number of realizations of independent random variables $ Z _ {1} \dots Z _ {m} $ with density distribution $ s(z) $ is available, then it is possible to construct a consistent estimator $ \widehat{s} (z) $ for $ s (z) $, and hence also to find a consistent estimator $ \widehat \psi (Y) $ for $ \psi (Y) $:

$$ \tag{4 } \phi (X) \approx \psi (Y) \approx \ \widehat \psi (Y) = \ \frac{\lambda (Y) \widehat{s} [z (Y)] }{\widehat{q} (Y) } . $$

For instance, if one has to estimate $ \psi (X) = X ^ {h} $, where $ h $ is a positive integer, and $ p(y \mid x) = x ^ {y} e ^ {-x} / y ! $( $ y = 0, 1 ,\dots $; $ x > 0 $), then $ \phi (x) p (y \mid x) = \lambda (y)p(y + h \mid x) $, where $ \lambda (y) = (y+h) ! /y ! $. Since, here, $ r(z \mid x) = p(z \mid x) $, one has $ s(z) = q(z) $. Accordingly, $ \widehat \psi (Y) = \lambda (Y) \widehat{q} (Y+h)/ \widehat{q} (Y) $, i.e. only the sequence of realizations $ Y _ {1} , Y _ {2} \dots $ is required to find $ \widehat \psi $. If, on the other hand, $ p(y \mid x) = b (y \mid x) = C _ {n} ^ {y} x ^ {y} (1 - x) ^ {n-y } $( $ y = 0 \dots n $; $ n $ is a positive integer; $ 0 \leq x \leq 1 $, $ C _ {n} ^ {y} = ( {} _ {y} ^ {n} ) $), then $ \psi (x)p(y \mid x) = \lambda (y)r(y+h \mid x) $, where $ \lambda (y) = C _ {n} ^ {y} / C _ {n+h } ^ {y+h } $ and $ r(z \mid x) = b _ {n+h } (z \mid x) \neq p(z \mid x) $. For this reason two sequences of empirical values $ Y _ {i} $ and $ Z _ {j} $ are required in this case to construct $ \widehat \psi (Y) $.

This form of the empirical Bayesian approach is applicable to the very narrow class of densities $ p(y \mid x) $ and functions $ \phi (x) $ which satisfy condition (3); even if this condition is in fact met, the construction of the estimator (4) is subject to the observability of the random variables $ Z _ {j} $, the distribution of which usually differs from that of the variables $ Y _ {i} $ which are observed directly. For practical purposes, it is preferable to use the empirical Bayesian approach in a modified form, in which these disadvantages are absent. In this modification the approximation which is constructed does not yield a consistent estimator of $ \psi (Y) $( such an estimator may even be non-existent), but rather upper and lower estimators of this function, which are found by solving a problem of linear programming, as follows. Let $ \Psi _ {1} (Y) $ and $ \Psi _ {2} (Y) $ be the constrained minimum and maximum of the linear functional (with respect to the unknown a priori density $ p(x) $) in the numerator of (1), calculated under the linear constraints $ p(x) \geq 0 $, $ \int p(x) d \mu (x) = 1 $ and $ q(Y) \equiv \int p(Y \mid x)p(x) d \mu (x) = \widehat{q} (Y) $, where $ \widehat{q} (Y) $ is the estimator of $ q(Y) $ mentioned above, constructed from the results of the observations $ Y _ {1} \dots Y _ {k} $. One may conclude in such a case that $ \Psi _ {1} (Y)/ \widehat{q} (Y) \leq \psi (Y) \leq \Psi _ {2} (Y)/ \widehat{q} (Y) $, where the probability of the truth of this conclusion tends to one (by virtue of the law of large numbers) as the number of random variables $ Y _ {i} $, used to construct the estimator $ \widehat{q} (Y) $, increases without limit. Other modifications of the empirical Bayesian approach are also possible — for example, by adding to the last-named condition $ q(Y) = \widehat{q} (Y) $ a finite number of conditions of the form $ q( y _ {i} ) = \widehat{q} (y _ {i} ) $, where $ y _ {i} $ are preliminarily given numbers; if $ \widehat{q} $ is replaced by the corresponding confidence bounds for $ q $, the conditions are obtained in the form of inequalities $ q _ {1} (y _ {i} ) \leq q(y _ {i} ) \leq q _ {2} (y _ {i} ) $, etc.

In certain cases, which are important in practice, satisfactory majorants, which can be computed without the use of the laborious method of linear programming, can be found for the functions $ \Psi _ {1} $ and $ \Psi _ {2} $( see the example in the entry Sample method which deals with statistical control).

See the entry Discriminant analysis for the applications of the empirical Bayesian approach to hypotheses testing concerning the values of random parameters.

References

[1] S.N. Bernshtein, "On "fiducial" probabilities of Fisher" Izv. Akad. Nauk SSSR Ser. Mat. , 5 (1941) pp. 85–94 (In Russian) (English abstract)
[2] L.N. Bol'shev, "Applications of the empirical Bayes approach" , Proc. Internat. Congress Mathematicians (Nice, 1970) , 3 , Gauthier-Villars (1971) pp. 241–247
[3] H. Robbins, "An empirical Bayes approach to statistics" , Proc. Berkeley Symp. Math. Statist. Probab. , 1 , Berkeley-Los Angeles (1956) pp. 157–163
How to Cite This Entry:
Bayesian approach, empirical. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach,_empirical&oldid=45998
This article was adapted from an original article by L.N. Bol'shev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article