Namespaces
Variants
Actions

Difference between revisions of "Bayesian approach"

From Encyclopedia of Mathematics
Jump to: navigation, search
m
m (tex done)
 
Line 1: Line 1:
{{TEX|want}}
+
{{TEX|done}}
  
 
''to statistical problems''
 
''to statistical problems''
  
An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153901.png" /> of (potential) samples <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153902.png" />, by a space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153903.png" /> of values of the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153904.png" />, by a family of probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153905.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153906.png" />, by a space of decisions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153907.png" /> and by a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153908.png" />, which characterizes the losses caused by accepting the decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b0153909.png" /> when the true value of the parameter is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539010.png" />. The objective of decision making is to find in a certain sense an optimal rule (decision function) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539011.png" />, assigning to each result of an observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539012.png" /> the decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539013.png" />. In the Bayesian approach, when it is assumed that the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539014.png" /> is a random variable with a given (a priori) distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539015.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539016.png" /> the best decision function ([[Bayesian decision function|Bayesian decision function]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539017.png" /> is defined as the function for which the minimum expected loss <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539018.png" />, where
+
An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $  (X,\  {\mathcal B} _ {X} ) $
 +
of (potential) samples $  x $,  
 +
by a space $  ( \Theta ,\  {\mathcal B} _  \Theta  ) $
 +
of values of the unknown parameter $  \theta $,  
 +
by a family of probability distributions $  \{ { {\mathsf P} _  \theta  } : {\theta \in \Theta } \} $
 +
on $  (X,\  {\mathcal B} _ {X} ) $,  
 +
by a space of decisions $  (D,\  {\mathcal B} _ {D} ) $
 +
and by a function $  L( \theta ,\  d) $,  
 +
which characterizes the losses caused by accepting the decision $  d $
 +
when the true value of the parameter is $  \theta $.  
 +
The objective of decision making is to find in a certain sense an optimal rule (decision function) $  \delta = \delta (x) $,  
 +
assigning to each result of an observation $  x \in X $
 +
the decision $  \delta (x) \in D $.  
 +
In the Bayesian approach, when it is assumed that the unknown parameter $  \theta $
 +
is a random variable with a given (a priori) distribution $  \pi = \pi (d \theta ) $
 +
on $  ( \Theta ,\  {\mathcal B} _  \Theta  ) $
 +
the best decision function ([[Bayesian decision function|Bayesian decision function]]) $  {\delta  ^ {*} } = {\delta  ^ {*} } (x) $
 +
is defined as the function for which the minimum expected loss $  \inf _  \delta  \  \rho ( \pi ,\  \delta ) $,  
 +
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539019.png" /></td> </tr></table>
+
$$
 +
\rho ( \pi ,\  \delta ) \  = \
 +
\int\limits _  \Theta  \rho ( \theta ,\  \delta ) \
 +
\pi (d \theta ) ,
 +
$$
  
 
and
 
and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539020.png" /></td> </tr></table>
+
$$
 +
\rho ( \theta ,\  \delta ) \  = \
 +
\int\limits _ { X } L ( \theta ,\  \delta (x)) \
 +
{\mathsf P} _  \theta  (dx)
 +
$$
  
 
is attained. Thus,
 
is attained. Thus,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539021.png" /></td> </tr></table>
+
$$
 +
\rho ( \pi ,\  \delta  ^ {*} ) \  = \
 +
\inf _  \delta  \
 +
\int\limits _  \Theta
 +
\int\limits _ { X }
 +
L ( \theta ,\  \delta (x)) \
 +
{\mathsf P} _  \theta  (dx) \
 +
\pi ( d \theta ) .
 +
$$
  
In searching for the Bayesian decision function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539022.png" />, the following remark is useful. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539023.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539024.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539025.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539026.png" /> are certain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539027.png" />-finite measures. One then finds, assuming that the order of integration may be changed,
+
In searching for the Bayesian decision function $  \delta  ^ {*} = \delta  ^ {*} (x) $,  
 +
the following remark is useful. Let $  {\mathsf P} _  \theta  (dx) = p (x \mid  \theta ) \  d \mu (x) $,  
 +
$  \pi (d \theta ) = \pi ( \theta ) \  d \nu ( \theta ) $,  
 +
where $  \mu $
 +
and $  \nu $
 +
are certain $  \sigma $-
 +
finite measures. One then finds, assuming that the order of integration may be changed,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539028.png" /></td> </tr></table>
+
$$
 +
\int\limits _  \Theta  \int\limits _ { X }
 +
L ( \theta ,\  \delta (x)) \
 +
{\mathsf P} _  \theta  (dx ) \  \pi ( d \theta )\  =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539029.png" /></td> </tr></table>
+
$$
 +
= \
 +
\int\limits _  \Theta  \int\limits _ { X } L ( \theta ,\  \delta (x)) p ( x
 +
\mid  \theta ) \pi ( \theta ) \  d \mu (x) \  d \nu ( \theta )\  =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539030.png" /></td> </tr></table>
+
$$
 +
= \
 +
\int\limits _ { X } \  d \mu (x) \left [ \int\limits _  \Theta  L ( \theta ,\
 +
\delta (x)) p (x \mid  \theta ) \pi ( \theta ) \  d \nu ( \theta ) \right ] .
 +
$$
  
It is seen from the above that for a given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539031.png" /> is that value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539032.png" /> for which
+
It is seen from the above that for a given $  x \in X ,\  \delta  ^ {*} (x) $
 +
is that value of $  d  ^ {*} $
 +
for which
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539033.png" /></td> </tr></table>
+
$$
 +
\inf _ { d } \
 +
\int\limits _  \Theta
 +
L ( \theta ,\  d)
 +
p (x \mid  \theta ) \pi
 +
( \theta ) \  d \nu ( \theta )
 +
$$
  
 
is attained, or, what is equivalent, for which
 
is attained, or, what is equivalent, for which
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539034.png" /></td> </tr></table>
+
$$
 +
\inf _ { d } \
 +
\int\limits _  \Theta  L ( \theta ,\  d)
 +
 
 +
\frac{p (x \mid  \theta ) \pi ( \theta ) }{p (x) }
 +
\
 +
d \nu ( \theta ) ,
 +
$$
  
 
is attained, where
 
is attained, where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539035.png" /></td> </tr></table>
+
$$
 +
p (x) \  = \
 +
\int\limits _  \Theta
 +
p (x \mid  \theta ) \pi ( \theta )
 +
\  d \nu ( \theta ) .
 +
$$
  
 
But, according to the [[Bayes formula|Bayes formula]]
 
But, according to the [[Bayes formula|Bayes formula]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539036.png" /></td> </tr></table>
+
$$
 +
\int\limits _  \Theta  L( \theta ,\  d)
 +
 
 +
\frac{p (x \mid  \theta ) \pi ( \theta ) }{p (x) }
 +
\
 +
d \nu ( \theta ) \  = \
 +
{\mathsf E} [L ( \theta ,\  d) \mid  x].
 +
$$
 +
 
 +
Thus, for a given  $  x $,
 +
$  \delta  ^ {*} (x) $
 +
is that value of  $  d  ^ {*} $
 +
for which the conditional average loss  $  {\mathsf E} [L ( \theta ,\  d) \mid  x] $
 +
attains a minimum.
 +
 
 +
Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let  $  \Theta = \{ \theta _ {1} ,\  \theta _ {2} \} $,
 +
$  D = \{ d _ {1} ,\  d _ {2} \} $,
 +
$  L _ {ij }  = L = ( \theta _ {i} ,\  d _ {j} ) $,
 +
$  i,\  j = 1,\  2 $;  
 +
$  \pi ( \theta _ {1} ) = \pi _ {1} $,
 +
$  \pi ( \theta _ {2} ) = \pi _ {2} $,
 +
$  \pi _ {1} + \pi _ {2} = 1 $.
 +
If the solution  $  d _ {i} $
 +
is identified with the acceptance of the hypothesis  $  H _ {i} $:  
 +
$  \theta = \theta _ {i} $,
 +
it is natural to assume that  $  L _ {11} < L _ {12} $,
 +
$  L _ {22} < L _ {21} $.
 +
Then
 +
 
 +
$$
 +
\rho ( \pi ,\  \delta ) \  = \
 +
\int\limits _ { X }
 +
[ \pi _ {1} p (x \mid  \theta _ {1} )
 +
L ( \theta _ {1} ,\  \delta ( x)) +
 +
$$
  
Thus, for a given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539037.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539038.png" /> is that value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539039.png" /> for which the conditional average loss <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539040.png" /> attains a minimum.
+
$$
 +
+
 +
{} \pi _ {2} p (x \mid  \theta _ {2} ) L ( \theta _ {2} ,\  \delta (x))] \  d \mu (x)
 +
$$
  
Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539041.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539042.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539043.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539044.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539045.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539046.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539047.png" />. If the solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539048.png" /> is identified with the acceptance of the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539049.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539050.png" />, it is natural to assume that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539051.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539052.png" />. Then
+
implies that  $  \inf _  \delta  \  \rho ( \pi ,\  \delta ) $
 +
is attained for the function
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539053.png" /></td> </tr></table>
+
$$
 +
\delta  ^ {*} (x) \  = \
 +
\left \{
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539054.png" /></td> </tr></table>
+
\begin{array}{l}
 +
d _ {1} ,\ \  \textrm{ if } \
  
implies that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539055.png" /> is attained for the function
+
\frac{p (x \mid  \theta _ {2} ) }{p (x \mid  \theta _ {1} ) }
 +
\  \leq  \ 
 +
\frac{\pi _ {1} }{\pi _ {2} }
 +
 +
\frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} }
 +
,  \\
 +
d _ {2} ,\ \  \textrm{ if } \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539056.png" /></td> </tr></table>
+
\frac{p (x \mid  \theta _ {2} ) }{p (x \mid  \theta _ {1} ) }
 +
\  \geq  \ 
 +
\frac{\pi _ {1} }{\pi _ {2} }
 +
 +
\frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} }
 +
. \\
 +
\end{array}
 +
\right .
 +
$$
  
The advantage of the Bayesian approach consists in the fact that, unlike the losses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539057.png" />, the expected losses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539058.png" /> are numbers which are dependent on the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539059.png" />, and, consequently, it is known that solutions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539060.png" /> for which
+
The advantage of the Bayesian approach consists in the fact that, unlike the losses $  \rho ( \theta ,\  \delta ) $,
 +
the expected losses $  \rho ( \pi ,\  \delta ) $
 +
are numbers which are dependent on the unknown parameter $  \theta $,  
 +
and, consequently, it is known that solutions $  \delta _  \epsilon  ^ {*} $
 +
for which
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539061.png" /></td> </tr></table>
+
$$
 +
\rho ( \pi ,\  \delta _  \epsilon  ^ {*} ) \  \leq  \
 +
\inf _  \delta  \
 +
\rho ( \pi ,\  \delta ) + \epsilon ,
 +
$$
  
and which are, if not optimal, at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539062.png" />-optimal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539063.png" />, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. [[Bayesian approach, empirical|Bayesian approach, empirical]]).
+
and which are, if not optimal, at least $  \epsilon $-
 +
optimal $  ( \epsilon > 0) $,  
 +
are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. [[Bayesian approach, empirical|Bayesian approach, empirical]]).
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A. Wald,  "Statistical decision functions" , Wiley  (1950)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.H. de Groot,  "Optimal statistical decisions" , McGraw-Hill  (1970)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A. Wald,  "Statistical decision functions" , Wiley  (1950)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  M.H. de Groot,  "Optimal statistical decisions" , McGraw-Hill  (1970)</TD></TR></table>

Latest revision as of 11:31, 10 February 2020


to statistical problems

An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $ (X,\ {\mathcal B} _ {X} ) $ of (potential) samples $ x $, by a space $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ of values of the unknown parameter $ \theta $, by a family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ on $ (X,\ {\mathcal B} _ {X} ) $, by a space of decisions $ (D,\ {\mathcal B} _ {D} ) $ and by a function $ L( \theta ,\ d) $, which characterizes the losses caused by accepting the decision $ d $ when the true value of the parameter is $ \theta $. The objective of decision making is to find in a certain sense an optimal rule (decision function) $ \delta = \delta (x) $, assigning to each result of an observation $ x \in X $ the decision $ \delta (x) \in D $. In the Bayesian approach, when it is assumed that the unknown parameter $ \theta $ is a random variable with a given (a priori) distribution $ \pi = \pi (d \theta ) $ on $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ the best decision function (Bayesian decision function) $ {\delta ^ {*} } = {\delta ^ {*} } (x) $ is defined as the function for which the minimum expected loss $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $, where

$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ \pi (d \theta ) , $$

and

$$ \rho ( \theta ,\ \delta ) \ = \ \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) $$

is attained. Thus,

$$ \rho ( \pi ,\ \delta ^ {*} ) \ = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) \ \pi ( d \theta ) . $$

In searching for the Bayesian decision function $ \delta ^ {*} = \delta ^ {*} (x) $, the following remark is useful. Let $ {\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x) $, $ \pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta ) $, where $ \mu $ and $ \nu $ are certain $ \sigma $- finite measures. One then finds, assuming that the order of integration may be changed,

$$ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ = $$

$$ = \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ = $$

$$ = \ \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] . $$

It is seen from the above that for a given $ x \in X ,\ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which

$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) $$

is attained, or, what is equivalent, for which

$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) , $$

is attained, where

$$ p (x) \ = \ \int\limits _ \Theta p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) . $$

But, according to the Bayes formula

$$ \int\limits _ \Theta L( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) \ = \ {\mathsf E} [L ( \theta ,\ d) \mid x]. $$

Thus, for a given $ x $, $ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which the conditional average loss $ {\mathsf E} [L ( \theta ,\ d) \mid x] $ attains a minimum.

Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $ \Theta = \{ \theta _ {1} ,\ \theta _ {2} \} $, $ D = \{ d _ {1} ,\ d _ {2} \} $, $ L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} ) $, $ i,\ j = 1,\ 2 $; $ \pi ( \theta _ {1} ) = \pi _ {1} $, $ \pi ( \theta _ {2} ) = \pi _ {2} $, $ \pi _ {1} + \pi _ {2} = 1 $. If the solution $ d _ {i} $ is identified with the acceptance of the hypothesis $ H _ {i} $: $ \theta = \theta _ {i} $, it is natural to assume that $ L _ {11} < L _ {12} $, $ L _ {22} < L _ {21} $. Then

$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ { X } [ \pi _ {1} p (x \mid \theta _ {1} ) L ( \theta _ {1} ,\ \delta ( x)) + $$

$$ + {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x) $$

implies that $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $ is attained for the function

$$ \delta ^ {*} (x) \ = \ \left \{ \begin{array}{l} d _ {1} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \leq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } , \\ d _ {2} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \geq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } . \\ \end{array} \right . $$

The advantage of the Bayesian approach consists in the fact that, unlike the losses $ \rho ( \theta ,\ \delta ) $, the expected losses $ \rho ( \pi ,\ \delta ) $ are numbers which are dependent on the unknown parameter $ \theta $, and, consequently, it is known that solutions $ \delta _ \epsilon ^ {*} $ for which

$$ \rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ \inf _ \delta \ \rho ( \pi ,\ \delta ) + \epsilon , $$

and which are, if not optimal, at least $ \epsilon $- optimal $ ( \epsilon > 0) $, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).

References

[1] A. Wald, "Statistical decision functions" , Wiley (1950)
[2] M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970)
How to Cite This Entry:
Bayesian approach. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach&oldid=44399
This article was adapted from an original article by A.N. Shiryaev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article