Bayesian approach
to statistical problems
An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $ (X,\ {\mathcal B} _ {X} ) $ of (potential) samples $ x $, by a space $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ of values of the unknown parameter $ \theta $, by a family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ on $ (X,\ {\mathcal B} _ {X} ) $, by a space of decisions $ (D,\ {\mathcal B} _ {D} ) $ and by a function $ L( \theta ,\ d) $, which characterizes the losses caused by accepting the decision $ d $ when the true value of the parameter is $ \theta $. The objective of decision making is to find in a certain sense an optimal rule (decision function) $ \delta = \delta (x) $, assigning to each result of an observation $ x \in X $ the decision $ \delta (x) \in D $. In the Bayesian approach, when it is assumed that the unknown parameter $ \theta $ is a random variable with a given (a priori) distribution $ \pi = \pi (d \theta ) $ on $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ the best decision function (Bayesian decision function) $ {\delta ^ {*} } = {\delta ^ {*} } (x) $ is defined as the function for which the minimum expected loss $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $, where
$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ \pi (d \theta ) , $$
and
$$ \rho ( \theta ,\ \delta ) \ = \ \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) $$
is attained. Thus,
$$ \rho ( \pi ,\ \delta ^ {*} ) \ = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) \ \pi ( d \theta ) . $$
In searching for the Bayesian decision function $ \delta ^ {*} = \delta ^ {*} (x) $, the following remark is useful. Let $ {\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x) $, $ \pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta ) $, where $ \mu $ and $ \nu $ are certain $ \sigma $- finite measures. One then finds, assuming that the order of integration may be changed,
$$ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ = $$
$$ = \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ = $$
$$ = \ \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] . $$
It is seen from the above that for a given $ x \in X ,\ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which
$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) $$
is attained, or, what is equivalent, for which
$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) , $$
is attained, where
$$ p (x) \ = \ \int\limits _ \Theta p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) . $$
But, according to the Bayes formula
$$ \int\limits _ \Theta L( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) \ = \ {\mathsf E} [L ( \theta ,\ d) \mid x]. $$
Thus, for a given $ x $, $ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which the conditional average loss $ {\mathsf E} [L ( \theta ,\ d) \mid x] $ attains a minimum.
Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $ \Theta = \{ \theta _ {1} ,\ \theta _ {2} \} $, $ D = \{ d _ {1} ,\ d _ {2} \} $, $ L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} ) $, $ i,\ j = 1,\ 2 $; $ \pi ( \theta _ {1} ) = \pi _ {1} $, $ \pi ( \theta _ {2} ) = \pi _ {2} $, $ \pi _ {1} + \pi _ {2} = 1 $. If the solution $ d _ {i} $ is identified with the acceptance of the hypothesis $ H _ {i} $: $ \theta = \theta _ {i} $, it is natural to assume that $ L _ {11} < L _ {12} $, $ L _ {22} < L _ {21} $. Then
$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ { X } [ \pi _ {1} p (x \mid \theta _ {1} ) L ( \theta _ {1} ,\ \delta ( x)) + $$
$$ + {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x) $$
implies that $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $ is attained for the function
$$ \delta ^ {*} (x) \ = \ \left \{ \begin{array}{l} d _ {1} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \leq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } , \\ d _ {2} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \geq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } . \\ \end{array} \right . $$
The advantage of the Bayesian approach consists in the fact that, unlike the losses $ \rho ( \theta ,\ \delta ) $, the expected losses $ \rho ( \pi ,\ \delta ) $ are numbers which are dependent on the unknown parameter $ \theta $, and, consequently, it is known that solutions $ \delta _ \epsilon ^ {*} $ for which
$$ \rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ \inf _ \delta \ \rho ( \pi ,\ \delta ) + \epsilon , $$
and which are, if not optimal, at least $ \epsilon $- optimal $ ( \epsilon > 0) $, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).
References
[1] | A. Wald, "Statistical decision functions" , Wiley (1950) |
[2] | M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970) |
Bayesian approach. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach&oldid=29437