Difference between revisions of "Bayesian approach"
m |
Ulf Rehmann (talk | contribs) m (tex done) |
||
Line 1: | Line 1: | ||
− | {{TEX| | + | {{TEX|done}} |
''to statistical problems'' | ''to statistical problems'' | ||
− | An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space | + | An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $ (X,\ {\mathcal B} _ {X} ) $ |
+ | of (potential) samples $ x $, | ||
+ | by a space $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ | ||
+ | of values of the unknown parameter $ \theta $, | ||
+ | by a family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ | ||
+ | on $ (X,\ {\mathcal B} _ {X} ) $, | ||
+ | by a space of decisions $ (D,\ {\mathcal B} _ {D} ) $ | ||
+ | and by a function $ L( \theta ,\ d) $, | ||
+ | which characterizes the losses caused by accepting the decision $ d $ | ||
+ | when the true value of the parameter is $ \theta $. | ||
+ | The objective of decision making is to find in a certain sense an optimal rule (decision function) $ \delta = \delta (x) $, | ||
+ | assigning to each result of an observation $ x \in X $ | ||
+ | the decision $ \delta (x) \in D $. | ||
+ | In the Bayesian approach, when it is assumed that the unknown parameter $ \theta $ | ||
+ | is a random variable with a given (a priori) distribution $ \pi = \pi (d \theta ) $ | ||
+ | on $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ | ||
+ | the best decision function ([[Bayesian decision function|Bayesian decision function]]) $ {\delta ^ {*} } = {\delta ^ {*} } (x) $ | ||
+ | is defined as the function for which the minimum expected loss $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $, | ||
+ | where | ||
− | + | $$ | |
+ | \rho ( \pi ,\ \delta ) \ = \ | ||
+ | \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ | ||
+ | \pi (d \theta ) , | ||
+ | $$ | ||
and | and | ||
− | + | $$ | |
+ | \rho ( \theta ,\ \delta ) \ = \ | ||
+ | \int\limits _ { X } L ( \theta ,\ \delta (x)) \ | ||
+ | {\mathsf P} _ \theta (dx) | ||
+ | $$ | ||
is attained. Thus, | is attained. Thus, | ||
− | + | $$ | |
+ | \rho ( \pi ,\ \delta ^ {*} ) \ = \ | ||
+ | \inf _ \delta \ | ||
+ | \int\limits _ \Theta | ||
+ | \int\limits _ { X } | ||
+ | L ( \theta ,\ \delta (x)) \ | ||
+ | {\mathsf P} _ \theta (dx) \ | ||
+ | \pi ( d \theta ) . | ||
+ | $$ | ||
− | In searching for the Bayesian decision function | + | In searching for the Bayesian decision function $ \delta ^ {*} = \delta ^ {*} (x) $, |
+ | the following remark is useful. Let $ {\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x) $, | ||
+ | $ \pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta ) $, | ||
+ | where $ \mu $ | ||
+ | and $ \nu $ | ||
+ | are certain $ \sigma $- | ||
+ | finite measures. One then finds, assuming that the order of integration may be changed, | ||
− | + | $$ | |
+ | \int\limits _ \Theta \int\limits _ { X } | ||
+ | L ( \theta ,\ \delta (x)) \ | ||
+ | {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x | ||
+ | \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ | ||
+ | \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] . | ||
+ | $$ | ||
− | It is seen from the above that for a given | + | It is seen from the above that for a given $ x \in X ,\ \delta ^ {*} (x) $ |
+ | is that value of $ d ^ {*} $ | ||
+ | for which | ||
− | + | $$ | |
+ | \inf _ { d } \ | ||
+ | \int\limits _ \Theta | ||
+ | L ( \theta ,\ d) | ||
+ | p (x \mid \theta ) \pi | ||
+ | ( \theta ) \ d \nu ( \theta ) | ||
+ | $$ | ||
is attained, or, what is equivalent, for which | is attained, or, what is equivalent, for which | ||
− | + | $$ | |
+ | \inf _ { d } \ | ||
+ | \int\limits _ \Theta L ( \theta ,\ d) | ||
+ | |||
+ | \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } | ||
+ | \ | ||
+ | d \nu ( \theta ) , | ||
+ | $$ | ||
is attained, where | is attained, where | ||
− | + | $$ | |
+ | p (x) \ = \ | ||
+ | \int\limits _ \Theta | ||
+ | p (x \mid \theta ) \pi ( \theta ) | ||
+ | \ d \nu ( \theta ) . | ||
+ | $$ | ||
But, according to the [[Bayes formula|Bayes formula]] | But, according to the [[Bayes formula|Bayes formula]] | ||
− | + | $$ | |
+ | \int\limits _ \Theta L( \theta ,\ d) | ||
+ | |||
+ | \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } | ||
+ | \ | ||
+ | d \nu ( \theta ) \ = \ | ||
+ | {\mathsf E} [L ( \theta ,\ d) \mid x]. | ||
+ | $$ | ||
+ | |||
+ | Thus, for a given $ x $, | ||
+ | $ \delta ^ {*} (x) $ | ||
+ | is that value of $ d ^ {*} $ | ||
+ | for which the conditional average loss $ {\mathsf E} [L ( \theta ,\ d) \mid x] $ | ||
+ | attains a minimum. | ||
+ | |||
+ | Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $ \Theta = \{ \theta _ {1} ,\ \theta _ {2} \} $, | ||
+ | $ D = \{ d _ {1} ,\ d _ {2} \} $, | ||
+ | $ L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} ) $, | ||
+ | $ i,\ j = 1,\ 2 $; | ||
+ | $ \pi ( \theta _ {1} ) = \pi _ {1} $, | ||
+ | $ \pi ( \theta _ {2} ) = \pi _ {2} $, | ||
+ | $ \pi _ {1} + \pi _ {2} = 1 $. | ||
+ | If the solution $ d _ {i} $ | ||
+ | is identified with the acceptance of the hypothesis $ H _ {i} $: | ||
+ | $ \theta = \theta _ {i} $, | ||
+ | it is natural to assume that $ L _ {11} < L _ {12} $, | ||
+ | $ L _ {22} < L _ {21} $. | ||
+ | Then | ||
+ | |||
+ | $$ | ||
+ | \rho ( \pi ,\ \delta ) \ = \ | ||
+ | \int\limits _ { X } | ||
+ | [ \pi _ {1} p (x \mid \theta _ {1} ) | ||
+ | L ( \theta _ {1} ,\ \delta ( x)) + | ||
+ | $$ | ||
− | + | $$ | |
+ | + | ||
+ | {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x) | ||
+ | $$ | ||
− | + | implies that $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $ | |
+ | is attained for the function | ||
− | + | $$ | |
+ | \delta ^ {*} (x) \ = \ | ||
+ | \left \{ | ||
− | + | \begin{array}{l} | |
+ | d _ {1} ,\ \ \textrm{ if } \ | ||
− | + | \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } | |
+ | \ \leq \ | ||
+ | \frac{\pi _ {1} }{\pi _ {2} } | ||
+ | \ | ||
+ | \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } | ||
+ | , \\ | ||
+ | d _ {2} ,\ \ \textrm{ if } \ | ||
− | + | \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } | |
+ | \ \geq \ | ||
+ | \frac{\pi _ {1} }{\pi _ {2} } | ||
+ | \ | ||
+ | \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } | ||
+ | . \\ | ||
+ | \end{array} | ||
+ | \right . | ||
+ | $$ | ||
− | The advantage of the Bayesian approach consists in the fact that, unlike the losses | + | The advantage of the Bayesian approach consists in the fact that, unlike the losses $ \rho ( \theta ,\ \delta ) $, |
+ | the expected losses $ \rho ( \pi ,\ \delta ) $ | ||
+ | are numbers which are dependent on the unknown parameter $ \theta $, | ||
+ | and, consequently, it is known that solutions $ \delta _ \epsilon ^ {*} $ | ||
+ | for which | ||
− | + | $$ | |
+ | \rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ | ||
+ | \inf _ \delta \ | ||
+ | \rho ( \pi ,\ \delta ) + \epsilon , | ||
+ | $$ | ||
− | and which are, if not optimal, at least | + | and which are, if not optimal, at least $ \epsilon $- |
+ | optimal $ ( \epsilon > 0) $, | ||
+ | are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. [[Bayesian approach, empirical|Bayesian approach, empirical]]). | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> A. Wald, "Statistical decision functions" , Wiley (1950)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> A. Wald, "Statistical decision functions" , Wiley (1950)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"> M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970)</TD></TR></table> |
Latest revision as of 11:31, 10 February 2020
to statistical problems
An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $ (X,\ {\mathcal B} _ {X} ) $ of (potential) samples $ x $, by a space $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ of values of the unknown parameter $ \theta $, by a family of probability distributions $ \{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \} $ on $ (X,\ {\mathcal B} _ {X} ) $, by a space of decisions $ (D,\ {\mathcal B} _ {D} ) $ and by a function $ L( \theta ,\ d) $, which characterizes the losses caused by accepting the decision $ d $ when the true value of the parameter is $ \theta $. The objective of decision making is to find in a certain sense an optimal rule (decision function) $ \delta = \delta (x) $, assigning to each result of an observation $ x \in X $ the decision $ \delta (x) \in D $. In the Bayesian approach, when it is assumed that the unknown parameter $ \theta $ is a random variable with a given (a priori) distribution $ \pi = \pi (d \theta ) $ on $ ( \Theta ,\ {\mathcal B} _ \Theta ) $ the best decision function (Bayesian decision function) $ {\delta ^ {*} } = {\delta ^ {*} } (x) $ is defined as the function for which the minimum expected loss $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $, where
$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ \pi (d \theta ) , $$
and
$$ \rho ( \theta ,\ \delta ) \ = \ \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) $$
is attained. Thus,
$$ \rho ( \pi ,\ \delta ^ {*} ) \ = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) \ \pi ( d \theta ) . $$
In searching for the Bayesian decision function $ \delta ^ {*} = \delta ^ {*} (x) $, the following remark is useful. Let $ {\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x) $, $ \pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta ) $, where $ \mu $ and $ \nu $ are certain $ \sigma $- finite measures. One then finds, assuming that the order of integration may be changed,
$$ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ = $$
$$ = \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ = $$
$$ = \ \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] . $$
It is seen from the above that for a given $ x \in X ,\ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which
$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) $$
is attained, or, what is equivalent, for which
$$ \inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) , $$
is attained, where
$$ p (x) \ = \ \int\limits _ \Theta p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) . $$
But, according to the Bayes formula
$$ \int\limits _ \Theta L( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) \ = \ {\mathsf E} [L ( \theta ,\ d) \mid x]. $$
Thus, for a given $ x $, $ \delta ^ {*} (x) $ is that value of $ d ^ {*} $ for which the conditional average loss $ {\mathsf E} [L ( \theta ,\ d) \mid x] $ attains a minimum.
Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $ \Theta = \{ \theta _ {1} ,\ \theta _ {2} \} $, $ D = \{ d _ {1} ,\ d _ {2} \} $, $ L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} ) $, $ i,\ j = 1,\ 2 $; $ \pi ( \theta _ {1} ) = \pi _ {1} $, $ \pi ( \theta _ {2} ) = \pi _ {2} $, $ \pi _ {1} + \pi _ {2} = 1 $. If the solution $ d _ {i} $ is identified with the acceptance of the hypothesis $ H _ {i} $: $ \theta = \theta _ {i} $, it is natural to assume that $ L _ {11} < L _ {12} $, $ L _ {22} < L _ {21} $. Then
$$ \rho ( \pi ,\ \delta ) \ = \ \int\limits _ { X } [ \pi _ {1} p (x \mid \theta _ {1} ) L ( \theta _ {1} ,\ \delta ( x)) + $$
$$ + {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x) $$
implies that $ \inf _ \delta \ \rho ( \pi ,\ \delta ) $ is attained for the function
$$ \delta ^ {*} (x) \ = \ \left \{ \begin{array}{l} d _ {1} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \leq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } , \\ d _ {2} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \geq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } . \\ \end{array} \right . $$
The advantage of the Bayesian approach consists in the fact that, unlike the losses $ \rho ( \theta ,\ \delta ) $, the expected losses $ \rho ( \pi ,\ \delta ) $ are numbers which are dependent on the unknown parameter $ \theta $, and, consequently, it is known that solutions $ \delta _ \epsilon ^ {*} $ for which
$$ \rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ \inf _ \delta \ \rho ( \pi ,\ \delta ) + \epsilon , $$
and which are, if not optimal, at least $ \epsilon $- optimal $ ( \epsilon > 0) $, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).
References
[1] | A. Wald, "Statistical decision functions" , Wiley (1950) |
[2] | M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970) |
Bayesian approach. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach&oldid=44399