# Bayesian approach

to statistical problems

An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $(X,\ {\mathcal B} _ {X} )$ of (potential) samples $x$, by a space $( \Theta ,\ {\mathcal B} _ \Theta )$ of values of the unknown parameter $\theta$, by a family of probability distributions $\{ { {\mathsf P} _ \theta } : {\theta \in \Theta } \}$ on $(X,\ {\mathcal B} _ {X} )$, by a space of decisions $(D,\ {\mathcal B} _ {D} )$ and by a function $L( \theta ,\ d)$, which characterizes the losses caused by accepting the decision $d$ when the true value of the parameter is $\theta$. The objective of decision making is to find in a certain sense an optimal rule (decision function) $\delta = \delta (x)$, assigning to each result of an observation $x \in X$ the decision $\delta (x) \in D$. In the Bayesian approach, when it is assumed that the unknown parameter $\theta$ is a random variable with a given (a priori) distribution $\pi = \pi (d \theta )$ on $( \Theta ,\ {\mathcal B} _ \Theta )$ the best decision function (Bayesian decision function) ${\delta ^ {*} } = {\delta ^ {*} } (x)$ is defined as the function for which the minimum expected loss $\inf _ \delta \ \rho ( \pi ,\ \delta )$, where

$$\rho ( \pi ,\ \delta ) \ = \ \int\limits _ \Theta \rho ( \theta ,\ \delta ) \ \pi (d \theta ) ,$$

and

$$\rho ( \theta ,\ \delta ) \ = \ \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx)$$

is attained. Thus,

$$\rho ( \pi ,\ \delta ^ {*} ) \ = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx) \ \pi ( d \theta ) .$$

In searching for the Bayesian decision function $\delta ^ {*} = \delta ^ {*} (x)$, the following remark is useful. Let ${\mathsf P} _ \theta (dx) = p (x \mid \theta ) \ d \mu (x)$, $\pi (d \theta ) = \pi ( \theta ) \ d \nu ( \theta )$, where $\mu$ and $\nu$ are certain $\sigma$- finite measures. One then finds, assuming that the order of integration may be changed,

$$\int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) \ {\mathsf P} _ \theta (dx ) \ \pi ( d \theta )\ =$$

$$= \ \int\limits _ \Theta \int\limits _ { X } L ( \theta ,\ \delta (x)) p ( x \mid \theta ) \pi ( \theta ) \ d \mu (x) \ d \nu ( \theta )\ =$$

$$= \ \int\limits _ { X } \ d \mu (x) \left [ \int\limits _ \Theta L ( \theta ,\ \delta (x)) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) \right ] .$$

It is seen from the above that for a given $x \in X ,\ \delta ^ {*} (x)$ is that value of $d ^ {*}$ for which

$$\inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta )$$

is attained, or, what is equivalent, for which

$$\inf _ { d } \ \int\limits _ \Theta L ( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) ,$$

is attained, where

$$p (x) \ = \ \int\limits _ \Theta p (x \mid \theta ) \pi ( \theta ) \ d \nu ( \theta ) .$$

But, according to the Bayes formula

$$\int\limits _ \Theta L( \theta ,\ d) \frac{p (x \mid \theta ) \pi ( \theta ) }{p (x) } \ d \nu ( \theta ) \ = \ {\mathsf E} [L ( \theta ,\ d) \mid x].$$

Thus, for a given $x$, $\delta ^ {*} (x)$ is that value of $d ^ {*}$ for which the conditional average loss ${\mathsf E} [L ( \theta ,\ d) \mid x]$ attains a minimum.

Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $\Theta = \{ \theta _ {1} ,\ \theta _ {2} \}$, $D = \{ d _ {1} ,\ d _ {2} \}$, $L _ {ij } = L = ( \theta _ {i} ,\ d _ {j} )$, $i,\ j = 1,\ 2$; $\pi ( \theta _ {1} ) = \pi _ {1}$, $\pi ( \theta _ {2} ) = \pi _ {2}$, $\pi _ {1} + \pi _ {2} = 1$. If the solution $d _ {i}$ is identified with the acceptance of the hypothesis $H _ {i}$: $\theta = \theta _ {i}$, it is natural to assume that $L _ {11} < L _ {12}$, $L _ {22} < L _ {21}$. Then

$$\rho ( \pi ,\ \delta ) \ = \ \int\limits _ { X } [ \pi _ {1} p (x \mid \theta _ {1} ) L ( \theta _ {1} ,\ \delta ( x)) +$$

$$+ {} \pi _ {2} p (x \mid \theta _ {2} ) L ( \theta _ {2} ,\ \delta (x))] \ d \mu (x)$$

implies that $\inf _ \delta \ \rho ( \pi ,\ \delta )$ is attained for the function

$$\delta ^ {*} (x) \ = \ \left \{ \begin{array}{l} d _ {1} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \leq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } , \\ d _ {2} ,\ \ \textrm{ if } \ \frac{p (x \mid \theta _ {2} ) }{p (x \mid \theta _ {1} ) } \ \geq \ \frac{\pi _ {1} }{\pi _ {2} } \ \frac{L _ {12} - L _ {11} }{L _ {21} - L _ {22} } . \\ \end{array} \right .$$

The advantage of the Bayesian approach consists in the fact that, unlike the losses $\rho ( \theta ,\ \delta )$, the expected losses $\rho ( \pi ,\ \delta )$ are numbers which are dependent on the unknown parameter $\theta$, and, consequently, it is known that solutions $\delta _ \epsilon ^ {*}$ for which

$$\rho ( \pi ,\ \delta _ \epsilon ^ {*} ) \ \leq \ \inf _ \delta \ \rho ( \pi ,\ \delta ) + \epsilon ,$$

and which are, if not optimal, at least $\epsilon$- optimal $( \epsilon > 0)$, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).

#### References

 [1] A. Wald, "Statistical decision functions" , Wiley (1950) [2] M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970)
How to Cite This Entry:
Bayesian approach. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_approach&oldid=44399
This article was adapted from an original article by A.N. Shiryaev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article