Namespaces
Variants
Actions

Difference between revisions of "User:Maximilian Janisch/Sandbox"

From Encyclopedia of Mathematics
Jump to: navigation, search
(AUTOMATIC EDIT (Latexlist): Replaced 18.2% images by TEX code)
(test)
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{TEX|partial}}{{TEX|part}}
+
{{TEX|partial}}<div class="Vorlage_Achtung" style="border: 0.18em solid #FF6666; border-left:1em solid #FF6666; margin:0.5em 0em; overflow:hidden; padding:0.5em; text-align: left;">
 +
This page is a copy of the article [[Bayesian approach]] in order to test [[User:Maximilian_Janisch/latexlist|automatic LaTeXification]]. This article is not my work.
 +
</div>
 +
Test
 +
''to statistical problems''
  
One of the integral transforms (cf. [[Integral transform|Integral transform]]). It is a linear operator $F$ acting on a space whose elements are functions $f$ of $n$ real variables. The smallest domain of definition of $F$ is the set $D=C_0^\infty$ of all infinitely-differentiable functions $\phi$ of compact support. For such functions
+
An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $( X , B X )$ of (potential) samples $\pi$, by a space $( \Theta , B _ { \Theta } )$ of values of the unknown parameter $6$, by a family of probability distributions $\{ P _ { \theta } : \theta \in \Theta \}$ on $( X , B X )$, by a space of decisions $( D , B _ { D } )$ and by a function $L ( \theta , d )$, which characterizes the losses caused by accepting the decision $a$ when the true value of the parameter is $6$. The objective of decision making is to find in a certain sense an optimal rule (decision function) $\delta = \delta ( x )$, assigning to each result of an observation $X \in X$ the decision $\delta ( x ) \in D$. In the Bayesian approach, when it is assumed that the unknown parameter $6$ is a random variable with a given (a priori) distribution $\pi = \pi ( d \theta )$ on $( \Theta , B _ { \Theta } )$ the best decision function ([[Bayesian decision function|Bayesian decision function]]) $\delta ^ { * } = \delta ^ { * } ( x )$ is defined as the function for which the minimum expected loss $\delta \rho ( \pi , \delta )$, where
  
\begin{equation} (F\phi)(x) = \frac{1}{(2\pi)^{\frac{n}{2}}} \cdot \int_{\mathbf R^n} \phi(\xi) e^{-i x \xi} \, \mathrm d\xi. \end{equation}
+
\begin{equation} \rho ( \pi , \delta ) = \int _ { \Theta } \rho ( \theta , \delta ) \pi ( d \theta ) \end{equation}
  
In a certain sense the most natural domain of definition of $F$ is the set $S$ of all infinitely-differentiable functions $\phi$ that, together with their derivatives, vanish at infinity faster than any power of $\frac{1}{|x|}$. Formula (1) still holds for $\phi\in S$, and $(F \phi)(x) \equiv \psi(x)\in S$. Moreover, $F$ is an isomorphism of $S$ <u>onto</u> itself, the inverse mapping $F^{-1}$ (the inverse Fourier transform) is the inverse of the Fourier transform and is given by the formula:
+
and
  
\begin{equation} \phi(x) = (F^{-1} \psi)(x) = \frac{1}{(2\pi)^{\frac{n}{2}}} \cdot \int_{\mathbf R^n} \psi(\xi) e^{i x \xi} \, \mathrm d\xi. \end{equation}
+
\begin{equation} \rho ( \theta , \delta ) = \int _ { Y } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \end{equation}
  
Formula (1) also acts on the space $L_{1}\left(\mathbf{R}^{n}\right)$ of integrable functions. In order to enlarge the domain of definition of the operator $F$ generalization of (1) is necessary. In classical analysis such a generalization has been constructed for locally integrable functions with some restriction on their behaviour as $|x|\to\infty$ (see [[Fourier integral|Fourier integral]]). In the theory of generalized functions the definition of the operator $F$ is free of many requirements of classical analysis.
+
is attained. Thus,
  
The basic problems connected with the study of the Fourier transform $F$ are: the investigation of the domain of definition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115023.png"/> and the range of values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115024.png"/> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115025.png"/>; as well as studying properties of the mapping $\Phi \rightarrow \Psi$ (in particular, conditions for the existence of the inverse operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115027.png"/> and its expression). The inversion formula for the Fourier transform is very simple:
+
\begin{equation} \rho ( \pi , \delta ^ { * } ) = \operatorname { inf } _ { \delta } \int _ { \Theta } \int _ { X } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \pi ( d \theta ) \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115028.png"/></td> </tr></tbody></table>
+
In searching for the Bayesian decision function $\delta ^ { * } = \delta ^ { * } ( x )$, the following remark is useful. Let $P _ { \theta } ( d x ) = p ( x | \theta ) d \mu ( x )$, $\pi ( d \theta ) = \pi ( \theta ) d \nu ( \theta )$, where $\mu$ and $2$ are certain $\Omega$-finite measures. One then finds, assuming that the order of integration may be changed,
  
Under the action of the Fourier transform linear operators on the original space, which are invariant with respect to a shift, become (under certain conditions) multiplication operators in the image space. In particular, the convolution of two functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115029.png"/> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115030.png"/> goes over into the product of the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115031.png"/> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115032.png"/>:
+
\begin{equation} \int \int _ { \Theta } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \pi ( d \theta ) = \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115033.png"/></td> </tr></tbody></table>
+
\begin{equation} = \int \int _ { \Theta } L ( \theta , \delta ( x ) ) p ( x | \theta ) \pi ( \theta ) d \mu ( x ) d \nu ( \theta ) = \end{equation}
  
and differentiation induces multiplication by the independent variable:
+
\begin{equation} = \int _ { X } d \mu ( x ) [ \int _ { \Theta } L ( \theta , \delta ( x ) ) p ( x | \theta ) \pi ( \theta ) d \nu ( \theta ) ] \end{equation}
  
\begin{equation} F ( D ^ { \alpha } f ) = ( i x ) ^ { \alpha } F f \end{equation}
+
It is seen from the above that for a given $x \in X , \delta ^ { * } ( x )$ is that value of $d ^ { x }$ for which
  
In the spaces $L _ { p } ( R ^ { n } )$, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115036.png"/>, the operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115037.png"/> is defined by the formula (1) on the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115038.png"/> and is a bounded operator from $L _ { p } ( R ^ { n } )$ into <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115040.png"/>, $p ^ { - 1 } + q ^ { - 1 } = 1$:
+
<table class="eq" style="width:100%;"> <tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539033.png"/></td> </tr></table>
  
\begin{equation} \left\{\frac{1}{(2 \pi)^{n / 2}} \int_{\mathbf{R}^{n}}|(F f)(x)|^{q} d x\right\}^{1 / q} \leq\left\{\frac{1}{(2 \pi)^{n / 2}} \int_{\mathbf{R}^{n}}|f(x)|^{p} d x\right\}^{1 / p} \end{equation}
+
is attained, or, what is equivalent, for which
  
(the Hausdorff–Young inequality). <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115043.png"/> admits a continuous extension onto the whole space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115044.png"/> which (for $1 &lt; p \leq 2$) is given by
+
\begin{equation} \operatorname { inf } _ { d } \int _ { \Theta } L ( \theta , d ) \frac { p ( x | \theta ) \pi ( \theta ) } { p ( x ) } d \nu ( \theta ) \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115046.png"/></td> <td style="width:5%;text-align:right;" valign="top">(3)</td></tr></tbody></table>
+
is attained, where
  
Convergence is understood to be in the norm of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115047.png"/>. If $p \neq 2$, the image of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115049.png"/> under the action of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115050.png"/> does not coincide with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115051.png"/>, that is, the imbedding <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115052.png"/> is strict when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115053.png"/> (for the case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115054.png"/> see [[Plancherel theorem|Plancherel theorem]]). The inverse operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115055.png"/> is defined on $F L y$ by
+
\begin{equation} p ( x ) = \int _ { \Theta } p ( x | \theta ) \pi ( \theta ) d \nu ( \theta ) \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115057.png"/></td> </tr></tbody></table>
+
But, according to the [[Bayes formula|Bayes formula]]
  
The problem of extending the Fourier transform to a larger class of functions arises constantly in analysis and its applications. See, for example, [[Fourier transform of a generalized function|Fourier transform of a generalized function]].
+
\begin{equation} \int _ { \Theta } L ( \theta , d ) \frac { p ( x | \theta ) \pi ( \theta ) } { p ( x ) } d \nu ( \theta ) = E [ L ( \theta , d ) | x ] \end{equation}
  
====References====
+
Thus, for a given $\pi$, $\delta ^ { * } ( x )$ is that value of $d ^ { x }$ for which the conditional average loss $E [ L ( \theta , d ) | x ]$ attains a minimum.
<table><tbody><tr><td valign="top">[1]</td> <td valign="top">  E.C. Titchmarsh,   "Introduction to the theory of Fourier integrals" , Oxford Univ. Press  (1948)</td></tr><tr><td valign="top">[2]</td> <td valign="top">  A. Zygmund,  "Trigonometric series" , '''2''' , Cambridge Univ. Press  (1988)</td></tr><tr><td valign="top">[3]</td> <td valign="top">  E.M. Stein,  G. Weiss,  "Fourier analysis on Euclidean spaces" , Princeton Univ. Press  (1971)</td></tr></tbody></table>
 
 
 
 
 
 
 
====Comments====
 
Instead of  "generalized function"  the term  "distributiondistribution"  is often used.
 
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115058.png"/> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115059.png"/> then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115060.png"/> denotes the scalar product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115061.png"/>.
+
Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $= \{ \theta _ { 1 } , \theta _ { 2 } \}$, $D = \{ d _ { 1 } , d _ { 2 } \}$, $L _ { i j } = L = ( \theta _ { i } , d _ { j } )$, $i , j = 1,2$; $\pi ( \theta _ { 1 } ) = \pi _ { 1 }$, $\pi ( \theta _ { 2 } ) = \pi _ { 2 }$, $\pi _ { 1 } + \pi _ { 2 } = 1$. If the solution $a$ is identified with the acceptance of the hypothesis $H _ { \hat { j } }$: $\theta = \theta _ { i }$, it is natural to assume that $L _ { 11 } &lt; L _ { 12 }$, $L _ { 22 } &lt; L _ { 21 }$. Then
  
If in (1) the  "normalizing factor"  $( 1 / 2 \pi ) ^ { n / 2 }$ is replaced by some constant <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115063.png"/>, then in (2) it must be replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115064.png"/> with $\beta = ( 1 / 2 \pi ) ^ { x }$.
+
\begin{equation} \rho ( \pi , \delta ) = \int _ { X } [ \pi _ { 1 } p ( x | \theta _ { 1 } ) L ( \theta _ { 1 } , \delta ( x ) ) + \end{equation}
  
At least two other conventions for the  "normalization factor"  are in common use:
+
\begin{equation} + \pi _ { 2 } p ( x | \theta _ { 2 } ) L ( \theta _ { 2 } , \delta ( x ) ) ] d \mu ( x ) \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115066.png"/></td> <td style="width:5%;text-align:right;" valign="top">(a1)</td></tr></tbody></table>
+
implies that $\delta \rho ( \pi , \delta )$ is attained for the function
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115067.png"/></td> </tr></tbody></table>
+
\begin{equation} \delta ^ { * } ( x ) = \left\{ \begin{array} { l l } { d _ { 1 } , } &amp; { \text { if } \frac { p ( x | \theta _ { 2 } ) } { p ( x | \theta _ { 1 } ) } \leq \frac { \pi _ { 1 } } { \pi _ { 2 } } \frac { L _ { 12 } - L _ { 11 } } { L _ { 21 } - L _ { 22 } } } \\ { d _ { 2 } , } &amp; { \text { if } \frac { p ( x | \theta _ { 2 } ) } { p ( x | \theta _ { 1 } ) } \geq \frac { \pi _ { 1 } } { \pi _ { 2 } } \frac { L _ { 12 } - L _ { 11 } } { L _ { 21 } - L _ { 22 } } } \end{array} \right. \end{equation}
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115068.png"/></td> <td style="width:5%;text-align:right;" valign="top">(a2)</td></tr></tbody></table>
+
The advantage of the Bayesian approach consists in the fact that, unlike the losses $\rho ( \theta , \delta )$, the expected losses $\rho ( \pi , \delta )$ are numbers which are dependent on the unknown parameter $6$, and, consequently, it is known that solutions $\delta _ { \epsilon } ^ { * }$ for which
  
<table class="eq" style="width:100%;"> <tbody><tr><td style="width:94%;text-align:center;" valign="top"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115069.png"/></td> </tr></tbody></table>
+
\begin{equation} \rho ( \pi , \delta _ { \epsilon } ^ { * } ) \leq \operatorname { inf } _ { \delta } \rho ( \pi , \delta ) + \epsilon \end{equation}
  
The convention of the article leads to the Fourier transform as a [[Unitary operator|unitary operator]] from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/f/f041/f041150/f04115070.png"/> into itself, and so does the convention (a2). Convention (a1) is more in line with [[Harmonic analysis|harmonic analysis]].
+
and which are, if not optimal, at least <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015390/b01539062.png"/>-optimal $( \epsilon &gt; 0 )$, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. [[Bayesian approach, empirical|Bayesian approach, empirical]]).
  
 
====References====
 
====References====
<table><tbody><tr><td valign="top">[a1]</td> <td valign="top">  W. Rudin,  "Functional analysis" , McGraw-Hill  (1973)</td></tr></tbody></table
+
<table><tr><td valign="top">[1]</td> <td valign="top">  A. Wald,  "Statistical decision functions" , Wiley  (1950)</td></tr><tr><td valign="top">[2]</td> <td valign="top">  M.H. de Groot,  "Optimal statistical decisions" , McGraw-Hill  (1970)</td></tr></table>

Latest revision as of 13:45, 17 October 2019

This page is a copy of the article Bayesian approach in order to test automatic LaTeXification. This article is not my work.

Test to statistical problems

An approach based on the assumption that to any parameter in a statistical problem there can be assigned a definite probability distribution. Any general statistical decision problem is determined by the following elements: by a space $( X , B X )$ of (potential) samples $\pi$, by a space $( \Theta , B _ { \Theta } )$ of values of the unknown parameter $6$, by a family of probability distributions $\{ P _ { \theta } : \theta \in \Theta \}$ on $( X , B X )$, by a space of decisions $( D , B _ { D } )$ and by a function $L ( \theta , d )$, which characterizes the losses caused by accepting the decision $a$ when the true value of the parameter is $6$. The objective of decision making is to find in a certain sense an optimal rule (decision function) $\delta = \delta ( x )$, assigning to each result of an observation $X \in X$ the decision $\delta ( x ) \in D$. In the Bayesian approach, when it is assumed that the unknown parameter $6$ is a random variable with a given (a priori) distribution $\pi = \pi ( d \theta )$ on $( \Theta , B _ { \Theta } )$ the best decision function (Bayesian decision function) $\delta ^ { * } = \delta ^ { * } ( x )$ is defined as the function for which the minimum expected loss $\delta \rho ( \pi , \delta )$, where

\begin{equation} \rho ( \pi , \delta ) = \int _ { \Theta } \rho ( \theta , \delta ) \pi ( d \theta ) \end{equation}

and

\begin{equation} \rho ( \theta , \delta ) = \int _ { Y } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \end{equation}

is attained. Thus,

\begin{equation} \rho ( \pi , \delta ^ { * } ) = \operatorname { inf } _ { \delta } \int _ { \Theta } \int _ { X } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \pi ( d \theta ) \end{equation}

In searching for the Bayesian decision function $\delta ^ { * } = \delta ^ { * } ( x )$, the following remark is useful. Let $P _ { \theta } ( d x ) = p ( x | \theta ) d \mu ( x )$, $\pi ( d \theta ) = \pi ( \theta ) d \nu ( \theta )$, where $\mu$ and $2$ are certain $\Omega$-finite measures. One then finds, assuming that the order of integration may be changed,

\begin{equation} \int \int _ { \Theta } L ( \theta , \delta ( x ) ) P _ { \theta } ( d x ) \pi ( d \theta ) = \end{equation}

\begin{equation} = \int \int _ { \Theta } L ( \theta , \delta ( x ) ) p ( x | \theta ) \pi ( \theta ) d \mu ( x ) d \nu ( \theta ) = \end{equation}

\begin{equation} = \int _ { X } d \mu ( x ) [ \int _ { \Theta } L ( \theta , \delta ( x ) ) p ( x | \theta ) \pi ( \theta ) d \nu ( \theta ) ] \end{equation}

It is seen from the above that for a given $x \in X , \delta ^ { * } ( x )$ is that value of $d ^ { x }$ for which

is attained, or, what is equivalent, for which

\begin{equation} \operatorname { inf } _ { d } \int _ { \Theta } L ( \theta , d ) \frac { p ( x | \theta ) \pi ( \theta ) } { p ( x ) } d \nu ( \theta ) \end{equation}

is attained, where

\begin{equation} p ( x ) = \int _ { \Theta } p ( x | \theta ) \pi ( \theta ) d \nu ( \theta ) \end{equation}

But, according to the Bayes formula

\begin{equation} \int _ { \Theta } L ( \theta , d ) \frac { p ( x | \theta ) \pi ( \theta ) } { p ( x ) } d \nu ( \theta ) = E [ L ( \theta , d ) | x ] \end{equation}

Thus, for a given $\pi$, $\delta ^ { * } ( x )$ is that value of $d ^ { x }$ for which the conditional average loss $E [ L ( \theta , d ) | x ]$ attains a minimum.

Example. (The Bayesian approach applied to the case of distinguishing between two simple hypotheses.) Let $= \{ \theta _ { 1 } , \theta _ { 2 } \}$, $D = \{ d _ { 1 } , d _ { 2 } \}$, $L _ { i j } = L = ( \theta _ { i } , d _ { j } )$, $i , j = 1,2$; $\pi ( \theta _ { 1 } ) = \pi _ { 1 }$, $\pi ( \theta _ { 2 } ) = \pi _ { 2 }$, $\pi _ { 1 } + \pi _ { 2 } = 1$. If the solution $a$ is identified with the acceptance of the hypothesis $H _ { \hat { j } }$: $\theta = \theta _ { i }$, it is natural to assume that $L _ { 11 } < L _ { 12 }$, $L _ { 22 } < L _ { 21 }$. Then

\begin{equation} \rho ( \pi , \delta ) = \int _ { X } [ \pi _ { 1 } p ( x | \theta _ { 1 } ) L ( \theta _ { 1 } , \delta ( x ) ) + \end{equation}

\begin{equation} + \pi _ { 2 } p ( x | \theta _ { 2 } ) L ( \theta _ { 2 } , \delta ( x ) ) ] d \mu ( x ) \end{equation}

implies that $\delta \rho ( \pi , \delta )$ is attained for the function

\begin{equation} \delta ^ { * } ( x ) = \left\{ \begin{array} { l l } { d _ { 1 } , } & { \text { if } \frac { p ( x | \theta _ { 2 } ) } { p ( x | \theta _ { 1 } ) } \leq \frac { \pi _ { 1 } } { \pi _ { 2 } } \frac { L _ { 12 } - L _ { 11 } } { L _ { 21 } - L _ { 22 } } } \\ { d _ { 2 } , } & { \text { if } \frac { p ( x | \theta _ { 2 } ) } { p ( x | \theta _ { 1 } ) } \geq \frac { \pi _ { 1 } } { \pi _ { 2 } } \frac { L _ { 12 } - L _ { 11 } } { L _ { 21 } - L _ { 22 } } } \end{array} \right. \end{equation}

The advantage of the Bayesian approach consists in the fact that, unlike the losses $\rho ( \theta , \delta )$, the expected losses $\rho ( \pi , \delta )$ are numbers which are dependent on the unknown parameter $6$, and, consequently, it is known that solutions $\delta _ { \epsilon } ^ { * }$ for which

\begin{equation} \rho ( \pi , \delta _ { \epsilon } ^ { * } ) \leq \operatorname { inf } _ { \delta } \rho ( \pi , \delta ) + \epsilon \end{equation}

and which are, if not optimal, at least -optimal $( \epsilon > 0 )$, are certain to exist. The disadvantage of the Bayesian approach is the necessity of postulating both the existence of an a priori distribution of the unknown parameter and its precise form (the latter disadvantage may be overcome to a certain extent by adopting an empirical Bayesian approach, cf. Bayesian approach, empirical).

References

[1] A. Wald, "Statistical decision functions" , Wiley (1950)
[2] M.H. de Groot, "Optimal statistical decisions" , McGraw-Hill (1970)
How to Cite This Entry:
Maximilian Janisch/Sandbox. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Maximilian_Janisch/Sandbox&oldid=43662