Difference between revisions of "Bayesian estimator"

Latest revision as of 10:33, 29 May 2020

An estimator of an unknown parameter from the results of observations using the Bayesian approach. In such an approach to the problem of statistical estimation it is usually assumed that the unknown parameter $ \theta \in \Theta \subseteq \mathbf R ^ {k} $ is a random variable with given a priori distribution $ \pi = \pi (d \theta ) $, that the space of decisions $ D $ is identical to the set $ \Theta $ and that the loss $ L ( \theta , d) $ expresses the deviation between the variable $ \theta $ and its estimator $ d $. It is therefore supposed, as a rule, that the function $ L( \theta , d) $ has the form $ L ( \theta , d) = a ( \theta ) \lambda ( \theta - d) $, where $ \lambda $ is some non-negative function of the error vector $ \theta - d $. If $ k = 1 $, it is often assumed that $ \lambda ( \theta - d) = | \theta - d | ^ \alpha $, $ \alpha > 0 $; the most useful and mathematically the most convenient is the quadratic loss function $ L ( \theta , d) = | \theta - d | ^ {2} $. For such a loss function the Bayesian estimator (Bayesian decision function) $ \delta ^ {*} = \delta ^ {*} (x) $ is defined as the function for which the minimum total loss

$$ \inf _ \delta \rho ( \pi , \delta ) = \ \inf _ \delta \ \int\limits _ \Theta \int\limits _ { X } | \theta - \delta (x) | ^ {2} {\mathsf P} _ \theta \ (dx) \pi (d \theta ), $$

is attained, or, equivalently, for which the minimum conditional loss

$$ \inf _ \delta \ {\mathsf E} \{ [ \theta - \delta (x)] ^ {2} \mid x \} $$

is attained. It follows that in the case of a quadratic loss function the Bayesian estimator $ \delta ^ {*} (x) $ coincides with the a posteriori average $ \delta ^ {*} (x) = {\mathsf E} ( \theta \mid x) $, and the Bayes risk is

$$ \rho ( \pi , \delta ^ {*} ) = \ {\mathsf E} [ {\mathsf D} ( \theta \mid x)], $$

where $ {\mathsf D} ( \theta \mid x) $ is the variance of the a posteriori distribution:

$$ {\mathsf D} ( \theta \mid x) = \ {\mathsf E} \{ [ \theta - {\mathsf E} ( \theta \mid x)] ^ {2} \mid x \} . $$

Example. Let $ x = (x _ {1} \dots x _ {n} ) $, where $ x _ {1} \dots x _ {n} $ are independent identically-distributed random variables with normal distributions $ N ( \theta , \sigma ^ {2} ) $, $ \sigma ^ {2} $ is known, while the unknown parameter $ \theta $ has the normal distribution $ N ( \mu , \tau ^ {2} ) $. Since the a posteriori distribution for $ \theta $( where $ x $ is given) is normal $ N ( \mu _ {n} , \tau _ {n} ^ {2} ) $ with

$$ \mu _ {n} = \ \frac{n \overline{x}\; \sigma ^ {-2} + \mu \tau ^ {-2} }{n \sigma ^ {-2} + \tau ^ {-2} } ,\ \ \tau _ {n} ^ {-2} = n \sigma ^ {-2} + \tau ^ {-2} , $$

where $ \overline{x}\; = ( \overline{x}\; _ {1} + \dots + \overline{x}\; _ {n} ) / n $, it follows that for the quadratic loss function $ {| \theta - d | } ^ {2} $ the Bayesian estimator is $ \delta ^ {*} (x) = \mu _ {n} $, while the Bayesian risk is $ \tau _ {n} ^ {2} = \sigma ^ {2} \tau ^ {2} / (n \tau ^ {2} + \sigma ^ {2} ) $.

Comments

References

[a1]	E. Sverdrup, "Laws and chance variations" , 1 , North-Holland (1967) pp. Chapt. 6, Section 4

How to Cite This Entry:
Bayesian estimator. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_estimator&oldid=45999

This article was adapted from an original article by A.N. Shiryaev (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Bayesian estimator"

Latest revision as of 10:33, 29 May 2020

Comments

References

@@ Line 1: / Line 1: @@
-An estimator of an unknown parameter from the results of observations using the [[Bayesian approach|Bayesian approach]]. In such an approach to the problem of statistical estimation it is usually assumed that the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154201.png" /> is a random variable with given a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154202.png" />, that the space of decisions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154203.png" /> is identical to the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154204.png" /> and that the loss <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154205.png" /> expresses the deviation between the variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154206.png" /> and its estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154207.png" />. It is therefore supposed, as a rule, that the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154208.png" /> has the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b0154209.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542010.png" /> is some non-negative function of the error vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542011.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542012.png" />, it is often assumed that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542013.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542014.png" />; the most useful and mathematically the most convenient is the quadratic loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542015.png" />. For such a loss function the Bayesian estimator ([[Bayesian decision function|Bayesian decision function]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542016.png" /> is defined as the function for which the minimum total loss
+<!--
+b0154201.png
+$#A+1 = 37 n = 0
+$#C+1 = 37 : ~/encyclopedia/old_files/data/B015/B.0105420 Bayesian estimator
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542017.png" /></td> </tr></table>
+{{TEX|auto}}
+{{TEX|done}}
+An estimator of an unknown parameter from the results of observations using the [[Bayesian approach|Bayesian approach]]. In such an approach to the problem of statistical estimation it is usually assumed that the unknown parameter  $  \theta \in \Theta \subseteq \mathbf R  ^ {k} $
+is a random variable with given a priori distribution  $  \pi = \pi (d \theta ) $,
+that the space of decisions  $  D $
+is identical to the set  $  \Theta $
+and that the loss  $  L ( \theta , d) $
+expresses the deviation between the variable  $  \theta $
+and its estimator  $  d $.
+It is therefore supposed, as a rule, that the function  $  L( \theta , d) $
+has the form  $  L ( \theta , d) = a ( \theta ) \lambda ( \theta - d) $,
+where  $  \lambda $
+is some non-negative function of the error vector  $  \theta - d $.
+If  $  k = 1 $,
+it is often assumed that  $  \lambda ( \theta - d) = | \theta - d |  ^  \alpha  $,
+$  \alpha > 0 $;
+the most useful and mathematically the most convenient is the quadratic loss function  $  L ( \theta , d) = | \theta - d |  ^ {2} $.
+For such a loss function the Bayesian estimator ([[Bayesian decision function|Bayesian decision function]])  $  \delta  ^ {*} = \delta  ^ {*} (x) $
+is defined as the function for which the minimum total loss
+$$
+\inf _  \delta   \rho
+( \pi , \delta )  = \
+\inf _  \delta  \
+\int\limits _  \Theta  \int\limits _ { X }
+| \theta - \delta (x) |
+ ^ {2}  {\mathsf P} _  \theta  \
+(dx)  \pi (d \theta ),
+$$
 is attained, or, equivalently, for which the minimum conditional loss
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542018.png" /></td> </tr></table>
+$$
+\inf _  \delta  \
-is attained. It follows that in the case of a quadratic loss function the Bayesian estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542019.png" /> coincides with the a posteriori average <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542020.png" />, and the Bayes risk is
+{\mathsf E} \{ [ \theta - \delta (x)]  ^ {2} \mid  x \}
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542021.png" /></td> </tr></table>
+is attained. It follows that in the case of a quadratic loss function the Bayesian estimator  $  \delta  ^ {*} (x) $
+coincides with the a posteriori average  $  \delta  ^ {*} (x) = {\mathsf E} ( \theta \mid  x) $,
+and the Bayes risk is
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542022.png" /> is the variance of the a posteriori distribution:
+$$
+\rho ( \pi , \delta  ^ {*} )  = \
+{\mathsf E} [ {\mathsf D} ( \theta \mid  x)],
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542023.png" /></td> </tr></table>
+where  $  {\mathsf D} ( \theta \mid  x) $
+is the variance of the a posteriori distribution:
-Example. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542024.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542025.png" /> are independent identically-distributed random variables with normal distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542026.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542027.png" /> is known, while the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542028.png" /> has the normal distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542029.png" />. Since the a posteriori distribution for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542030.png" /> (where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542031.png" /> is given) is normal <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542032.png" /> with
+$$
+{\mathsf D} ( \theta \mid  x)  = \
+{\mathsf E} \{ [ \theta - {\mathsf E} ( \theta \mid  x)]
+ ^ {2} \mid  x \} .
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542033.png" /></td> </tr></table>
+Example. Let  $  x = (x _ {1} \dots x _ {n} ) $,
+where  $  x _ {1} \dots x _ {n} $
+are independent identically-distributed random variables with normal distributions  $  N ( \theta , \sigma  ^ {2} ) $,
+$  \sigma  ^ {2} $
+is known, while the unknown parameter  $  \theta $
+has the normal distribution  $  N ( \mu , \tau  ^ {2} ) $.
+Since the a posteriori distribution for  $  \theta $(
+where  $  x $
+is given) is normal  $  N ( \mu _ {n} , \tau _ {n}  ^ {2} ) $
+with
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542034.png" />, it follows that for the quadratic loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542035.png" /> the Bayesian estimator is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542036.png" />, while the Bayesian risk is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b015/b015420/b01542037.png" />.
+$$
+\mu _ {n}  = \
+\frac{n \overline{x}\; \sigma  ^ {-2} + \mu \tau  ^ {-2} }{n \sigma  ^ {-2} + \tau  ^ {-2} }
+ ,\ \
+\tau _ {n}  ^ {-2}  =  n \sigma  ^ {-2} +
+\tau  ^ {-2} ,
+$$
+where  $  \overline{x}\; = ( \overline{x}\; _ {1} + \dots + \overline{x}\; _ {n} ) / n $,
+it follows that for the quadratic loss function  $  {| \theta - d | }  ^ {2} $
+the Bayesian estimator is  $  \delta  ^ {*} (x) = \mu _ {n} $,
+while the Bayesian risk is  $  \tau _ {n}  ^ {2} = \sigma  ^ {2} \tau  ^ {2} / (n \tau  ^ {2} + \sigma  ^ {2} ) $.
 ====Comments====
 ====References====
 <table><TR><TD valign="top">[a1]</TD> <TD valign="top">  E. Sverdrup,   "Laws and chance variations" , '''1''' , North-Holland  (1967)  pp. Chapt. 6, Section 4</TD></TR></table>