Difference between revisions of "Least-favourable distribution"

Latest revision as of 22:16, 5 June 2020

An a priori distribution maximizing the risk function in a statistical problem of decision making.

Suppose that, based on a realization of a random variable $ X $ with values in a sample space $ ( \mathfrak X , \mathfrak B _ {\mathfrak X} , P _ \theta ) $, $ \theta \in \Theta $, one has to choose a decision $ d $ from a decision space $ ( \mathfrak D , \mathfrak B _ {\mathfrak D} ) $; it is assumed here that the unknown parameter $ \theta $ is a random variable taking values in a sample space $ ( \Theta , \mathfrak B _ \Theta , \pi _ {t} ) $, $ t \in T $. Let $ L( \theta , d) $ be a function representing the loss incurred by adopting the decision $ d $ if the true value of the parameter is $ \theta $. An a priori distribution $ \pi _ {t ^ {*} } $ from the family $ \{ {\pi _ {t} } : {t \in T } \} $ is said to be least favourable for a decision $ d $ in the statistical problem of decision making using the Bayesian approach if

$$ \sup _ {t \in T } \rho ( \pi _ {t} , d) = \ \rho ( \pi _ {t ^ {*} } , d), $$

where

$$ \rho ( \pi _ {t} , d) = \ \int\limits _ \Theta \int\limits _ {\mathfrak X } L ( \theta , d ( x)) d P _ \theta ( x) d \pi _ {t} ( \theta ) $$

is the risk function, representing the mean loss incurred by adopting the decision $ d $. A least-favourable distribution $ \pi _ {t ^ {*} } $ makes it possible to calculate the "greatest" (on the average) loss $ \rho ( \pi _ {t ^ {*} } , d) $ incurred by adopting $ d $. In practical work one is guided, as a rule, not by the least-favourable distribution, but, on the contrary, strives to adopt a decision that would safeguard one against maximum loss when $ \theta $ varies; this implies a search for a minimax decision $ d ^ {*} $ minimizing the maximum risk, i.e.

$$ \inf _ {d \in \mathfrak D } \ \sup _ {t \in T } \ \rho ( \pi _ {t} , d) = \ \sup _ {t \in T } \ \rho ( \pi _ {t} , d ^ {*} ). $$

When testing a composite statistical hypothesis against a simple alternative, within the Bayesian approach, one defines a least-favourable distribution with the aid of Wald reduction, which may be described as follows. Suppose that, based on a realization of a random variable $ X $, one has to test a composite hypothesis $ H _ {0} $, according to which the distribution law of $ X $ belongs to a family $ H _ {0} = \{ {P _ \theta } : {\theta \in \Theta } \} $, against a simple alternative $ H _ {1} $, according to which $ X $ obeys a law $ Q $; let

$$ p _ \theta ( x) = \frac{dP _ \theta ( x) }{d \mu ( x) } \ \ \textrm{ and } \ \ q ( x) = \frac{dQ ( x) }{d \mu ( x) } , $$

where $ \mu ( \cdot ) $ is a $ \sigma $- finite measure on $ ( \mathfrak X , \mathfrak B _ {\mathfrak X} ) $ and $ \{ {\pi _ {t} } : {t \in T } \} $ is a family of a priori distributions on $ ( \Theta , \mathfrak B _ \Theta ) $. Then, for any $ t \in T $, the composite hypothesis $ H _ {0} $ can be associated with a simple hypothesis $ H _ {t} $, according to which $ X $ obeys the probability law with density

$$ f _ {t} ( x) = \ \int\limits _ \Theta p _ \theta ( x) d \pi _ {t} ( \theta ). $$

By the Neyman–Pearson lemma for testing a simple hypothesis $ H _ {t} $ against a simple alternative $ H _ {1} $, there exists a most-powerful test, based on the likelihood ratio. Let $ \beta _ {t} $ be the power of this test (cf. Power of a statistical test). Then the least-favourable distribution is the a priori distribution $ \pi _ {t ^ {*} } $ from the family $ \{ {\pi _ {t} } : {t \in T } \} $ such that $ \beta _ {t ^ {*} } \leq \beta _ {t} $ for all $ t \in T $. The least-favourable distribution has the property that the density $ f _ {t ^ {*} } ( x) $ of $ X $ under the hypothesis $ H _ {t ^ {*} } $ is the "least distant" from the alternative density $ q ( x) $, i.e. the hypothesis $ H _ {t ^ {*} } $ is the member of the family $ \{ {H _ {t} } : {t \in T } \} $" nearest" to the rival hypothesis $ H _ {1} $. See Bayesian approach.

References

[1]	E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1986)
[2]	S. Zachs, "Theory of statistical inference" , Wiley (1971)

How to Cite This Entry:
Least-favourable distribution. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Least-favourable_distribution&oldid=47598

This article was adapted from an original article by M.S. Nikulin (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Least-favourable distribution"

Latest revision as of 22:16, 5 June 2020

References

@@ Line 1: / Line 1: @@
+<!--
+l0577601.png
+$#A+1 = 55 n = 0
+$#C+1 = 55 : ~/encyclopedia/old_files/data/L057/L.0507760 Least\AAhfavourable distribution
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 An [[A priori distribution|a priori distribution]] maximizing the risk function in a statistical problem of decision making.
-Suppose that, based on a realization of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577601.png" /> with values in a sample space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577602.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577603.png" />, one has to choose a decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577604.png" /> from a decision space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577605.png" />; it is assumed here that the unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577606.png" /> is a random variable taking values in a sample space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577607.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577608.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l0577609.png" /> be a function representing the loss incurred by adopting the decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776010.png" /> if the true value of the parameter is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776011.png" />. An a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776012.png" /> from the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776013.png" /> is said to be least favourable for a decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776014.png" /> in the statistical problem of decision making using the [[Bayesian approach|Bayesian approach]] if
+Suppose that, based on a realization of a random variable  $  X $
+with values in a sample space  $  ( \mathfrak X , \mathfrak B _ {\mathfrak X} , P _  \theta  ) $,
+$  \theta \in \Theta $,
+one has to choose a decision  $  d $
+from a decision space  $  ( \mathfrak D , \mathfrak B _ {\mathfrak D} ) $;
+it is assumed here that the unknown parameter  $  \theta $
+is a random variable taking values in a sample space  $  ( \Theta , \mathfrak B _  \Theta  , \pi _ {t} ) $,
+$  t \in T $.
+Let  $  L( \theta , d) $
+be a function representing the loss incurred by adopting the decision  $  d $
+if the true value of the parameter is  $  \theta $.
+An a priori distribution  $  \pi _ {t  ^ {*}  } $
+from the family  $  \{ {\pi _ {t} } : {t \in T } \} $
+is said to be least favourable for a decision  $  d $
+in the statistical problem of decision making using the [[Bayesian approach|Bayesian approach]] if
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776015.png" /></td> </tr></table>
+$$
+\sup _ {t \in T }  \rho ( \pi _ {t} , d)  = \
+\rho ( \pi _ {t  ^ {*}  } , d),
+$$
 where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776016.png" /></td> </tr></table>
+$$
+\rho ( \pi _ {t} , d)  = \
+\int\limits _  \Theta  \int\limits _ {\mathfrak X }
+L ( \theta , d ( x))  d P _  \theta  ( x)  d \pi _ {t} ( \theta )
+$$
-is the risk function, representing the mean loss incurred by adopting the decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776017.png" />. A least-favourable distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776018.png" /> makes it possible to calculate the  "greatest"  (on the average) loss <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776019.png" /> incurred by adopting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776020.png" />. In practical work one is guided, as a rule, not by the least-favourable distribution, but, on the contrary, strives to adopt a decision that would safeguard one against maximum loss when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776021.png" /> varies; this implies a search for a minimax decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776022.png" /> minimizing the maximum risk, i.e.
+is the risk function, representing the mean loss incurred by adopting the decision  $  d $.
+A least-favourable distribution  $  \pi _ {t  ^ {*}  } $
+makes it possible to calculate the  "greatest"  (on the average) loss  $  \rho ( \pi _ {t  ^ {*}  } , d) $
+incurred by adopting  $  d $.
+In practical work one is guided, as a rule, not by the least-favourable distribution, but, on the contrary, strives to adopt a decision that would safeguard one against maximum loss when  $  \theta $
+varies; this implies a search for a minimax decision  $  d  ^ {*} $
+minimizing the maximum risk, i.e.
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776023.png" /></td> </tr></table>
+$$
+\inf _ {d \in \mathfrak D } \
+\sup _ {t \in T } \
+\rho ( \pi _ {t} , d)  = \
+\sup _ {t \in T } \
+\rho ( \pi _ {t} , d  ^ {*} ).
+$$
-When testing a composite statistical hypothesis against a simple alternative, within the Bayesian approach, one defines a least-favourable distribution with the aid of Wald reduction, which may be described as follows. Suppose that, based on a realization of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776024.png" />, one has to test a composite hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776025.png" />, according to which the distribution law of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776026.png" /> belongs to a family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776027.png" />, against a simple alternative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776028.png" />, according to which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776029.png" /> obeys a law <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776030.png" />; let
+When testing a composite statistical hypothesis against a simple alternative, within the Bayesian approach, one defines a least-favourable distribution with the aid of Wald reduction, which may be described as follows. Suppose that, based on a realization of a random variable  $  X $,
+one has to test a composite hypothesis  $  H _ {0} $,
+according to which the distribution law of  $  X $
+belongs to a family  $  H _ {0} = \{ {P _  \theta  } : {\theta \in \Theta } \} $,
+against a simple alternative  $  H _ {1} $,
+according to which  $  X $
+obeys a law  $  Q $;
+let
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776031.png" /></td> </tr></table>
+$$
+p _  \theta  ( x)  =
+\frac{dP _  \theta  ( x) }{d \mu ( x) }
+ \ \
+\textrm{ and } \ \
+q ( x)  =
+\frac{dQ ( x) }{d \mu ( x) }
+ ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776032.png" /> is a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776033.png" />-finite measure on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776034.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776035.png" /> is a family of a priori distributions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776036.png" />. Then, for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776037.png" />, the composite hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776038.png" /> can be associated with a simple hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776039.png" />, according to which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776040.png" /> obeys the probability law with density
+where  $  \mu ( \cdot ) $
+is a  $  \sigma $-
+finite measure on  $  ( \mathfrak X , \mathfrak B _ {\mathfrak X} ) $
+and  $  \{ {\pi _ {t} } : {t \in T } \} $
+is a family of a priori distributions on  $  ( \Theta , \mathfrak B _  \Theta  ) $.
+Then, for any  $  t \in T $,
+the composite hypothesis  $  H _ {0} $
+can be associated with a simple hypothesis  $  H _ {t} $,
+according to which  $  X $
+obeys the probability law with density
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776041.png" /></td> </tr></table>
+$$
+f _ {t} ( x)  = \
+\int\limits _  \Theta  p _  \theta  ( x)  d \pi _ {t} ( \theta ).
+$$
-By the [[Neyman–Pearson lemma|Neyman–Pearson lemma]] for testing a simple hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776042.png" /> against a simple alternative <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776043.png" />, there exists a [[Most-powerful test|most-powerful test]], based on the likelihood ratio. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776044.png" /> be the power of this test (cf. [[Power of a statistical test|Power of a statistical test]]). Then the least-favourable distribution is the a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776045.png" /> from the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776046.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776047.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776048.png" />. The least-favourable distribution has the property that the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776049.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776050.png" /> under the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776051.png" /> is the  "least distant"  from the alternative density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776052.png" />, i.e. the hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776053.png" /> is the member of the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776054.png" />  "nearest"  to the rival hypothesis <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057760/l05776055.png" />. See [[Bayesian approach|Bayesian approach]].
+By the [[Neyman–Pearson lemma|Neyman–Pearson lemma]] for testing a simple hypothesis  $  H _ {t} $
+against a simple alternative  $  H _ {1} $,
+there exists a [[Most-powerful test|most-powerful test]], based on the likelihood ratio. Let  $  \beta _ {t} $
+be the power of this test (cf. [[Power of a statistical test|Power of a statistical test]]). Then the least-favourable distribution is the a priori distribution  $  \pi _ {t  ^ {*}  } $
+from the family  $  \{ {\pi _ {t} } : {t \in T } \} $
+such that  $  \beta _ {t  ^ {*}  } \leq  \beta _ {t} $
+for all  $  t \in T $.
+The least-favourable distribution has the property that the density  $  f _ {t  ^ {*}  } ( x) $
+of  $  X $
+under the hypothesis  $  H _ {t  ^ {*}  } $
+is the  "least distant"  from the alternative density  $  q ( x) $,
+i.e. the hypothesis  $  H _ {t  ^ {*}  } $
+is the member of the family  $  \{ {H _ {t} } : {t \in T } \} $"
+nearest"  to the rival hypothesis  $  H _ {1} $.
+See [[Bayesian approach|Bayesian approach]].
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  E.L. Lehmann,   "Testing statistical hypotheses" , Wiley  (1986)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  S. Zachs,   "Theory of statistical inference" , Wiley  (1971)</TD></TR></table>