Namespaces
Variants
Actions

Difference between revisions of "Statistical decision theory"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
 +
<!--
 +
s0873201.png
 +
$#A+1 = 79 n = 0
 +
$#C+1 = 79 : ~/encyclopedia/old_files/data/S087/S.0807320 Statistical decision theory
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
A general theory for the processing and use of statistical observations. In a broader interpretation of the term, statistical decision theory is the theory of choosing an optimal non-deterministic behaviour in incompletely known situations.
 
A general theory for the processing and use of statistical observations. In a broader interpretation of the term, statistical decision theory is the theory of choosing an optimal non-deterministic behaviour in incompletely known situations.
  
Inverse problems of probability theory are a subject of mathematical statistics. Suppose that a random phenomenon <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873201.png" /> occurs, described qualitatively by the measure space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873202.png" /> of all its elementary events <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873203.png" /> and quantitatively by a probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873204.png" /> of the events. The statistician knows only the qualitative description of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873205.png" />, and has only incomplete information on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873206.png" /> of the type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873207.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873208.png" /> is a family of probability distributions. By making one or more observations of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s0873209.png" /> and processing the data thus obtained, the statistician has to make a decision on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732010.png" /> and choose the most profitable way to proceed (in particular, it may be decided that insufficient material has been collected and that the set of observations has to be extended before final inferences be made). In classical problems of mathematical statistics, the number of independent observations (the size of the sample) was fixed and optimal estimators of the unknown distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732011.png" /> were sought. The general modern conception of a statistical decision is attributed to A. Wald (see [[#References|[2]]]). It is assumed that every experiment has a cost which has to be paid for, and the statistician must meet the loss of a wrong decision by paying the  "fine"  corresponding to his error. Therefore, from the statistician's point of view, a decision rule (procedure) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732012.png" /> is optimal when it minimizes the risk <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732013.png" /> — the mathematical expectation of his total loss. This approach was proposed by Wald as the basis of statistical [[Sequential analysis|sequential analysis]] and led to the creation in [[Statistical quality control|statistical quality control]] of procedures which, with the same accuracy of inference, use on the average almost half the number of observations as the classical decision rule. In the formulation described, any statistical decision problem can be seen as a two-player game in the sense of J. von Neumann, in which the statistician is one of the players and nature is the other (see [[#References|[3]]]). However, as early as 1820, P. Laplace had likewise described a statistical estimation problem as a game of chance in which the statistician is defeated if his estimates are bad.
+
Inverse problems of probability theory are a subject of mathematical statistics. Suppose that a random phenomenon $  \phi $
 +
occurs, described qualitatively by the measure space $  ( \Omega , {\mathcal A}) $
 +
of all its elementary events $  \omega $
 +
and quantitatively by a probability distribution $  P $
 +
of the events. The statistician knows only the qualitative description of $  \phi $,  
 +
and has only incomplete information on $  P $
 +
of the type $  P \in {\mathcal P} $,  
 +
where $  {\mathcal P} $
 +
is a family of probability distributions. By making one or more observations of $  \phi $
 +
and processing the data thus obtained, the statistician has to make a decision on $  P $
 +
and choose the most profitable way to proceed (in particular, it may be decided that insufficient material has been collected and that the set of observations has to be extended before final inferences be made). In classical problems of mathematical statistics, the number of independent observations (the size of the sample) was fixed and optimal estimators of the unknown distribution $  P $
 +
were sought. The general modern conception of a statistical decision is attributed to A. Wald (see [[#References|[2]]]). It is assumed that every experiment has a cost which has to be paid for, and the statistician must meet the loss of a wrong decision by paying the  "fine"  corresponding to his error. Therefore, from the statistician's point of view, a decision rule (procedure) $  \Pi $
 +
is optimal when it minimizes the risk $  \mathfrak R = \mathfrak R ( P, \Pi ) $—  
 +
the mathematical expectation of his total loss. This approach was proposed by Wald as the basis of statistical [[Sequential analysis|sequential analysis]] and led to the creation in [[Statistical quality control|statistical quality control]] of procedures which, with the same accuracy of inference, use on the average almost half the number of observations as the classical decision rule. In the formulation described, any statistical decision problem can be seen as a two-player game in the sense of J. von Neumann, in which the statistician is one of the players and nature is the other (see [[#References|[3]]]). However, as early as 1820, P. Laplace had likewise described a statistical estimation problem as a game of chance in which the statistician is defeated if his estimates are bad.
  
The value of the risk <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732014.png" /> depends both on the decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732015.png" /> and on the probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732016.png" /> that governs the distribution of the results of the observed phenomenon. As this  "true"  value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732017.png" /> is unknown, the entire risk function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732018.png" /> has to be minimized with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732019.png" /> as a function in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732020.png" /> for a given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732021.png" />. A decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732022.png" /> is said to be uniformly better than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732023.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732024.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732025.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732026.png" /> for at least one <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732027.png" />. A decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732028.png" /> is said to be admissible if no uniformly-better decision rules exist. A class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732029.png" /> of decision rules is said to be complete (essentially complete) if for any decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732030.png" /> there is a uniformly-better (not worse) decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732031.png" />. The most important is a minimal complete class of decision rules which coincides (when it exists) with the set of all admissible decision rules. If the minimal complete class contains precisely one decision rule, then it will be optimal. Generally, the risk functions corresponding to admissible decision rules must also be compared by the value of some other functional, for example, the maximum risk. The optimal decision rule <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732032.png" /> in this sense,
+
The value of the risk $  \mathfrak R ( P, \Pi ) $
 +
depends both on the decision rule $  \Pi $
 +
and on the probability distribution $  P $
 +
that governs the distribution of the results of the observed phenomenon. As this  "true"  value of $  P $
 +
is unknown, the entire risk function $  \mathfrak R ( P, \Pi ) $
 +
has to be minimized with respect to $  \Pi $
 +
as a function in $  P \in {\mathcal P} $
 +
for a given $  \Pi $.  
 +
A decision rule $  \Pi _ {1} $
 +
is said to be uniformly better than $  \Pi _ {2} $
 +
if $  \mathfrak R ( P, \Pi _ {1} ) \leq  \mathfrak R ( P, \Pi _ {2} ) $
 +
for all $  P \in {\mathcal P} $
 +
and $  \mathfrak R ( P, \Pi _ {1} ) < \mathfrak R ( P, \Pi _ {2} ) $
 +
for at least one $  P \in {\mathcal P} $.  
 +
A decision rule $  \Pi $
 +
is said to be admissible if no uniformly-better decision rules exist. A class $  C $
 +
of decision rules is said to be complete (essentially complete) if for any decision rule $  \Pi \notin C $
 +
there is a uniformly-better (not worse) decision rule $  \Pi  ^  \star  \in C $.  
 +
The most important is a minimal complete class of decision rules which coincides (when it exists) with the set of all admissible decision rules. If the minimal complete class contains precisely one decision rule, then it will be optimal. Generally, the risk functions corresponding to admissible decision rules must also be compared by the value of some other functional, for example, the maximum risk. The optimal decision rule $  \Pi _ {0} $
 +
in this sense,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732033.png" /></td> </tr></table>
+
$$
 +
\sup _ {P \in {\mathcal P} }  \mathfrak R ( P, \Pi _ {0} )  = \
 +
\inf _  \Pi  \sup _ {P \in {\mathcal P} }  \mathfrak R ( P, \Pi )  = \mathfrak R  ^  \star  ,
 +
$$
  
 
is called the minimax rule. Comparison using the Bayesian risk is also possible:
 
is called the minimax rule. Comparison using the Bayesian risk is also possible:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732034.png" /></td> </tr></table>
+
$$
 +
\mathfrak R _  \mu  ( \Pi )  = \int\limits _  {\mathcal P}  \mathfrak R ( P, \Pi ) \mu \{ dP( \cdot ) \}
 +
$$
  
— averaging the risk over an a priori probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732035.png" /> on the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732036.png" />. This choice of functional is natural, especially when sets of experiments are repeated with a fixed marginal distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732037.png" /> in the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732038.png" />-th set, whereas the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732039.png" /> prove to be a random series of measures with unknown distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732040.png" /> (see [[Bayesian approach|Bayesian approach]]). The optimal decision rule in this sense,
+
— averaging the risk over an a priori probability distribution $  \mu $
 +
on the family $  {\mathcal P} $.  
 +
This choice of functional is natural, especially when sets of experiments are repeated with a fixed marginal distribution $  P _ {m} $
 +
in the $  m $-
 +
th set, whereas the $  \{ P _ {1} , P _ {2} ,\dots \} $
 +
prove to be a random series of measures with unknown distribution $  \mu $(
 +
see [[Bayesian approach|Bayesian approach]]). The optimal decision rule in this sense,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732041.png" /></td> </tr></table>
+
$$
 +
\mathfrak R _  \mu  ( \Pi _ {0} )  = \inf _  \Pi  \mathfrak R _  \mu  ( \Pi ),
 +
$$
  
is called the Bayesian decision rule with [[A priori distribution|a priori distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732042.png" />. Finally, an a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732043.png" /> is said to be least favourable (for the given problem) if
+
is called the Bayesian decision rule with [[A priori distribution|a priori distribution]] $  \mu $.  
 +
Finally, an a priori distribution $  \nu $
 +
is said to be least favourable (for the given problem) if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732044.png" /></td> </tr></table>
+
$$
 +
\inf _  \Pi  \mathfrak R _  \nu  ( \Pi )  = \
 +
\sup _  \mu  \inf _  \Pi  \mathfrak R _  \nu  ( \Pi )  = \mathfrak R _ {0} .
 +
$$
  
Under very general assumptions it has been proved that: 1) for any a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732045.png" />, a Bayesian decision rule exists; 2) the totality of all Bayes decision rules and their limits forms a complete class; and 3) minimax decision rules exist and are Bayesian rules relative to the least-favourable a priori distribution, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732046.png" /> (see [[#References|[4]]]). The concrete form of optimal decision rules essentially depends on the type of statistical problem. However, in classical problems of statistical estimation, the optimal decision rule when the samples are large depends weakly on the chosen method of comparing risk functions.
+
Under very general assumptions it has been proved that: 1) for any a priori distribution $  \mu $,  
 +
a Bayesian decision rule exists; 2) the totality of all Bayes decision rules and their limits forms a complete class; and 3) minimax decision rules exist and are Bayesian rules relative to the least-favourable a priori distribution, and $  \mathfrak R  ^  \star  = \mathfrak R _ {0} $(
 +
see [[#References|[4]]]). The concrete form of optimal decision rules essentially depends on the type of statistical problem. However, in classical problems of statistical estimation, the optimal decision rule when the samples are large depends weakly on the chosen method of comparing risk functions.
  
Decision rules in problems of statistical decision theory can be deterministic or randomized. Deterministic rules are defined by functions, for example by a measurable mapping of the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732047.png" /> of all samples <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732048.png" /> of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732049.png" /> onto a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732050.png" /> of decisions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732051.png" />. Randomized rules are defined by Markov transition probability distributions of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732052.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732053.png" /> into <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732054.png" />, which describe the probability distribution according to which the selected value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732055.png" /> must also be independently  "chosen"  (see [[Statistical experiments, method of|Statistical experiments, method of]]; [[Monte-Carlo method|Monte-Carlo method]]). The allowance of randomized procedures makes the set of decision rules of the problem convex, which greatly facilitates theoretical analysis. Moreover, problems exist in which the optimal decision rule is randomized. Even so, statisticians try to avoid them whenever possible in practice, since the use of tables or other sources of random numbers for  "determining"  inferences complicates the work and even may seem unscientific.
+
Decision rules in problems of statistical decision theory can be deterministic or randomized. Deterministic rules are defined by functions, for example by a measurable mapping of the space $  \Omega  ^ {n} $
 +
of all samples $  ( \omega  ^ {(} 1) \dots \omega  ^ {(} n) ) $
 +
of size $  n $
 +
onto a measurable space $  ( \Delta , {\mathcal B}) $
 +
of decisions $  \delta $.  
 +
Randomized rules are defined by Markov transition probability distributions of the form $  \Pi ( \omega  ^ {(} 1) \dots \omega  ^ {(} n) ;  d \delta ) $
 +
from $  ( \Omega  ^ {n} , {\mathcal A}  ^ {n} ) $
 +
into $  ( \Delta , {\mathcal B}) $,  
 +
which describe the probability distribution according to which the selected value $  \delta $
 +
must also be independently  "chosen"  (see [[Statistical experiments, method of|Statistical experiments, method of]]; [[Monte-Carlo method|Monte-Carlo method]]). The allowance of randomized procedures makes the set of decision rules of the problem convex, which greatly facilitates theoretical analysis. Moreover, problems exist in which the optimal decision rule is randomized. Even so, statisticians try to avoid them whenever possible in practice, since the use of tables or other sources of random numbers for  "determining"  inferences complicates the work and even may seem unscientific.
  
A statistical decision rule is by definition a transition probability distribution from a certain measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732056.png" /> of results of the experiment into a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732057.png" /> of decisions. Conversely, every transition probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732058.png" /> can be interpreted as a decision rule in any statistical decision problem with a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732059.png" /> of results and a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732060.png" /> of inferences (it can also be interpreted as a memoryless communication channel with input alphabet <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732061.png" /> and output alphabet <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732062.png" />). The statistical decision rules form an algebraic category with objects <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732063.png" /> — the totality of all probability distributions on measurable spaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732064.png" />, and morphisms — transition probability distributions of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732065.png" />. The invariants and equivariants of this category define many natural concepts and laws of mathematical statistics (see [[#References|[5]]]). For example, an invariant Riemannian metric, unique up to a factor, exists on the objects of this category. It is defined by the Fisher [[Information matrix|information matrix]]. The morphisms of the category generate equivalence and order relations for parametrized families of probability distributions and for statistical decision problems, which permits one to give a natural definition of a [[Sufficient statistic|sufficient statistic]]. The Kullback non-symmetrical information deviation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732066.png" />, which characterizes the dissimilarity of the probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732067.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732068.png" /> (see [[Information distance|Information distance]]), is a monotone invariant in the category:
+
A statistical decision rule is by definition a transition probability distribution from a certain measurable space $  ( \Omega , {\mathcal A}) $
 +
of results of the experiment into a measurable space $  ( \Delta , {\mathcal B}) $
 +
of decisions. Conversely, every transition probability distribution $  \Pi ( \omega ;  d \delta ) $
 +
can be interpreted as a decision rule in any statistical decision problem with a measurable space $  ( \Omega , {\mathcal A}) $
 +
of results and a measurable space $  ( \Delta , {\mathcal B}) $
 +
of inferences (it can also be interpreted as a memoryless communication channel with input alphabet $  \Omega $
 +
and output alphabet $  \Delta $).  
 +
The statistical decision rules form an algebraic category with objects $  \mathop{\rm Cap} ( \Omega , {\mathcal A}) $—  
 +
the totality of all probability distributions on measurable spaces $  ( \Omega , {\mathcal A}) $,  
 +
and morphisms — transition probability distributions of $  \Pi $.  
 +
The invariants and equivariants of this category define many natural concepts and laws of mathematical statistics (see [[#References|[5]]]). For example, an invariant Riemannian metric, unique up to a factor, exists on the objects of this category. It is defined by the Fisher [[Information matrix|information matrix]]. The morphisms of the category generate equivalence and order relations for parametrized families of probability distributions and for statistical decision problems, which permits one to give a natural definition of a [[Sufficient statistic|sufficient statistic]]. The Kullback non-symmetrical information deviation $  I( Q: P) $,  
 +
which characterizes the dissimilarity of the probability distributions $  Q $
 +
and $  P $(
 +
see [[Information distance|Information distance]]), is a monotone invariant in the category:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732069.png" /></td> </tr></table>
+
$$
 +
I( Q _ {1} : P _ {1} )  \geq  I( Q _ {2} : P _ {2} )
 +
$$
  
if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732070.png" />, i.e. if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732071.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732072.png" /> for a certain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732073.png" />. If in the problem of statistical estimation by a sample of fixed size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732074.png" /> there is a need to estimate the actual marginal probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732075.png" /> of the results of observations, which belongs a priori to a smooth family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732076.png" />, then, given the choice <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732077.png" /> for an invariant loss function for the decision <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732078.png" />, the minimax risk proved to be
+
if $  ( Q _ {1} , P _ {1} ) \geq  ( Q _ {2} , P _ {2} ) $,  
 +
i.e. if $  Q _ {2} = Q \Pi $
 +
and $  P _ {2} = P _ {1} \Pi $
 +
for a certain $  \Pi $.  
 +
If in the problem of statistical estimation by a sample of fixed size $  N $
 +
there is a need to estimate the actual marginal probability distribution $  P $
 +
of the results of observations, which belongs a priori to a smooth family $  {\mathcal P} $,  
 +
then, given the choice $  2I( Q: P) $
 +
for an invariant loss function for the decision $  Q $,  
 +
the minimax risk proved to be
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087320/s08732079.png" /></td> </tr></table>
+
$$
 +
\mathfrak R  ^  \star  = N  ^ {-} 1  \mathop{\rm dim}  {\mathcal P} + o( N  ^ {-} 1 ) .
 +
$$
  
 
The logic of quantum events is not Aristotelean; random phenomena of the micro-physics are therefore not a subject of classical probability theory. The formalism designed to describe them accepts the existence of non-commuting random variables and contains the classical theory as a degenerate commutative scheme. In the corresponding interpretation, many problems of the theory of quantum-mechanical measurements become non-commutative analogues of problems of statistical decision theory (see [[#References|[6]]]).
 
The logic of quantum events is not Aristotelean; random phenomena of the micro-physics are therefore not a subject of classical probability theory. The formalism designed to describe them accepts the existence of non-commuting random variables and contains the classical theory as a degenerate commutative scheme. In the corresponding interpretation, many problems of the theory of quantum-mechanical measurements become non-commutative analogues of problems of statistical decision theory (see [[#References|[6]]]).
Line 35: Line 134:
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A. Wald,  "Sequential analysis" , Wiley  (1947)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  A. Wald,  "Statistical decision functions" , Wiley  (1950)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  J. von Neumann,  O. Morgenstern,  "The theory of games and economic behavior" , Princeton Univ. Press  (1944)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  E.L. Lehmann,  "Testing statistical hypotheses" , Wiley  (1986)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  N.N. Chentsov,  "Statistical decision rules and optimal inference" , Amer. Math. Soc.  (1982)  (Translated from Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  A.S. Kholevo,  "Probabilistic and statistical aspects of quantum theory" , North-Holland  (1982)  (Translated from Russian)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  A. Wald,  "Sequential analysis" , Wiley  (1947)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  A. Wald,  "Statistical decision functions" , Wiley  (1950)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  J. von Neumann,  O. Morgenstern,  "The theory of games and economic behavior" , Princeton Univ. Press  (1944)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  E.L. Lehmann,  "Testing statistical hypotheses" , Wiley  (1986)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  N.N. Chentsov,  "Statistical decision rules and optimal inference" , Amer. Math. Soc.  (1982)  (Translated from Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  A.S. Kholevo,  "Probabilistic and statistical aspects of quantum theory" , North-Holland  (1982)  (Translated from Russian)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
 
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J.O. Berger,  "Statistical decision theory and Bayesian analysis" , Springer  (1985)</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J.O. Berger,  "Statistical decision theory and Bayesian analysis" , Springer  (1985)</TD></TR></table>

Latest revision as of 08:23, 6 June 2020


A general theory for the processing and use of statistical observations. In a broader interpretation of the term, statistical decision theory is the theory of choosing an optimal non-deterministic behaviour in incompletely known situations.

Inverse problems of probability theory are a subject of mathematical statistics. Suppose that a random phenomenon $ \phi $ occurs, described qualitatively by the measure space $ ( \Omega , {\mathcal A}) $ of all its elementary events $ \omega $ and quantitatively by a probability distribution $ P $ of the events. The statistician knows only the qualitative description of $ \phi $, and has only incomplete information on $ P $ of the type $ P \in {\mathcal P} $, where $ {\mathcal P} $ is a family of probability distributions. By making one or more observations of $ \phi $ and processing the data thus obtained, the statistician has to make a decision on $ P $ and choose the most profitable way to proceed (in particular, it may be decided that insufficient material has been collected and that the set of observations has to be extended before final inferences be made). In classical problems of mathematical statistics, the number of independent observations (the size of the sample) was fixed and optimal estimators of the unknown distribution $ P $ were sought. The general modern conception of a statistical decision is attributed to A. Wald (see [2]). It is assumed that every experiment has a cost which has to be paid for, and the statistician must meet the loss of a wrong decision by paying the "fine" corresponding to his error. Therefore, from the statistician's point of view, a decision rule (procedure) $ \Pi $ is optimal when it minimizes the risk $ \mathfrak R = \mathfrak R ( P, \Pi ) $— the mathematical expectation of his total loss. This approach was proposed by Wald as the basis of statistical sequential analysis and led to the creation in statistical quality control of procedures which, with the same accuracy of inference, use on the average almost half the number of observations as the classical decision rule. In the formulation described, any statistical decision problem can be seen as a two-player game in the sense of J. von Neumann, in which the statistician is one of the players and nature is the other (see [3]). However, as early as 1820, P. Laplace had likewise described a statistical estimation problem as a game of chance in which the statistician is defeated if his estimates are bad.

The value of the risk $ \mathfrak R ( P, \Pi ) $ depends both on the decision rule $ \Pi $ and on the probability distribution $ P $ that governs the distribution of the results of the observed phenomenon. As this "true" value of $ P $ is unknown, the entire risk function $ \mathfrak R ( P, \Pi ) $ has to be minimized with respect to $ \Pi $ as a function in $ P \in {\mathcal P} $ for a given $ \Pi $. A decision rule $ \Pi _ {1} $ is said to be uniformly better than $ \Pi _ {2} $ if $ \mathfrak R ( P, \Pi _ {1} ) \leq \mathfrak R ( P, \Pi _ {2} ) $ for all $ P \in {\mathcal P} $ and $ \mathfrak R ( P, \Pi _ {1} ) < \mathfrak R ( P, \Pi _ {2} ) $ for at least one $ P \in {\mathcal P} $. A decision rule $ \Pi $ is said to be admissible if no uniformly-better decision rules exist. A class $ C $ of decision rules is said to be complete (essentially complete) if for any decision rule $ \Pi \notin C $ there is a uniformly-better (not worse) decision rule $ \Pi ^ \star \in C $. The most important is a minimal complete class of decision rules which coincides (when it exists) with the set of all admissible decision rules. If the minimal complete class contains precisely one decision rule, then it will be optimal. Generally, the risk functions corresponding to admissible decision rules must also be compared by the value of some other functional, for example, the maximum risk. The optimal decision rule $ \Pi _ {0} $ in this sense,

$$ \sup _ {P \in {\mathcal P} } \mathfrak R ( P, \Pi _ {0} ) = \ \inf _ \Pi \sup _ {P \in {\mathcal P} } \mathfrak R ( P, \Pi ) = \mathfrak R ^ \star , $$

is called the minimax rule. Comparison using the Bayesian risk is also possible:

$$ \mathfrak R _ \mu ( \Pi ) = \int\limits _ {\mathcal P} \mathfrak R ( P, \Pi ) \mu \{ dP( \cdot ) \} $$

— averaging the risk over an a priori probability distribution $ \mu $ on the family $ {\mathcal P} $. This choice of functional is natural, especially when sets of experiments are repeated with a fixed marginal distribution $ P _ {m} $ in the $ m $- th set, whereas the $ \{ P _ {1} , P _ {2} ,\dots \} $ prove to be a random series of measures with unknown distribution $ \mu $( see Bayesian approach). The optimal decision rule in this sense,

$$ \mathfrak R _ \mu ( \Pi _ {0} ) = \inf _ \Pi \mathfrak R _ \mu ( \Pi ), $$

is called the Bayesian decision rule with a priori distribution $ \mu $. Finally, an a priori distribution $ \nu $ is said to be least favourable (for the given problem) if

$$ \inf _ \Pi \mathfrak R _ \nu ( \Pi ) = \ \sup _ \mu \inf _ \Pi \mathfrak R _ \nu ( \Pi ) = \mathfrak R _ {0} . $$

Under very general assumptions it has been proved that: 1) for any a priori distribution $ \mu $, a Bayesian decision rule exists; 2) the totality of all Bayes decision rules and their limits forms a complete class; and 3) minimax decision rules exist and are Bayesian rules relative to the least-favourable a priori distribution, and $ \mathfrak R ^ \star = \mathfrak R _ {0} $( see [4]). The concrete form of optimal decision rules essentially depends on the type of statistical problem. However, in classical problems of statistical estimation, the optimal decision rule when the samples are large depends weakly on the chosen method of comparing risk functions.

Decision rules in problems of statistical decision theory can be deterministic or randomized. Deterministic rules are defined by functions, for example by a measurable mapping of the space $ \Omega ^ {n} $ of all samples $ ( \omega ^ {(} 1) \dots \omega ^ {(} n) ) $ of size $ n $ onto a measurable space $ ( \Delta , {\mathcal B}) $ of decisions $ \delta $. Randomized rules are defined by Markov transition probability distributions of the form $ \Pi ( \omega ^ {(} 1) \dots \omega ^ {(} n) ; d \delta ) $ from $ ( \Omega ^ {n} , {\mathcal A} ^ {n} ) $ into $ ( \Delta , {\mathcal B}) $, which describe the probability distribution according to which the selected value $ \delta $ must also be independently "chosen" (see Statistical experiments, method of; Monte-Carlo method). The allowance of randomized procedures makes the set of decision rules of the problem convex, which greatly facilitates theoretical analysis. Moreover, problems exist in which the optimal decision rule is randomized. Even so, statisticians try to avoid them whenever possible in practice, since the use of tables or other sources of random numbers for "determining" inferences complicates the work and even may seem unscientific.

A statistical decision rule is by definition a transition probability distribution from a certain measurable space $ ( \Omega , {\mathcal A}) $ of results of the experiment into a measurable space $ ( \Delta , {\mathcal B}) $ of decisions. Conversely, every transition probability distribution $ \Pi ( \omega ; d \delta ) $ can be interpreted as a decision rule in any statistical decision problem with a measurable space $ ( \Omega , {\mathcal A}) $ of results and a measurable space $ ( \Delta , {\mathcal B}) $ of inferences (it can also be interpreted as a memoryless communication channel with input alphabet $ \Omega $ and output alphabet $ \Delta $). The statistical decision rules form an algebraic category with objects $ \mathop{\rm Cap} ( \Omega , {\mathcal A}) $— the totality of all probability distributions on measurable spaces $ ( \Omega , {\mathcal A}) $, and morphisms — transition probability distributions of $ \Pi $. The invariants and equivariants of this category define many natural concepts and laws of mathematical statistics (see [5]). For example, an invariant Riemannian metric, unique up to a factor, exists on the objects of this category. It is defined by the Fisher information matrix. The morphisms of the category generate equivalence and order relations for parametrized families of probability distributions and for statistical decision problems, which permits one to give a natural definition of a sufficient statistic. The Kullback non-symmetrical information deviation $ I( Q: P) $, which characterizes the dissimilarity of the probability distributions $ Q $ and $ P $( see Information distance), is a monotone invariant in the category:

$$ I( Q _ {1} : P _ {1} ) \geq I( Q _ {2} : P _ {2} ) $$

if $ ( Q _ {1} , P _ {1} ) \geq ( Q _ {2} , P _ {2} ) $, i.e. if $ Q _ {2} = Q \Pi $ and $ P _ {2} = P _ {1} \Pi $ for a certain $ \Pi $. If in the problem of statistical estimation by a sample of fixed size $ N $ there is a need to estimate the actual marginal probability distribution $ P $ of the results of observations, which belongs a priori to a smooth family $ {\mathcal P} $, then, given the choice $ 2I( Q: P) $ for an invariant loss function for the decision $ Q $, the minimax risk proved to be

$$ \mathfrak R ^ \star = N ^ {-} 1 \mathop{\rm dim} {\mathcal P} + o( N ^ {-} 1 ) . $$

The logic of quantum events is not Aristotelean; random phenomena of the micro-physics are therefore not a subject of classical probability theory. The formalism designed to describe them accepts the existence of non-commuting random variables and contains the classical theory as a degenerate commutative scheme. In the corresponding interpretation, many problems of the theory of quantum-mechanical measurements become non-commutative analogues of problems of statistical decision theory (see [6]).

References

[1] A. Wald, "Sequential analysis" , Wiley (1947)
[2] A. Wald, "Statistical decision functions" , Wiley (1950)
[3] J. von Neumann, O. Morgenstern, "The theory of games and economic behavior" , Princeton Univ. Press (1944)
[4] E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1986)
[5] N.N. Chentsov, "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian)
[6] A.S. Kholevo, "Probabilistic and statistical aspects of quantum theory" , North-Holland (1982) (Translated from Russian)

Comments

References

[a1] J.O. Berger, "Statistical decision theory and Bayesian analysis" , Springer (1985)
How to Cite This Entry:
Statistical decision theory. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Statistical_decision_theory&oldid=15126
This article was adapted from an original article by N.N. Chentsov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article