Namespaces
Variants
Actions

Difference between revisions of "Statistical estimation"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
Line 1: Line 1:
 +
<!--
 +
s0873501.png
 +
$#A+1 = 182 n = 0
 +
$#C+1 = 182 : ~/encyclopedia/old_files/data/S087/S.0807350 Statistical estimation
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
One of the fundamental parts of mathematical statistics, dedicated to the estimation using random observations of various characteristics of their distribution.
 
One of the fundamental parts of mathematical statistics, dedicated to the estimation using random observations of various characteristics of their distribution.
  
 
===Example 1.===
 
===Example 1.===
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873501.png" /> be independent random variables (or observations) with a common unknown distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873502.png" /> on the straight line. The empirical (sample) distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873503.png" /> which ascribes the weight <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873504.png" /> to every random point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873505.png" /> is a [[Statistical estimator|statistical estimator]] for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873506.png" />. The empirical moments
+
Let $  X _ {1} \dots X _ {n} $
 +
be independent random variables (or observations) with a common unknown distribution $  {\mathcal P} $
 +
on the straight line. The empirical (sample) distribution $  {\mathcal P} _ {n}  ^  \star  $
 +
which ascribes the weight $  1/n $
 +
to every random point $  X _ {n} $
 +
is a [[Statistical estimator|statistical estimator]] for $  {\mathcal P} $.  
 +
The empirical moments
 +
 
 +
$$
 +
a _  \nu  =  \int\limits x  ^  \nu  d {\mathcal P} _ {n}  ^  \star  = \
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873507.png" /></td> </tr></table>
+
\frac{1}{n}
 +
\sum _ { i= } 1 ^ { n }  X _ {i}  $$
  
serve as estimators for the moments <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873508.png" />. In particular,
+
serve as estimators for the moments $  \alpha _  \nu  = \int x  ^  \nu  d {\mathcal P} $.  
 +
In particular,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s0873509.png" /></td> </tr></table>
+
$$
 +
\overline{X}\; =
 +
\frac{1}{n}
 +
\sum _ { i= } 1 ^ { n }  X _ {i}  $$
  
 
is an estimator for the mean, and
 
is an estimator for the mean, and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735010.png" /></td> </tr></table>
+
$$
 +
s  ^ {2}  =
 +
\frac{1}{n}
 +
\sum _ { i= } 1 ^ { n }  ( X _ {i} - \overline{X}\; )  ^ {2}
 +
$$
  
 
is an estimator for the variance.
 
is an estimator for the variance.
  
 
==Basic concepts.==
 
==Basic concepts.==
In the general theory of estimation, an observation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735011.png" /> is a [[Random element|random element]] with values in a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735012.png" />, whose unknown distribution belongs to a given family of distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735013.png" />. The family of distributions can always be parametrized and written in the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735014.png" />. Here the form of dependence on the parameter and the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735015.png" /> are assumed to be known. The problem of estimation using an observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735016.png" /> of an unknown parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735017.png" /> or of the value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735018.png" /> of a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735019.png" /> at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735020.png" /> consists of constructing a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735021.png" /> from the observations made, which gives a sufficiently good approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735022.png" /> <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735023.png" />.
+
In the general theory of estimation, an observation of $  X $
 +
is a [[Random element|random element]] with values in a measurable space $  ( \mathfrak X , \mathfrak A) $,  
 +
whose unknown distribution belongs to a given family of distributions $  P $.  
 +
The family of distributions can always be parametrized and written in the form $  \{ { {\mathcal P} _  \theta  } : {\theta \in \Theta } \} $.  
 +
Here the form of dependence on the parameter and the set $  \Theta $
 +
are assumed to be known. The problem of estimation using an observation $  X $
 +
of an unknown parameter $  \theta $
 +
or of the value $  g( \theta ) $
 +
of a function $  g $
 +
at the point $  \theta $
 +
consists of constructing a function $  \theta  ^  \star  ( X) $
 +
from the observations made, which gives a sufficiently good approximation of $  \theta $
 +
$  ( g( \theta )) $.
  
A comparison of estimators is carried out in the following way. Let a non-negative loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735024.png" /> be defined on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735025.png" /> <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735026.png" />, the sense of this being that the use of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735027.png" /> for the actual value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735028.png" /> leads to losses <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735029.png" />. The mean losses and the risk function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735030.png" /> are taken as a measure of the quality of the statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735031.png" /> as an estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735032.png" /> given the loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735033.png" />. A partial order relation is thereby introduced on the set of estimators: An estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735034.png" /> is preferable to an estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735035.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735036.png" />. In particular, an estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735037.png" /> of the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735038.png" /> is said to be inadmissible (in relation to the loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735039.png" />) if an estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735040.png" /> exists such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735041.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735042.png" />, and for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735043.png" /> strict inequality occurs. In this method of comparing the quality of estimators, many estimators prove to be incomparable, and, moreover, the choice of a loss function is to a large extent arbitrary.
+
A comparison of estimators is carried out in the following way. Let a non-negative loss function $  w( y _ {1} ;  y _ {2} ) $
 +
be defined on $  \Theta \times \Theta $
 +
$  ( g( \Theta ) \times g( \Theta )) $,  
 +
the sense of this being that the use of $  \theta  ^  \star  $
 +
for the actual value of $  \theta $
 +
leads to losses $  w( \theta  ^  \star  ;  \theta ) $.  
 +
The mean losses and the risk function $  R _ {w} ( \theta  ^  \star  ;  \theta ) = {\mathsf E} _  \theta  w( \theta  ^  \star  ;  \theta ) $
 +
are taken as a measure of the quality of the statistic $  \theta  ^  \star  $
 +
as an estimator of $  \theta $
 +
given the loss function $  w $.  
 +
A partial order relation is thereby introduced on the set of estimators: An estimator $  T _ {1} $
 +
is preferable to an estimator $  T _ {2} $
 +
if $  R _ {w} ( T _ {1} ;  \theta ) \leq  R _ {w} ( T _ {2} ;  \theta ) $.  
 +
In particular, an estimator $  T $
 +
of the parameter $  \theta $
 +
is said to be inadmissible (in relation to the loss function $  w $)  
 +
if an estimator $  T  ^  \prime  $
 +
exists such that $  R _ {w} ( T  ^  \prime  ;  \theta ) \leq  R _ {w} ( T;  \theta ) $
 +
for all $  \theta \in \Theta $,  
 +
and for some $  \theta $
 +
strict inequality occurs. In this method of comparing the quality of estimators, many estimators prove to be incomparable, and, moreover, the choice of a loss function is to a large extent arbitrary.
  
 
It is sometimes possible to find estimators that are optimal within a certain narrower class of estimators. Unbiased estimators form an important class. If the initial experiment is invariant relative to a certain group of transformations, it is natural to restrict to estimators that do not disrupt the symmetry of the problem (see [[Equivariant estimator|Equivariant estimator]]).
 
It is sometimes possible to find estimators that are optimal within a certain narrower class of estimators. Unbiased estimators form an important class. If the initial experiment is invariant relative to a certain group of transformations, it is natural to restrict to estimators that do not disrupt the symmetry of the problem (see [[Equivariant estimator|Equivariant estimator]]).
  
Estimators can be compared by their behaviour at  "worst"  points: An estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735044.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735045.png" /> is called a [[Minimax estimator|minimax estimator]] relative to the loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735046.png" /> if
+
Estimators can be compared by their behaviour at  "worst"  points: An estimator $  T _ {0} $
 +
of $  \theta $
 +
is called a [[Minimax estimator|minimax estimator]] relative to the loss function $  w $
 +
if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735047.png" /></td> </tr></table>
+
$$
 +
\sup _  \theta  R _ {w} ( T _ {0} ; \theta )  = \
 +
\inf _ { T }  \sup _  \theta  R _ {w} ( T; \theta ) ,
 +
$$
  
where the lower bound is taken over all estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735048.png" />.
+
where the lower bound is taken over all estimators $  T = T( X) $.
  
In the Bayesian formulation of the problem (cf. [[Bayesian approach|Bayesian approach]]), the unknown parameter is considered to represent values of the random variable with [[A priori distribution|a priori distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735049.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735050.png" />. In this case, the best estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735051.png" /> relative to the loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735052.png" /> is defined by the relation
+
In the Bayesian formulation of the problem (cf. [[Bayesian approach|Bayesian approach]]), the unknown parameter is considered to represent values of the random variable with [[A priori distribution|a priori distribution]] $  Q $
 +
on $  \Theta $.  
 +
In this case, the best estimator $  T _ {0} $
 +
relative to the loss function $  w $
 +
is defined by the relation
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735053.png" /></td> </tr></table>
+
$$
 +
r _ {w} ( T _ {0} )  = \
 +
{\mathsf E} _ {w} ( T _ {0} ; \theta )  = \
 +
\int\limits _  \Theta  {\mathsf E} _  \theta  w( T _ {0} ; \theta ) Q( d \theta ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735054.png" /></td> </tr></table>
+
$$
 +
= \
 +
\inf _ { T }  \int\limits _  \Theta  {\mathsf E} _  \theta  w( T; \theta ) Q( d \theta ) ,
 +
$$
  
and the lower bound is taken over all estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735055.png" />.
+
and the lower bound is taken over all estimators $  T = T( X) $.
  
There is a distinction between parametric estimation problems, in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735056.png" /> is a subset of a finite-dimensional Euclidean space, and non-parametric problems. In parametric problems one usually considers loss functions in the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735057.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735058.png" /> is a non-negative, non-decreasing function on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735059.png" />. The most frequently used quadratic loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735060.png" /> plays an important part.
+
There is a distinction between parametric estimation problems, in which $  \Theta $
 +
is a subset of a finite-dimensional Euclidean space, and non-parametric problems. In parametric problems one usually considers loss functions in the form $  l( | \theta _ {1} - \theta _ {2} | ) $,  
 +
where $  l $
 +
is a non-negative, non-decreasing function on $  \mathbf R  ^ {+} $.  
 +
The most frequently used quadratic loss function $  | \theta _ {1} - \theta _ {2} |  ^ {2} $
 +
plays an important part.
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735061.png" /> is a [[Sufficient statistic|sufficient statistic]] for the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735062.png" />, then it is often possible to restrict to estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735063.png" />. Thus, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735065.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735066.png" /> is a convex function and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735067.png" /> is any estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735068.png" />, an estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735069.png" /> exists that is not worse than <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735070.png" />; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735071.png" /> is unbiased, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735072.png" /> can also be chosen unbiased (Blackwell's theorem). If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735073.png" /> is a complete sufficient statistic for the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735074.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735075.png" /> is an unbiased estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735076.png" />, then an unbiased estimator in the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735077.png" /> with minimum variance in the class of unbiased estimators exists (the Lehmann–Scheffé theorem).
+
If $  T = T( X) $
 +
is a [[Sufficient statistic|sufficient statistic]] for the family $  \{ { {\mathcal P} _  \theta  } : {\theta \in \Theta } \} $,  
 +
then it is often possible to restrict to estimators $  \theta  ^  \star  = h( T) $.  
 +
Thus, if $  \Theta \in \mathbf R  ^ {k} $,  
 +
$  w( \theta _ {1} ;  \theta _ {2} ) = l( | \theta _ {1} - \theta _ {2} | ) $,  
 +
where $  l $
 +
is a convex function and $  \theta  ^  \star  $
 +
is any estimator for $  \theta $,  
 +
an estimator $  h( T) $
 +
exists that is not worse than $  \theta  ^  \star  $;  
 +
if $  \theta  ^  \star  $
 +
is unbiased, $  h( T) $
 +
can also be chosen unbiased (Blackwell's theorem). If $  T $
 +
is a complete sufficient statistic for the family $  \{ {\mathcal P} _  \theta  \} $
 +
and $  \theta  ^  \star  $
 +
is an unbiased estimator for $  g( \theta ) $,  
 +
then an unbiased estimator in the form $  h( T) $
 +
with minimum variance in the class of unbiased estimators exists (the Lehmann–Scheffé theorem).
  
As a rule, it is assumed that in parametric estimation problems the elements of the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735078.png" /> are absolutely continuous with respect to a certain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735079.png" />-finite measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735080.png" /> and that the density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735081.png" /> exists. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735082.png" /> is a sufficiently-smooth function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735083.png" /> and the Fisher information matrix
+
As a rule, it is assumed that in parametric estimation problems the elements of the family $  \{ { {\mathcal P} _  \theta  } : {\theta \in \Theta } \} $
 +
are absolutely continuous with respect to a certain $  \sigma $-
 +
finite measure $  \mu $
 +
and that the density $  d {\mathcal P} _  \theta  /d \mu = p( x;  \theta ) $
 +
exists. If $  p( x;  \theta ) $
 +
is a sufficiently-smooth function of $  \theta $
 +
and the Fisher information matrix
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735084.png" /></td> </tr></table>
+
$$
 +
I( \theta )  = \
 +
\int\limits _ { \mathfrak X }
 +
\frac{dp}{d \theta }
 +
( x, \theta ) \left (
 +
\frac{dp}{d \theta }
 +
( x,\
 +
\theta ) \right )  ^ {T}
 +
\frac{\mu ( dx) }{p( x; \theta ) }
  
exists, the estimation problem is said to be regular. For regular problems, the accuracy of the estimation is bounded from below by the Cramér–Rao inequality: If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735085.png" />, then for any estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735086.png" />,
+
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735087.png" /></td> </tr></table>
+
exists, the estimation problem is said to be regular. For regular problems, the accuracy of the estimation is bounded from below by the Cramér–Rao inequality: If  $  \Theta \subset  \mathbf R  ^ {1} $,
 +
then for any estimator  $  T $,
 +
 
 +
$$
 +
{\mathsf E} _  \theta  | T- \theta |  ^ {2}  \geq  \
 +
 
 +
\frac{( 1+ ( db / {d \theta } ) ( \theta ))  ^ {2} }{I( \theta ) }
 +
+
 +
b  ^ {2} ( \theta ) ,\ \
 +
b( \theta )  = {\mathsf E} _  \theta  T- \theta .
 +
$$
  
 
===Examples of estimation problems 2.===
 
===Examples of estimation problems 2.===
The most widespread formulation is that in which a sample of size <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735088.png" /> is observed: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735089.png" /> are independent identically-distributed variables taking values in a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735090.png" /> with common distribution density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735091.png" /> relative to a measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735092.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735093.png" />. In regular problems, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735094.png" /> is the Fisher information on one observation, then the Fisher information of the whole sample <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735095.png" />. The Cramér–Rao inequality takes the form
+
The most widespread formulation is that in which a sample of size $  n $
 +
is observed: $  X _ {1} \dots X _ {n} $
 +
are independent identically-distributed variables taking values in a measurable space $  ( \mathfrak X , \mathfrak A) $
 +
with common distribution density $  f( x, \theta ) $
 +
relative to a measure $  \nu $,  
 +
and $  \theta \in \Theta $.  
 +
In regular problems, if $  I( \theta ) $
 +
is the Fisher information on one observation, then the Fisher information of the whole sample $  I _ {n} ( \theta ) = nI( \theta ) $.  
 +
The Cramér–Rao inequality takes the form
 +
 
 +
$$
 +
{\mathsf E} _  \theta  | T- \theta |  ^ {2}  \geq  \
 +
 
 +
\frac{( 1+ ( db / {d \theta } )( \theta ))  ^ {2} }{nI( \theta ) }
 +
+ b  ^ {2} ( \theta ),\ \
 +
$$
 +
 
 +
$$
 +
T  =  T( X _ {1} \dots X _ {n} ).
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735096.png" /></td> </tr></table>
+
$  2.1 $.  
 +
Let  $  X _ {j} $
 +
be normal random variables with distribution density
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735097.png" /></td> </tr></table>
+
$$
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735098.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s08735099.png" /> be normal random variables with distribution density
+
\frac{1}{\sqrt {2 \pi } }
 +
  \mathop{\rm exp}
 +
\left \{ -
 +
\frac{( x- a)  ^ {2} }{2 \sigma  ^ {2} }
 +
\right \} .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350100.png" /></td> </tr></table>
+
Let the unknown parameter be  $  \theta = ( a, \sigma  ^ {2} ) $;  
 +
$  \overline{X}\; $
 +
and  $  s  ^ {2} $
 +
can serve as estimators for  $  a $
 +
and  $  \sigma  ^ {2} $,
 +
and  $  ( \overline{X}\; , s  ^ {2} ) $
 +
is then a sufficient statistic. The estimator  $  \overline{X}\; $
 +
is unbiased, while  $  s  ^ {2} $
 +
is biased. If  $  \sigma  ^ {2} $
 +
is known,  $  \overline{X}\; $
 +
is an unbiased estimator of minimal variance, and is a minimax estimator relative to the quadratic loss function.
  
Let the unknown parameter be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350101.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350102.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350103.png" /> can serve as estimators for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350104.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350105.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350106.png" /> is then a sufficient statistic. The estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350107.png" /> is unbiased, while <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350108.png" /> is biased. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350109.png" /> is known, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350110.png" /> is an unbiased estimator of minimal variance, and is a minimax estimator relative to the quadratic loss function.
+
$  2.2 $.
 +
Let $  X _ {j} $
 +
be normal random variables in  $  \mathbf R  ^ {k} $
 +
with density
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350111.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350112.png" /> be normal random variables in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350113.png" /> with density
+
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350114.png" /></td> </tr></table>
+
\frac{1}{( 2 \pi )  ^ {k/2} }
 +
  \mathop{\rm exp} \left \{
 +
\frac{| x- \theta |  ^ {2} }{2}
 +
\right \} ,
 +
\  \theta \in \mathbf R  ^ {k} .
 +
$$
  
The statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350115.png" /> is an unbiased estimator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350116.png" />; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350117.png" />, it is admissible relative to the quadratic loss function, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350118.png" />, it is inadmissible.
+
The statistic $  \overline{X}\; $
 +
is an unbiased estimator of $  \theta $;  
 +
if $  k \leq  2 $,  
 +
it is admissible relative to the quadratic loss function, if $  k > 2 $,  
 +
it is inadmissible.
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350119.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350120.png" /> be random variables in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350121.png" /> with unknown distribution density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350122.png" /> belonging to a given family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350123.png" /> of densities. For a sufficiently broad class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350124.png" />, this is a non-parametric problem. The problem of estimating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350125.png" /> at a point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350126.png" /> is a problem of estimating the functional <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350127.png" />.
+
$  2.3 $.  
 +
Let $  X _ {j} $
 +
be random variables in $  \mathbf R  ^ {1} $
 +
with unknown distribution density $  f $
 +
belonging to a given family $  F $
 +
of densities. For a sufficiently broad class $  F $,  
 +
this is a non-parametric problem. The problem of estimating $  f( x _ {0} ) $
 +
at a point $  x _ {0} $
 +
is a problem of estimating the functional $  g( f) = f( x _ {0} ) $.
  
 
===Example 3.===
 
===Example 3.===
 
The linear regression model. The variables
 
The linear regression model. The variables
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350128.png" /></td> </tr></table>
+
$$
 +
X _ {i}  = \sum _ {\alpha = 1 } ^ { p }  a _ {\alpha i }  \theta _  \alpha  + \xi _ {i}  $$
  
are observed; the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350129.png" /> are random disturbances, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350130.png" />; the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350131.png" /> is known; and the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350132.png" /> must be estimated.
+
are observed; the $  \xi _ {i} $
 +
are random disturbances, $  i = 1 \dots n $;  
 +
the matrix $  \| a _ {\alpha i }  \| $
 +
is known; and the parameter $  ( \theta _ {1} \dots \theta _ {p} ) $
 +
must be estimated.
  
 
===Example 4.===
 
===Example 4.===
A segment of a stationary [[Gaussian process|Gaussian process]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350133.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350134.png" />, with rational spectral density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350135.png" /> is observed; the unknown parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350136.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350137.png" /> are to be estimated.
+
A segment of a stationary [[Gaussian process|Gaussian process]] $  x( t) $,  
 +
0 \leq  t \leq  T $,  
 +
with rational spectral density $  | \sum _ {j=} 0  ^ {m} a _ {j} \lambda  ^ {j} |  ^ {2} \cdot  | \sum _ {j=} 0 ^ {n} b _ {j} \lambda  ^ {j} |  ^ {-} 2 $
 +
is observed; the unknown parameters $  \{ a _ {j} \} $,  
 +
$  \{ b _ {j} \} $
 +
are to be estimated.
  
 
==Methods of producing estimators.==
 
==Methods of producing estimators.==
The most widely used [[Maximum-likelihood method|maximum-likelihood method]] recommends that the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350138.png" /> defined as the maximum point of the random function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350139.png" /> is taken, the so-called maximum-likelihood estimator. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350140.png" />, the maximum-likelihood estimators are to be found among the roots of the likelihood equation
+
The most widely used [[Maximum-likelihood method|maximum-likelihood method]] recommends that the estimator $  \widehat \theta  ( X) $
 +
defined as the maximum point of the random function $  p( X;  \theta ) $
 +
is taken, the so-called maximum-likelihood estimator. If $  \Theta \subset  \mathbf R  ^ {k} $,  
 +
the maximum-likelihood estimators are to be found among the roots of the likelihood equation
 +
 
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350141.png" /></td> </tr></table>
+
\frac{d}{d \theta }
 +
  \mathop{\rm ln}  p( \theta ; X)  = 0.
 +
$$
  
 
In example 3, the method of least squares (cf. [[Least squares, method of|Least squares, method of]]) recommends that the minimum point of the function
 
In example 3, the method of least squares (cf. [[Least squares, method of|Least squares, method of]]) recommends that the minimum point of the function
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350142.png" /></td> </tr></table>
+
$$
 +
m( \theta )  = \sum _ { i= } 1 ^ { n }  \left ( X _ {i} - \sum _  \alpha  a _ {\alpha i }
 +
\theta _  \alpha  \right )  ^ {2}
 +
$$
  
 
be used as the estimator.
 
be used as the estimator.
  
Another method is to take a Bayesian estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350143.png" /> relative to a loss function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350144.png" /> and an a priori distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350145.png" />, although the initial formulation is not Bayesian. For example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350146.png" />, it is possible to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350147.png" /> by means of
+
Another method is to take a Bayesian estimator $  T $
 +
relative to a loss function $  w $
 +
and an a priori distribution $  Q $,  
 +
although the initial formulation is not Bayesian. For example, if $  \Theta = \mathbf R  ^ {k} $,  
 +
it is possible to estimate $  \theta $
 +
by means of
 +
 
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350148.png" /></td> </tr></table>
+
\frac{\int\limits _ {- \infty } ^  \infty  \theta p ( X; \theta )  d \theta }{\int\limits _ {-
 +
\infty } ^  \infty  p( X; \theta )  d \theta }
 +
.
 +
$$
  
 
This is a Bayesian estimator relative to the quadratic loss function and a uniform a priori distribution.
 
This is a Bayesian estimator relative to the quadratic loss function and a uniform a priori distribution.
  
The method of moments (cf. [[Moments, method of (in probability theory)|Moments, method of (in probability theory)]]) consists of the following. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350149.png" />, and suppose that there are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350150.png" /> "good"  estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350151.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350152.png" />. Estimators by the method of moments are solutions of the system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350153.png" />. Empirical moments are frequently chosen in the capacity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350154.png" /> (see example 1).
+
The method of moments (cf. [[Moments, method of (in probability theory)|Moments, method of (in probability theory)]]) consists of the following. Let $  \Theta \subset  \mathbf R  ^ {k} $,  
 +
and suppose that there are  $  k $"
 +
good"  estimators $  a _ {1} ( X) \dots a _ {k} ( X) $
 +
for $  \alpha _ {1} ( \theta ) \dots \alpha _ {k} ( \theta ) $.  
 +
Estimators by the method of moments are solutions of the system $  \alpha _ {i} ( \theta ) = a _ {i} $.  
 +
Empirical moments are frequently chosen in the capacity of $  a _ {i} $(
 +
see example 1).
  
If the sample <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350155.png" /> is observed, then (see example 1) as an estimator for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350156.png" /> it is possible to choose <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350157.png" />. If the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350158.png" /> is not defined (for example, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350159.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350160.png" /> is Lebesgue measure), appropriate modifications <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350161.png" /> are chosen. For example, for an estimator of the density a [[Histogram|histogram]] or an estimator of the form
+
If the sample $  X _ {1} \dots X _ {n} $
 +
is observed, then (see example 1) as an estimator for $  g( {\mathcal P}) $
 +
it is possible to choose $  g( {\mathcal P} _ {n}  ^  \star  ) $.  
 +
If the function $  g( {\mathcal P} _ {n}  ^  \star  ) $
 +
is not defined (for example, $  g( {\mathcal P}) = ( d {\mathcal P} /d \lambda )( x) $,  
 +
where $  \lambda $
 +
is Lebesgue measure), appropriate modifications $  g _ {n} ( {\mathcal P} _ {n}  ^  \star  ) $
 +
are chosen. For example, for an estimator of the density a [[Histogram|histogram]] or an estimator of the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350162.png" /></td> </tr></table>
+
$$
 +
\int\limits \phi _ {n} ( x- y)  d {\mathcal P} _ {n}  ^  \star  ( y)
 +
$$
  
 
is used.
 
is used.
  
 
==Asymptotic behaviour of estimators.==
 
==Asymptotic behaviour of estimators.==
For the sake of being explicit a problem such as Example 2 is examined, in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350163.png" />. It is to be expected that when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350164.png" />,  "good"  estimators will get infinitely close to the characteristic being estimated. A sequence of estimators <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350165.png" /> is called a consistent sequence of estimators of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350166.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350167.png" /> in the probability <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350168.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350169.png" />. The above methods of producing estimators lead, under broad hypotheses, to consistent estimators (cf. [[Consistent estimator|Consistent estimator]]). The estimators in example 1 are consistent. For regular estimation problems, maximum-likelihood estimators and Bayesian estimators are asymptotically normal with mean <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350170.png" /> and correlation matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350171.png" />. Under such conditions, these estimators are asymptotically locally minimax relative to a broad class of loss functions, and they can be considered as being asymptotically optimal (see [[Asymptotically-efficient estimator|Asymptotically-efficient estimator]]).
+
For the sake of being explicit a problem such as Example 2 is examined, in which $  \Theta \subset  \mathbf R  ^ {k} $.  
 +
It is to be expected that when $  n \rightarrow \infty $,   
 +
"good"  estimators will get infinitely close to the characteristic being estimated. A sequence of estimators $  \theta _ {n}  ^  \star  ( X _ {1} \dots X _ {n} ) $
 +
is called a consistent sequence of estimators of $  \theta $
 +
if $  \theta _ {n}  ^  \star  \rightarrow \theta $
 +
in the probability $  P _  \theta  $
 +
for all $  \theta $.  
 +
The above methods of producing estimators lead, under broad hypotheses, to consistent estimators (cf. [[Consistent estimator|Consistent estimator]]). The estimators in example 1 are consistent. For regular estimation problems, maximum-likelihood estimators and Bayesian estimators are asymptotically normal with mean $  \theta $
 +
and correlation matrix $  ( nI( \theta ))  ^ {-} 1 $.  
 +
Under such conditions, these estimators are asymptotically locally minimax relative to a broad class of loss functions, and they can be considered as being asymptotically optimal (see [[Asymptotically-efficient estimator|Asymptotically-efficient estimator]]).
  
 
==Interval estimation.==
 
==Interval estimation.==
A random subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350172.png" /> of the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350173.png" /> is called a confidence region for the estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350174.png" /> with confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350175.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350176.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350177.png" />). Many confidence regions with a given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350178.png" /> usually exist, and the problem is to choose the one possessing certain optimal properties (for example, the interval of minimum length, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350179.png" />). Under the conditions of example 2.1, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350180.png" />. Then the interval
+
A random subset $  E = E( X) $
 +
of the set $  \Theta $
 +
is called a confidence region for the estimator $  \theta $
 +
with confidence coefficient $  \gamma $
 +
if $  P _  \theta  \{ E \supset \theta \} = \gamma $(
 +
$  \geq  \gamma $).  
 +
Many confidence regions with a given $  \gamma $
 +
usually exist, and the problem is to choose the one possessing certain optimal properties (for example, the interval of minimum length, if $  \Theta \subset  \mathbf R  ^ {1} $).  
 +
Under the conditions of example 2.1, let $  \sigma = 1 $.  
 +
Then the interval
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350181.png" /></td> </tr></table>
+
$$
 +
\left [ \overline{X}\; -  
 +
\frac \lambda {\sqrt n }
 +
, \overline{X}\; +
 +
\frac \lambda {\sqrt n }
  
is a confidence interval with confidence coefficient <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/s/s087/s087350/s087350182.png" /> (see [[Interval estimator|Interval estimator]]).
+
\right ] ,\ \
 +
1 - \gamma  =  \sqrt {
 +
\frac{2} \pi
 +
} \int\limits _  \lambda  ^  \infty    \mathop{\rm exp} \left \{ -
 +
\frac{u  ^ {2} }{2}
 +
\right \}  du ,
 +
$$
 +
 
 +
is a confidence interval with confidence coefficient $  \gamma $(
 +
see [[Interval estimator|Interval estimator]]).
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  R.A. Fisher,  "On the mathematical foundations of theoretical statistics"  ''Phil. Trans. Roy. Soc. London Ser. A'' , '''222'''  (1922)  pp. 309–368</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  A.N. Kolmogorov,  "Sur l'estimation statistique des paramètres de la loi de Gauss"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''6''' :  1  (1942)  pp. 3–32</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1979)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  I.A. Ibragimov,  R.Z. [R.Z. Khas'minskii] Has'minskii,  "Statistical estimation: asymptotic theory" , Springer  (1981)  (Translated from Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  N.N. Chentsov,  "Statistical decision laws and optimal inference" , Amer. Math. Soc.  (1982)  (Translated from Russian)</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  S. Zacks,  "The theory of statistical inference" , Wiley  (1975)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  U. Grenander,  "Abstract inference" , Wiley  (1981)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  R.A. Fisher,  "On the mathematical foundations of theoretical statistics"  ''Phil. Trans. Roy. Soc. London Ser. A'' , '''222'''  (1922)  pp. 309–368</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  A.N. Kolmogorov,  "Sur l'estimation statistique des paramètres de la loi de Gauss"  ''Izv. Akad. Nauk SSSR Ser. Mat.'' , '''6''' :  1  (1942)  pp. 3–32</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  H. Cramér,  "Mathematical methods of statistics" , Princeton Univ. Press  (1946)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  M.G. Kendall,  A. Stuart,  "The advanced theory of statistics" , '''2. Inference and relationship''' , Griffin  (1979)</TD></TR><TR><TD valign="top">[5]</TD> <TD valign="top">  I.A. Ibragimov,  R.Z. [R.Z. Khas'minskii] Has'minskii,  "Statistical estimation: asymptotic theory" , Springer  (1981)  (Translated from Russian)</TD></TR><TR><TD valign="top">[6]</TD> <TD valign="top">  N.N. Chentsov,  "Statistical decision laws and optimal inference" , Amer. Math. Soc.  (1982)  (Translated from Russian)</TD></TR><TR><TD valign="top">[7]</TD> <TD valign="top">  S. Zacks,  "The theory of statistical inference" , Wiley  (1975)</TD></TR><TR><TD valign="top">[8]</TD> <TD valign="top">  U. Grenander,  "Abstract inference" , Wiley  (1981)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
 
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  E.L. Lehmann,  "Theory of point estimation" , Wiley  (1986)</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  E.L. Lehmann,  "Theory of point estimation" , Wiley  (1986)</TD></TR></table>

Revision as of 08:23, 6 June 2020


One of the fundamental parts of mathematical statistics, dedicated to the estimation using random observations of various characteristics of their distribution.

Example 1.

Let $ X _ {1} \dots X _ {n} $ be independent random variables (or observations) with a common unknown distribution $ {\mathcal P} $ on the straight line. The empirical (sample) distribution $ {\mathcal P} _ {n} ^ \star $ which ascribes the weight $ 1/n $ to every random point $ X _ {n} $ is a statistical estimator for $ {\mathcal P} $. The empirical moments

$$ a _ \nu = \int\limits x ^ \nu d {\mathcal P} _ {n} ^ \star = \ \frac{1}{n} \sum _ { i= } 1 ^ { n } X _ {i} $$

serve as estimators for the moments $ \alpha _ \nu = \int x ^ \nu d {\mathcal P} $. In particular,

$$ \overline{X}\; = \frac{1}{n} \sum _ { i= } 1 ^ { n } X _ {i} $$

is an estimator for the mean, and

$$ s ^ {2} = \frac{1}{n} \sum _ { i= } 1 ^ { n } ( X _ {i} - \overline{X}\; ) ^ {2} $$

is an estimator for the variance.

Basic concepts.

In the general theory of estimation, an observation of $ X $ is a random element with values in a measurable space $ ( \mathfrak X , \mathfrak A) $, whose unknown distribution belongs to a given family of distributions $ P $. The family of distributions can always be parametrized and written in the form $ \{ { {\mathcal P} _ \theta } : {\theta \in \Theta } \} $. Here the form of dependence on the parameter and the set $ \Theta $ are assumed to be known. The problem of estimation using an observation $ X $ of an unknown parameter $ \theta $ or of the value $ g( \theta ) $ of a function $ g $ at the point $ \theta $ consists of constructing a function $ \theta ^ \star ( X) $ from the observations made, which gives a sufficiently good approximation of $ \theta $ $ ( g( \theta )) $.

A comparison of estimators is carried out in the following way. Let a non-negative loss function $ w( y _ {1} ; y _ {2} ) $ be defined on $ \Theta \times \Theta $ $ ( g( \Theta ) \times g( \Theta )) $, the sense of this being that the use of $ \theta ^ \star $ for the actual value of $ \theta $ leads to losses $ w( \theta ^ \star ; \theta ) $. The mean losses and the risk function $ R _ {w} ( \theta ^ \star ; \theta ) = {\mathsf E} _ \theta w( \theta ^ \star ; \theta ) $ are taken as a measure of the quality of the statistic $ \theta ^ \star $ as an estimator of $ \theta $ given the loss function $ w $. A partial order relation is thereby introduced on the set of estimators: An estimator $ T _ {1} $ is preferable to an estimator $ T _ {2} $ if $ R _ {w} ( T _ {1} ; \theta ) \leq R _ {w} ( T _ {2} ; \theta ) $. In particular, an estimator $ T $ of the parameter $ \theta $ is said to be inadmissible (in relation to the loss function $ w $) if an estimator $ T ^ \prime $ exists such that $ R _ {w} ( T ^ \prime ; \theta ) \leq R _ {w} ( T; \theta ) $ for all $ \theta \in \Theta $, and for some $ \theta $ strict inequality occurs. In this method of comparing the quality of estimators, many estimators prove to be incomparable, and, moreover, the choice of a loss function is to a large extent arbitrary.

It is sometimes possible to find estimators that are optimal within a certain narrower class of estimators. Unbiased estimators form an important class. If the initial experiment is invariant relative to a certain group of transformations, it is natural to restrict to estimators that do not disrupt the symmetry of the problem (see Equivariant estimator).

Estimators can be compared by their behaviour at "worst" points: An estimator $ T _ {0} $ of $ \theta $ is called a minimax estimator relative to the loss function $ w $ if

$$ \sup _ \theta R _ {w} ( T _ {0} ; \theta ) = \ \inf _ { T } \sup _ \theta R _ {w} ( T; \theta ) , $$

where the lower bound is taken over all estimators $ T = T( X) $.

In the Bayesian formulation of the problem (cf. Bayesian approach), the unknown parameter is considered to represent values of the random variable with a priori distribution $ Q $ on $ \Theta $. In this case, the best estimator $ T _ {0} $ relative to the loss function $ w $ is defined by the relation

$$ r _ {w} ( T _ {0} ) = \ {\mathsf E} _ {w} ( T _ {0} ; \theta ) = \ \int\limits _ \Theta {\mathsf E} _ \theta w( T _ {0} ; \theta ) Q( d \theta ) = $$

$$ = \ \inf _ { T } \int\limits _ \Theta {\mathsf E} _ \theta w( T; \theta ) Q( d \theta ) , $$

and the lower bound is taken over all estimators $ T = T( X) $.

There is a distinction between parametric estimation problems, in which $ \Theta $ is a subset of a finite-dimensional Euclidean space, and non-parametric problems. In parametric problems one usually considers loss functions in the form $ l( | \theta _ {1} - \theta _ {2} | ) $, where $ l $ is a non-negative, non-decreasing function on $ \mathbf R ^ {+} $. The most frequently used quadratic loss function $ | \theta _ {1} - \theta _ {2} | ^ {2} $ plays an important part.

If $ T = T( X) $ is a sufficient statistic for the family $ \{ { {\mathcal P} _ \theta } : {\theta \in \Theta } \} $, then it is often possible to restrict to estimators $ \theta ^ \star = h( T) $. Thus, if $ \Theta \in \mathbf R ^ {k} $, $ w( \theta _ {1} ; \theta _ {2} ) = l( | \theta _ {1} - \theta _ {2} | ) $, where $ l $ is a convex function and $ \theta ^ \star $ is any estimator for $ \theta $, an estimator $ h( T) $ exists that is not worse than $ \theta ^ \star $; if $ \theta ^ \star $ is unbiased, $ h( T) $ can also be chosen unbiased (Blackwell's theorem). If $ T $ is a complete sufficient statistic for the family $ \{ {\mathcal P} _ \theta \} $ and $ \theta ^ \star $ is an unbiased estimator for $ g( \theta ) $, then an unbiased estimator in the form $ h( T) $ with minimum variance in the class of unbiased estimators exists (the Lehmann–Scheffé theorem).

As a rule, it is assumed that in parametric estimation problems the elements of the family $ \{ { {\mathcal P} _ \theta } : {\theta \in \Theta } \} $ are absolutely continuous with respect to a certain $ \sigma $- finite measure $ \mu $ and that the density $ d {\mathcal P} _ \theta /d \mu = p( x; \theta ) $ exists. If $ p( x; \theta ) $ is a sufficiently-smooth function of $ \theta $ and the Fisher information matrix

$$ I( \theta ) = \ \int\limits _ { \mathfrak X } \frac{dp}{d \theta } ( x, \theta ) \left ( \frac{dp}{d \theta } ( x,\ \theta ) \right ) ^ {T} \frac{\mu ( dx) }{p( x; \theta ) } $$

exists, the estimation problem is said to be regular. For regular problems, the accuracy of the estimation is bounded from below by the Cramér–Rao inequality: If $ \Theta \subset \mathbf R ^ {1} $, then for any estimator $ T $,

$$ {\mathsf E} _ \theta | T- \theta | ^ {2} \geq \ \frac{( 1+ ( db / {d \theta } ) ( \theta )) ^ {2} }{I( \theta ) } + b ^ {2} ( \theta ) ,\ \ b( \theta ) = {\mathsf E} _ \theta T- \theta . $$

Examples of estimation problems 2.

The most widespread formulation is that in which a sample of size $ n $ is observed: $ X _ {1} \dots X _ {n} $ are independent identically-distributed variables taking values in a measurable space $ ( \mathfrak X , \mathfrak A) $ with common distribution density $ f( x, \theta ) $ relative to a measure $ \nu $, and $ \theta \in \Theta $. In regular problems, if $ I( \theta ) $ is the Fisher information on one observation, then the Fisher information of the whole sample $ I _ {n} ( \theta ) = nI( \theta ) $. The Cramér–Rao inequality takes the form

$$ {\mathsf E} _ \theta | T- \theta | ^ {2} \geq \ \frac{( 1+ ( db / {d \theta } )( \theta )) ^ {2} }{nI( \theta ) } + b ^ {2} ( \theta ),\ \ $$

$$ T = T( X _ {1} \dots X _ {n} ). $$

$ 2.1 $. Let $ X _ {j} $ be normal random variables with distribution density

$$ \frac{1}{\sqrt {2 \pi } } \mathop{\rm exp} \left \{ - \frac{( x- a) ^ {2} }{2 \sigma ^ {2} } \right \} . $$

Let the unknown parameter be $ \theta = ( a, \sigma ^ {2} ) $; $ \overline{X}\; $ and $ s ^ {2} $ can serve as estimators for $ a $ and $ \sigma ^ {2} $, and $ ( \overline{X}\; , s ^ {2} ) $ is then a sufficient statistic. The estimator $ \overline{X}\; $ is unbiased, while $ s ^ {2} $ is biased. If $ \sigma ^ {2} $ is known, $ \overline{X}\; $ is an unbiased estimator of minimal variance, and is a minimax estimator relative to the quadratic loss function.

$ 2.2 $. Let $ X _ {j} $ be normal random variables in $ \mathbf R ^ {k} $ with density

$$ \frac{1}{( 2 \pi ) ^ {k/2} } \mathop{\rm exp} \left \{ \frac{| x- \theta | ^ {2} }{2} \right \} , \ \theta \in \mathbf R ^ {k} . $$

The statistic $ \overline{X}\; $ is an unbiased estimator of $ \theta $; if $ k \leq 2 $, it is admissible relative to the quadratic loss function, if $ k > 2 $, it is inadmissible.

$ 2.3 $. Let $ X _ {j} $ be random variables in $ \mathbf R ^ {1} $ with unknown distribution density $ f $ belonging to a given family $ F $ of densities. For a sufficiently broad class $ F $, this is a non-parametric problem. The problem of estimating $ f( x _ {0} ) $ at a point $ x _ {0} $ is a problem of estimating the functional $ g( f) = f( x _ {0} ) $.

Example 3.

The linear regression model. The variables

$$ X _ {i} = \sum _ {\alpha = 1 } ^ { p } a _ {\alpha i } \theta _ \alpha + \xi _ {i} $$

are observed; the $ \xi _ {i} $ are random disturbances, $ i = 1 \dots n $; the matrix $ \| a _ {\alpha i } \| $ is known; and the parameter $ ( \theta _ {1} \dots \theta _ {p} ) $ must be estimated.

Example 4.

A segment of a stationary Gaussian process $ x( t) $, $ 0 \leq t \leq T $, with rational spectral density $ | \sum _ {j=} 0 ^ {m} a _ {j} \lambda ^ {j} | ^ {2} \cdot | \sum _ {j=} 0 ^ {n} b _ {j} \lambda ^ {j} | ^ {-} 2 $ is observed; the unknown parameters $ \{ a _ {j} \} $, $ \{ b _ {j} \} $ are to be estimated.

Methods of producing estimators.

The most widely used maximum-likelihood method recommends that the estimator $ \widehat \theta ( X) $ defined as the maximum point of the random function $ p( X; \theta ) $ is taken, the so-called maximum-likelihood estimator. If $ \Theta \subset \mathbf R ^ {k} $, the maximum-likelihood estimators are to be found among the roots of the likelihood equation

$$ \frac{d}{d \theta } \mathop{\rm ln} p( \theta ; X) = 0. $$

In example 3, the method of least squares (cf. Least squares, method of) recommends that the minimum point of the function

$$ m( \theta ) = \sum _ { i= } 1 ^ { n } \left ( X _ {i} - \sum _ \alpha a _ {\alpha i } \theta _ \alpha \right ) ^ {2} $$

be used as the estimator.

Another method is to take a Bayesian estimator $ T $ relative to a loss function $ w $ and an a priori distribution $ Q $, although the initial formulation is not Bayesian. For example, if $ \Theta = \mathbf R ^ {k} $, it is possible to estimate $ \theta $ by means of

$$ \frac{\int\limits _ {- \infty } ^ \infty \theta p ( X; \theta ) d \theta }{\int\limits _ {- \infty } ^ \infty p( X; \theta ) d \theta } . $$

This is a Bayesian estimator relative to the quadratic loss function and a uniform a priori distribution.

The method of moments (cf. Moments, method of (in probability theory)) consists of the following. Let $ \Theta \subset \mathbf R ^ {k} $, and suppose that there are $ k $" good" estimators $ a _ {1} ( X) \dots a _ {k} ( X) $ for $ \alpha _ {1} ( \theta ) \dots \alpha _ {k} ( \theta ) $. Estimators by the method of moments are solutions of the system $ \alpha _ {i} ( \theta ) = a _ {i} $. Empirical moments are frequently chosen in the capacity of $ a _ {i} $( see example 1).

If the sample $ X _ {1} \dots X _ {n} $ is observed, then (see example 1) as an estimator for $ g( {\mathcal P}) $ it is possible to choose $ g( {\mathcal P} _ {n} ^ \star ) $. If the function $ g( {\mathcal P} _ {n} ^ \star ) $ is not defined (for example, $ g( {\mathcal P}) = ( d {\mathcal P} /d \lambda )( x) $, where $ \lambda $ is Lebesgue measure), appropriate modifications $ g _ {n} ( {\mathcal P} _ {n} ^ \star ) $ are chosen. For example, for an estimator of the density a histogram or an estimator of the form

$$ \int\limits \phi _ {n} ( x- y) d {\mathcal P} _ {n} ^ \star ( y) $$

is used.

Asymptotic behaviour of estimators.

For the sake of being explicit a problem such as Example 2 is examined, in which $ \Theta \subset \mathbf R ^ {k} $. It is to be expected that when $ n \rightarrow \infty $, "good" estimators will get infinitely close to the characteristic being estimated. A sequence of estimators $ \theta _ {n} ^ \star ( X _ {1} \dots X _ {n} ) $ is called a consistent sequence of estimators of $ \theta $ if $ \theta _ {n} ^ \star \rightarrow \theta $ in the probability $ P _ \theta $ for all $ \theta $. The above methods of producing estimators lead, under broad hypotheses, to consistent estimators (cf. Consistent estimator). The estimators in example 1 are consistent. For regular estimation problems, maximum-likelihood estimators and Bayesian estimators are asymptotically normal with mean $ \theta $ and correlation matrix $ ( nI( \theta )) ^ {-} 1 $. Under such conditions, these estimators are asymptotically locally minimax relative to a broad class of loss functions, and they can be considered as being asymptotically optimal (see Asymptotically-efficient estimator).

Interval estimation.

A random subset $ E = E( X) $ of the set $ \Theta $ is called a confidence region for the estimator $ \theta $ with confidence coefficient $ \gamma $ if $ P _ \theta \{ E \supset \theta \} = \gamma $( $ \geq \gamma $). Many confidence regions with a given $ \gamma $ usually exist, and the problem is to choose the one possessing certain optimal properties (for example, the interval of minimum length, if $ \Theta \subset \mathbf R ^ {1} $). Under the conditions of example 2.1, let $ \sigma = 1 $. Then the interval

$$ \left [ \overline{X}\; - \frac \lambda {\sqrt n } , \overline{X}\; + \frac \lambda {\sqrt n } \right ] ,\ \ 1 - \gamma = \sqrt { \frac{2} \pi } \int\limits _ \lambda ^ \infty \mathop{\rm exp} \left \{ - \frac{u ^ {2} }{2} \right \} du , $$

is a confidence interval with confidence coefficient $ \gamma $( see Interval estimator).

References

[1] R.A. Fisher, "On the mathematical foundations of theoretical statistics" Phil. Trans. Roy. Soc. London Ser. A , 222 (1922) pp. 309–368
[2] A.N. Kolmogorov, "Sur l'estimation statistique des paramètres de la loi de Gauss" Izv. Akad. Nauk SSSR Ser. Mat. , 6 : 1 (1942) pp. 3–32
[3] H. Cramér, "Mathematical methods of statistics" , Princeton Univ. Press (1946)
[4] M.G. Kendall, A. Stuart, "The advanced theory of statistics" , 2. Inference and relationship , Griffin (1979)
[5] I.A. Ibragimov, R.Z. [R.Z. Khas'minskii] Has'minskii, "Statistical estimation: asymptotic theory" , Springer (1981) (Translated from Russian)
[6] N.N. Chentsov, "Statistical decision laws and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian)
[7] S. Zacks, "The theory of statistical inference" , Wiley (1975)
[8] U. Grenander, "Abstract inference" , Wiley (1981)

Comments

References

[a1] E.L. Lehmann, "Theory of point estimation" , Wiley (1986)
How to Cite This Entry:
Statistical estimation. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Statistical_estimation&oldid=18593
This article was adapted from an original article by I.A. Ibragimov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article