Difference between revisions of "Asymptotic optimality"

Latest revision as of 18:48, 5 April 2020

of estimating functions

Efficient estimation (cf. Efficient estimator) of parameters in stochastic models is most conveniently approached via properties of estimating functions, namely functions of the data and the parameter of interest, rather than estimators derived therefrom. For a detailed explanation see [a1], Chapt. 1.

Let $ \{ {X _ {t} } : {0 \leq t \leq T } \} $ be a sample in discrete or continuous time from a stochastic system taking values in an $ r $- dimensional Euclidean space. The distribution of $ X _ {t} $ depends on a parameter of interest $ \theta $ taking values in an open subset of a $ p $- dimensional Euclidean space. The possible probability measures (cf. Probability measure) for $ X _ {t} $ are $ \{ {\mathsf P} _ \theta \} $, a union of families of models.

Consider the class $ {\mathcal G} $ of zero-mean square-integrable estimating functions $ G _ {T} = G _ {T} ( \{ {X _ {t} } : {0 \leq t \leq T } \} , \theta ) $, which are vectors of dimension $ p $ and for which the matrices used below are non-singular.

Optimality in both the fixed sample and the asymptotic sense is considered. The former involves choice of an estimating function $ G _ {T} $ to maximize, in the partial order of non-negative definite matrices, the information criterion

$$ {\mathcal E} {( G _ {T} ) } = ( {\mathsf E} {\nabla G } _ {T} ) ^ \prime ( {\mathsf E} {G _ {T} } G _ {T} ^ \prime ) ^ {-1 } ( {\mathsf E} {\nabla G } _ {T} ) , $$

which is a natural generalization of the Fisher amount of information. Here $ \nabla G $ is the $ ( p \times p ) $- matrix of derivatives of the elements of $ G $ with respect to those of $ \theta $ and prime denotes transposition. If $ {\mathcal H} \subset {\mathcal G} $ is a prespecified family of estimating functions, it is said that $ G _ {T} ^ {*} \in {\mathcal H} $ is fixed sample optimal in $ {\mathcal H} $ if $ {\mathcal E} ( G _ {T} ^ {*} ) - {\mathcal E} ( G _ {T} ) $ is non-negative definite for all $ G _ {T} \in {\mathcal H} $, $ \theta $ and $ {\mathsf P} _ \theta $. Then, $ G _ {T} ^ {*} $ is the element of $ {\mathcal H} $ whose dispersion distance from the maximum information estimating function in $ {\mathcal G} $( often the likelihood score) is least.

A focus on asymptotic properties can be made by confining attention to the subset $ {\mathcal M} \subset {\mathcal G} $ of estimating functions which are martingales (cf. Martingale). Here one considers $ T $ ranging over the positive real numbers and for $ \{ G _ {T} \} \in {\mathcal M} $ one writes $ \{ \langle G \rangle _ {T} \} $ for the quadratic characteristic, the predictable increasing process for which $ \{ G _ {T} G _ {T} ^ \prime - \langle G \rangle _ {T} \} $ is a martingale. Also, write $ \{ {\overline{G}\; } _ {T} \} $ for the predictable process for which $ \{ {\nabla G } _ {T} - {\overline{G}\; } _ {T} \} $ is a martingale. Then, $ G _ {T} ^ {*} \in {\mathcal M} _ {1} \subset {\mathcal M} $ is asymptotically optimal in $ {\mathcal M} _ {1} $ if $ {\overline {\mathcal E} \; } ( G _ {T} ^ {*} ) - {\overline {\mathcal E} \; } ( G _ {T} ) $ is almost surely non-negative definite for all $ {G _ {T} } \in {\mathcal M} _ {1} $, $ \theta $, $ {\mathsf P} _ \theta $, and $ T > 0 $, where

$$ {\overline {\mathcal E} \; } ( G _ {T} ) = {\overline{G}\; } _ {T} ^ \prime \left \langle G \right \rangle _ {T} ^ {-1 } {\overline{G}\; } _ {T} . $$

Under suitable regularity conditions, asymptotically optimal estimating functions produce estimators for $ \theta $ which are consistent (cf. Consistent estimator), asymptotically unbiased (cf. Unbiased estimator) and asymptotically normally distributed (cf. Normal distribution) with minimum size asymptotic confidence zones (cf. Confidence estimation). For further details see [a2], [a3].

References

[a1]	D.L. McLeish, C.G. Small, "The theory and applications of statistical inference functions" , Lecture Notes in Statistics , Springer (1988)
[a2]	V.P. Godambe, C.C. Heyde, "Quasi-likelihood and optimal estimation" Internat. Statist. Rev. , 55 (1987) pp. 231–244.
[a3]	C.C. Heyde, "Quasi-likelihood and its application. A general approach to optimal parameter estimation" , Springer (1997)

How to Cite This Entry:
Asymptotic optimality. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Asymptotic_optimality&oldid=18637

This article was adapted from an original article by C.C. Heyde (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Asymptotic optimality"

Latest revision as of 18:48, 5 April 2020

References

@@ Line 1: / Line 1: @@
+<!--
+a1107901.png
+$#A+1 = 42 n = 0
+$#C+1 = 42 : ~/encyclopedia/old_files/data/A110/A.1100790 Asymptotic optimality
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 ''of estimating functions''
 Efficient estimation (cf. [[Efficient estimator|Efficient estimator]]) of parameters in stochastic models is most conveniently approached via properties of estimating functions, namely functions of the data and the parameter of interest, rather than estimators derived therefrom. For a detailed explanation see [[#References|[a1]]], Chapt. 1.
-Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107901.png" /> be a [[Sample|sample]] in discrete or continuous time from a stochastic system taking values in an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107902.png" />-dimensional Euclidean space. The distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107903.png" /> depends on a parameter of interest <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107904.png" /> taking values in an open subset of a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107905.png" />-dimensional Euclidean space. The possible probability measures (cf. [[Probability measure|Probability measure]]) for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107906.png" /> are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107907.png" />, a union of families of models.
+Let  $  \{ {X _ {t} } : {0 \leq  t \leq  T } \} $
+be a [[Sample|sample]] in discrete or continuous time from a stochastic system taking values in an  $  r $-
+dimensional Euclidean space. The distribution of  $  X _ {t} $
+depends on a parameter of interest  $  \theta $
+taking values in an open subset of a  $  p $-
+dimensional Euclidean space. The possible probability measures (cf. [[Probability measure|Probability measure]]) for  $  X _ {t} $
+are  $  \{ {\mathsf P} _  \theta  \} $,
+a union of families of models.
-Consider the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107908.png" /> of zero-mean square-integrable estimating functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a1107909.png" />, which are vectors of dimension <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079010.png" /> and for which the matrices used below are non-singular.
+Consider the class  $  {\mathcal G} $
+of zero-mean square-integrable estimating functions  $  G _ {T} = G _ {T} ( \{ {X _ {t} } : {0 \leq  t \leq  T } \} , \theta ) $,
+which are vectors of dimension  $  p $
+and for which the matrices used below are non-singular.
-Optimality in both the fixed sample and the asymptotic sense is considered. The former involves choice of an estimating function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079011.png" /> to maximize, in the partial order of non-negative definite matrices, the information criterion
+Optimality in both the fixed sample and the asymptotic sense is considered. The former involves choice of an estimating function  $  G _ {T} $
+to maximize, in the partial order of non-negative definite matrices, the information criterion
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079012.png" /></td> </tr></table>
+$$
+{\mathcal E} {( G _ {T} ) } = ( {\mathsf E} {\nabla G } _ {T} )  ^  \prime  ( {\mathsf E} {G _ {T} } G _ {T}  ^  \prime  ) ^ {-1 } ( {\mathsf E} {\nabla G } _ {T} ) ,
+$$
-which is a natural generalization of the [[Fisher amount of information|Fisher amount of information]]. Here <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079013.png" /> is the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079014.png" />-matrix of derivatives of the elements of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079015.png" /> with respect to those of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079016.png" /> and prime denotes transposition. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079017.png" /> is a prespecified family of estimating functions, it is said that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079018.png" /> is fixed sample optimal in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079019.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079020.png" /> is non-negative definite for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079021.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079022.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079023.png" />. Then, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079024.png" /> is the element of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079025.png" /> whose dispersion distance from the maximum information estimating function in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079026.png" /> (often the likelihood score) is least.
+which is a natural generalization of the [[Fisher amount of information|Fisher amount of information]]. Here  $  \nabla G $
+is the  $  ( p \times p ) $-
+matrix of derivatives of the elements of  $  G $
+with respect to those of  $  \theta $
+and prime denotes transposition. If  $  {\mathcal H} \subset  {\mathcal G} $
+is a prespecified family of estimating functions, it is said that  $  G _ {T}  ^ {*} \in {\mathcal H} $
+is fixed sample optimal in  $  {\mathcal H} $
+if  $  {\mathcal E} ( G _ {T}  ^ {*} ) - {\mathcal E} ( G _ {T} ) $
+is non-negative definite for all  $  G _ {T} \in {\mathcal H} $,
+$  \theta $
+and  $  {\mathsf P} _  \theta  $.
+Then,  $  G _ {T}  ^ {*} $
+is the element of  $  {\mathcal H} $
+whose dispersion distance from the maximum information estimating function in  $  {\mathcal G} $(
+often the likelihood score) is least.
-A focus on asymptotic properties can be made by confining attention to the subset <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079027.png" /> of estimating functions which are martingales (cf. [[Martingale|Martingale]]). Here one considers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079028.png" /> ranging over the positive real numbers and for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079029.png" /> one writes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079030.png" /> for the quadratic characteristic, the predictable increasing process for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079031.png" /> is a martingale. Also, write <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079032.png" /> for the predictable process for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079033.png" /> is a martingale. Then, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079034.png" /> is asymptotically optimal in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079035.png" /> if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079036.png" /> is almost surely non-negative definite for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079037.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079038.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079039.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079040.png" />, where
+A focus on asymptotic properties can be made by confining attention to the subset  $  {\mathcal M} \subset  {\mathcal G} $
+of estimating functions which are martingales (cf. [[Martingale|Martingale]]). Here one considers  $  T $
+ranging over the positive real numbers and for  $  \{ G _ {T} \} \in {\mathcal M} $
+one writes  $  \{ \langle  G \rangle _ {T} \} $
+for the quadratic characteristic, the predictable increasing process for which  $  \{ G _ {T} G _ {T}  ^  \prime  - \langle  G \rangle _ {T} \} $
+is a martingale. Also, write  $  \{ {\overline{G}\; } _ {T} \} $
+for the predictable process for which  $  \{ {\nabla G } _ {T} - {\overline{G}\; } _ {T} \} $
+is a martingale. Then,  $  G _ {T}  ^ {*} \in {\mathcal M} _ {1} \subset  {\mathcal M} $
+is asymptotically optimal in  $  {\mathcal M} _ {1} $
+if  $  {\overline {\mathcal E} \; } ( G _ {T}  ^ {*} ) - {\overline {\mathcal E} \; } ( G _ {T} ) $
+is almost surely non-negative definite for all  $  {G _ {T} } \in {\mathcal M} _ {1} $,
+$  \theta $,
+$  {\mathsf P} _  \theta  $,
+and  $  T > 0 $,
+where
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079041.png" /></td> </tr></table>
+$$
+{\overline {\mathcal E} \; } ( G _ {T} ) = {\overline{G}\; } _ {T}  ^  \prime  \left \langle  G \right \rangle _ {T} ^ {-1 } {\overline{G}\; } _ {T} .
+$$
-Under suitable regularity conditions, asymptotically optimal estimating functions produce estimators for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110790/a11079042.png" /> which are consistent (cf. [[Consistent estimator|Consistent estimator]]), asymptotically unbiased (cf. [[Unbiased estimator|Unbiased estimator]]) and asymptotically normally distributed (cf. [[Normal distribution|Normal distribution]]) with minimum size asymptotic confidence zones (cf. [[Confidence estimation|Confidence estimation]]). For further details see [[#References|[a2]]], [[#References|[a3]]].
+Under suitable regularity conditions, asymptotically optimal estimating functions produce estimators for  $  \theta $
+which are consistent (cf. [[Consistent estimator|Consistent estimator]]), asymptotically unbiased (cf. [[Unbiased estimator|Unbiased estimator]]) and asymptotically normally distributed (cf. [[Normal distribution|Normal distribution]]) with minimum size asymptotic confidence zones (cf. [[Confidence estimation|Confidence estimation]]). For further details see [[#References|[a2]]], [[#References|[a3]]].
 ====References====
 <table><TR><TD valign="top">[a1]</TD> <TD valign="top">  D.L. McLeish,   C.G. Small,   "The theory and applications of statistical inference functions" , ''Lecture Notes in Statistics'' , Springer  (1988)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  V.P. Godambe,   C.C. Heyde,   "Quasi-likelihood and optimal estimation"  ''Internat. Statist. Rev.'' , '''55'''  (1987)  pp. 231–244.</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  C.C. Heyde,   "Quasi-likelihood and its application. A general approach to optimal parameter estimation" , Springer  (1997)</TD></TR></table>