Difference between revisions of "Informant"

Revision as of 22:12, 5 June 2020

The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution $ P ^ \theta ( d \omega ) $ from a family $ \{ {P ^ {t} } : {t \in \Theta } \} $, where $ t $ is a numerical or vector parameter, but for which the true value of $ \theta $ is unknown. The observation (series of independent observations) made led to the outcome $ \omega $( series of outcomes $ \omega ^ {(} 1) \dots \omega ^ {(} N) $). It is required to estimate $ \theta $ from the outcome(s). Suppose that the family $ \{ {P ^ {t} } : {t \in \Theta } \} $ is given by a family of densities $ p ( \omega ; t ) $ with respect to a measure $ \mu ( d \omega ) $ on the space $ \Omega $ of outcomes of observations. If $ \Omega $ is discrete, then the probabilities $ P ^ {t} ( \omega ) $ itself can be taken for $ p ( \omega ; t ) $. For $ \omega $ fixed, $ p ( \omega ; t ) $, as a function of $ t = ( t _ {1} \dots t _ {m} ) $, is called a likelihood function, and its logarithm is called a logarithmic likelihood function.

For smooth families the informant can conveniently be introduced as the vector

$$ \mathop{\rm grad} _ {t} \mathop{\rm ln} p ( \omega ; t ) = $$

$$ = \ \left ( \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {1} } \dots \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {n} } \right ) , $$

which, unlike the logarithmic likelihood function, does not depend on the choice of $ \mu $. The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating $ \theta $. Moreover, it is additive: For independent observations, i.e. when

$$ p ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ \prod _ { k= } 1 ^ { N } p _ {k} ( \omega ^ {(} k) ; t ) , $$

the informants are summed:

$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ \sum _ { k= } 1 ^ { N } { \mathop{\rm grad} \mathop{\rm ln} } p _ {k} ( \omega ^ {(} k) ; t ) . $$

In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has

$$ {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } = \int\limits _ \Omega \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } p ( \omega ; t ) d \mu = 0 ,\ \ \forall k ; $$

$$ - {\mathsf E} _ {t} \frac{\partial ^ {2} \mathop{\rm ln} p ( \omega ; \ t ) }{\partial t _ {j} \partial t _ {k} } = \ I _ {jk} ( t) = {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p }{\partial t _ {j} } \frac{\partial \mathop{\rm ln} p }{\partial t _ {k} } ,\ \ \forall j , k . $$

The covariance matrix $ \| I _ {jk} ( t) \| _ {j,k=} 1 ^ {m} $ is called the information matrix. An inequality expressing a bound on the exactness of statistical estimators for $ \theta $ can be given in terms of this matrix.

When estimating $ \theta $ by the maximum-likelihood method, one assigns to the observed outcome $ \omega $( or series $ \omega ^ {(} 1) \dots \omega ^ {(} N) $) the most likely value $ t = \theta ^ {*} ( \omega ) $, i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises,

$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ; t ) = 0 , $$

can have roots $ t = \theta ^ {*} $, corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of $ t = 0 $,

$$ \mathop{\rm det} \| I _ {jk} ( t) \| \neq 0 , $$

then the asymptotic optimality of the maximum-likelihood estimator $ \theta _ {N} ^ {*} $ follows from the listed properties of the informant, as the number $ N $ of independent observations used grows indefinitely.

References

[1]	S.S. Wilks, "Mathematical statistics" , Wiley (1962)

How to Cite This Entry:
Informant. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Informant&oldid=16853

This article was adapted from an original article by N.N. Chentsov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Informant"

Revision as of 22:12, 5 June 2020

References

@@ Line 1: / Line 1: @@
-The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510301.png" /> from a family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510302.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510303.png" /> is a numerical or vector parameter, but for which the true value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510304.png" /> is unknown. The observation (series of independent observations) made led to the outcome <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510305.png" /> (series of outcomes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510306.png" />). It is required to estimate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510307.png" /> from the outcome(s). Suppose that the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510308.png" /> is given by a family of densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i0510309.png" /> with respect to a measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103010.png" /> on the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103011.png" /> of outcomes of observations. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103012.png" /> is discrete, then the probabilities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103013.png" /> itself can be taken for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103014.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103015.png" /> fixed, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103016.png" />, as a function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103017.png" />, is called a likelihood function, and its logarithm is called a logarithmic likelihood function.
+<!--
+i0510301.png
+$#A+1 = 37 n = 0
+$#C+1 = 37 : ~/encyclopedia/old_files/data/I051/I.0501030 Informant
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
+The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution  $  P  ^  \theta  ( d \omega ) $
+from a family  $  \{ {P  ^ {t} } : {t \in \Theta } \} $,
+where  $  t $
+is a numerical or vector parameter, but for which the true value of  $  \theta $
+is unknown. The observation (series of independent observations) made led to the outcome  $  \omega $(
+series of outcomes  $  \omega  ^ {(} 1) \dots \omega  ^ {(} N) $).
+It is required to estimate  $  \theta $
+from the outcome(s). Suppose that the family  $  \{ {P  ^ {t} } : {t \in \Theta } \} $
+is given by a family of densities  $  p ( \omega ;  t ) $
+with respect to a measure  $  \mu ( d \omega ) $
+on the space  $  \Omega $
+of outcomes of observations. If  $  \Omega $
+is discrete, then the probabilities  $  P  ^ {t} ( \omega ) $
+itself can be taken for  $  p ( \omega ;  t ) $.
+For  $  \omega $
+fixed,  $  p ( \omega ;  t ) $,
+as a function of  $  t = ( t _ {1} \dots t _ {m} ) $,
+is called a likelihood function, and its logarithm is called a logarithmic likelihood function.
 For smooth families the informant can conveniently be introduced as the vector
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103018.png" /></td> </tr></table>
+$$
+ \mathop{\rm grad} _ {t}   \mathop{\rm ln}  p ( \omega ;  t ) =
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103019.png" /></td> </tr></table>
+$$
+= \
+\left (
+\frac{1}{p ( \omega ;  t ) }
+\frac{\partial
+p ( \omega ;  t ) }{\partial  t _ {1} }
+ \dots
+\frac{1}{p ( \omega ;  t ) }
+\frac{\partial  p ( \omega ;  t ) }{\partial  t _ {n} }
+ \right ) ,
+$$
-which, unlike the logarithmic likelihood function, does not depend on the choice of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103020.png" />. The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103021.png" />. Moreover, it is additive: For independent observations, i.e. when
+which, unlike the logarithmic likelihood function, does not depend on the choice of  $  \mu $.
+The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating  $  \theta $.
+Moreover, it is additive: For independent observations, i.e. when
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103022.png" /></td> </tr></table>
+$$
+p ( \omega  ^ {(} 1) \dots \omega  ^ {(} N) ;  t )  = \
+\prod _ { k= } 1 ^ { N }  p _ {k} ( \omega  ^ {(} k) ;  t ) ,
+$$
 the informants are summed:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103023.png" /></td> </tr></table>
+$$
+{ \mathop{\rm grad}   \mathop{\rm ln} }  p
+( \omega  ^ {(} 1) \dots \omega  ^ {(} N) ;  t )  = \
+\sum _ { k= } 1 ^ { N }
+{ \mathop{\rm grad}   \mathop{\rm ln} }  p _ {k} ( \omega  ^ {(} k) ;  t ) .
+$$
 In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103024.png" /></td> </tr></table>
+$$
+{\mathsf E} _ {t}
+\frac{\partial   \mathop{\rm ln}  p ( \omega ;  t ) }{\partial  t _ {k} }
+ =  \int\limits _  \Omega
+\frac{\partial   \mathop{\rm ln}  p ( \omega ;  t ) }{\partial  t _ {k} }
+p ( \omega ;  t )  d \mu  =  0 ,\ \
+\forall k ;
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103025.png" /></td> </tr></table>
+$$
+- {\mathsf E} _ {t}
+\frac{\partial   ^ {2}  \mathop{\rm ln}  p ( \omega ; \
+t ) }{\partial  t _ {j} \partial  t _ {k} }
+  = \
+I _ {jk} ( t)  =  {\mathsf E} _ {t}
+\frac{\partial   \mathop{\rm ln}  p }{\partial  t _ {j} }
+\frac{\partial   \mathop{\rm ln}  p }{\partial  t _ {k} }
+ ,\ \
+\forall j , k .
+$$
-The covariance matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103026.png" /> is called the [[Information matrix|information matrix]]. An inequality expressing a bound on the exactness of statistical estimators for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103027.png" /> can be given in terms of this matrix.
+The covariance matrix  $  \| I _ {jk} ( t) \| _ {j,k=} 1  ^ {m} $
+is called the [[Information matrix|information matrix]]. An inequality expressing a bound on the exactness of statistical estimators for  $  \theta $
+can be given in terms of this matrix.
-When estimating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103028.png" /> by the [[Maximum-likelihood method|maximum-likelihood method]], one assigns to the observed outcome <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103029.png" /> (or series <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103030.png" />) the most likely value <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103031.png" />, i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises,
+When estimating  $  \theta $
+by the [[Maximum-likelihood method|maximum-likelihood method]], one assigns to the observed outcome  $  \omega $(
+or series  $  \omega  ^ {(} 1) \dots \omega  ^ {(} N) $)
+the most likely value  $  t = \theta  ^ {*} ( \omega ) $,
+i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103032.png" /></td> </tr></table>
+$$
+{ \mathop{\rm grad}   \mathop{\rm ln} }  p ( \omega ;  t )  =  0 ,
+$$
-can have roots <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103033.png" />, corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103034.png" />,
+can have roots  $  t = \theta  ^ {*} $,
+corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of  $  t = 0 $,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103035.png" /></td> </tr></table>
+$$
+ \mathop{\rm det}  \| I _ {jk} ( t) \|  \neq  0 ,
+$$
-then the asymptotic optimality of the maximum-likelihood estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103036.png" /> follows from the listed properties of the informant, as the number <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051030/i05103037.png" /> of independent observations used grows indefinitely.
+then the asymptotic optimality of the maximum-likelihood estimator  $  \theta _ {N}  ^ {*} $
+follows from the listed properties of the informant, as the number  $  N $
+of independent observations used grows indefinitely.
 ====References====
 <table><TR><TD valign="top">[1]</TD> <TD valign="top">  S.S. Wilks,   "Mathematical statistics" , Wiley  (1962)</TD></TR></table>