Difference between revisions of "Informant"
(Importing text file) |
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
||
Line 1: | Line 1: | ||
− | The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution | + | <!-- |
+ | i0510301.png | ||
+ | $#A+1 = 37 n = 0 | ||
+ | $#C+1 = 37 : ~/encyclopedia/old_files/data/I051/I.0501030 Informant | ||
+ | Automatically converted into TeX, above some diagnostics. | ||
+ | Please remove this comment and the {{TEX|auto}} line below, | ||
+ | if TeX found to be correct. | ||
+ | --> | ||
+ | |||
+ | {{TEX|auto}} | ||
+ | {{TEX|done}} | ||
+ | |||
+ | The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution $ P ^ \theta ( d \omega ) $ | ||
+ | from a family $ \{ {P ^ {t} } : {t \in \Theta } \} $, | ||
+ | where $ t $ | ||
+ | is a numerical or vector parameter, but for which the true value of $ \theta $ | ||
+ | is unknown. The observation (series of independent observations) made led to the outcome $ \omega $( | ||
+ | series of outcomes $ \omega ^ {(} 1) \dots \omega ^ {(} N) $). | ||
+ | It is required to estimate $ \theta $ | ||
+ | from the outcome(s). Suppose that the family $ \{ {P ^ {t} } : {t \in \Theta } \} $ | ||
+ | is given by a family of densities $ p ( \omega ; t ) $ | ||
+ | with respect to a measure $ \mu ( d \omega ) $ | ||
+ | on the space $ \Omega $ | ||
+ | of outcomes of observations. If $ \Omega $ | ||
+ | is discrete, then the probabilities $ P ^ {t} ( \omega ) $ | ||
+ | itself can be taken for $ p ( \omega ; t ) $. | ||
+ | For $ \omega $ | ||
+ | fixed, $ p ( \omega ; t ) $, | ||
+ | as a function of $ t = ( t _ {1} \dots t _ {m} ) $, | ||
+ | is called a likelihood function, and its logarithm is called a logarithmic likelihood function. | ||
For smooth families the informant can conveniently be introduced as the vector | For smooth families the informant can conveniently be introduced as the vector | ||
− | + | $$ | |
+ | \mathop{\rm grad} _ {t} \mathop{\rm ln} p ( \omega ; t ) = | ||
+ | $$ | ||
− | + | $$ | |
+ | = \ | ||
+ | \left ( | ||
+ | \frac{1}{p ( \omega ; t ) } | ||
+ | |||
+ | \frac{\partial | ||
+ | p ( \omega ; t ) }{\partial t _ {1} } | ||
+ | \dots | ||
+ | \frac{1}{p ( \omega ; t ) } | ||
+ | |||
+ | \frac{\partial p ( \omega ; t ) }{\partial t _ {n} } | ||
+ | \right ) , | ||
+ | $$ | ||
− | which, unlike the logarithmic likelihood function, does not depend on the choice of | + | which, unlike the logarithmic likelihood function, does not depend on the choice of $ \mu $. |
+ | The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating $ \theta $. | ||
+ | Moreover, it is additive: For independent observations, i.e. when | ||
− | + | $$ | |
+ | p ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ | ||
+ | \prod _ { k= } 1 ^ { N } p _ {k} ( \omega ^ {(} k) ; t ) , | ||
+ | $$ | ||
the informants are summed: | the informants are summed: | ||
− | + | $$ | |
+ | { \mathop{\rm grad} \mathop{\rm ln} } p | ||
+ | ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ | ||
+ | \sum _ { k= } 1 ^ { N } | ||
+ | { \mathop{\rm grad} \mathop{\rm ln} } p _ {k} ( \omega ^ {(} k) ; t ) . | ||
+ | $$ | ||
In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has | In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has | ||
− | + | $$ | |
+ | {\mathsf E} _ {t} | ||
+ | \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } | ||
+ | |||
+ | = \int\limits _ \Omega | ||
+ | |||
+ | \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } | ||
+ | |||
+ | p ( \omega ; t ) d \mu = 0 ,\ \ | ||
+ | \forall k ; | ||
+ | $$ | ||
− | + | $$ | |
+ | - {\mathsf E} _ {t} | ||
+ | \frac{\partial ^ {2} \mathop{\rm ln} p ( \omega ; \ | ||
+ | t ) }{\partial t _ {j} \partial t _ {k} } | ||
+ | = \ | ||
+ | I _ {jk} ( t) = {\mathsf E} _ {t} | ||
+ | \frac{\partial \mathop{\rm ln} p }{\partial t _ {j} } | ||
+ | |||
+ | \frac{\partial \mathop{\rm ln} p }{\partial t _ {k} } | ||
+ | ,\ \ | ||
+ | \forall j , k . | ||
+ | $$ | ||
− | The covariance matrix | + | The covariance matrix $ \| I _ {jk} ( t) \| _ {j,k=} 1 ^ {m} $ |
+ | is called the [[Information matrix|information matrix]]. An inequality expressing a bound on the exactness of statistical estimators for $ \theta $ | ||
+ | can be given in terms of this matrix. | ||
− | When estimating | + | When estimating $ \theta $ |
+ | by the [[Maximum-likelihood method|maximum-likelihood method]], one assigns to the observed outcome $ \omega $( | ||
+ | or series $ \omega ^ {(} 1) \dots \omega ^ {(} N) $) | ||
+ | the most likely value $ t = \theta ^ {*} ( \omega ) $, | ||
+ | i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises, | ||
− | + | $$ | |
+ | { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ; t ) = 0 , | ||
+ | $$ | ||
− | can have roots | + | can have roots $ t = \theta ^ {*} $, |
+ | corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of $ t = 0 $, | ||
− | + | $$ | |
+ | \mathop{\rm det} \| I _ {jk} ( t) \| \neq 0 , | ||
+ | $$ | ||
− | then the asymptotic optimality of the maximum-likelihood estimator | + | then the asymptotic optimality of the maximum-likelihood estimator $ \theta _ {N} ^ {*} $ |
+ | follows from the listed properties of the informant, as the number $ N $ | ||
+ | of independent observations used grows indefinitely. | ||
====References==== | ====References==== | ||
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> S.S. Wilks, "Mathematical statistics" , Wiley (1962)</TD></TR></table> | <table><TR><TD valign="top">[1]</TD> <TD valign="top"> S.S. Wilks, "Mathematical statistics" , Wiley (1962)</TD></TR></table> |
Revision as of 22:12, 5 June 2020
The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution $ P ^ \theta ( d \omega ) $
from a family $ \{ {P ^ {t} } : {t \in \Theta } \} $,
where $ t $
is a numerical or vector parameter, but for which the true value of $ \theta $
is unknown. The observation (series of independent observations) made led to the outcome $ \omega $(
series of outcomes $ \omega ^ {(} 1) \dots \omega ^ {(} N) $).
It is required to estimate $ \theta $
from the outcome(s). Suppose that the family $ \{ {P ^ {t} } : {t \in \Theta } \} $
is given by a family of densities $ p ( \omega ; t ) $
with respect to a measure $ \mu ( d \omega ) $
on the space $ \Omega $
of outcomes of observations. If $ \Omega $
is discrete, then the probabilities $ P ^ {t} ( \omega ) $
itself can be taken for $ p ( \omega ; t ) $.
For $ \omega $
fixed, $ p ( \omega ; t ) $,
as a function of $ t = ( t _ {1} \dots t _ {m} ) $,
is called a likelihood function, and its logarithm is called a logarithmic likelihood function.
For smooth families the informant can conveniently be introduced as the vector
$$ \mathop{\rm grad} _ {t} \mathop{\rm ln} p ( \omega ; t ) = $$
$$ = \ \left ( \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {1} } \dots \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {n} } \right ) , $$
which, unlike the logarithmic likelihood function, does not depend on the choice of $ \mu $. The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating $ \theta $. Moreover, it is additive: For independent observations, i.e. when
$$ p ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ \prod _ { k= } 1 ^ { N } p _ {k} ( \omega ^ {(} k) ; t ) , $$
the informants are summed:
$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ^ {(} 1) \dots \omega ^ {(} N) ; t ) = \ \sum _ { k= } 1 ^ { N } { \mathop{\rm grad} \mathop{\rm ln} } p _ {k} ( \omega ^ {(} k) ; t ) . $$
In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has
$$ {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } = \int\limits _ \Omega \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } p ( \omega ; t ) d \mu = 0 ,\ \ \forall k ; $$
$$ - {\mathsf E} _ {t} \frac{\partial ^ {2} \mathop{\rm ln} p ( \omega ; \ t ) }{\partial t _ {j} \partial t _ {k} } = \ I _ {jk} ( t) = {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p }{\partial t _ {j} } \frac{\partial \mathop{\rm ln} p }{\partial t _ {k} } ,\ \ \forall j , k . $$
The covariance matrix $ \| I _ {jk} ( t) \| _ {j,k=} 1 ^ {m} $ is called the information matrix. An inequality expressing a bound on the exactness of statistical estimators for $ \theta $ can be given in terms of this matrix.
When estimating $ \theta $ by the maximum-likelihood method, one assigns to the observed outcome $ \omega $( or series $ \omega ^ {(} 1) \dots \omega ^ {(} N) $) the most likely value $ t = \theta ^ {*} ( \omega ) $, i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises,
$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ; t ) = 0 , $$
can have roots $ t = \theta ^ {*} $, corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of $ t = 0 $,
$$ \mathop{\rm det} \| I _ {jk} ( t) \| \neq 0 , $$
then the asymptotic optimality of the maximum-likelihood estimator $ \theta _ {N} ^ {*} $ follows from the listed properties of the informant, as the number $ N $ of independent observations used grows indefinitely.
References
[1] | S.S. Wilks, "Mathematical statistics" , Wiley (1962) |
Informant. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Informant&oldid=47347