Namespaces
Variants
Actions

Difference between revisions of "Information matrix"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<!--
 +
i0510901.png
 +
$#A+1 = 37 n = 0
 +
$#C+1 = 37 : ~/encyclopedia/old_files/data/I051/I.0501090 Information matrix,
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
''Fisher information''
 
''Fisher information''
  
The [[Covariance matrix|covariance matrix]] of the [[Informant|informant]]. For a dominated family of probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510901.png" /> (cf. [[Density of a probability distribution|Density of a probability distribution]]) with densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510902.png" /> that depend sufficiently smoothly on a vector (in particular, numerical) parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510903.png" />, the elements of the information matrix are defined, for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510904.png" />, as
+
The [[Covariance matrix|covariance matrix]] of the [[Informant|informant]]. For a dominated family of probability distributions $  P  ^ {t} ( d \omega ) $(
 +
cf. [[Density of a probability distribution|Density of a probability distribution]]) with densities $  p ( \omega ;  t ) $
 +
that depend sufficiently smoothly on a vector (in particular, numerical) parameter $  t = ( t _ {1} \dots t _ {m} ) \in \Theta $,  
 +
the elements of the information matrix are defined, for $  t = \theta $,  
 +
as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510905.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
$$ \tag{1 }
 +
I _ {jk} ( \theta )  = \
 +
\int\limits _  \Omega
 +
\left .
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510906.png" />. For a scalar parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510907.png" /> the information matrix can be described by one number — the variance (cf. [[Dispersion|Dispersion]]) of the informant.
+
\frac{\partial  \mathop{\rm ln}  p ( \omega ;  t ) }{\partial  t _ {j} }
 +
\cdot
  
The information matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510908.png" /> determines a non-negative quadratic differential form
+
\frac{\partial  \mathop{\rm ln}  p ( \omega ;  t ) }{\partial  t _ {k} }
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i0510909.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
\right | _ {t = \theta }
 +
p ( \omega ; \theta ) d \mu ,
 +
$$
  
endowing the family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109010.png" /> with a Riemannian metric. If the space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109011.png" /> of outcomes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109012.png" /> is finite, then
+
where  $  j , k = 1 \dots m $.  
 +
For a scalar parameter  $  t $
 +
the information matrix can be described by one number — the variance (cf. [[Dispersion|Dispersion]]) of the informant.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109013.png" /></td> </tr></table>
+
The information matrix  $  I ( \theta ) $
 +
determines a non-negative quadratic differential form
  
The Fisher quadratic differential form (2) is the unique (up to a constant multiplier) quadratic differential form that is invariant under the category of statistical decision rules. Because of this fact it arises in the formulation of many statistical laws.
+
$$ \tag{2 }
 +
\sum _ {j , k }
 +
I _ {jk} ( \theta ) d t _ {j}  d t _ {k}  =  \Delta _  \theta  ,
 +
$$
  
Any measurable mapping <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109014.png" /> of the outcome space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109015.png" /> generates a new smooth family of distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109016.png" /> with information matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109017.png" />, which is not greater than the initial one, i.e.
+
endowing the family  $  \{ P  ^ {t} \} $
 +
with a Riemannian metric. If the space $  \Omega $
 +
of outcomes  $  \omega $
 +
is finite, then
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109018.png" /></td> </tr></table>
+
$$
 +
\Delta _ {P}  = \
 +
\sum _ { j }
  
whatever <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109019.png" />. The information matrix also has the property of additivity. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109020.png" /> is the information matrix for a family of densities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109021.png" />, then the family
+
\frac{( d p _ {j} )  ^ {2} }{p _ {j} }
 +
; \ \
 +
p _ {j}  = P ( \omega _ {j} ) ,\ \
 +
\forall \omega _ {j} \in \Omega .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109022.png" /></td> </tr></table>
+
The Fisher quadratic differential form (2) is the unique (up to a constant multiplier) quadratic differential form that is invariant under the category of statistical decision rules. Because of this fact it arises in the formulation of many statistical laws.
 
 
has information matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109023.png" />. In particular, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109024.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109025.png" /> independent identically-distributed measurements. The information matrix allows one to characterize the statistical accuracy of decision rules in the problem of estimating the parameter of a distribution law. The variance of any unbiased estimator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109026.png" /> of a scalar parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109027.png" /> satisfies
 
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109028.png" /></td> </tr></table>
+
Any measurable mapping  $  f $
 +
of the outcome space  $  \Omega $
 +
generates a new smooth family of distributions  $  Q  ^ {t} = P  ^ {t} f ^ { - 1 } $
 +
with information matrix  $  I  ^ {Q} ( \theta ) $,
 +
which is not greater than the initial one, i.e.
  
The analogous matrix inequality for the information holds for estimators of a vector parameter. Its scalar consequence,
+
$$
 +
\sum _ {j , k } I _ {jk}  ^ {Q} z _ {j} z _ {k}  \leq  \
 +
\sum _ {j , k } I _ {jk}  ^ {P} z _ {j} z _ {k} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109029.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
whatever  $  z _ {1} \dots z _ {m} $.  
 +
The information matrix also has the property of additivity. If  $  I  ^ {(i)} ( \theta ) $
 +
is the information matrix for a family of densities  $  p _ {i} ( \omega  ^ {(i)} ; t ) $,
 +
then the family
  
shows that unbiased estimation can nowhere be too exact. For arbitrary estimators the latter is not true. However, restrictions remain, e.g., for the average accuracy:
+
$$
 +
p ( \omega  ^ {(1)} \dots \omega  ^ {(N)};  t )  = \
 +
\prod_{i=1}^ { N }
 +
p _ {i} ( \omega  ^ {(i)} , t )
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109030.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
has information matrix  $  I _ {N} ( \theta ) = \sum _ {i} I  ^ {(i)} ( \theta ) $.
 +
In particular,  $  I _ {N} ( \theta ) = N I ( \theta ) $
 +
for  $  N $
 +
independent identically-distributed measurements. The information matrix allows one to characterize the statistical accuracy of decision rules in the problem of estimating the parameter of a distribution law. The variance of any unbiased estimator  $  \tau ( \omega ) = \tau ( \omega^{(1)} \dots \omega  ^ {(N)}) $
 +
of a scalar parameter  $  t $
 +
satisfies
  
where the average <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109031.png" /> on the left-hand side of (3) is with respect to the invariant volume <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109032.png" /> of any compact subdomain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109033.png" />,
+
$$
 +
{\mathsf D} _  \theta  \tau  \geq  \
 +
[ N I ( \theta ) ]  ^ {-1} .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109034.png" /></td> </tr></table>
+
The analogous matrix inequality for the information holds for estimators of a vector parameter. Its scalar consequence,
  
the remainder depends on the dimension of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109035.png" />. Inequalities (4) are asymptotically exact, while the maximum-likelihood estimator is asymptotically optimal in this sense.
+
$$ \tag{3 }
 +
{\mathsf E} _  \theta  \sum _ {j , k = 1 } ^ { m }
 +
[ \tau _ {j} ( \omega ) - \theta _ {j} ]
 +
[ \tau _ {k} ( \omega ) - \theta _ {k} ]
 +
I _ {jk} ( \theta ) \geq  \
 +
m N  ^ {-1} ,
 +
$$
  
At degenerate points, for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109036.png" />, joint estimation of parameters is difficult. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/i/i051/i051090/i05109037.png" /> in a certain domain, then joint estimation is not possible at all. Following R. Fisher [[#References|[1]]], one may say with appropriate qualification that the information matrix describes the amount of information (cf. [[Information, amount of|Information, amount of]]) on the parameters of a distribution law that is contained in the random sample.
+
shows that unbiased estimation can nowhere be too exact. For arbitrary estimators the latter is not true. However, restrictions remain, e.g., for the average accuracy:
  
====References====
+
$$ \tag{4 }
<table><TR><TD valign="top">[1]</TD> <TD valign="top"> R.A. Fisher,   "Theory of statistical estimation" ''Trans. Cambridge Philos. Soc.'' , '''22''' (1925pp. 700–725</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top"J.R. Barra,   "Notions fondamentales de statistique mathématique" , Dunod (1971)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  N.N. [N.N. Chentsov] Čencov,  "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian)</TD></TR></table>
+
\mathfrak M _ {\Theta  ^ \prime   } {\mathsf E} _  \theta <
 +
\tau - \theta | I ( \theta ) | \tau - \theta >
 +
  \geq   m N ^ {-1} + o ( N  ^ {-1} ) ,
 +
$$
  
 +
where the average  $  \mathfrak M $
 +
on the left-hand side of (3) is with respect to the invariant volume  $  V $
 +
of any compact subdomain  $  \Theta  ^  \prime  \subset  \Theta $,
  
 +
$$
 +
d V ( \theta )  = \
 +
\sqrt { \mathop{\rm det}  I ( \theta ) }  d \theta _ {1} \dots d \theta _ {m} ;
 +
$$
  
====Comments====
+
the remainder depends on the dimension of  $  \Theta  ^  \prime  $.
 +
Inequalities (4) are asymptotically exact, while the maximum-likelihood estimator is asymptotically optimal in this sense.
  
 +
At degenerate points, for which  $  \mathop{\rm det}  I ( \theta ) = 0 $,
 +
joint estimation of parameters is difficult. If  $  \mathop{\rm det}  I ( \theta ) = 0 $
 +
in a certain domain, then joint estimation is not possible at all. Following R. Fisher [[#References|[1]]], one may say with appropriate qualification that the information matrix describes the amount of information (cf. [[Information, amount of|Information, amount of]]) on the parameters of a distribution law that is contained in the random sample.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR></table>
+
<table>
 +
<TR><TD valign="top">[1]</TD> <TD valign="top">  R.A. Fisher,  "Theory of statistical estimation"  ''Trans. Cambridge Philos. Soc.'' , '''22'''  (1925)  pp. 700–725</TD></TR>
 +
<TR><TD valign="top">[2]</TD> <TD valign="top">  J.R. Barra,  "Notions fondamentales de statistique mathématique" , Dunod  (1971)</TD></TR>
 +
<TR><TD valign="top">[3]</TD> <TD valign="top">  N.N. [N.N. Chentsov] Čencov,  "Statistical decision rules and optimal inference" , Amer. Math. Soc.  (1982)  (Translated from Russian)</TD></TR>
 +
<TR><TD valign="top">[a1]</TD> <TD valign="top">  C.R. Rao,  "Linear statistical inference and its applications" , Wiley  (1965)</TD></TR>
 +
</table>

Latest revision as of 07:58, 14 January 2024


Fisher information

The covariance matrix of the informant. For a dominated family of probability distributions $ P ^ {t} ( d \omega ) $( cf. Density of a probability distribution) with densities $ p ( \omega ; t ) $ that depend sufficiently smoothly on a vector (in particular, numerical) parameter $ t = ( t _ {1} \dots t _ {m} ) \in \Theta $, the elements of the information matrix are defined, for $ t = \theta $, as

$$ \tag{1 } I _ {jk} ( \theta ) = \ \int\limits _ \Omega \left . \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {j} } \cdot \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } \right | _ {t = \theta } p ( \omega ; \theta ) d \mu , $$

where $ j , k = 1 \dots m $. For a scalar parameter $ t $ the information matrix can be described by one number — the variance (cf. Dispersion) of the informant.

The information matrix $ I ( \theta ) $ determines a non-negative quadratic differential form

$$ \tag{2 } \sum _ {j , k } I _ {jk} ( \theta ) d t _ {j} d t _ {k} = \Delta _ \theta , $$

endowing the family $ \{ P ^ {t} \} $ with a Riemannian metric. If the space $ \Omega $ of outcomes $ \omega $ is finite, then

$$ \Delta _ {P} = \ \sum _ { j } \frac{( d p _ {j} ) ^ {2} }{p _ {j} } ; \ \ p _ {j} = P ( \omega _ {j} ) ,\ \ \forall \omega _ {j} \in \Omega . $$

The Fisher quadratic differential form (2) is the unique (up to a constant multiplier) quadratic differential form that is invariant under the category of statistical decision rules. Because of this fact it arises in the formulation of many statistical laws.

Any measurable mapping $ f $ of the outcome space $ \Omega $ generates a new smooth family of distributions $ Q ^ {t} = P ^ {t} f ^ { - 1 } $ with information matrix $ I ^ {Q} ( \theta ) $, which is not greater than the initial one, i.e.

$$ \sum _ {j , k } I _ {jk} ^ {Q} z _ {j} z _ {k} \leq \ \sum _ {j , k } I _ {jk} ^ {P} z _ {j} z _ {k} , $$

whatever $ z _ {1} \dots z _ {m} $. The information matrix also has the property of additivity. If $ I ^ {(i)} ( \theta ) $ is the information matrix for a family of densities $ p _ {i} ( \omega ^ {(i)} ; t ) $, then the family

$$ p ( \omega ^ {(1)} \dots \omega ^ {(N)}; t ) = \ \prod_{i=1}^ { N } p _ {i} ( \omega ^ {(i)} , t ) $$

has information matrix $ I _ {N} ( \theta ) = \sum _ {i} I ^ {(i)} ( \theta ) $. In particular, $ I _ {N} ( \theta ) = N I ( \theta ) $ for $ N $ independent identically-distributed measurements. The information matrix allows one to characterize the statistical accuracy of decision rules in the problem of estimating the parameter of a distribution law. The variance of any unbiased estimator $ \tau ( \omega ) = \tau ( \omega^{(1)} \dots \omega ^ {(N)}) $ of a scalar parameter $ t $ satisfies

$$ {\mathsf D} _ \theta \tau \geq \ [ N I ( \theta ) ] ^ {-1} . $$

The analogous matrix inequality for the information holds for estimators of a vector parameter. Its scalar consequence,

$$ \tag{3 } {\mathsf E} _ \theta \sum _ {j , k = 1 } ^ { m } [ \tau _ {j} ( \omega ) - \theta _ {j} ] [ \tau _ {k} ( \omega ) - \theta _ {k} ] I _ {jk} ( \theta ) \geq \ m N ^ {-1} , $$

shows that unbiased estimation can nowhere be too exact. For arbitrary estimators the latter is not true. However, restrictions remain, e.g., for the average accuracy:

$$ \tag{4 } \mathfrak M _ {\Theta ^ \prime } {\mathsf E} _ \theta < \tau - \theta | I ( \theta ) | \tau - \theta > \geq m N ^ {-1} + o ( N ^ {-1} ) , $$

where the average $ \mathfrak M $ on the left-hand side of (3) is with respect to the invariant volume $ V $ of any compact subdomain $ \Theta ^ \prime \subset \Theta $,

$$ d V ( \theta ) = \ \sqrt { \mathop{\rm det} I ( \theta ) } d \theta _ {1} \dots d \theta _ {m} ; $$

the remainder depends on the dimension of $ \Theta ^ \prime $. Inequalities (4) are asymptotically exact, while the maximum-likelihood estimator is asymptotically optimal in this sense.

At degenerate points, for which $ \mathop{\rm det} I ( \theta ) = 0 $, joint estimation of parameters is difficult. If $ \mathop{\rm det} I ( \theta ) = 0 $ in a certain domain, then joint estimation is not possible at all. Following R. Fisher [1], one may say with appropriate qualification that the information matrix describes the amount of information (cf. Information, amount of) on the parameters of a distribution law that is contained in the random sample.

References

[1] R.A. Fisher, "Theory of statistical estimation" Trans. Cambridge Philos. Soc. , 22 (1925) pp. 700–725
[2] J.R. Barra, "Notions fondamentales de statistique mathématique" , Dunod (1971)
[3] N.N. [N.N. Chentsov] Čencov, "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian)
[a1] C.R. Rao, "Linear statistical inference and its applications" , Wiley (1965)
How to Cite This Entry:
Information matrix. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Information_matrix&oldid=12468
This article was adapted from an original article by N.N. Chentsov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article