Namespaces
Variants
Actions

Difference between revisions of "Differential geometry in statistical inference"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (typos)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 +
<!--
 +
d1101901.png
 +
$#A+1 = 68 n = 0
 +
$#C+1 = 68 : ~/encyclopedia/old_files/data/D110/D.1100190 Differential geometry in statistical inference
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
Many of the key concepts and results of statistical inference (cf. also [[Statistics|Statistics]]) can be expressed efficiently in terms of [[Differential geometry|differential geometry]]. Such re-expressions have been helpful both in illuminating classical statistical procedures and in developing new methodology. The role which differential geometry can play in statistical theory has been realized effectively only since the late 1970s. The historical development can be seen from [[#References|[a1]]] and [[#References|[a6]]].
 
Many of the key concepts and results of statistical inference (cf. also [[Statistics|Statistics]]) can be expressed efficiently in terms of [[Differential geometry|differential geometry]]. Such re-expressions have been helpful both in illuminating classical statistical procedures and in developing new methodology. The role which differential geometry can play in statistical theory has been realized effectively only since the late 1970s. The historical development can be seen from [[#References|[a1]]] and [[#References|[a6]]].
  
 
Any (sufficiently regular) parametric statistical model determines two types of geometries on the parameter space: i) expected geometries; and ii) observed geometries. Both types are based on derivatives of the likelihood function. Construction of the observed geometries requires an appropriate auxiliary statistic. Each of these geometries consists of a [[Riemannian metric|Riemannian metric]] and a one-parameter family of affine connections (cf. [[Affine connection|Affine connection]]) on the parameter space, together with various higher-order geometrical objects. Observed geometries are more directly relevant to the actual data, whereas expected geometries are more closely related to the underlying statistical population as a whole.
 
Any (sufficiently regular) parametric statistical model determines two types of geometries on the parameter space: i) expected geometries; and ii) observed geometries. Both types are based on derivatives of the likelihood function. Construction of the observed geometries requires an appropriate auxiliary statistic. Each of these geometries consists of a [[Riemannian metric|Riemannian metric]] and a one-parameter family of affine connections (cf. [[Affine connection|Affine connection]]) on the parameter space, together with various higher-order geometrical objects. Observed geometries are more directly relevant to the actual data, whereas expected geometries are more closely related to the underlying statistical population as a whole.
  
A parametric statistical model with [[Sampling space|sampling space]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101901.png" /> is a set of probability density functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101902.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101903.png" /> (with respect to some dominating measure) indexed by a parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101904.png" /> in the parameter space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101905.png" /> (cf. also [[Probability measure|Probability measure]]; [[Density of a probability distribution|Density of a probability distribution]]). Given an observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101906.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101907.png" />, the corresponding log-likelihood function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101908.png" /> is defined by
+
A parametric statistical model with [[Sampling space|sampling space]] $  {\mathcal X} $
 +
is a set of probability density functions $  p ( \cdot ; \omega ) $
 +
on $  {\mathcal X} $(
 +
with respect to some dominating measure) indexed by a parameter $  \omega $
 +
in the parameter space $  \Omega $(
 +
cf. also [[Probability measure|Probability measure]]; [[Density of a probability distribution|Density of a probability distribution]]). Given an observation $  x $
 +
in $  {\mathcal X} $,  
 +
the corresponding log-likelihood function $  l ( \cdot; x ) $
 +
is defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d1101909.png" /></td> </tr></table>
+
$$
 +
l ( \omega ; x ) = { \mathop{\rm log} } p ( x ; \omega ) .
 +
$$
  
In most cases of interest, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019010.png" /> is a [[Differentiable manifold|differentiable manifold]] and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019011.png" /> is smooth. The expected (or Fisher) information <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019012.png" /> is the Riemannian metric given in terms of some local coordinate system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019013.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019014.png" /> by
+
In most cases of interest, $  \Omega $
 +
is a [[Differentiable manifold|differentiable manifold]] and $  l ( \cdot; x ) $
 +
is smooth. The expected (or Fisher) information $  i $
 +
is the Riemannian metric given in terms of some local coordinate system $  \omega = ( \omega  ^ {1} \dots \omega  ^ {d} ) $
 +
on $  \Omega $
 +
by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019015.png" /></td> </tr></table>
+
$$
 +
i _ {rs }  = {\mathsf E} \left [ {
 +
\frac{\partial  l }{\partial  \omega  ^ {r} }
 +
} {
 +
\frac{\partial  l }{\partial  \omega  ^ {s} }
 +
} \right ] ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019016.png" /> denotes [[Mathematical expectation|mathematical expectation]]. For any real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019017.png" />, the expected <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019019.png" />-connection <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019020.png" /> ([[#References|[a1]]], [[#References|[a9]]]) is the connection on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019021.png" /> with Christoffel symbols (cf. [[Christoffel symbol|Christoffel symbol]])
+
where $  {\mathsf E} [ \cdot ] $
 +
denotes [[Mathematical expectation|mathematical expectation]]. For any real $  \alpha $,  
 +
the expected $  \alpha $-
 +
connection $  {\nabla ^  \alpha  } $([[#References|[a1]]], [[#References|[a9]]]) is the connection on $  \Omega $
 +
with Christoffel symbols (cf. [[Christoffel symbol|Christoffel symbol]])
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019022.png" /></td> </tr></table>
+
$$
 +
{\Gamma {} ^  \alpha  }  ^ {r} _ {st }  = {\Gamma {} ^ { 0 }  }  ^ {r} _ {st }  + {
 +
\frac \alpha {2}
 +
} i ^ {ru } T _ {ust }  ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019023.png" /> are the Christoffel symbols of the [[Levi-Civita connection|Levi-Civita connection]] of the expected information, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019024.png" /> denotes the inverse matrix of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019025.png" />, and the expected skewness tensor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019026.png" /> is defined by
+
where $  {\Gamma {} ^ { 0 }  }  ^ {r} _ {st }  $
 +
are the Christoffel symbols of the [[Levi-Civita connection|Levi-Civita connection]] of the expected information, $  [ i ^ {ru } ] $
 +
denotes the inverse matrix of $  i $,  
 +
and the expected skewness tensor $  T _ {rst }  $
 +
is defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019027.png" /></td> </tr></table>
+
$$
 +
T _ {rst }  = {\mathsf E} \left [ {
 +
\frac{\partial  l }{\partial  \omega  ^ {r} }
 +
} {
 +
\frac{\partial  l }{\partial  \omega  ^ {s} }
 +
} {
 +
\frac{\partial  l }{\partial  \omega  ^ {t} }
 +
} \right ] .
 +
$$
  
The most important of the expected <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019028.png" />-connections are the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019029.png" />-connection (or exponential connection) and the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019030.png" />-connection (or mixture connection). The connections <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019031.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019032.png" /> are dual with respect to the metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019033.png" />, i.e.
+
The most important of the expected $  \alpha $-
 +
connections are the $  1 $-
 +
connection (or exponential connection) and the $  ( - 1 ) $-
 +
connection (or mixture connection). The connections $  {\nabla ^  \alpha  } $
 +
and $  {\nabla ^ { {- }  \alpha } } $
 +
are dual with respect to the metric $  i $,  
 +
i.e.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019034.png" /></td> </tr></table>
+
$$
 +
{\Gamma {} ^  \alpha  }  ^ {u} _ {rs }  i _ {ut }  = {\Gamma {} ^ { {- }  \alpha } }  ^ {u} _ {rt }  i _ {us }  .
 +
$$
  
For the definition of observed geometries [[#References|[a3]]], an auxiliary statistic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019035.png" /> is required, such that the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019036.png" /> is bijective, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019037.png" /> denotes the maximum-likelihood estimate of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019038.png" /> (see [[Maximum-likelihood method|Maximum-likelihood method]]). Given the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019039.png" />, the corresponding observed geometry is based on the quantities
+
For the definition of observed geometries [[#References|[a3]]], an auxiliary statistic $  a $
 +
is required, such that the function $  x \mapsto ( {\widehat \omega  } , a ) $
 +
is bijective, where $  {\widehat \omega  } $
 +
denotes the maximum-likelihood estimate of $  \omega $(
 +
see [[Maximum-likelihood method|Maximum-likelihood method]]). Given the value of $  a $,  
 +
the corresponding observed geometry is based on the quantities
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019040.png" /></td> </tr></table>
+
$$
 +
{/ \; l} _ {r _ {1}  \dots r _ {p} ; s _ {1} \dots s _ {q} } = \left . {
 +
\frac{\partial  ^ {p + q } l ( \omega; {\widehat \omega  } ,a ) }{\partial  \omega ^ {r _ {1} } \dots \partial  \omega ^ {r _ {p} } \partial  { {\widehat \omega  } } ^ {s _ {1} } \dots \partial  { {\widehat \omega  } } ^ {s _ {q} } }
 +
} \right | _ {\omega = {\widehat \omega  }  } ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019041.png" /> is regarded as depending on the data <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019042.png" /> through <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019043.png" />. In particular, the observed information is the Riemannian metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019044.png" /> given by
+
where $  l $
 +
is regarded as depending on the data $  x $
 +
through $  ( {\widehat \omega  } , a ) $.  
 +
In particular, the observed information is the Riemannian metric $  {/ \; j} $
 +
given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019045.png" /></td> </tr></table>
+
$$
 +
{/ \; j} _ {rs }  = {/ \; l} _ {r;s }  .
 +
$$
  
The observed <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019047.png" />-connection <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019048.png" /> has Christoffel symbols
+
The observed $  \alpha $-
 +
connection $  { {{/ \; \nabla} } ^  \alpha  } $
 +
has Christoffel symbols
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019049.png" /></td> </tr></table>
+
$$
 +
{ {{/ \; \Gamma} } {} ^  \alpha  }  ^ {r} _ {st }  = { {{/ \; \Gamma} } {} ^ { 0 }  }  ^ {r} _ {st }  + {
 +
\frac \alpha {2}
 +
} {/ \; j} ^ {ru } {/ \; T} _ {ust }  ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019050.png" /> are the Christoffel symbols of the Levi-Civita connection of the observed information, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019051.png" /> denotes the inverse matrix of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019052.png" />, and the observed skewness tensor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019053.png" /> is defined by
+
where $  { {{/ \; \Gamma} } {} ^ { 0 }  }  ^ {r} _ {st }  $
 +
are the Christoffel symbols of the Levi-Civita connection of the observed information, $  [ {/ \; j} ^ {ru } ] $
 +
denotes the inverse matrix of $  {/ \; j} $,  
 +
and the observed skewness tensor $  {/ \; T} _ {rst }  $
 +
is defined by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019054.png" /></td> </tr></table>
+
$$
 +
{/ \; T} _ {rst }  = {/ \; l} _ {r;st }  - {/ \; l} _ {st;r }  .
 +
$$
  
The observed connections <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019055.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019056.png" /> are dual with respect to the metric <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019057.png" />.
+
The observed connections $  { {{/ \; \nabla} } ^  \alpha  } $
 +
and $  { {{/ \; \nabla} } ^ { {- }  \alpha } } $
 +
are dual with respect to the metric $  {/ \; j} $.
  
 
The expected and observed geometries can be placed in the common setting of geometries obtained from yokes (see [[#References|[a4]]] and [[Yoke|Yoke]]). Any yoke gives rise to families of tensors [[#References|[a8]]]. In the statistical context, these tensors have various applications, notably in:
 
The expected and observed geometries can be placed in the common setting of geometries obtained from yokes (see [[#References|[a4]]] and [[Yoke|Yoke]]). Any yoke gives rise to families of tensors [[#References|[a8]]]. In the statistical context, these tensors have various applications, notably in:
Line 45: Line 135:
 
1) concise expressions [[#References|[a8]]] for Bartlett correction factors, which enable adjustment of the likelihood ratio test statistic to bring its distribution close to the large-sample asymptotic distribution;
 
1) concise expressions [[#References|[a8]]] for Bartlett correction factors, which enable adjustment of the likelihood ratio test statistic to bring its distribution close to the large-sample asymptotic distribution;
  
2) expansions ([[#References|[a3]]], [[#References|[a5]]]) for the probability density function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019058.png" />. Yokes also give rise to symplectic structures (see [[Symplectic structure|Symplectic structure]]; [[Yoke|Yoke]]).
+
2) expansions ([[#References|[a3]]], [[#References|[a5]]]) for the probability density function of $  {\widehat \omega  } $.  
 +
Yokes also give rise to symplectic structures (see [[Symplectic structure|Symplectic structure]]; [[Yoke|Yoke]]).
  
 
An offshoot of researches into differential-geometric aspects of statistical inference has been the exploration of invariant Taylor expansions (see [[Yoke|Yoke]]) and of generalizations of tensors with transformation laws based on those of higher-order derivatives [[#References|[a7]]]
 
An offshoot of researches into differential-geometric aspects of statistical inference has been the exploration of invariant Taylor expansions (see [[Yoke|Yoke]]) and of generalizations of tensors with transformation laws based on those of higher-order derivatives [[#References|[a7]]]
Line 53: Line 144:
 
Exponential models, which have probability density functions of the form
 
Exponential models, which have probability density functions of the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019059.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
$$ \tag{a1 }
 +
p ( x ; \omega ) = b ( x ) { \mathop{\rm exp} } \{ \left \langle  {\omega, t ( x ) } \right \rangle - \kappa ( \omega ) \} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019060.png" /> is an open subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019061.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019062.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019063.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019064.png" /> are suitable functions.
+
where $  \Omega $
 +
is an open subset of $  \mathbf R ^ {*d } $,  
 +
and $  b : {\mathcal X} \rightarrow \mathbf R $,  
 +
$  t : {\mathcal X} \rightarrow {\mathbf R  ^ {d} } $
 +
and $  \kappa : \Omega \rightarrow \mathbf R $
 +
are suitable functions.
  
Transformation models, which are preserved under the action of a group on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019065.png" />.
+
Transformation models, which are preserved under the action of a group on $  {\mathcal X} $.
  
For exponential models the expected and observed geometries coincide and are determined by the cumulant function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019066.png" />. Curved exponential models have the form (a1) but with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019067.png" /> a submanifold of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019068.png" />. Various applications of differential geometry to curved exponential models are given in [[#References|[a1]]].
+
For exponential models the expected and observed geometries coincide and are determined by the cumulant function $  \kappa $.  
 +
Curved exponential models have the form (a1) but with $  \Omega $
 +
a submanifold of $  \mathbf R ^ {*d } $.  
 +
Various applications of differential geometry to curved exponential models are given in [[#References|[a1]]].
  
In many applications the parameter space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019069.png" /> is finite-dimensional but the fairly recent and important area of semi-parametric modelling has led [[#References|[a2]]] to consideration of cases in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d110/d110190/d11019070.png" /> is the product of a finite-dimensional manifold and a function space.
+
In many applications the parameter space $  \Omega $
 +
is finite-dimensional but the fairly recent and important area of semi-parametric modelling has led [[#References|[a2]]] to consideration of cases in which $  \Omega $
 +
is the product of a finite-dimensional manifold and a function space.
  
 
Apart from giving rise to various developments of a purely mathematical nature, concepts and results from the differential-geometric approach to statistics are diffusing into control theory, information theory, neural networks and [[Quantum probability|quantum probability]]. Of particular interest is the connection [[#References|[a10]]] with quantum analogues of exponential models.
 
Apart from giving rise to various developments of a purely mathematical nature, concepts and results from the differential-geometric approach to statistics are diffusing into control theory, information theory, neural networks and [[Quantum probability|quantum probability]]. Of particular interest is the connection [[#References|[a10]]] with quantum analogues of exponential models.

Latest revision as of 19:32, 5 June 2020


Many of the key concepts and results of statistical inference (cf. also Statistics) can be expressed efficiently in terms of differential geometry. Such re-expressions have been helpful both in illuminating classical statistical procedures and in developing new methodology. The role which differential geometry can play in statistical theory has been realized effectively only since the late 1970s. The historical development can be seen from [a1] and [a6].

Any (sufficiently regular) parametric statistical model determines two types of geometries on the parameter space: i) expected geometries; and ii) observed geometries. Both types are based on derivatives of the likelihood function. Construction of the observed geometries requires an appropriate auxiliary statistic. Each of these geometries consists of a Riemannian metric and a one-parameter family of affine connections (cf. Affine connection) on the parameter space, together with various higher-order geometrical objects. Observed geometries are more directly relevant to the actual data, whereas expected geometries are more closely related to the underlying statistical population as a whole.

A parametric statistical model with sampling space $ {\mathcal X} $ is a set of probability density functions $ p ( \cdot ; \omega ) $ on $ {\mathcal X} $( with respect to some dominating measure) indexed by a parameter $ \omega $ in the parameter space $ \Omega $( cf. also Probability measure; Density of a probability distribution). Given an observation $ x $ in $ {\mathcal X} $, the corresponding log-likelihood function $ l ( \cdot; x ) $ is defined by

$$ l ( \omega ; x ) = { \mathop{\rm log} } p ( x ; \omega ) . $$

In most cases of interest, $ \Omega $ is a differentiable manifold and $ l ( \cdot; x ) $ is smooth. The expected (or Fisher) information $ i $ is the Riemannian metric given in terms of some local coordinate system $ \omega = ( \omega ^ {1} \dots \omega ^ {d} ) $ on $ \Omega $ by

$$ i _ {rs } = {\mathsf E} \left [ { \frac{\partial l }{\partial \omega ^ {r} } } { \frac{\partial l }{\partial \omega ^ {s} } } \right ] , $$

where $ {\mathsf E} [ \cdot ] $ denotes mathematical expectation. For any real $ \alpha $, the expected $ \alpha $- connection $ {\nabla ^ \alpha } $([a1], [a9]) is the connection on $ \Omega $ with Christoffel symbols (cf. Christoffel symbol)

$$ {\Gamma {} ^ \alpha } ^ {r} _ {st } = {\Gamma {} ^ { 0 } } ^ {r} _ {st } + { \frac \alpha {2} } i ^ {ru } T _ {ust } , $$

where $ {\Gamma {} ^ { 0 } } ^ {r} _ {st } $ are the Christoffel symbols of the Levi-Civita connection of the expected information, $ [ i ^ {ru } ] $ denotes the inverse matrix of $ i $, and the expected skewness tensor $ T _ {rst } $ is defined by

$$ T _ {rst } = {\mathsf E} \left [ { \frac{\partial l }{\partial \omega ^ {r} } } { \frac{\partial l }{\partial \omega ^ {s} } } { \frac{\partial l }{\partial \omega ^ {t} } } \right ] . $$

The most important of the expected $ \alpha $- connections are the $ 1 $- connection (or exponential connection) and the $ ( - 1 ) $- connection (or mixture connection). The connections $ {\nabla ^ \alpha } $ and $ {\nabla ^ { {- } \alpha } } $ are dual with respect to the metric $ i $, i.e.

$$ {\Gamma {} ^ \alpha } ^ {u} _ {rs } i _ {ut } = {\Gamma {} ^ { {- } \alpha } } ^ {u} _ {rt } i _ {us } . $$

For the definition of observed geometries [a3], an auxiliary statistic $ a $ is required, such that the function $ x \mapsto ( {\widehat \omega } , a ) $ is bijective, where $ {\widehat \omega } $ denotes the maximum-likelihood estimate of $ \omega $( see Maximum-likelihood method). Given the value of $ a $, the corresponding observed geometry is based on the quantities

$$ {/ \; l} _ {r _ {1} \dots r _ {p} ; s _ {1} \dots s _ {q} } = \left . { \frac{\partial ^ {p + q } l ( \omega; {\widehat \omega } ,a ) }{\partial \omega ^ {r _ {1} } \dots \partial \omega ^ {r _ {p} } \partial { {\widehat \omega } } ^ {s _ {1} } \dots \partial { {\widehat \omega } } ^ {s _ {q} } } } \right | _ {\omega = {\widehat \omega } } , $$

where $ l $ is regarded as depending on the data $ x $ through $ ( {\widehat \omega } , a ) $. In particular, the observed information is the Riemannian metric $ {/ \; j} $ given by

$$ {/ \; j} _ {rs } = {/ \; l} _ {r;s } . $$

The observed $ \alpha $- connection $ { {{/ \; \nabla} } ^ \alpha } $ has Christoffel symbols

$$ { {{/ \; \Gamma} } {} ^ \alpha } ^ {r} _ {st } = { {{/ \; \Gamma} } {} ^ { 0 } } ^ {r} _ {st } + { \frac \alpha {2} } {/ \; j} ^ {ru } {/ \; T} _ {ust } , $$

where $ { {{/ \; \Gamma} } {} ^ { 0 } } ^ {r} _ {st } $ are the Christoffel symbols of the Levi-Civita connection of the observed information, $ [ {/ \; j} ^ {ru } ] $ denotes the inverse matrix of $ {/ \; j} $, and the observed skewness tensor $ {/ \; T} _ {rst } $ is defined by

$$ {/ \; T} _ {rst } = {/ \; l} _ {r;st } - {/ \; l} _ {st;r } . $$

The observed connections $ { {{/ \; \nabla} } ^ \alpha } $ and $ { {{/ \; \nabla} } ^ { {- } \alpha } } $ are dual with respect to the metric $ {/ \; j} $.

The expected and observed geometries can be placed in the common setting of geometries obtained from yokes (see [a4] and Yoke). Any yoke gives rise to families of tensors [a8]. In the statistical context, these tensors have various applications, notably in:

1) concise expressions [a8] for Bartlett correction factors, which enable adjustment of the likelihood ratio test statistic to bring its distribution close to the large-sample asymptotic distribution;

2) expansions ([a3], [a5]) for the probability density function of $ {\widehat \omega } $. Yokes also give rise to symplectic structures (see Symplectic structure; Yoke).

An offshoot of researches into differential-geometric aspects of statistical inference has been the exploration of invariant Taylor expansions (see Yoke) and of generalizations of tensors with transformation laws based on those of higher-order derivatives [a7]

Although differential geometry is of importance for parametric statistical models generally, it has been particularly useful in considering the following two major classes of models.

Exponential models, which have probability density functions of the form

$$ \tag{a1 } p ( x ; \omega ) = b ( x ) { \mathop{\rm exp} } \{ \left \langle {\omega, t ( x ) } \right \rangle - \kappa ( \omega ) \} , $$

where $ \Omega $ is an open subset of $ \mathbf R ^ {*d } $, and $ b : {\mathcal X} \rightarrow \mathbf R $, $ t : {\mathcal X} \rightarrow {\mathbf R ^ {d} } $ and $ \kappa : \Omega \rightarrow \mathbf R $ are suitable functions.

Transformation models, which are preserved under the action of a group on $ {\mathcal X} $.

For exponential models the expected and observed geometries coincide and are determined by the cumulant function $ \kappa $. Curved exponential models have the form (a1) but with $ \Omega $ a submanifold of $ \mathbf R ^ {*d } $. Various applications of differential geometry to curved exponential models are given in [a1].

In many applications the parameter space $ \Omega $ is finite-dimensional but the fairly recent and important area of semi-parametric modelling has led [a2] to consideration of cases in which $ \Omega $ is the product of a finite-dimensional manifold and a function space.

Apart from giving rise to various developments of a purely mathematical nature, concepts and results from the differential-geometric approach to statistics are diffusing into control theory, information theory, neural networks and quantum probability. Of particular interest is the connection [a10] with quantum analogues of exponential models.

References

[a1] S-I. Amari, "Differential-geometrical methods in statistics" , Lecture Notes in Statistics , 28 , Springer (1985)
[a2] S-I. Amari, M. Kawanabe, "Information geometry of estimating functions in semi-parametric models" Bernoulli (1995)
[a3] O.E. Barndorff-Nielsen, "Likelihood and observed geometries" Ann. Stat. , 14 (1986) pp. 856–873
[a4] O.E. Barndorff-Nielsen, "Differential geometry and statistics: some mathematical aspects" Indian J. Math. , 29 (1987) pp. 335–350
[a5] O.E. Barndorff-Nielsen, "Parametric statistical models and likelihood" , Lecture Notes in Statistics , 50 , Springer (1988)
[a6] O.E. Barndorff-Nielsen, D. R. Cox, N. Reid, "The role of differential geometry in statistical theory" Int. Statist. Rev. , 54 (1986) pp. 83–96
[a7] O.E. Barndorff-Nielsen, P.E. Jupp, W.S. Kendall, "Stochastic calculus, statistical asymptotics, Taylor strings and phyla" Ann. Fac. Sci. Toulouse, Sér. G , III (1994) pp. 5–62
[a8] P. Blæsild, "Yokes and tensors derived from yokes" Ann. Inst. Stat. Math. , 43 (1991) pp. 95–113
[a9] N.N. Chentsov, "Statistical decision rules and optimal inference" , Trans. Math. Monographs , 53 , Amer. Math. Soc. (1982)
[a10] H. Nagaoka, "Differential geometrical aspects of quantum state estimation and relative entropy" Techn. Report Dept. Math. Eng. Inf. Physics, Univ. Tokyo (1994)
How to Cite This Entry:
Differential geometry in statistical inference. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Differential_geometry_in_statistical_inference&oldid=14333
This article was adapted from an original article by P.E. JuppO.E. Barndorff-Nielsen (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article