Differential geometry in statistical inference
Many of the key concepts and results of statistical inference (cf. also Statistics) can be expressed efficiently in terms of differential geometry. Such re-expressions have been helpful both in illuminating classical statistical procedures and in developing new methodology. The role which differential geometry can play in statistical theory has been realized effectively only since the late 1970s. The historical development can be seen from [a1] and [a6].
Any (sufficiently regular) parametric statistical model determines two types of geometries on the parameter space: i) expected geometries; and ii) observed geometries. Both types are based on derivatives of the likelihood function. Construction of the observed geometries requires an appropriate auxiliary statistic. Each of these geometries consists of a Riemannian metric and a one-parameter family of affine connections (cf. Affine connection) on the parameter space, together with various higher-order geometrical objects. Observed geometries are more directly relevant to the actual data, whereas expected geometries are more closely related to the underlying statistical population as a whole.
A parametric statistical model with sampling space $ {\mathcal X} $ is a set of probability density functions $ p ( \cdot ; \omega ) $ on $ {\mathcal X} $( with respect to some dominating measure) indexed by a parameter $ \omega $ in the parameter space $ \Omega $( cf. also Probability measure; Density of a probability distribution). Given an observation $ x $ in $ {\mathcal X} $, the corresponding log-likelihood function $ l ( \cdot; x ) $ is defined by
$$ l ( \omega ; x ) = { \mathop{\rm log} } p ( x ; \omega ) . $$
In most cases of interest, $ \Omega $ is a differentiable manifold and $ l ( \cdot; x ) $ is smooth. The expected (or Fisher) information $ i $ is the Riemannian metric given in terms of some local coordinate system $ \omega = ( \omega ^ {1} \dots \omega ^ {d} ) $ on $ \Omega $ by
$$ i _ {rs } = {\mathsf E} \left [ { \frac{\partial l }{\partial \omega ^ {r} } } { \frac{\partial l }{\partial \omega ^ {s} } } \right ] , $$
where $ {\mathsf E} [ \cdot ] $ denotes mathematical expectation. For any real $ \alpha $, the expected $ \alpha $- connection $ {\nabla ^ \alpha } $([a1], [a9]) is the connection on $ \Omega $ with Christoffel symbols (cf. Christoffel symbol)
$$ {\Gamma {} ^ \alpha } ^ {r} _ {st } = {\Gamma {} ^ { 0 } } ^ {r} _ {st } + { \frac \alpha {2} } i ^ {ru } T _ {ust } , $$
where $ {\Gamma {} ^ { 0 } } ^ {r} _ {st } $ are the Christoffel symbols of the Levi-Civita connection of the expected information, $ [ i ^ {ru } ] $ denotes the inverse matrix of $ i $, and the expected skewness tensor $ T _ {rst } $ is defined by
$$ T _ {rst } = {\mathsf E} \left [ { \frac{\partial l }{\partial \omega ^ {r} } } { \frac{\partial l }{\partial \omega ^ {s} } } { \frac{\partial l }{\partial \omega ^ {t} } } \right ] . $$
The most important of the expected $ \alpha $- connections are the $ 1 $- connection (or exponential connection) and the $ ( - 1 ) $- connection (or mixture connection). The connections $ {\nabla ^ \alpha } $ and $ {\nabla ^ { {- } \alpha } } $ are dual with respect to the metric $ i $, i.e.
$$ {\Gamma {} ^ \alpha } ^ {u} _ {rs } i _ {ut } = {\Gamma {} ^ { {- } \alpha } } ^ {u} _ {rt } i _ {us } . $$
For the definition of observed geometries [a3], an auxiliary statistic $ a $ is required, such that the function $ x \mapsto ( {\widehat \omega } , a ) $ is bijective, where $ {\widehat \omega } $ denotes the maximum-likelihood estimate of $ \omega $( see Maximum-likelihood method). Given the value of $ a $, the corresponding observed geometry is based on the quantities
$$ {/ \; l} _ {r _ {1} \dots r _ {p} ; s _ {1} \dots s _ {q} } = \left . { \frac{\partial ^ {p + q } l ( \omega; {\widehat \omega } ,a ) }{\partial \omega ^ {r _ {1} } \dots \partial \omega ^ {r _ {p} } \partial { {\widehat \omega } } ^ {s _ {1} } \dots \partial { {\widehat \omega } } ^ {s _ {q} } } } \right | _ {\omega = {\widehat \omega } } , $$
where $ l $ is regarded as depending on the data $ x $ through $ ( {\widehat \omega } , a ) $. In particular, the observed information is the Riemannian metric $ {/ \; j} $ given by
$$ {/ \; j} _ {rs } = {/ \; l} _ {r;s } . $$
The observed $ \alpha $- connection $ { {{/ \; \nabla} } ^ \alpha } $ has Christoffel symbols
$$ { {{/ \; \Gamma} } {} ^ \alpha } ^ {r} _ {st } = { {{/ \; \Gamma} } {} ^ { 0 } } ^ {r} _ {st } + { \frac \alpha {2} } {/ \; j} ^ {ru } {/ \; T} _ {ust } , $$
where $ { {{/ \; \Gamma} } {} ^ { 0 } } ^ {r} _ {st } $ are the Christoffel symbols of the Levi-Civita connection of the observed information, $ [ {/ \; j} ^ {ru } ] $ denotes the inverse matrix of $ {/ \; j} $, and the observed skewness tensor $ {/ \; T} _ {rst } $ is defined by
$$ {/ \; T} _ {rst } = {/ \; l} _ {r;st } - {/ \; l} _ {st;r } . $$
The observed connections $ { {{/ \; \nabla} } ^ \alpha } $ and $ { {{/ \; \nabla} } ^ { {- } \alpha } } $ are dual with respect to the metric $ {/ \; j} $.
The expected and observed geometries can be placed in the common setting of geometries obtained from yokes (see [a4] and Yoke). Any yoke gives rise to families of tensors [a8]. In the statistical context, these tensors have various applications, notably in:
1) concise expressions [a8] for Bartlett correction factors, which enable adjustment of the likelihood ratio test statistic to bring its distribution close to the large-sample asymptotic distribution;
2) expansions ([a3], [a5]) for the probability density function of $ {\widehat \omega } $. Yokes also give rise to symplectic structures (see Symplectic structure; Yoke).
An offshoot of researches into differential-geometric aspects of statistical inference has been the exploration of invariant Taylor expansions (see Yoke) and of generalizations of tensors with transformation laws based on those of higher-order derivatives [a7]
Although differential geometry is of importance for parametric statistical models generally, it has been particularly useful in considering the following two major classes of models.
Exponential models, which have probability density functions of the form
$$ \tag{a1 } p ( x ; \omega ) = b ( x ) { \mathop{\rm exp} } \{ \left \langle {\omega, t ( x ) } \right \rangle - \kappa ( \omega ) \} , $$
where $ \Omega $ is an open subset of $ \mathbf R ^ {*d } $, and $ b : {\mathcal X} \rightarrow \mathbf R $, $ t : {\mathcal X} \rightarrow {\mathbf R ^ {d} } $ and $ \kappa : \Omega \rightarrow \mathbf R $ are suitable functions.
Transformation models, which are preserved under the action of a group on $ {\mathcal X} $.
For exponential models the expected and observed geometries coincide and are determined by the cumulant function $ \kappa $. Curved exponential models have the form (a1) but with $ \Omega $ a submanifold of $ \mathbf R ^ {*d } $. Various applications of differential geometry to curved exponential models are given in [a1].
In many applications the parameter space $ \Omega $ is finite-dimensional but the fairly recent and important area of semi-parametric modelling has led [a2] to consideration of cases in which $ \Omega $ is the product of a finite-dimensional manifold and a function space.
Apart from giving rise to various developments of a purely mathematical nature, concepts and results from the differential-geometric approach to statistics are diffusing into control theory, information theory, neural networks and quantum probability. Of particular interest is the connection [a10] with quantum analogues of exponential models.
References
[a1] | S-I. Amari, "Differential-geometrical methods in statistics" , Lecture Notes in Statistics , 28 , Springer (1985) |
[a2] | S-I. Amari, M. Kawanabe, "Information geometry of estimating functions in semi-parametric models" Bernoulli (1995) |
[a3] | O.E. Barndorff-Nielsen, "Likelihood and observed geometries" Ann. Stat. , 14 (1986) pp. 856–873 |
[a4] | O.E. Barndorff-Nielsen, "Differential geometry and statistics: some mathematical aspects" Indian J. Math. , 29 (1987) pp. 335–350 |
[a5] | O.E. Barndorff-Nielsen, "Parametric statistical models and likelihood" , Lecture Notes in Statistics , 50 , Springer (1988) |
[a6] | O.E. Barndorff-Nielsen, D. R. Cox, N. Reid, "The role of differential geometry in statistical theory" Int. Statist. Rev. , 54 (1986) pp. 83–96 |
[a7] | O.E. Barndorff-Nielsen, P.E. Jupp, W.S. Kendall, "Stochastic calculus, statistical asymptotics, Taylor strings and phyla" Ann. Fac. Sci. Toulouse, Sér. G , III (1994) pp. 5–62 |
[a8] | P. Blæsild, "Yokes and tensors derived from yokes" Ann. Inst. Stat. Math. , 43 (1991) pp. 95–113 |
[a9] | N.N. Chentsov, "Statistical decision rules and optimal inference" , Trans. Math. Monographs , 53 , Amer. Math. Soc. (1982) |
[a10] | H. Nagaoka, "Differential geometrical aspects of quantum state estimation and relative entropy" Techn. Report Dept. Math. Eng. Inf. Physics, Univ. Tokyo (1994) |
Differential geometry in statistical inference. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Differential_geometry_in_statistical_inference&oldid=46689