Namespaces
Variants
Actions

Difference between revisions of "Natural exponential family of probability distributions"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 126 formulas out of 126 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
 
Line 1: Line 1:
Given a finite-dimensional real linear space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200201.png" />, denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200202.png" /> the space of linear forms <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200203.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200204.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200205.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200206.png" /> be the set of positive Radon measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200207.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200208.png" /> with the following two properties (cf. also [[Radon measure|Radon measure]]):
+
<!--This article has been texified automatically. Since there was no Nroff source code for this article,  
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
  
i) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n1200209.png" /> is not concentrated on some affine hyperplane of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002010.png" />;
+
Out of 126 formulas, 126 were replaced by TEX code.-->
  
ii) considering the interior <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002011.png" /> of the convex set of those <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002012.png" /> such that
+
{{TEX|semi-auto}}{{TEX|done}}
 +
Given a finite-dimensional real linear space $E$, denote by $E ^ { * }$ the space of linear forms $\theta$ from $E$ to $\mathbf{R}$. Let $\mathcal{M} ( E )$ be the set of positive Radon measures $\mu$ on $E$ with the following two properties (cf. also [[Radon measure|Radon measure]]):
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002013.png" /></td> </tr></table>
+
i) $\mu$ is not concentrated on some affine hyperplane of $E$;
  
is finite, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002014.png" /> is not empty. For notation, see also [[Exponential family of probability distributions|Exponential family of probability distributions]].
+
ii) considering the interior $\Theta ( \mu )$ of the convex set of those $\theta \in E ^ { * }$ such that
  
For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002015.png" />, the cumulant function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002016.png" /> is a real-analytic strictly convex function defined on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002017.png" />. Thus, its differential
+
\begin{equation*} L _ { \mu } ( \theta ) = \int _ { E } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ) \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002018.png" /></td> </tr></table>
+
is finite, then $\Theta ( \mu )$ is not empty. For notation, see also [[Exponential family of probability distributions|Exponential family of probability distributions]].
  
is injective. Denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002019.png" /> its image, and by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002020.png" /> the inverse mapping of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002021.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002022.png" /> onto <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002023.png" />. The natural exponential family of probability distributions (abbreviated, NEF) generated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002024.png" /> is the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002025.png" /> of probabilities
+
For $\mu \in \mathcal{M} ( E )$, the cumulant function $k _ { \mu } = \operatorname { log } L _ { \mu }$ is a real-analytic strictly convex function defined on $\Theta ( \mu )$. Thus, its differential
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002026.png" /></td> </tr></table>
+
\begin{equation*} \theta \mapsto k ^ { \prime } \mu ( \theta ) , \Theta ( \mu ) \rightarrow E, \end{equation*}
  
when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002027.png" /> varies in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002028.png" />. Note that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002029.png" /> is such that the two sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002030.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002031.png" /> coincide if and only if there exist an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002032.png" /> and a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002033.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002034.png" />. The mean of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002035.png" /> is given by
+
is injective. Denote by $M _ { \mu } \subset E$ its image, and by $\psi _ { \mu }$ the inverse mapping of $k ^ { \prime  \mu}$ from $M _ { \mu }$ onto $\Theta ( \mu )$. The natural exponential family of probability distributions (abbreviated, NEF) generated by $\mu$ is the set $F = F ( \mu )$ of probabilities
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002036.png" /></td> </tr></table>
+
\begin{equation*} \mathsf{P} ( \theta , \mu ) = \operatorname { exp } [ \langle \theta , x \rangle - k _ { \mu } ( \theta ) ] \mu ( d x ), \end{equation*}
  
and for this reason <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002037.png" /> is called the domain of the means of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002038.png" />. It is easily seen that it depends on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002039.png" /> and not on a particular <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002040.png" /> generating <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002041.png" />. Also,
+
when $\theta$ varies in $\Theta ( \mu )$. Note that $\mu ^ { \prime } \in \mathcal{M} ( E )$ is such that the two sets $F ( \mu )$ and $F ( \mu ^ { \prime } )$ coincide if and only if there exist an $\alpha \in E ^ { * }$ and a $b \in \mathbf{R}$ such that $\mu ^ { \prime } ( d x ) = \operatorname { exp } \langle \alpha , x \rangle \mu ( d x )$. The mean of $\mathsf{P} ( \theta , \mu )$ is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002042.png" /></td> </tr></table>
+
\begin{equation*} m = k ^ { \prime \mu } ( \theta ) = \int _ {  E  } x \mathsf{P} ( \theta , \mu ) ( d x ), \end{equation*}
  
is the parametrization of the natural exponential family by the mean. The domain of the means is contained in the interior <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002043.png" /> of the convex hull of the support of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002044.png" />. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002045.png" />, the natural exponential family is said to be steep. A sufficient condition for steepness is that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002046.png" />. The natural exponential family generated by a stable distribution in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002047.png" /> with parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002048.png" /> provides an example of a non-steep natural exponential family. A more elementary example is given by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002049.png" />.
+
and for this reason $M _ { \mu } = M _ { F }$ is called the domain of the means of $F$. It is easily seen that it depends on $F$ and not on a particular $\mu$ generating $F$. Also,
  
For one observation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002050.png" />, the maximum-likelihood estimator (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]) of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002051.png" /> is simply <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002052.png" />: it has to be in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002053.png" /> to be defined, and in this case the maximum-likelihood estimator of the canonical parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002054.png" /> is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002055.png" />. In the case of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002056.png" /> observations, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002057.png" /> has to be replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002058.png" />. Note that since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002059.png" /> is an open set, and from the strong [[Law of large numbers|law of large numbers]], almost surely there exists an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002060.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002061.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002062.png" /> and finally <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002063.png" /> will be well-defined after enough observations.
+
\begin{equation*} m \mapsto P ( \psi _ { \mu } ( m ) , \mu ) = P ( m , F ) , M _ { F } \rightarrow F, \end{equation*}
  
Exponential families have also a striking property in information theory. That is, they minimize the [[Entropy|entropy]] in the following sense: Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002064.png" /> be a natural exponential family on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002065.png" /> and fix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002066.png" />. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002067.png" /> be the convex set of probabilities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002068.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002069.png" /> which are absolutely continuous with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002070.png" /> and such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002071.png" />. Then the minimum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002072.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002073.png" /> is reached on the unique point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002074.png" />. Extension to general exponential families is trivial. See, e.g., [[#References|[a5]]], 3(A).
+
is the parametrization of the natural exponential family by the mean. The domain of the means is contained in the interior $C _ { F }$ of the convex hull of the support of $F$. When $C _ { F } = M _ { F }$, the natural exponential family is said to be steep. A sufficient condition for steepness is that $D ( \mu ) = \Theta ( \mu )$. The natural exponential family generated by a stable distribution in $\mathcal{M} ( \mathbf{R} )$ with parameter $\alpha \in [ 1,2 )$ provides an example of a non-steep natural exponential family. A more elementary example is given by $\mu = \sum _ { x = 1 } ^ { \infty } n ^ { - 3 } \delta _ { n }$.
  
Denote by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002075.png" /> the covariance operator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002076.png" />. The space of symmetric linear operators from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002077.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002078.png" /> is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002079.png" />, and the mapping from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002080.png" /> to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002081.png" /> defined by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002082.png" /> is called the variance function of the natural exponential family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002083.png" />.
+
For one observation $X$, the maximum-likelihood estimator (cf. also [[Maximum-likelihood method|Maximum-likelihood method]]) of $m$ is simply $\widehat { m } = X$: it has to be in $M _ { F }$ to be defined, and in this case the maximum-likelihood estimator of the canonical parameter $\theta$ is $\hat { \theta } = \psi _ { \mu } ( X )$. In the case of $n$ observations, $X$ has to be replaced by $\bar{X} _ { n } = 1 / n ( X _ { 1 } + \ldots + X _ { n } )$. Note that since $M _ { F }$ is an open set, and from the strong [[Law of large numbers|law of large numbers]], almost surely there exists an $N$ such that $\overline{X} _ { n } \in M _ { F }$ for $n \geq N$ and finally $\hat { \theta } _ { n } = \psi _ { \mu } ( \overline{X} _ { n } )$ will be well-defined after enough observations.
  
Because it satisfies the relation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002084.png" />, the variance function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002085.png" /> determines the natural exponential family <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002086.png" />. For each <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002087.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002088.png" /> is a positive-definite operator. The variance function also satisfies the following condition: For all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002089.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002090.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002091.png" /> one has
+
Exponential families have also a striking property in information theory. That is, they minimize the [[Entropy|entropy]] in the following sense: Let $F = F ( \mu )$ be a natural exponential family on $E$ and fix $m \in M _ { F }$. Let $C$ be the convex set of probabilities $\textsf{P}$ on $E$ which are absolutely continuous with respect to $\mu$ and such that $\int _ { E  }x d \mathsf{P}( x ) = m$. Then the minimum of $\int _ { E } \operatorname { log } ( d \mathsf{P} / d \mu ) d \mathsf{P}$ on $C$ is reached on the unique point $\mathsf{P} ( m , F )$. Extension to general exponential families is trivial. See, e.g., [[#References|[a5]]], 3(A).
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002092.png" /></td> </tr></table>
+
Denote by $V _ { F } ( m )$ the covariance operator of $\mathsf{P} ( m , F )$. The space of symmetric linear operators from $E ^ { * }$ to $E$ is denoted by $L _ { s } ( E ^ { * } , E )$, and the mapping from $M _ { F }$ to $L _ { s } ( E ^ { * } , E )$ defined by $m \mapsto V _ { F } ( m )$ is called the variance function of the natural exponential family $F$.
  
For dimension one, the variance function provides an explicit formula for the large deviations theorem: If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002093.png" /> are in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002094.png" />, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002095.png" /> are independent real random variables with the same distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002096.png" />, then
+
Because it satisfies the relation $k _ { \mu } ^ { \prime \prime } ( \theta ) = V _ { F } ( k _ { \mu } ^ { \prime } ( \theta ) )$, the variance function $V _ { F }$ determines the natural exponential family $F$. For each $m$, $V _ { F } ( m )$ is a positive-definite operator. The variance function also satisfies the following condition: For all $\alpha$ and $\beta$ in $E ^ { * }$ one has
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002097.png" /></td> </tr></table>
+
\begin{equation*} V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \alpha ) ) ( \beta ) = V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \beta ) ) ( \alpha ). \end{equation*}
  
The second member can be easily computed for natural exponential families on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002098.png" /> whose variance functions are simple. It happens that a kind of vague principle like  "the simpler VF is, more useful is F"  holds. C. Morris [[#References|[a9]]] has observed that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n12002099.png" /> is the restriction to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020100.png" /> of a polynomial of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020101.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020102.png" /> is either normal, Poisson, binomial, negative binomial, gamma, or hyperbolic (i.e., with a Fourier transform <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020103.png" />), at least up to an affinity and a convolution power. Similarly, in [[#References|[a8]]], the classification in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020104.png" /> types of the variance functions which are third-degree polynomials is performed: the corresponding distributions are also classical, but occur in the literature as distributions of stopping times of Lévy processes or random walks in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020105.png" /> (cf. also [[Random walk|Random walk]]; [[Stopping time|Stopping time]]). Other classes, like <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020106.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020107.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020108.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020109.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020110.png" /> are polynomials of low degree, have also been classified (see [[#References|[a1]]] and [[#References|[a7]]]).
+
For dimension one, the variance function provides an explicit formula for the large deviations theorem: If $m _ { 0 } &lt; m$ are in $M _ { F }$, and if $X _ { 1 } , \dots , X _ { n } , \dots$ are independent real random variables with the same distribution $\mathsf{P} ( m _ { 0 } , F )$, then
  
In higher dimensions the same principle holds. For instance, M. Casalis [[#References|[a3]]] has shown that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020111.png" /> is homogeneous of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020112.png" /> if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020113.png" /> is a family of Wishart distributions on a Euclidean Jordan algebra. She [[#References|[a4]]] has also found the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020114.png" /> types of natural exponential families on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020115.png" /> whose variance function is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020116.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020117.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020118.png" /> are real <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020119.png" />-matrices and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020120.png" />, thus providing a generalization of the above-mentioned result by Morris. Another extension is obtained in [[#References|[a2]]], where all non-trivial natural exponential families in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020121.png" /> whose marginal distributions are still natural exponential families are found; surprisingly, these marginal distributions are necessarily of Morris type.
+
\begin{equation*} \operatorname { lim } _ { n \rightarrow \infty } \frac { 1 } { n } \operatorname { log } \mathsf {P} [ X _ { 1 } + \ldots + X _ { n } \geq n m ] = \int _ { m _ { 0 } } ^ { m } \frac { x - m } { V _ { F } ( x ) } d x. \end{equation*}
  
Finally, the cubic class is generalized in a deep way to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020122.png" /> in [[#References|[a6]]].
+
The second member can be easily computed for natural exponential families on $\mathbf{R}$ whose variance functions are simple. It happens that a kind of vague principle like  "the simpler VF is, more useful is F"  holds. C. Morris [[#References|[a9]]] has observed that $V _ { F }$ is the restriction to $M _ { F }$ of a polynomial of degree $\leq 2$ if and only if $F$ is either normal, Poisson, binomial, negative binomial, gamma, or hyperbolic (i.e., with a Fourier transform $( \operatorname { cos } t ) ^ { - 1 }$), at least up to an affinity and a convolution power. Similarly, in [[#References|[a8]]], the classification in $6$ types of the variance functions which are third-degree polynomials is performed: the corresponding distributions are also classical, but occur in the literature as distributions of stopping times of Lévy processes or random walks in $\bf Z$ (cf. also [[Random walk|Random walk]]; [[Stopping time|Stopping time]]). Other classes, like $V _ { F } ( m ) = A m ^ { a }$ or $V _ { F } = P R + Q \sqrt { R }$, where $P$, $Q$, $R$ are polynomials of low degree, have also been classified (see [[#References|[a1]]] and [[#References|[a7]]]).
 +
 
 +
In higher dimensions the same principle holds. For instance, M. Casalis [[#References|[a3]]] has shown that $V _ { F }$ is homogeneous of degree $2$ if and only if $F$ is a family of Wishart distributions on a Euclidean Jordan algebra. She [[#References|[a4]]] has also found the $2 d + 4$ types of natural exponential families on $\mathbf{R} ^ { d }$ whose variance function is $am \otimes m + m _ { 1 } B _ { 1 } + \ldots + m _ { d } B _ { d } + C$, where $B _ { j }$ and $C$ are real $( d , d )$-matrices and $a \in \bf R$, thus providing a generalization of the above-mentioned result by Morris. Another extension is obtained in [[#References|[a2]]], where all non-trivial natural exponential families in $\mathbf{R} ^ { d }$ whose marginal distributions are still natural exponential families are found; surprisingly, these marginal distributions are necessarily of Morris type.
 +
 
 +
Finally, the cubic class is generalized in a deep way to $\mathbf{R} ^ { d }$ in [[#References|[a6]]].
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  S. Bar-Lev,  P. Enis,  "Reproducibility and natural exponential families with power variance functions"  ''Ann. Statist.'' , '''14'''  (1987)  pp. 1507–1522</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  S. Bar-Lev,  D. Bshouty,  P. Enis,  G. Letac,  I-Li Lu,  D. Richards,  "The diagonal multivariate natural exponential families and their classification"  ''J. Theor. Probab.'' , '''7'''  (1994)  pp. 883–929</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  M. Casalis,  "Les familles exponentielles à variance quadratique homogæne sont des lois de Wishart sur un c spone symétrisque"  ''C.R. Acad. Sci. Paris Ser. I'' , '''312'''  (1991)  pp. 537–540</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  M. Casalis,  "The <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020123.png" /> simple quadratic natural exponential families on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020124.png" />"  ''Ann. Statist.'' , '''24'''  (1996)  pp. 1828–1854</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  I. Csiszár,  "I-Divergence, geometry of probability distributions, and minimization problems"  ''Ann. of Probab.'' , '''3'''  (1975)  pp. 146–158</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  A. Hassaïri,  "La classification des familles exponentielles naturelles sur <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020125.png" /> par l'action du groupe linéaire de <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/n/n120/n120020/n120020126.png" />"  ''C.R. Acad. Sci. Paris Ser. I'' , '''315'''  (1992)  pp. 207–210</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  C. Kokonendji,  "Sur les familles exponentielles naturelles de grand-Babel"  ''Ann. Fac. Sci. Toulouse'' , '''4'''  (1995)  pp. 763–800</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  G. Letac,  M. Mora,  "Natural exponential families with cubic variance functions"  ''Ann. Statist.'' , '''18'''  (1990)  pp. 1–37</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  C.N. Morris,  "Natural exponential families with quadratic variance functions"  ''Ann. Statist.'' , '''10'''  (1982)  pp. 65–80</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  S. Bar-Lev,  P. Enis,  "Reproducibility and natural exponential families with power variance functions"  ''Ann. Statist.'' , '''14'''  (1987)  pp. 1507–1522</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  S. Bar-Lev,  D. Bshouty,  P. Enis,  G. Letac,  I-Li Lu,  D. Richards,  "The diagonal multivariate natural exponential families and their classification"  ''J. Theor. Probab.'' , '''7'''  (1994)  pp. 883–929</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  M. Casalis,  "Les familles exponentielles à variance quadratique homogæne sont des lois de Wishart sur un c spone symétrisque"  ''C.R. Acad. Sci. Paris Ser. I'' , '''312'''  (1991)  pp. 537–540</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  M. Casalis,  "The $2 d + 4$ simple quadratic natural exponential families on $\mathbf{R} ^ { d }$"  ''Ann. Statist.'' , '''24'''  (1996)  pp. 1828–1854</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  I. Csiszár,  "I-Divergence, geometry of probability distributions, and minimization problems"  ''Ann. of Probab.'' , '''3'''  (1975)  pp. 146–158</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  A. Hassaïri,  "La classification des familles exponentielles naturelles sur ${\bf R} ^ { n }$ par l'action du groupe linéaire de $\mathbf R ^ { n + 1 }$"  ''C.R. Acad. Sci. Paris Ser. I'' , '''315'''  (1992)  pp. 207–210</td></tr><tr><td valign="top">[a7]</td> <td valign="top">  C. Kokonendji,  "Sur les familles exponentielles naturelles de grand-Babel"  ''Ann. Fac. Sci. Toulouse'' , '''4'''  (1995)  pp. 763–800</td></tr><tr><td valign="top">[a8]</td> <td valign="top">  G. Letac,  M. Mora,  "Natural exponential families with cubic variance functions"  ''Ann. Statist.'' , '''18'''  (1990)  pp. 1–37</td></tr><tr><td valign="top">[a9]</td> <td valign="top">  C.N. Morris,  "Natural exponential families with quadratic variance functions"  ''Ann. Statist.'' , '''10'''  (1982)  pp. 65–80</td></tr></table>

Latest revision as of 15:30, 1 July 2020

Given a finite-dimensional real linear space $E$, denote by $E ^ { * }$ the space of linear forms $\theta$ from $E$ to $\mathbf{R}$. Let $\mathcal{M} ( E )$ be the set of positive Radon measures $\mu$ on $E$ with the following two properties (cf. also Radon measure):

i) $\mu$ is not concentrated on some affine hyperplane of $E$;

ii) considering the interior $\Theta ( \mu )$ of the convex set of those $\theta \in E ^ { * }$ such that

\begin{equation*} L _ { \mu } ( \theta ) = \int _ { E } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ) \end{equation*}

is finite, then $\Theta ( \mu )$ is not empty. For notation, see also Exponential family of probability distributions.

For $\mu \in \mathcal{M} ( E )$, the cumulant function $k _ { \mu } = \operatorname { log } L _ { \mu }$ is a real-analytic strictly convex function defined on $\Theta ( \mu )$. Thus, its differential

\begin{equation*} \theta \mapsto k ^ { \prime } \mu ( \theta ) , \Theta ( \mu ) \rightarrow E, \end{equation*}

is injective. Denote by $M _ { \mu } \subset E$ its image, and by $\psi _ { \mu }$ the inverse mapping of $k ^ { \prime \mu}$ from $M _ { \mu }$ onto $\Theta ( \mu )$. The natural exponential family of probability distributions (abbreviated, NEF) generated by $\mu$ is the set $F = F ( \mu )$ of probabilities

\begin{equation*} \mathsf{P} ( \theta , \mu ) = \operatorname { exp } [ \langle \theta , x \rangle - k _ { \mu } ( \theta ) ] \mu ( d x ), \end{equation*}

when $\theta$ varies in $\Theta ( \mu )$. Note that $\mu ^ { \prime } \in \mathcal{M} ( E )$ is such that the two sets $F ( \mu )$ and $F ( \mu ^ { \prime } )$ coincide if and only if there exist an $\alpha \in E ^ { * }$ and a $b \in \mathbf{R}$ such that $\mu ^ { \prime } ( d x ) = \operatorname { exp } \langle \alpha , x \rangle \mu ( d x )$. The mean of $\mathsf{P} ( \theta , \mu )$ is given by

\begin{equation*} m = k ^ { \prime \mu } ( \theta ) = \int _ { E } x \mathsf{P} ( \theta , \mu ) ( d x ), \end{equation*}

and for this reason $M _ { \mu } = M _ { F }$ is called the domain of the means of $F$. It is easily seen that it depends on $F$ and not on a particular $\mu$ generating $F$. Also,

\begin{equation*} m \mapsto P ( \psi _ { \mu } ( m ) , \mu ) = P ( m , F ) , M _ { F } \rightarrow F, \end{equation*}

is the parametrization of the natural exponential family by the mean. The domain of the means is contained in the interior $C _ { F }$ of the convex hull of the support of $F$. When $C _ { F } = M _ { F }$, the natural exponential family is said to be steep. A sufficient condition for steepness is that $D ( \mu ) = \Theta ( \mu )$. The natural exponential family generated by a stable distribution in $\mathcal{M} ( \mathbf{R} )$ with parameter $\alpha \in [ 1,2 )$ provides an example of a non-steep natural exponential family. A more elementary example is given by $\mu = \sum _ { x = 1 } ^ { \infty } n ^ { - 3 } \delta _ { n }$.

For one observation $X$, the maximum-likelihood estimator (cf. also Maximum-likelihood method) of $m$ is simply $\widehat { m } = X$: it has to be in $M _ { F }$ to be defined, and in this case the maximum-likelihood estimator of the canonical parameter $\theta$ is $\hat { \theta } = \psi _ { \mu } ( X )$. In the case of $n$ observations, $X$ has to be replaced by $\bar{X} _ { n } = 1 / n ( X _ { 1 } + \ldots + X _ { n } )$. Note that since $M _ { F }$ is an open set, and from the strong law of large numbers, almost surely there exists an $N$ such that $\overline{X} _ { n } \in M _ { F }$ for $n \geq N$ and finally $\hat { \theta } _ { n } = \psi _ { \mu } ( \overline{X} _ { n } )$ will be well-defined after enough observations.

Exponential families have also a striking property in information theory. That is, they minimize the entropy in the following sense: Let $F = F ( \mu )$ be a natural exponential family on $E$ and fix $m \in M _ { F }$. Let $C$ be the convex set of probabilities $\textsf{P}$ on $E$ which are absolutely continuous with respect to $\mu$ and such that $\int _ { E }x d \mathsf{P}( x ) = m$. Then the minimum of $\int _ { E } \operatorname { log } ( d \mathsf{P} / d \mu ) d \mathsf{P}$ on $C$ is reached on the unique point $\mathsf{P} ( m , F )$. Extension to general exponential families is trivial. See, e.g., [a5], 3(A).

Denote by $V _ { F } ( m )$ the covariance operator of $\mathsf{P} ( m , F )$. The space of symmetric linear operators from $E ^ { * }$ to $E$ is denoted by $L _ { s } ( E ^ { * } , E )$, and the mapping from $M _ { F }$ to $L _ { s } ( E ^ { * } , E )$ defined by $m \mapsto V _ { F } ( m )$ is called the variance function of the natural exponential family $F$.

Because it satisfies the relation $k _ { \mu } ^ { \prime \prime } ( \theta ) = V _ { F } ( k _ { \mu } ^ { \prime } ( \theta ) )$, the variance function $V _ { F }$ determines the natural exponential family $F$. For each $m$, $V _ { F } ( m )$ is a positive-definite operator. The variance function also satisfies the following condition: For all $\alpha$ and $\beta$ in $E ^ { * }$ one has

\begin{equation*} V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \alpha ) ) ( \beta ) = V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \beta ) ) ( \alpha ). \end{equation*}

For dimension one, the variance function provides an explicit formula for the large deviations theorem: If $m _ { 0 } < m$ are in $M _ { F }$, and if $X _ { 1 } , \dots , X _ { n } , \dots$ are independent real random variables with the same distribution $\mathsf{P} ( m _ { 0 } , F )$, then

\begin{equation*} \operatorname { lim } _ { n \rightarrow \infty } \frac { 1 } { n } \operatorname { log } \mathsf {P} [ X _ { 1 } + \ldots + X _ { n } \geq n m ] = \int _ { m _ { 0 } } ^ { m } \frac { x - m } { V _ { F } ( x ) } d x. \end{equation*}

The second member can be easily computed for natural exponential families on $\mathbf{R}$ whose variance functions are simple. It happens that a kind of vague principle like "the simpler VF is, more useful is F" holds. C. Morris [a9] has observed that $V _ { F }$ is the restriction to $M _ { F }$ of a polynomial of degree $\leq 2$ if and only if $F$ is either normal, Poisson, binomial, negative binomial, gamma, or hyperbolic (i.e., with a Fourier transform $( \operatorname { cos } t ) ^ { - 1 }$), at least up to an affinity and a convolution power. Similarly, in [a8], the classification in $6$ types of the variance functions which are third-degree polynomials is performed: the corresponding distributions are also classical, but occur in the literature as distributions of stopping times of Lévy processes or random walks in $\bf Z$ (cf. also Random walk; Stopping time). Other classes, like $V _ { F } ( m ) = A m ^ { a }$ or $V _ { F } = P R + Q \sqrt { R }$, where $P$, $Q$, $R$ are polynomials of low degree, have also been classified (see [a1] and [a7]).

In higher dimensions the same principle holds. For instance, M. Casalis [a3] has shown that $V _ { F }$ is homogeneous of degree $2$ if and only if $F$ is a family of Wishart distributions on a Euclidean Jordan algebra. She [a4] has also found the $2 d + 4$ types of natural exponential families on $\mathbf{R} ^ { d }$ whose variance function is $am \otimes m + m _ { 1 } B _ { 1 } + \ldots + m _ { d } B _ { d } + C$, where $B _ { j }$ and $C$ are real $( d , d )$-matrices and $a \in \bf R$, thus providing a generalization of the above-mentioned result by Morris. Another extension is obtained in [a2], where all non-trivial natural exponential families in $\mathbf{R} ^ { d }$ whose marginal distributions are still natural exponential families are found; surprisingly, these marginal distributions are necessarily of Morris type.

Finally, the cubic class is generalized in a deep way to $\mathbf{R} ^ { d }$ in [a6].

References

[a1] S. Bar-Lev, P. Enis, "Reproducibility and natural exponential families with power variance functions" Ann. Statist. , 14 (1987) pp. 1507–1522
[a2] S. Bar-Lev, D. Bshouty, P. Enis, G. Letac, I-Li Lu, D. Richards, "The diagonal multivariate natural exponential families and their classification" J. Theor. Probab. , 7 (1994) pp. 883–929
[a3] M. Casalis, "Les familles exponentielles à variance quadratique homogæne sont des lois de Wishart sur un c spone symétrisque" C.R. Acad. Sci. Paris Ser. I , 312 (1991) pp. 537–540
[a4] M. Casalis, "The $2 d + 4$ simple quadratic natural exponential families on $\mathbf{R} ^ { d }$" Ann. Statist. , 24 (1996) pp. 1828–1854
[a5] I. Csiszár, "I-Divergence, geometry of probability distributions, and minimization problems" Ann. of Probab. , 3 (1975) pp. 146–158
[a6] A. Hassaïri, "La classification des familles exponentielles naturelles sur ${\bf R} ^ { n }$ par l'action du groupe linéaire de $\mathbf R ^ { n + 1 }$" C.R. Acad. Sci. Paris Ser. I , 315 (1992) pp. 207–210
[a7] C. Kokonendji, "Sur les familles exponentielles naturelles de grand-Babel" Ann. Fac. Sci. Toulouse , 4 (1995) pp. 763–800
[a8] G. Letac, M. Mora, "Natural exponential families with cubic variance functions" Ann. Statist. , 18 (1990) pp. 1–37
[a9] C.N. Morris, "Natural exponential families with quadratic variance functions" Ann. Statist. , 10 (1982) pp. 65–80
How to Cite This Entry:
Natural exponential family of probability distributions. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Natural_exponential_family_of_probability_distributions&oldid=16396
This article was adapted from an original article by Gérard Letac (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article