# Natural exponential family of probability distributions

Given a finite-dimensional real linear space $E$, denote by $E ^ { * }$ the space of linear forms $\theta$ from $E$ to $\mathbf{R}$. Let $\mathcal{M} ( E )$ be the set of positive Radon measures $\mu$ on $E$ with the following two properties (cf. also Radon measure):

i) $\mu$ is not concentrated on some affine hyperplane of $E$;

ii) considering the interior $\Theta ( \mu )$ of the convex set of those $\theta \in E ^ { * }$ such that

\begin{equation*} L _ { \mu } ( \theta ) = \int _ { E } \operatorname { exp } \langle \theta , x \rangle \mu ( d x ) \end{equation*}

is finite, then $\Theta ( \mu )$ is not empty. For notation, see also Exponential family of probability distributions.

For $\mu \in \mathcal{M} ( E )$, the cumulant function $k _ { \mu } = \operatorname { log } L _ { \mu }$ is a real-analytic strictly convex function defined on $\Theta ( \mu )$. Thus, its differential

\begin{equation*} \theta \mapsto k ^ { \prime } \mu ( \theta ) , \Theta ( \mu ) \rightarrow E, \end{equation*}

is injective. Denote by $M _ { \mu } \subset E$ its image, and by $\psi _ { \mu }$ the inverse mapping of $k ^ { \prime \mu}$ from $M _ { \mu }$ onto $\Theta ( \mu )$. The natural exponential family of probability distributions (abbreviated, NEF) generated by $\mu$ is the set $F = F ( \mu )$ of probabilities

\begin{equation*} \mathsf{P} ( \theta , \mu ) = \operatorname { exp } [ \langle \theta , x \rangle - k _ { \mu } ( \theta ) ] \mu ( d x ), \end{equation*}

when $\theta$ varies in $\Theta ( \mu )$. Note that $\mu ^ { \prime } \in \mathcal{M} ( E )$ is such that the two sets $F ( \mu )$ and $F ( \mu ^ { \prime } )$ coincide if and only if there exist an $\alpha \in E ^ { * }$ and a $b \in \mathbf{R}$ such that $\mu ^ { \prime } ( d x ) = \operatorname { exp } \langle \alpha , x \rangle \mu ( d x )$. The mean of $\mathsf{P} ( \theta , \mu )$ is given by

\begin{equation*} m = k ^ { \prime \mu } ( \theta ) = \int _ { E } x \mathsf{P} ( \theta , \mu ) ( d x ), \end{equation*}

and for this reason $M _ { \mu } = M _ { F }$ is called the domain of the means of $F$. It is easily seen that it depends on $F$ and not on a particular $\mu$ generating $F$. Also,

\begin{equation*} m \mapsto P ( \psi _ { \mu } ( m ) , \mu ) = P ( m , F ) , M _ { F } \rightarrow F, \end{equation*}

is the parametrization of the natural exponential family by the mean. The domain of the means is contained in the interior $C _ { F }$ of the convex hull of the support of $F$. When $C _ { F } = M _ { F }$, the natural exponential family is said to be steep. A sufficient condition for steepness is that $D ( \mu ) = \Theta ( \mu )$. The natural exponential family generated by a stable distribution in $\mathcal{M} ( \mathbf{R} )$ with parameter $\alpha \in [ 1,2 )$ provides an example of a non-steep natural exponential family. A more elementary example is given by $\mu = \sum _ { x = 1 } ^ { \infty } n ^ { - 3 } \delta _ { n }$.

For one observation $X$, the maximum-likelihood estimator (cf. also Maximum-likelihood method) of $m$ is simply $\widehat { m } = X$: it has to be in $M _ { F }$ to be defined, and in this case the maximum-likelihood estimator of the canonical parameter $\theta$ is $\hat { \theta } = \psi _ { \mu } ( X )$. In the case of $n$ observations, $X$ has to be replaced by $\bar{X} _ { n } = 1 / n ( X _ { 1 } + \ldots + X _ { n } )$. Note that since $M _ { F }$ is an open set, and from the strong law of large numbers, almost surely there exists an $N$ such that $\overline{X} _ { n } \in M _ { F }$ for $n \geq N$ and finally $\hat { \theta } _ { n } = \psi _ { \mu } ( \overline{X} _ { n } )$ will be well-defined after enough observations.

Exponential families have also a striking property in information theory. That is, they minimize the entropy in the following sense: Let $F = F ( \mu )$ be a natural exponential family on $E$ and fix $m \in M _ { F }$. Let $C$ be the convex set of probabilities $\textsf{P}$ on $E$ which are absolutely continuous with respect to $\mu$ and such that $\int _ { E }x d \mathsf{P}( x ) = m$. Then the minimum of $\int _ { E } \operatorname { log } ( d \mathsf{P} / d \mu ) d \mathsf{P}$ on $C$ is reached on the unique point $\mathsf{P} ( m , F )$. Extension to general exponential families is trivial. See, e.g., [a5], 3(A).

Denote by $V _ { F } ( m )$ the covariance operator of $\mathsf{P} ( m , F )$. The space of symmetric linear operators from $E ^ { * }$ to $E$ is denoted by $L _ { s } ( E ^ { * } , E )$, and the mapping from $M _ { F }$ to $L _ { s } ( E ^ { * } , E )$ defined by $m \mapsto V _ { F } ( m )$ is called the variance function of the natural exponential family $F$.

Because it satisfies the relation $k _ { \mu } ^ { \prime \prime } ( \theta ) = V _ { F } ( k _ { \mu } ^ { \prime } ( \theta ) )$, the variance function $V _ { F }$ determines the natural exponential family $F$. For each $m$, $V _ { F } ( m )$ is a positive-definite operator. The variance function also satisfies the following condition: For all $\alpha$ and $\beta$ in $E ^ { * }$ one has

\begin{equation*} V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \alpha ) ) ( \beta ) = V _ { F } ^ { \prime } ( m ) ( V ( m ) ( \beta ) ) ( \alpha ). \end{equation*}

For dimension one, the variance function provides an explicit formula for the large deviations theorem: If $m _ { 0 } < m$ are in $M _ { F }$, and if $X _ { 1 } , \dots , X _ { n } , \dots$ are independent real random variables with the same distribution $\mathsf{P} ( m _ { 0 } , F )$, then

\begin{equation*} \operatorname { lim } _ { n \rightarrow \infty } \frac { 1 } { n } \operatorname { log } \mathsf {P} [ X _ { 1 } + \ldots + X _ { n } \geq n m ] = \int _ { m _ { 0 } } ^ { m } \frac { x - m } { V _ { F } ( x ) } d x. \end{equation*}

The second member can be easily computed for natural exponential families on $\mathbf{R}$ whose variance functions are simple. It happens that a kind of vague principle like "the simpler VF is, more useful is F" holds. C. Morris [a9] has observed that $V _ { F }$ is the restriction to $M _ { F }$ of a polynomial of degree $\leq 2$ if and only if $F$ is either normal, Poisson, binomial, negative binomial, gamma, or hyperbolic (i.e., with a Fourier transform $( \operatorname { cos } t ) ^ { - 1 }$), at least up to an affinity and a convolution power. Similarly, in [a8], the classification in $6$ types of the variance functions which are third-degree polynomials is performed: the corresponding distributions are also classical, but occur in the literature as distributions of stopping times of Lévy processes or random walks in $\bf Z$ (cf. also Random walk; Stopping time). Other classes, like $V _ { F } ( m ) = A m ^ { a }$ or $V _ { F } = P R + Q \sqrt { R }$, where $P$, $Q$, $R$ are polynomials of low degree, have also been classified (see [a1] and [a7]).

In higher dimensions the same principle holds. For instance, M. Casalis [a3] has shown that $V _ { F }$ is homogeneous of degree $2$ if and only if $F$ is a family of Wishart distributions on a Euclidean Jordan algebra. She [a4] has also found the $2 d + 4$ types of natural exponential families on $\mathbf{R} ^ { d }$ whose variance function is $am \otimes m + m _ { 1 } B _ { 1 } + \ldots + m _ { d } B _ { d } + C$, where $B _ { j }$ and $C$ are real $( d , d )$-matrices and $a \in \bf R$, thus providing a generalization of the above-mentioned result by Morris. Another extension is obtained in [a2], where all non-trivial natural exponential families in $\mathbf{R} ^ { d }$ whose marginal distributions are still natural exponential families are found; surprisingly, these marginal distributions are necessarily of Morris type.

Finally, the cubic class is generalized in a deep way to $\mathbf{R} ^ { d }$ in [a6].

#### References

[a1] | S. Bar-Lev, P. Enis, "Reproducibility and natural exponential families with power variance functions" Ann. Statist. , 14 (1987) pp. 1507–1522 |

[a2] | S. Bar-Lev, D. Bshouty, P. Enis, G. Letac, I-Li Lu, D. Richards, "The diagonal multivariate natural exponential families and their classification" J. Theor. Probab. , 7 (1994) pp. 883–929 |

[a3] | M. Casalis, "Les familles exponentielles à variance quadratique homogæne sont des lois de Wishart sur un c spone symétrisque" C.R. Acad. Sci. Paris Ser. I , 312 (1991) pp. 537–540 |

[a4] | M. Casalis, "The $2 d + 4$ simple quadratic natural exponential families on $\mathbf{R} ^ { d }$" Ann. Statist. , 24 (1996) pp. 1828–1854 |

[a5] | I. Csiszár, "I-Divergence, geometry of probability distributions, and minimization problems" Ann. of Probab. , 3 (1975) pp. 146–158 |

[a6] | A. Hassaïri, "La classification des familles exponentielles naturelles sur ${\bf R} ^ { n }$ par l'action du groupe linéaire de $\mathbf R ^ { n + 1 }$" C.R. Acad. Sci. Paris Ser. I , 315 (1992) pp. 207–210 |

[a7] | C. Kokonendji, "Sur les familles exponentielles naturelles de grand-Babel" Ann. Fac. Sci. Toulouse , 4 (1995) pp. 763–800 |

[a8] | G. Letac, M. Mora, "Natural exponential families with cubic variance functions" Ann. Statist. , 18 (1990) pp. 1–37 |

[a9] | C.N. Morris, "Natural exponential families with quadratic variance functions" Ann. Statist. , 10 (1982) pp. 65–80 |

**How to Cite This Entry:**

Natural exponential family of probability distributions.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Natural_exponential_family_of_probability_distributions&oldid=49929