# Kullback-Leibler-type distance measures

In mathematical statistics one usually considers, among others, estimation, testing of hypothesis, discrimination, etc. When considering the statistical problem of discrimination, S. Kullback and R.A. Leibler [a13] introduced a measure of the "distance" or "divergence" between statistical populations, known variously as information for discrimination, $I$- divergence, the error, or the directed divergence. While the Shannon entropy is fundamental in information theory, several generalizations of Shannon's entropy have also been proposed. In statistical estimation problems, measures between probability distributions play a significant role. The Chernoff coefficient, Hellinger–Bhattacharyya coefficient, Jeffreys distance, the directed divergence and its symmetrization, $J$- divergence, $f$- divergence, etc. are examples of such measures. These measures have many applications in statistics, pattern recognition, numerical taxonomy, etc.

Let

$$\Gamma _ {n} = \left \{ P = ( p _ {1} \dots p _ {n} \mid p _ {i} > 0 ) \textrm{ and } \sum _ {i = 1 } ^ { n } p _ {i} = 1 \right \}$$

be the set of all complete discrete probability distributions of length $n \geq 2$( cf. Density of a probability distribution). Let $I = ( 0,1 )$ and let $\mathbf R$ be the set of real numbers. For $P, Q$ in $\Gamma _ {n}$, Kullback and Leibler [a13] defined the directed divergence as

$$\tag{a1 } D _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ { n } p _ {i} { \mathop{\rm log} } { \frac{p _ {i} }{q _ {i} } } = \sum p _ {i} ( { \mathop{\rm log} } p _ {i} - { \mathop{\rm log} } q _ {i} ) .$$

Usually, measures are characterized by using the many algebraic properties possessed by them, for example, see [a8] for (a1). A sequence of measures ${\mu _ {n} } : {\Gamma _ {n} \times \Gamma _ {n} } \rightarrow \mathbf R$ is said to have the sum property if there exists a function $f : {I ^ {2} } \rightarrow \mathbf R$ such that $\mu _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ {n} f ( p _ {i} , q _ {i} )$ for $P, Q \in \Gamma _ {n}$. In this case $f$ is said to be a generating function of $\{ \mu _ {n} \}$. A stronger version of the sum property is $f$- divergence [a6]. The measure $\mu _ {n}$ is an $f$- divergence if and only if it has a representation

$$\mu _ {n} ( P \| Q ) = \sum p _ {i} f \left ( { \frac{p _ {i} }{q _ {i} } } \right )$$

for some $f : {( 0, \infty ) } \rightarrow \mathbf R$. The measures $\mu _ {n}$ are said to be $( m,n )$- additive if $\mu _ {mn } ( P \star R \| Q \star S ) = \mu _ {m} ( R \| S ) + \mu _ {n} ( P \| Q )$ where $P \star R = ( p _ {1} r _ {1} \dots p _ {1} r _ {m} , p _ {2} r _ {1} \dots p _ {2} r _ {m} \dots p _ {n} r _ {m} )$.

Measures $\mu _ {n}$ having the sum property with a Lebesgue-measurable generating function $f$ are $( 2, 2 )$- additive if and only if they are given by

$$\mu _ {n} ( P \| Q ) =$$

$$= 4 aH _ {n} ^ {3} ( P ) + 4a ^ \prime H _ {n} ^ {3} ( Q ) - 9a H _ {n} ^ {2} ( P ) - 9a ^ \prime H _ {n} ^ {2} ( Q ) +$$

$$+ b H _ {n} ( P ) + b ^ \prime H _ {n} ( Q ) +$$

$$+ c I _ {n} ( P \| Q ) + c ^ \prime I _ {n} ( Q \| P ) + dn,$$

where $a$, $a ^ \prime$, $b$, $b ^ \prime$, $c$, $c ^ \prime$, $d$ are constants, $H _ {n} ( P ) = - \sum p _ {i} { \mathop{\rm log} } p _ {i}$( Shannon entropy), $H _ {n} ^ \beta ( P ) = ( 2 ^ {1 - \beta } - 1 ) ^ {- 1 } ( \sum p _ {i} ^ \beta - 1 )$( entropy of degree $\beta \neq 1$) and $I _ {n} ( P \| Q ) = - \sum p _ {i} { \mathop{\rm log} } q _ {i}$( inaccuracy). However, (a1) is neither symmetric nor satisfies the triangle inequality and thus its use as a metric is limited. In [a7], the symmetric divergence or $J$- divergence $J _ {n} ( P \| Q ) = D _ {n} ( P \| Q ) + D _ {n} ( Q \| P )$ was introduced to restore symmetry.

A sequence of measures $\{ \mu _ {m} \}$ is said to be symmetrically additive if

$$\mu _ {nm } ( P \star R \| Q \star S ) + \mu _ {nm } ( P \star S \| Q \star R ) =$$

$$= 2 \mu _ {n} ( P \| Q ) + 2 \mu _ {m} ( R \| S )$$

for all $P, Q \in \Gamma _ {n}$, $R, S \in \Gamma _ {m}$.

Sum-form measures $\{ \mu _ {n} \}$ with a measurable symmetric generating function $f : {I ^ {2} } \rightarrow \mathbf R$ are symmetrically additive for all pairs of integers $m, n \geq 2$ and have the form [a5]

$$\mu _ {n} ( P \| Q ) =$$

$$= \sum _ {i = 1 } ^ { n } [ p _ {i} ( a { \mathop{\rm log} } p _ {i} + b { \mathop{\rm log} } q _ {i} ) + q _ {i} ( a { \mathop{\rm log} } q _ {i} + b { \mathop{\rm log} } p _ {i} ) ] .$$

It is well known that $H _ {n} ( P ) \leq I _ {n} ( P \| Q )$, that is,

$$- \sum p _ {i} { \mathop{\rm log} } p _ {i} \leq - \sum p _ {i} { \mathop{\rm log} } q _ {i} ,$$

which is known as the Shannon inequality. This inequality gives rise to the error $D _ {n} ( P \| Q ) \geq 0$ in (a1). A function ${\mu _ {n} } : {\Gamma _ {n} ^ {2} } \rightarrow \mathbf R$ is called a separability measure if and only if $\mu _ {n} ( P \| Q ) \geq 0$ and $\mu _ {n} ( P \| Q )$ attains a minimum if $P = Q$ for all $P,Q \in \Gamma _ {n}$ with $n \geq 2$. A separability measure $\mu _ {n}$ is a distance measure of Kullback–Leibler type if there exists an $f : I \rightarrow \mathbf R$ such that $\mu _ {n} ( P \| Q ) = \sum p _ {i} ( f ( p _ {i} ) - f ( q _ {i} ) )$. Any Kullback–Leibler-type distance measure with generating function $f$ satisfies the inequality $\sum p _ {k} f ( q _ {k} ) \leq \sum p _ {k} f ( p _ {k} )$( see [a10], [a2]).

How to Cite This Entry:
Kullback-Leibler-type distance measures. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kullback-Leibler-type_distance_measures&oldid=47532
This article was adapted from an original article by Pl. Kannappan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article