Namespaces
Variants
Actions

Difference between revisions of "Kullback-Leibler-type distance measures"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
In [[Mathematical statistics|mathematical statistics]] one usually considers, among others, estimation, testing of hypothesis, discrimination, etc. When considering the statistical problem of discrimination, S. Kullback and R.A. Leibler [[#References|[a13]]] introduced a measure of the  "distance"  or  "divergence"  between statistical populations, known variously as information for discrimination, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101902.png" />-divergence, the error, or the directed divergence. While the Shannon [[Entropy|entropy]] is fundamental in [[Information theory|information theory]], several generalizations of Shannon's entropy have also been proposed. In statistical estimation problems, measures between probability distributions play a significant role. The Chernoff coefficient, Hellinger–Bhattacharyya coefficient, Jeffreys distance, the directed divergence and its symmetrization, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101904.png" />-divergence, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101906.png" />-divergence, etc. are examples of such measures. These measures have many applications in statistics, pattern recognition, numerical taxonomy, etc.
+
<!--
 +
k1101902.png
 +
$#A+1 = 69 n = 0
 +
$#C+1 = 69 : ~/encyclopedia/old_files/data/K110/K.1100190 Kullback\ANDLeibler\AAhtype distance measures
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
 +
 
 +
{{TEX|auto}}
 +
{{TEX|done}}
 +
 
 +
In [[Mathematical statistics|mathematical statistics]] one usually considers, among others, estimation, testing of hypothesis, discrimination, etc. When considering the statistical problem of discrimination, S. Kullback and R.A. Leibler [[#References|[a13]]] introduced a measure of the  "distance"  or  "divergence"  between statistical populations, known variously as information for discrimination, $  I $-
 +
divergence, the error, or the directed divergence. While the Shannon [[Entropy|entropy]] is fundamental in [[Information theory|information theory]], several generalizations of Shannon's entropy have also been proposed. In statistical estimation problems, measures between probability distributions play a significant role. The Chernoff coefficient, Hellinger–Bhattacharyya coefficient, [[Jeffreys distance]], the directed divergence and its symmetrization, $  J $-
 +
divergence, $  f $-
 +
divergence, etc. are examples of such measures. These measures have many applications in statistics, pattern recognition, numerical taxonomy, etc.
  
 
Let
 
Let
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101907.png" /></td> </tr></table>
+
$$
 +
\Gamma _ {n} = \left \{ P = ( p _ {1} \dots p _ {n} \mid  p _ {i} > 0 )  \textrm{ and  }  \sum _ {i = 1 } ^ { n }  p _ {i} = 1 \right \}
 +
$$
  
be the set of all complete discrete probability distributions of length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101908.png" /> (cf. [[Density of a probability distribution|Density of a probability distribution]]). Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k1101909.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019010.png" /> be the set of real numbers. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019011.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019012.png" />, Kullback and Leibler [[#References|[a13]]] defined the directed divergence as
+
be the set of all complete discrete probability distributions of length $  n \geq  2 $(
 +
cf. [[Density of a probability distribution|Density of a probability distribution]]). Let $  I = ( 0,1 ) $
 +
and let $  \mathbf R $
 +
be the set of real numbers. For $  P, Q $
 +
in $  \Gamma _ {n} $,  
 +
Kullback and Leibler [[#References|[a13]]] defined the directed divergence as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019013.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
$$ \tag{a1 }
 +
D _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ { n }  p _ {i} { \mathop{\rm log} } {
 +
\frac{p _ {i} }{q _ {i} }
 +
} = \sum p _ {i} ( { \mathop{\rm log} } p _ {i} - { \mathop{\rm log} } q _ {i} ) .
 +
$$
  
Usually, measures are characterized by using the many algebraic properties possessed by them, for example, see [[#References|[a8]]] for (a1). A sequence of measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019014.png" /> is said to have the sum property if there exists a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019015.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019016.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019017.png" />. In this case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019018.png" /> is said to be a generating function of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019019.png" />. A stronger version of the sum property is <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019020.png" />-divergence [[#References|[a6]]]. The measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019021.png" /> is an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019023.png" />-divergence if and only if it has a representation
+
Usually, measures are characterized by using the many algebraic properties possessed by them, for example, see [[#References|[a8]]] for (a1). A sequence of measures $  {\mu _ {n} } : {\Gamma _ {n} \times \Gamma _ {n} } \rightarrow \mathbf R $
 +
is said to have the sum property if there exists a function $  f : {I  ^ {2} } \rightarrow \mathbf R $
 +
such that $  \mu _ {n} ( P \| Q ) = \sum _ {i = 1 }  ^ {n} f ( p _ {i} , q _ {i} ) $
 +
for $  P, Q \in \Gamma _ {n} $.  
 +
In this case $  f $
 +
is said to be a generating function of $  \{ \mu _ {n} \} $.  
 +
A stronger version of the sum property is $  f $-
 +
divergence [[#References|[a6]]]. The measure $  \mu _ {n} $
 +
is an $  f $-
 +
divergence if and only if it has a representation
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019024.png" /></td> </tr></table>
+
$$
 +
\mu _ {n} ( P \| Q ) = \sum p _ {i} f \left ( {
 +
\frac{p _ {i} }{q _ {i} }
 +
} \right )
 +
$$
  
for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019025.png" />. The measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019026.png" /> are said to be <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019028.png" />-additive if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019029.png" /> where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019030.png" />.
+
for some $  f : {( 0, \infty ) } \rightarrow \mathbf R $.  
 +
The measures $  \mu _ {n} $
 +
are said to be $  ( m,n ) $-
 +
additive if $  \mu _ {mn }  ( P \star R \| Q \star S ) = \mu _ {m} ( R \| S ) + \mu _ {n} ( P \| Q ) $
 +
where $  P \star R = ( p _ {1} r _ {1} \dots p _ {1} r _ {m} , p _ {2} r _ {1} \dots p _ {2} r _ {m} \dots p _ {n} r _ {m} ) $.
  
Measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019031.png" /> having the sum property with a Lebesgue-measurable generating function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019032.png" /> are <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019033.png" />-additive if and only if they are given by
+
Measures $  \mu _ {n} $
 +
having the sum property with a Lebesgue-measurable generating function $  f $
 +
are $  ( 2, 2 ) $-
 +
additive if and only if they are given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019034.png" /></td> </tr></table>
+
$$
 +
\mu _ {n} ( P \| Q ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019035.png" /></td> </tr></table>
+
$$
 +
=  
 +
4 aH _ {n}  ^ {3} ( P ) + 4a  ^  \prime  H _ {n}  ^ {3} ( Q ) - 9a H _ {n}  ^ {2} ( P ) - 9a  ^  \prime  H _ {n}  ^ {2} ( Q )  +
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019036.png" /></td> </tr></table>
+
$$
 +
+
 +
b H _ {n} ( P ) + b  ^  \prime  H _ {n} ( Q ) +
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019037.png" /></td> </tr></table>
+
$$
 +
+
 +
c I _ {n} ( P \| Q ) + c  ^  \prime  I _ {n} ( Q \| P ) + dn,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019038.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019039.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019040.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019041.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019042.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019043.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019044.png" /> are constants, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019045.png" /> (Shannon entropy), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019046.png" /> (entropy of degree <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019048.png" />) and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019049.png" /> (inaccuracy). However, (a1) is neither symmetric nor satisfies the triangle inequality and thus its use as a metric is limited. In [[#References|[a7]]], the symmetric divergence or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019051.png" />-divergence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019052.png" /> was introduced to restore symmetry.
+
where $  a $,  
 +
$  a  ^  \prime  $,  
 +
$  b $,  
 +
$  b  ^  \prime  $,  
 +
$  c $,  
 +
$  c  ^  \prime  $,  
 +
$  d $
 +
are constants, $  H _ {n} ( P ) = - \sum p _ {i} { \mathop{\rm log} } p _ {i} $(
 +
Shannon entropy), $  H _ {n}  ^  \beta  ( P ) = ( 2 ^ {1 - \beta } - 1 ) ^ {- 1 } ( \sum p _ {i}  ^  \beta  - 1 ) $(
 +
entropy of degree $  \beta \neq 1 $)  
 +
and $  I _ {n} ( P \| Q ) = - \sum p _ {i} { \mathop{\rm log} } q _ {i} $(
 +
inaccuracy). However, (a1) is neither symmetric nor satisfies the triangle inequality and thus its use as a metric is limited. In [[#References|[a7]]], the symmetric divergence or $  J $-
 +
divergence $  J _ {n} ( P \| Q ) = D _ {n} ( P \| Q ) + D _ {n} ( Q \| P ) $
 +
was introduced to restore symmetry.
  
A sequence of measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019053.png" /> is said to be symmetrically additive if
+
A sequence of measures $  \{ \mu _ {m} \} $
 +
is said to be symmetrically additive if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019054.png" /></td> </tr></table>
+
$$
 +
\mu _ {nm }  ( P \star R \| Q \star S ) + \mu _ {nm }  ( P \star S \| Q \star R ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019055.png" /></td> </tr></table>
+
$$
 +
=  
 +
2 \mu _ {n} ( P \| Q ) + 2 \mu _ {m} ( R \| S )
 +
$$
  
for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019056.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019057.png" />.
+
for all $  P, Q \in \Gamma _ {n} $,
 +
$  R, S \in \Gamma _ {m} $.
  
Sum-form measures <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019058.png" /> with a measurable symmetric generating function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019059.png" /> are symmetrically additive for all pairs of integers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019060.png" /> and have the form [[#References|[a5]]]
+
Sum-form measures $  \{ \mu _ {n} \} $
 +
with a measurable symmetric generating function $  f : {I  ^ {2} } \rightarrow \mathbf R $
 +
are symmetrically additive for all pairs of integers $  m, n \geq  2 $
 +
and have the form [[#References|[a5]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019061.png" /></td> </tr></table>
+
$$
 +
\mu _ {n} ( P \| Q ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019062.png" /></td> </tr></table>
+
$$
 +
=  
 +
\sum _ {i = 1 } ^ { n }  [ p _ {i} ( a { \mathop{\rm log} } p _ {i} + b { \mathop{\rm log} } q _ {i} ) + q _ {i} ( a { \mathop{\rm log} } q _ {i} + b { \mathop{\rm log} } p _ {i} ) ] .
 +
$$
  
It is well known that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019063.png" />, that is,
+
It is well known that $  H _ {n} ( P ) \leq  I _ {n} ( P \| Q ) $,  
 +
that is,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019064.png" /></td> </tr></table>
+
$$
 +
- \sum p _ {i} { \mathop{\rm log} } p _ {i} \leq  - \sum p _ {i} { \mathop{\rm log} } q _ {i} ,
 +
$$
  
which is known as the Shannon inequality. This inequality gives rise to the error <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019065.png" /> in (a1). A function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019066.png" /> is called a separability measure if and only if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019067.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019068.png" /> attains a minimum if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019069.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019070.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019071.png" />. A separability measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019072.png" /> is a distance measure of Kullback–Leibler type if there exists an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019073.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019074.png" />. Any Kullback–Leibler-type distance measure with generating function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019075.png" /> satisfies the inequality <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/k/k110/k110190/k11019076.png" /> (see [[#References|[a10]]], [[#References|[a2]]]).
+
which is known as the Shannon inequality. This inequality gives rise to the error $  D _ {n} ( P \| Q ) \geq  0 $
 +
in (a1). A function $  {\mu _ {n} } : {\Gamma _ {n}  ^ {2} } \rightarrow \mathbf R $
 +
is called a separability measure if and only if $  \mu _ {n} ( P \| Q ) \geq  0 $
 +
and $  \mu _ {n} ( P \| Q ) $
 +
attains a minimum if $  P = Q $
 +
for all $  P,Q \in \Gamma _ {n} $
 +
with $  n \geq  2 $.  
 +
A separability measure $  \mu _ {n} $
 +
is a distance measure of Kullback–Leibler type if there exists an $  f : I \rightarrow \mathbf R $
 +
such that $  \mu _ {n} ( P \| Q ) = \sum p _ {i} ( f ( p _ {i} ) - f ( q _ {i} ) ) $.  
 +
Any Kullback–Leibler-type distance measure with generating function $  f $
 +
satisfies the inequality $  \sum p _ {k} f ( q _ {k} ) \leq  \sum p _ {k} f ( p _ {k} ) $(
 +
see [[#References|[a10]]], [[#References|[a2]]]).
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  J. Aczél,  Z. Daróczy,  "On measures of information and their characterizations" , Acad. Press  (1975)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  J. Aczél,  A.M. Ostrowski,  "On the characterization of Shannon's entropy by Shannon's inequality"  ''J. Austral. Math. Soc.'' , '''16'''  (1973)  pp. 368–374</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  A. Bhattacharyya,  "On a measure of divergence between two statistical populations defined by their probability distributions"  ''Bull. Calcutta Math. Soc.'' , '''35'''  (1943)  pp. 99–109</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  A. Bhattacharyya,  "On a measure of divergence between two multinomial populations"  ''Sankhya'' , '''7'''  (1946)  pp. 401–406</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  J.K. Chung,  Pl. Kannappan,  C.T. Ng,  P.K. Shahoo,  "Measures of distance between probability distributions"  ''J. Math. Anal. Appl.'' , '''139'''  (1989)  pp. 280–292</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  I. Csiszár,  "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten"  ''Magyar Tud. Kutato Int. Közl.'' , '''8'''  (1963)  pp. 85–108</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  H. Jeffreys,  "An invariant form for the prior probability in estimation pro"  ''Proc. Roy. Soc. London A'' , '''186'''  (1946)  pp. 453–461</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  Pl. Kannappan,  P.N. Rathie,  "On various characterizations of directed divergence" , ''Proc. Sixth Prague Conf. on Information Theory, Statistical Decision Functions and Random Process''  (1971)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  Pl. Kannappan,  C.T. Ng,  "Representation of measures information" , ''Trans. Eighth Prague Conf.'' , '''C''' , Prague  (1979)  pp. 203–206</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  Pl. Kannappan,  P.K. Shahoo,  "Kullback–Leibler type distance measures between probability distributions"  ''J. Math. Phys. Sci.'' , '''26'''  (1993)  pp. 443–454</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  Pl. Kannappan,  P.K. Shahoo,  J.K. Chung,  "On a functional equation associated with the symmetric divergence measures"  ''Utilita Math.'' , '''44'''  (1993)  pp. 75–83</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top">  S. Kullback,  "Information theory and statistics" , Peter Smith, reprint , Gloucester (1978)</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top">  S. Kullback,  R.A. Leibler,  "On information and sufficiency"  ''Ann. Math. Stat.'' , '''22'''  (1951)  pp. 79–86</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top">  C.E. Shannon,  "A mathematical theory of communication"  ''Bell System J.'' , '''27'''  (1948)  pp. 379–423; 623–656</TD></TR></table>
+
<table>
 +
<TR><TD valign="top">[a1]</TD> <TD valign="top">  J. Aczél,  Z. Daróczy,  "On measures of information and their characterizations" , Acad. Press  (1975) {{ZBL|0345.94022}}</TD></TR>
 +
<TR><TD valign="top">[a2]</TD> <TD valign="top">  J. Aczél,  A.M. Ostrowski,  "On the characterization of Shannon's entropy by Shannon's inequality"  ''J. Austral. Math. Soc.'' , '''16'''  (1973)  pp. 368–374</TD></TR>
 +
<TR><TD valign="top">[a3]</TD> <TD valign="top">  A. Bhattacharyya,  "On a measure of divergence between two statistical populations defined by their probability distributions"  ''Bull. Calcutta Math. Soc.'' , '''35'''  (1943)  pp. 99–109</TD></TR>
 +
<TR><TD valign="top">[a4]</TD> <TD valign="top">  A. Bhattacharyya,  "On a measure of divergence between two multinomial populations"  ''Sankhya'' , '''7'''  (1946)  pp. 401–406</TD></TR>
 +
<TR><TD valign="top">[a5]</TD> <TD valign="top">  J.K. Chung,  P.L. Kannappan,  C.T. Ng,  P.K. Sahoo,  "Measures of distance between probability distributions"  ''J. Math. Anal. Appl.'' , '''139'''  (1989)  pp. 280–292 {{DOI|10.1016/0022-247X(89)90335-1}} {{ZBL|0669.60025}}</TD></TR>
 +
<TR><TD valign="top">[a6]</TD> <TD valign="top">  I. Csiszár,  "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten"  ''Magyar Tud. Kutato Int. Közl.'' , '''8'''  (1963)  pp. 85–108</TD></TR>
 +
<TR><TD valign="top">[a7]</TD> <TD valign="top">  H. Jeffreys,  "An invariant form for the prior probability in estimation problems"  ''Proc. Roy. Soc. London A'' , '''186'''  (1946)  pp. 453–461 {{DOI|10.1098/rspa.1946.0056}} {{ZBL|0063.03050}}</TD></TR>
 +
<TR><TD valign="top">[a8]</TD> <TD valign="top">  Pl. Kannappan,  P.N. Rathie,  "On various characterizations of directed divergence" , ''Proc. Sixth Prague Conf. on Information Theory, Statistical Decision Functions and Random Process''  (1971)</TD></TR>
 +
<TR><TD valign="top">[a9]</TD> <TD valign="top">  Pl. Kannappan,  C.T. Ng,  "Representation of measures information" , ''Trans. Eighth Prague Conf.'' , '''C''' , Prague  (1979)  pp. 203–206</TD></TR>
 +
<TR><TD valign="top">[a10]</TD> <TD valign="top">  Pl. Kannappan,  P.K. Sahoo,  "Kullback–Leibler type distance measures between probability distributions"  ''J. Math. Phys. Sci.'' , '''26'''  (1993)  pp. 443–454</TD></TR>
 +
<TR><TD valign="top">[a11]</TD> <TD valign="top">  Pl. Kannappan,  P.K. Sahoo,  J.K. Chung,  "On a functional equation associated with the symmetric divergence measures"  ''Utilita Math.'' , '''44'''  (1993)  pp. 75–83</TD></TR>
 +
<TR><TD valign="top">[a12]</TD> <TD valign="top">  S. Kullback,  "Information theory and statistics" , Peter Smith, reprint , Gloucester MA (1978)</TD></TR>
 +
<TR><TD valign="top">[a13]</TD> <TD valign="top">  S. Kullback,  R.A. Leibler,  "On information and sufficiency"  ''Ann. Math. Stat.'' , '''22'''  (1951)  pp. 79–86 {{DOI|10.1214/aoms/1177729694|}} {{ZBL|0042.38403}}</TD></TR>
 +
<TR><TD valign="top">[a14]</TD> <TD valign="top">  C.E. Shannon,  "A mathematical theory of communication"  ''Bell System J.'' , '''27'''  (1948)  pp. 379–423; 623–656</TD></TR>
 +
</table>

Latest revision as of 22:15, 5 June 2020


In mathematical statistics one usually considers, among others, estimation, testing of hypothesis, discrimination, etc. When considering the statistical problem of discrimination, S. Kullback and R.A. Leibler [a13] introduced a measure of the "distance" or "divergence" between statistical populations, known variously as information for discrimination, $ I $- divergence, the error, or the directed divergence. While the Shannon entropy is fundamental in information theory, several generalizations of Shannon's entropy have also been proposed. In statistical estimation problems, measures between probability distributions play a significant role. The Chernoff coefficient, Hellinger–Bhattacharyya coefficient, Jeffreys distance, the directed divergence and its symmetrization, $ J $- divergence, $ f $- divergence, etc. are examples of such measures. These measures have many applications in statistics, pattern recognition, numerical taxonomy, etc.

Let

$$ \Gamma _ {n} = \left \{ P = ( p _ {1} \dots p _ {n} \mid p _ {i} > 0 ) \textrm{ and } \sum _ {i = 1 } ^ { n } p _ {i} = 1 \right \} $$

be the set of all complete discrete probability distributions of length $ n \geq 2 $( cf. Density of a probability distribution). Let $ I = ( 0,1 ) $ and let $ \mathbf R $ be the set of real numbers. For $ P, Q $ in $ \Gamma _ {n} $, Kullback and Leibler [a13] defined the directed divergence as

$$ \tag{a1 } D _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ { n } p _ {i} { \mathop{\rm log} } { \frac{p _ {i} }{q _ {i} } } = \sum p _ {i} ( { \mathop{\rm log} } p _ {i} - { \mathop{\rm log} } q _ {i} ) . $$

Usually, measures are characterized by using the many algebraic properties possessed by them, for example, see [a8] for (a1). A sequence of measures $ {\mu _ {n} } : {\Gamma _ {n} \times \Gamma _ {n} } \rightarrow \mathbf R $ is said to have the sum property if there exists a function $ f : {I ^ {2} } \rightarrow \mathbf R $ such that $ \mu _ {n} ( P \| Q ) = \sum _ {i = 1 } ^ {n} f ( p _ {i} , q _ {i} ) $ for $ P, Q \in \Gamma _ {n} $. In this case $ f $ is said to be a generating function of $ \{ \mu _ {n} \} $. A stronger version of the sum property is $ f $- divergence [a6]. The measure $ \mu _ {n} $ is an $ f $- divergence if and only if it has a representation

$$ \mu _ {n} ( P \| Q ) = \sum p _ {i} f \left ( { \frac{p _ {i} }{q _ {i} } } \right ) $$

for some $ f : {( 0, \infty ) } \rightarrow \mathbf R $. The measures $ \mu _ {n} $ are said to be $ ( m,n ) $- additive if $ \mu _ {mn } ( P \star R \| Q \star S ) = \mu _ {m} ( R \| S ) + \mu _ {n} ( P \| Q ) $ where $ P \star R = ( p _ {1} r _ {1} \dots p _ {1} r _ {m} , p _ {2} r _ {1} \dots p _ {2} r _ {m} \dots p _ {n} r _ {m} ) $.

Measures $ \mu _ {n} $ having the sum property with a Lebesgue-measurable generating function $ f $ are $ ( 2, 2 ) $- additive if and only if they are given by

$$ \mu _ {n} ( P \| Q ) = $$

$$ = 4 aH _ {n} ^ {3} ( P ) + 4a ^ \prime H _ {n} ^ {3} ( Q ) - 9a H _ {n} ^ {2} ( P ) - 9a ^ \prime H _ {n} ^ {2} ( Q ) + $$

$$ + b H _ {n} ( P ) + b ^ \prime H _ {n} ( Q ) + $$

$$ + c I _ {n} ( P \| Q ) + c ^ \prime I _ {n} ( Q \| P ) + dn, $$

where $ a $, $ a ^ \prime $, $ b $, $ b ^ \prime $, $ c $, $ c ^ \prime $, $ d $ are constants, $ H _ {n} ( P ) = - \sum p _ {i} { \mathop{\rm log} } p _ {i} $( Shannon entropy), $ H _ {n} ^ \beta ( P ) = ( 2 ^ {1 - \beta } - 1 ) ^ {- 1 } ( \sum p _ {i} ^ \beta - 1 ) $( entropy of degree $ \beta \neq 1 $) and $ I _ {n} ( P \| Q ) = - \sum p _ {i} { \mathop{\rm log} } q _ {i} $( inaccuracy). However, (a1) is neither symmetric nor satisfies the triangle inequality and thus its use as a metric is limited. In [a7], the symmetric divergence or $ J $- divergence $ J _ {n} ( P \| Q ) = D _ {n} ( P \| Q ) + D _ {n} ( Q \| P ) $ was introduced to restore symmetry.

A sequence of measures $ \{ \mu _ {m} \} $ is said to be symmetrically additive if

$$ \mu _ {nm } ( P \star R \| Q \star S ) + \mu _ {nm } ( P \star S \| Q \star R ) = $$

$$ = 2 \mu _ {n} ( P \| Q ) + 2 \mu _ {m} ( R \| S ) $$

for all $ P, Q \in \Gamma _ {n} $, $ R, S \in \Gamma _ {m} $.

Sum-form measures $ \{ \mu _ {n} \} $ with a measurable symmetric generating function $ f : {I ^ {2} } \rightarrow \mathbf R $ are symmetrically additive for all pairs of integers $ m, n \geq 2 $ and have the form [a5]

$$ \mu _ {n} ( P \| Q ) = $$

$$ = \sum _ {i = 1 } ^ { n } [ p _ {i} ( a { \mathop{\rm log} } p _ {i} + b { \mathop{\rm log} } q _ {i} ) + q _ {i} ( a { \mathop{\rm log} } q _ {i} + b { \mathop{\rm log} } p _ {i} ) ] . $$

It is well known that $ H _ {n} ( P ) \leq I _ {n} ( P \| Q ) $, that is,

$$ - \sum p _ {i} { \mathop{\rm log} } p _ {i} \leq - \sum p _ {i} { \mathop{\rm log} } q _ {i} , $$

which is known as the Shannon inequality. This inequality gives rise to the error $ D _ {n} ( P \| Q ) \geq 0 $ in (a1). A function $ {\mu _ {n} } : {\Gamma _ {n} ^ {2} } \rightarrow \mathbf R $ is called a separability measure if and only if $ \mu _ {n} ( P \| Q ) \geq 0 $ and $ \mu _ {n} ( P \| Q ) $ attains a minimum if $ P = Q $ for all $ P,Q \in \Gamma _ {n} $ with $ n \geq 2 $. A separability measure $ \mu _ {n} $ is a distance measure of Kullback–Leibler type if there exists an $ f : I \rightarrow \mathbf R $ such that $ \mu _ {n} ( P \| Q ) = \sum p _ {i} ( f ( p _ {i} ) - f ( q _ {i} ) ) $. Any Kullback–Leibler-type distance measure with generating function $ f $ satisfies the inequality $ \sum p _ {k} f ( q _ {k} ) \leq \sum p _ {k} f ( p _ {k} ) $( see [a10], [a2]).

References

[a1] J. Aczél, Z. Daróczy, "On measures of information and their characterizations" , Acad. Press (1975) Zbl 0345.94022
[a2] J. Aczél, A.M. Ostrowski, "On the characterization of Shannon's entropy by Shannon's inequality" J. Austral. Math. Soc. , 16 (1973) pp. 368–374
[a3] A. Bhattacharyya, "On a measure of divergence between two statistical populations defined by their probability distributions" Bull. Calcutta Math. Soc. , 35 (1943) pp. 99–109
[a4] A. Bhattacharyya, "On a measure of divergence between two multinomial populations" Sankhya , 7 (1946) pp. 401–406
[a5] J.K. Chung, P.L. Kannappan, C.T. Ng, P.K. Sahoo, "Measures of distance between probability distributions" J. Math. Anal. Appl. , 139 (1989) pp. 280–292 DOI 10.1016/0022-247X(89)90335-1 Zbl 0669.60025
[a6] I. Csiszár, "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten" Magyar Tud. Kutato Int. Közl. , 8 (1963) pp. 85–108
[a7] H. Jeffreys, "An invariant form for the prior probability in estimation problems" Proc. Roy. Soc. London A , 186 (1946) pp. 453–461 DOI 10.1098/rspa.1946.0056 Zbl 0063.03050
[a8] Pl. Kannappan, P.N. Rathie, "On various characterizations of directed divergence" , Proc. Sixth Prague Conf. on Information Theory, Statistical Decision Functions and Random Process (1971)
[a9] Pl. Kannappan, C.T. Ng, "Representation of measures information" , Trans. Eighth Prague Conf. , C , Prague (1979) pp. 203–206
[a10] Pl. Kannappan, P.K. Sahoo, "Kullback–Leibler type distance measures between probability distributions" J. Math. Phys. Sci. , 26 (1993) pp. 443–454
[a11] Pl. Kannappan, P.K. Sahoo, J.K. Chung, "On a functional equation associated with the symmetric divergence measures" Utilita Math. , 44 (1993) pp. 75–83
[a12] S. Kullback, "Information theory and statistics" , Peter Smith, reprint , Gloucester MA (1978)
[a13] S. Kullback, R.A. Leibler, "On information and sufficiency" Ann. Math. Stat. , 22 (1951) pp. 79–86 DOI 10.1214/aoms/1177729694 Zbl 0042.38403
[a14] C.E. Shannon, "A mathematical theory of communication" Bell System J. , 27 (1948) pp. 379–423; 623–656
How to Cite This Entry:
Kullback-Leibler-type distance measures. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Kullback-Leibler-type_distance_measures&oldid=17993
This article was adapted from an original article by Pl. Kannappan (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article