Namespaces
Variants
Actions

Difference between revisions of "Entropy"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
(latex details)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
An information-theoretical measure of the degree of indeterminacy of a random variable. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357401.png" /> is a discrete random variable defined on a probability space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357402.png" /> and assuming values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357403.png" /> with probability distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357404.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357405.png" />, then the entropy is defined by the formula
+
<!--
 +
e0357401.png
 +
$#A+1 = 92 n = 0
 +
$#C+1 = 92 : ~/encyclopedia/old_files/data/E035/E.0305740 Entropy
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357406.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
(here it is assumed that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357407.png" />). The base of the logarithm can be any positive number, but as a rule one takes logarithms to the base 2 or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357408.png" />, which corresponds to the choice of a [[Bit|bit]] or a nat (natural unit) as the unit of measurement.
+
An information-theoretical measure of the degree of indeterminacy of a random variable. If  $  \xi $
 +
is a discrete random variable defined on a probability space  $  ( \Omega , \mathfrak A , {\mathsf P} ) $
 +
and assuming values  $  x _ {1} , x _ {2} \dots $
 +
with probability distribution  $  \{ {p _ {k} } : {1 , 2 ,\dots } \} $,
 +
$  p _ {k} = {\mathsf P} \{ \xi = x _ {k} \} $,  
 +
then the entropy is defined by the formula
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e0357409.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574010.png" /> are two discrete random variables taking values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574011.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574012.png" /> with probability distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574013.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574014.png" />, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574015.png" /> is the conditional distribution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574016.png" /> assuming that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574017.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574018.png" /> then the (mean) conditional entropy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574019.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574020.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574021.png" /> is defined as
+
$$ \tag{1 }
 +
H ( \xi ) = - \sum_{k=1} ^  \infty p _ {k}  \mathop{\rm log}  p _ {k}  $$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574022.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
(here it is assumed that  $  0  \mathop{\rm log}  0 = 0 $).  
 +
The base of the logarithm can be any positive number, but as a rule one takes logarithms to the base 2 or  $  e $,
 +
which corresponds to the choice of a [[Bit|bit]] or a nat (natural unit) as the unit of measurement.
  
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574023.png" /> be a stationary process with discrete time and discrete space of values such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574024.png" />. Then the entropy (more accurately, the mean entropy) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574025.png" /> of this stationary process is defined as the limit
+
If  $  \xi $
 +
and  $  \eta $
 +
are two discrete random variables taking values  $  x _ {1} , x _ {2} \dots $
 +
and  $  y _ {1} , y _ {2} \dots $
 +
with probability distributions  $  \{ {p _ {k} } : {k = 1 , 2 ,\dots } \} $
 +
and  $  \{ {q _ {j} } : {j = 1 , 2 ,\dots } \} $,
 +
and if  $  \{ {p _ {k\mid } j } : {k = 1 , 2 , . . . } \} $
 +
is the conditional distribution of $  \xi $
 +
assuming that $  \eta = y _ {j} $,
 +
$  j = 1 , 2 \dots $
 +
then the (mean) conditional entropy $  H ( \xi \mid  \eta ) $
 +
of $  \xi $
 +
given  $  \eta $
 +
is defined as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574026.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
$$ \tag{2 }
 +
H ( \xi \mid  \eta )  = - \sum_{j=1} ^  \infty 
 +
q _ {j} \sum_{k=1} ^  \infty  p _ {k\mid } j  \mathop{\rm log}  p _ {k\mid } j .
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574027.png" /> is the entropy of the random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574028.png" />. It is known that the limit on the right-hand side of (3) always exists and that
+
Let  $  \xi = \{ {\xi _ {k} } : {k = \dots, - 1 , 0 , 1 ,\dots } \} $
 +
be a stationary process with discrete time and discrete space of values such that  $  H ( \xi _ {1} ) < \infty $.  
 +
Then the entropy (more accurately, the mean entropy)  $  \overline{H}\; ( \xi ) $
 +
of this stationary process is defined as the limit
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574029.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
$$ \tag{3 }
 +
\overline{H}\; ( \xi ) =  \lim\limits _ {n \rightarrow \infty } 
 +
\frac{1}{n}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574030.png" /> is the conditional entropy of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574031.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574032.png" />. The entropy of stationary processes has important applications in the theory of dynamical systems.
+
H ( \xi  ^ {n} ) ,
 +
$$
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574033.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574034.png" /> are two measures on a measurable space <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574035.png" /> and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574036.png" /> is absolutely continuous relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574037.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574038.png" /> is the corresponding Radon–Nikodým derivative, then the entropy <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574039.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574040.png" /> relative to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574041.png" /> is defined as the integral
+
where  $  H ( \xi  ^ {n} ) $
 +
is the entropy of the random variable  $  \xi  ^ {n} = ( \xi _ {1} \dots \xi _ {n} ) $.  
 +
It is known that the limit on the right-hand side of (3) always exists and that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574042.png" /></td> <td valign="top" style="width:5%;text-align:right;">(5)</td></tr></table>
+
$$ \tag{4 }
 +
\overline{H}\; ( \xi )  = \lim\limits _ {n \rightarrow \infty }  H
 +
( \xi _ {n} \mid  \xi _ {1} \dots \xi _ {n-} 1 ) ,
 +
$$
 +
 
 +
where  $  H ( \xi _ {n} \mid  \xi _ {1} \dots \xi _ {n-} 1 ) $
 +
is the conditional entropy of  $  \xi _ {n} $
 +
given  $  \xi  ^ {n-} 1 = ( \xi _ {1} \dots \xi _ {n-} 1 ) $.  
 +
The entropy of stationary processes has important applications in the theory of dynamical systems.
 +
 
 +
If  $  \mu $
 +
and  $  \nu $
 +
are two measures on a measurable space  $  ( \Omega , \mathfrak A ) $
 +
and if  $  \mu $
 +
is absolutely continuous relative to  $  \nu $
 +
and  $  d \mu / d \nu $
 +
is the corresponding Radon–Nikodým derivative, then the entropy  $  H ( d \mu / d \nu ) $
 +
of  $  \mu $
 +
relative to  $  \nu $
 +
is defined as the integral
 +
 
 +
$$ \tag{5 }
 +
H \left (
 +
\frac{d \mu }{d \nu }
 +
\right )  =  \int\limits _  \Omega
 +
\mathop{\rm log} 
 +
\frac{d \mu }{d \nu }
 +
  \nu ( d \omega ) .
 +
$$
  
 
A special case of the entropy of one measure with respect to another is the [[Differential entropy|differential entropy]].
 
A special case of the entropy of one measure with respect to another is the [[Differential entropy|differential entropy]].
  
Of the many possible generalizations of the concept of entropy in information theory one of the most important is the following. Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574043.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574044.png" /> be two random variables taking values in certain measurable spaces <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574045.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574046.png" />. Suppose that the distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574047.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574048.png" /> is given and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574049.png" /> be a class of admissible joint distributions of the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574050.png" /> in the set of all probability measures in the product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574051.png" />. Then the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574053.png" />-entropy (or the entropy for a given condition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574054.png" /> of exactness of reproduction of information (cf. [[Information, exactness of reproducibility of|Information, exactness of reproducibility of]])) is defined as the quantity
+
Of the many possible generalizations of the concept of entropy in information theory one of the most important is the following. Let $  \xi $
 +
and $  \widetilde \xi  $
 +
be two random variables taking values in certain measurable spaces $  ( \mathfrak X , S _ {\mathfrak X} ) $
 +
and $  ( \widetilde{\mathfrak X}  , S _ {\widetilde{\mathfrak X}  }  ) $.  
 +
Suppose that the distribution $  p ( \cdot ) $
 +
of $  \xi $
 +
is given and let $  W $
 +
be a class of admissible joint distributions of the pair $  ( \xi , \widetilde \xi  ) $
 +
in the set of all probability measures in the product $  ( \mathfrak X \times \widetilde{\mathfrak X}  , S _ {\mathfrak X} \times S _ {\widetilde{\mathfrak X}  }  ) $.  
 +
Then the $  W $-
 +
entropy (or the entropy for a given condition $  W $
 +
of exactness of reproduction of information (cf. [[Information, exactness of reproducibility of|Information, exactness of reproducibility of]])) is defined as the quantity
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574055.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
+
$$ \tag{6 }
 +
H _ {W} ( \xi )  = \inf  I ( \xi , \widetilde \xi  ) ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574056.png" /> is the amount of information (cf. [[Information, amount of|Information, amount of]]) in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574057.png" /> given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574058.png" /> and the infimum is taken over all pairs of random variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574059.png" /> such that the joint distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574060.png" /> of the pair <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574061.png" /> belongs to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574062.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574063.png" /> has the distribution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574064.png" />. The class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574065.png" /> of joint distributions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574066.png" /> is often given by means of a certain non-negative measurable real-valued function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574067.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574068.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574069.png" />, a measure of distortion, in the following manner:
+
where $  I ( \xi , \widetilde \xi  ) $
 +
is the amount of information (cf. [[Information, amount of|Information, amount of]]) in $  \widetilde \xi  $
 +
given $  \xi $
 +
and the infimum is taken over all pairs of random variables $  ( \xi , \widetilde \xi  ) $
 +
such that the joint distribution $  p ( \cdot , \cdot ) $
 +
of the pair $  ( \xi , \widetilde \xi  ) $
 +
belongs to $  W $
 +
and $  \xi $
 +
has the distribution $  p ( \cdot ) $.  
 +
The class $  W $
 +
of joint distributions $  p ( \cdot , \cdot ) $
 +
is often given by means of a certain non-negative measurable real-valued function $  \rho ( x , \widetilde{x}  ) $,  
 +
$  x \in X $,  
 +
$  \widetilde{x}  \in \widetilde{X}  $,  
 +
a measure of distortion, in the following manner:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574070.png" /></td> <td valign="top" style="width:5%;text-align:right;">(7)</td></tr></table>
+
$$ \tag{7 }
 +
= \{ {p ( \cdot , \cdot ) } : {
 +
{\mathsf E} \rho ( \xi , \widetilde \xi  ) \leq  \epsilon } \}
 +
,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574071.png" /> is fixed. In this case the quantity defined by (6), where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574072.png" /> is given by (7), is called the [[Epsilon-entropy|<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574073.png" />-entropy]] (or the rate as a function of the distortion) and is denoted by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574074.png" />. For example, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574075.png" /> is a Gaussian random vector with independent components, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574076.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574077.png" />, and if the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574078.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574079.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574080.png" />, has the form
+
where $  \epsilon > 0 $
 +
is fixed. In this case the quantity defined by (6), where $  W $
 +
is given by (7), is called the [[Epsilon-entropy| $  \epsilon $-
 +
entropy]] (or the rate as a function of the distortion) and is denoted by $  H _  \epsilon  ( \xi ) $.  
 +
For example, if $  \xi = ( \xi _ {1} \dots \xi _ {n} ) $
 +
is a Gaussian random vector with independent components, if $  {\mathsf E} \xi _ {k} = 0 $,  
 +
$  k = 1 \dots n $,  
 +
and if the function $  \rho ( x , \widetilde{x}  ) $,  
 +
$  x = ( x _ {1} \dots x _ {n} ) $,  
 +
$  \widetilde{x}  = ( \widetilde{x}  _ {1} \dots \widetilde{x}  _ {n} ) $,  
 +
has the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574081.png" /></td> </tr></table>
+
$$
 +
\rho ( x , \widetilde{x}  )  = \sum_{k=1} ^ { n }
 +
( x _ {k} - \widetilde{x}  _ {k} )  ^ {2} ,
 +
$$
  
then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574082.png" /> can be found by the formula
+
then $  H _  \epsilon  ( \xi ) $
 +
can be found by the formula
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574083.png" /></td> </tr></table>
+
$$
 +
H _  \epsilon  ( \xi )  =
 +
\frac{1}{2}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574084.png" /> is defined by
+
\sum_{k=1} ^ { n }  { \mathop{\rm log}  \max }
 +
\left (
 +
\frac{ {\mathsf E} \xi _ {k}  ^ {2} } \lambda
 +
, 1 \right ) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574085.png" /></td> </tr></table>
+
where  $  \lambda $
 +
is defined by
  
If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574086.png" /> is a discrete random variable, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574087.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574088.png" /> are the same, and if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574089.png" /> has the form
+
$$
 +
\sum_{k=1} ^ { n }  \min ( \lambda , {\mathsf E} \xi _ {k}  ^ {2} )  = \epsilon .
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574090.png" /></td> </tr></table>
+
If  $  \xi $
 +
is a discrete random variable, if  $  ( \mathfrak X , S _ {\mathfrak X} ) $
 +
and  $  ( \widetilde{\mathfrak X}  , S _ \widetilde{ {\mathfrak X}}  ) $
 +
are the same, and if  $  \rho ( x , \widetilde{x}  ) $
 +
has the form
  
then the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574091.png" />-entropy for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574092.png" /> is equal to the ordinary entropy defined in (1), that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/e/e035/e035740/e03574093.png" />.
+
$$
 +
\rho ( x , \widetilde{x}  )  = \left \{ \begin{array}{l}
 +
 
 +
0 \  \textrm{ if }  x = \widetilde{x}  , \\
 +
 
 +
1 \  \textrm{ if }  x \neq \widetilde{x}  ,
 +
\end{array}
 +
 
 +
\right .$$
 +
 
 +
then the  $  \epsilon $-
 +
entropy for $  \epsilon = 0 $
 +
is equal to the ordinary entropy defined in (1), that is, $  H _ {0} ( \xi ) = H ( \xi ) $.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  C. Shannon,  "A mathematical theory of communication"  ''Bell System Techn. J.'' , '''27'''  (1948)  pp. 379–423; 623–656</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  R.G. Gallager,  "Information theory and reliable communication" , Wiley  (1968)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  T. Berger,  "Rate distortion theory" , Prentice-Hall  (1971)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  P. Billingsley,  "Ergodic theory and information" , Wiley  (1956)</TD></TR></table>
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  C. Shannon,  "A mathematical theory of communication"  ''Bell System Techn. J.'' , '''27'''  (1948)  pp. 379–423; 623–656</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  R.G. Gallager,  "Information theory and reliable communication" , Wiley  (1968)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  T. Berger,  "Rate distortion theory" , Prentice-Hall  (1971)</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  P. Billingsley,  "Ergodic theory and information" , Wiley  (1956)</TD></TR></table>
 
 
  
 
====Comments====
 
====Comments====
For entropy in the theory of dynamical systems see [[Entropy theory of a dynamical system|Entropy theory of a dynamical system]] and [[Topological entropy|Topological entropy]].
+
For entropy in the theory of dynamical systems, see [[Entropy theory of a dynamical system]] and [[Topological entropy]].

Latest revision as of 13:06, 6 January 2024


An information-theoretical measure of the degree of indeterminacy of a random variable. If $ \xi $ is a discrete random variable defined on a probability space $ ( \Omega , \mathfrak A , {\mathsf P} ) $ and assuming values $ x _ {1} , x _ {2} \dots $ with probability distribution $ \{ {p _ {k} } : {1 , 2 ,\dots } \} $, $ p _ {k} = {\mathsf P} \{ \xi = x _ {k} \} $, then the entropy is defined by the formula

$$ \tag{1 } H ( \xi ) = - \sum_{k=1} ^ \infty p _ {k} \mathop{\rm log} p _ {k} $$

(here it is assumed that $ 0 \mathop{\rm log} 0 = 0 $). The base of the logarithm can be any positive number, but as a rule one takes logarithms to the base 2 or $ e $, which corresponds to the choice of a bit or a nat (natural unit) as the unit of measurement.

If $ \xi $ and $ \eta $ are two discrete random variables taking values $ x _ {1} , x _ {2} \dots $ and $ y _ {1} , y _ {2} \dots $ with probability distributions $ \{ {p _ {k} } : {k = 1 , 2 ,\dots } \} $ and $ \{ {q _ {j} } : {j = 1 , 2 ,\dots } \} $, and if $ \{ {p _ {k\mid } j } : {k = 1 , 2 , . . . } \} $ is the conditional distribution of $ \xi $ assuming that $ \eta = y _ {j} $, $ j = 1 , 2 \dots $ then the (mean) conditional entropy $ H ( \xi \mid \eta ) $ of $ \xi $ given $ \eta $ is defined as

$$ \tag{2 } H ( \xi \mid \eta ) = - \sum_{j=1} ^ \infty q _ {j} \sum_{k=1} ^ \infty p _ {k\mid } j \mathop{\rm log} p _ {k\mid } j . $$

Let $ \xi = \{ {\xi _ {k} } : {k = \dots, - 1 , 0 , 1 ,\dots } \} $ be a stationary process with discrete time and discrete space of values such that $ H ( \xi _ {1} ) < \infty $. Then the entropy (more accurately, the mean entropy) $ \overline{H}\; ( \xi ) $ of this stationary process is defined as the limit

$$ \tag{3 } \overline{H}\; ( \xi ) = \lim\limits _ {n \rightarrow \infty } \frac{1}{n} H ( \xi ^ {n} ) , $$

where $ H ( \xi ^ {n} ) $ is the entropy of the random variable $ \xi ^ {n} = ( \xi _ {1} \dots \xi _ {n} ) $. It is known that the limit on the right-hand side of (3) always exists and that

$$ \tag{4 } \overline{H}\; ( \xi ) = \lim\limits _ {n \rightarrow \infty } H ( \xi _ {n} \mid \xi _ {1} \dots \xi _ {n-} 1 ) , $$

where $ H ( \xi _ {n} \mid \xi _ {1} \dots \xi _ {n-} 1 ) $ is the conditional entropy of $ \xi _ {n} $ given $ \xi ^ {n-} 1 = ( \xi _ {1} \dots \xi _ {n-} 1 ) $. The entropy of stationary processes has important applications in the theory of dynamical systems.

If $ \mu $ and $ \nu $ are two measures on a measurable space $ ( \Omega , \mathfrak A ) $ and if $ \mu $ is absolutely continuous relative to $ \nu $ and $ d \mu / d \nu $ is the corresponding Radon–Nikodým derivative, then the entropy $ H ( d \mu / d \nu ) $ of $ \mu $ relative to $ \nu $ is defined as the integral

$$ \tag{5 } H \left ( \frac{d \mu }{d \nu } \right ) = \int\limits _ \Omega \mathop{\rm log} \frac{d \mu }{d \nu } \nu ( d \omega ) . $$

A special case of the entropy of one measure with respect to another is the differential entropy.

Of the many possible generalizations of the concept of entropy in information theory one of the most important is the following. Let $ \xi $ and $ \widetilde \xi $ be two random variables taking values in certain measurable spaces $ ( \mathfrak X , S _ {\mathfrak X} ) $ and $ ( \widetilde{\mathfrak X} , S _ {\widetilde{\mathfrak X} } ) $. Suppose that the distribution $ p ( \cdot ) $ of $ \xi $ is given and let $ W $ be a class of admissible joint distributions of the pair $ ( \xi , \widetilde \xi ) $ in the set of all probability measures in the product $ ( \mathfrak X \times \widetilde{\mathfrak X} , S _ {\mathfrak X} \times S _ {\widetilde{\mathfrak X} } ) $. Then the $ W $- entropy (or the entropy for a given condition $ W $ of exactness of reproduction of information (cf. Information, exactness of reproducibility of)) is defined as the quantity

$$ \tag{6 } H _ {W} ( \xi ) = \inf I ( \xi , \widetilde \xi ) , $$

where $ I ( \xi , \widetilde \xi ) $ is the amount of information (cf. Information, amount of) in $ \widetilde \xi $ given $ \xi $ and the infimum is taken over all pairs of random variables $ ( \xi , \widetilde \xi ) $ such that the joint distribution $ p ( \cdot , \cdot ) $ of the pair $ ( \xi , \widetilde \xi ) $ belongs to $ W $ and $ \xi $ has the distribution $ p ( \cdot ) $. The class $ W $ of joint distributions $ p ( \cdot , \cdot ) $ is often given by means of a certain non-negative measurable real-valued function $ \rho ( x , \widetilde{x} ) $, $ x \in X $, $ \widetilde{x} \in \widetilde{X} $, a measure of distortion, in the following manner:

$$ \tag{7 } W = \{ {p ( \cdot , \cdot ) } : { {\mathsf E} \rho ( \xi , \widetilde \xi ) \leq \epsilon } \} , $$

where $ \epsilon > 0 $ is fixed. In this case the quantity defined by (6), where $ W $ is given by (7), is called the $ \epsilon $- entropy (or the rate as a function of the distortion) and is denoted by $ H _ \epsilon ( \xi ) $. For example, if $ \xi = ( \xi _ {1} \dots \xi _ {n} ) $ is a Gaussian random vector with independent components, if $ {\mathsf E} \xi _ {k} = 0 $, $ k = 1 \dots n $, and if the function $ \rho ( x , \widetilde{x} ) $, $ x = ( x _ {1} \dots x _ {n} ) $, $ \widetilde{x} = ( \widetilde{x} _ {1} \dots \widetilde{x} _ {n} ) $, has the form

$$ \rho ( x , \widetilde{x} ) = \sum_{k=1} ^ { n } ( x _ {k} - \widetilde{x} _ {k} ) ^ {2} , $$

then $ H _ \epsilon ( \xi ) $ can be found by the formula

$$ H _ \epsilon ( \xi ) = \frac{1}{2} \sum_{k=1} ^ { n } { \mathop{\rm log} \max } \left ( \frac{ {\mathsf E} \xi _ {k} ^ {2} } \lambda , 1 \right ) , $$

where $ \lambda $ is defined by

$$ \sum_{k=1} ^ { n } \min ( \lambda , {\mathsf E} \xi _ {k} ^ {2} ) = \epsilon . $$

If $ \xi $ is a discrete random variable, if $ ( \mathfrak X , S _ {\mathfrak X} ) $ and $ ( \widetilde{\mathfrak X} , S _ \widetilde{ {\mathfrak X}} ) $ are the same, and if $ \rho ( x , \widetilde{x} ) $ has the form

$$ \rho ( x , \widetilde{x} ) = \left \{ \begin{array}{l} 0 \ \textrm{ if } x = \widetilde{x} , \\ 1 \ \textrm{ if } x \neq \widetilde{x} , \end{array} \right .$$

then the $ \epsilon $- entropy for $ \epsilon = 0 $ is equal to the ordinary entropy defined in (1), that is, $ H _ {0} ( \xi ) = H ( \xi ) $.

References

[1] C. Shannon, "A mathematical theory of communication" Bell System Techn. J. , 27 (1948) pp. 379–423; 623–656
[2] R.G. Gallager, "Information theory and reliable communication" , Wiley (1968)
[3] T. Berger, "Rate distortion theory" , Prentice-Hall (1971)
[4] P. Billingsley, "Ergodic theory and information" , Wiley (1956)

Comments

For entropy in the theory of dynamical systems, see Entropy theory of a dynamical system and Topological entropy.

How to Cite This Entry:
Entropy. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Entropy&oldid=15099
This article was adapted from an original article by R.L. DobrushinV.V. Prelov (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article