Namespaces
Variants
Actions

Difference between revisions of "Bregman function"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108901.png" /> be a closed convex subset of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108902.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108903.png" /> its interior. Consider a real-valued convex function (cf. also [[Convex function (of a real variable)|convex function (of a real variable)]]) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108904.png" /> whose effective domain contains <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108905.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108906.png" /> be defined as
+
<!--
 +
b1108901.png
 +
$#A+1 = 93 n = 0
 +
$#C+1 = 93 : ~/encyclopedia/old_files/data/B110/B.1100890 Bregman function
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108907.png" /></td> </tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108908.png" /> is said to be a Bregman function with zone <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b1108909.png" /> (and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089010.png" /> the Bregman distance induced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089011.png" />) if the following conditions hold:
+
Let  $  C $
 +
be a closed convex subset of  $  \mathbf R  ^ {n} $
 +
and  $  C  ^ {o} $
 +
its interior. Consider a real-valued convex function (cf. also [[Convex function (of a real variable)|convex function (of a real variable)]])  $  f $
 +
whose effective domain contains  $  C $
 +
and let  $  {D _ {f} } : {C \times C  ^ {o} } \rightarrow \mathbf R $
 +
be defined as
  
B1) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089012.png" /> is continuously differentiable on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089013.png" />;
+
$$
 +
D _ {f} ( x,y ) = f ( x ) - f ( y ) - \left \langle  {\nabla f ( y ) ,x - y } \right \rangle .
 +
$$
  
B2) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089014.png" /> is strictly convex and continuous on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089015.png" />;
+
$  f $
 +
is said to be a Bregman function with zone  $  C $(
 +
and $  D _ {f} $
 +
the Bregman distance induced by  $  f $)
 +
if the following conditions hold:
  
B3) for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089016.png" /> the partial level sets <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089017.png" /> are bounded for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089018.png" />;
+
B1) $  f $
 +
is continuously differentiable on  $  C  ^ {o} $;
  
B4) if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089019.png" /> converges to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089020.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089021.png" /> converges to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089022.png" />;
+
B2) $  f $
 +
is strictly convex and continuous on  $  C $;
  
B5) if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089023.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089024.png" /> are sequences such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089025.png" /> is bounded, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089026.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089027.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089028.png" />.
+
B3) for all  $  \delta \in \mathbf R $
 +
the partial level sets  $  \Gamma ( x, \delta ) = \{ {y \in C  ^ {o} } : {D _ {f} ( x,y ) \leq  \delta } \} $
 +
are bounded for all  $  x \in C $;
  
Bregman functions were introduced in [[#References|[a1]]]. B4) and B5) hold automatically when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089029.png" /> are in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089030.png" />, as a consequence of B1), B2) and B3), and so they need to be checked only at points on the boundary <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089031.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089032.png" />. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089033.png" />, a sufficient condition for a strictly convex differentiable function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089034.png" /> to be a Bregman function is
+
B4) if  $  \{ y  ^ {k} \} \subset  C  ^ {o} $
 +
converges to  $  y  ^ {*} $,  
 +
then  $  D _ {f} ( y  ^ {*} ,y  ^ {k} ) $
 +
converges to 0 $;
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089035.png" /></td> </tr></table>
+
B5) if  $  \{ x  ^ {k} \} \subset  C $
 +
and  $  \{ y  ^ {k} \} \subset  C  ^ {o} $
 +
are sequences such that  $  \{ x  ^ {k} \} $
 +
is bounded,  $  {\lim\limits } _ {k \rightarrow \infty }  y  ^ {k} = y  ^ {*} $
 +
and  $  {\lim\limits } _ {k \rightarrow \infty }  D _ {f} ( x  ^ {k} ,y  ^ {k} ) = 0 $,
 +
then  $  {\lim\limits } _ {k \rightarrow \infty }  x  ^ {k} = y  ^ {*} $.
 +
 
 +
Bregman functions were introduced in [[#References|[a1]]]. B4) and B5) hold automatically when  $  x  ^ {k} ,y  ^ {*} $
 +
are in  $  C  ^ {o} $,
 +
as a consequence of B1), B2) and B3), and so they need to be checked only at points on the boundary  $  \partial  C $
 +
of  $  C $.  
 +
When  $  C = \mathbf R  ^ {n} $,
 +
a sufficient condition for a strictly convex differentiable function  $  f $
 +
to be a Bregman function is
 +
 
 +
$$
 +
{\lim\limits } _ {\left \| x \right \| \rightarrow \infty } {
 +
\frac{f ( x ) }{\left \| x \right \| }
 +
} = \infty
 +
$$
  
 
(see [[#References|[a2]]]).
 
(see [[#References|[a2]]]).
  
A Bregman function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089036.png" /> is said to be boundary coercive if for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089037.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089038.png" /> one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089039.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089040.png" />, and zone coercive if the image of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089041.png" /> under <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089042.png" /> is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089043.png" />. Zone coerciveness implies boundary coerciveness (see [[#References|[a3]]]). These notions are closely related to essential smoothness, as defined in [[#References|[a5]]]. For a boundary-coercive Bregman function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089044.png" /> the zone <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089045.png" /> is uniquely determined from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089046.png" />, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089047.png" /> cannot be finitely extended outside <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089048.png" />. This property is essential in most applications of Bregman functions.
+
A Bregman function $  f $
 +
is said to be boundary coercive if for all $  \{ y  ^ {k} \} \subset  C  ^ {o} $
 +
such that $  {\lim\limits } _ {k \rightarrow \infty }  y  ^ {k} = y \in \partial  C $
 +
one has $  {\lim\limits } _ {k \rightarrow \infty }  D _ {g} ( x,y  ^ {k} ) = \infty $
 +
for all $  x \in C  ^ {o} $,  
 +
and zone coercive if the image of $  C  ^ {o} $
 +
under $  \nabla f $
 +
is equal to $  \mathbf R  ^ {n} $.  
 +
Zone coerciveness implies boundary coerciveness (see [[#References|[a3]]]). These notions are closely related to essential smoothness, as defined in [[#References|[a5]]]. For a boundary-coercive Bregman function $  f $
 +
the zone $  C $
 +
is uniquely determined from $  f $,  
 +
i.e. $  f $
 +
cannot be finitely extended outside $  C $.  
 +
This property is essential in most applications of Bregman functions.
  
 
==Examples.==
 
==Examples.==
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089049.png" /> denotes the non-negative orthant of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089050.png" />.
+
$  \mathbf R  ^ {n} _ {+} $
 +
denotes the non-negative orthant of $  \mathbf R  ^ {n} $.
  
i) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089051.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089052.png" />. In this case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089053.png" />. More generally, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089054.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089055.png" /> symmetric and positive definite, in which case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089056.png" />.
+
i) $  C = \mathbf R  ^ {n} $,  
 +
$  f ( x ) = \| x \|  ^ {2} $.  
 +
In this case $  D _ {f} ( x,y ) = \| {x - y } \|  ^ {2} $.  
 +
More generally, $  f ( x ) = x  ^ {t} Mx $,  
 +
with $  M \in \mathbf R ^ {n \times n } $
 +
symmetric and positive definite, in which case $  D _ {f} ( x,y ) = ( x - y )  ^ {t} M ( x - y ) $.
  
ii) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089057.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089058.png" />, extended by continuity to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089059.png" /> with the convention that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089060.png" />. In this case
+
ii) $  C = \mathbf R _ {+}  ^ {n} $,  
 +
$  f ( x ) = \sum _ {j = 1 }  ^ {n} x _ {j} { \mathop{\rm log} } x _ {j} $,  
 +
extended by continuity to $  \partial  \mathbf R _ {+}  ^ {n} $
 +
with the convention that $  0 { \mathop{\rm log} } 0 = 0 $.  
 +
In this case
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089061.png" /></td> </tr></table>
+
$$
 +
D _ {f} ( x,y ) = \sum _ {j = 1 } ^ { n }  \left ( x _ {j} { \mathop{\rm log} } {
 +
\frac{x _ {j} }{y _ {j} }
 +
} + y _ {j} - x _ {j} \right ) ,
 +
$$
  
 
which is the Kullback–Leibler information divergence, widely used in statistics (see [[Kullback–Leibler-type distance measures|Kullback–Leibler-type distance measures]]; [[#References|[a4]]]).
 
which is the Kullback–Leibler information divergence, widely used in statistics (see [[Kullback–Leibler-type distance measures|Kullback–Leibler-type distance measures]]; [[#References|[a4]]]).
  
iii) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089062.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089063.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089065.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089066.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089067.png" /> one has
+
iii) $  C = \mathbf R _ {+}  ^ {n} $,
 +
$  f ( x ) = \sum _ {j = 1 }  ^ {n} ( x _ {j}  ^  \alpha  - x _ {j}  ^  \beta  ) $
 +
with  $  \alpha \geq  1 $,  
 +
0 < \beta < 1 $.  
 +
For $  \alpha = 2 $,  
 +
$  \beta = {1 / 2 } $
 +
one has
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089068.png" /></td> </tr></table>
+
$$
 +
D _ {f} ( x,y ) = \left \| {x - y } \right \|  ^ {2} + \sum _ {j = 1 } ^ { n }  \left [ {
 +
\frac{( \sqrt {x _ {j} } - \sqrt {y _ {j} } )  ^ {2} }{2 \sqrt {y _ {j} } }
 +
} \right ] ,
 +
$$
  
and for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089069.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089070.png" /> one has
+
and for $  \alpha = 1 $,  
 +
$  \beta = 1/2 $
 +
one has
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089071.png" /></td> </tr></table>
+
$$
 +
D _ {f} ( x,y ) = \sum _ {j = 1 } ^ { n }  \left [ {
 +
\frac{( \sqrt {x _ {j} } - \sqrt {y _ {j} } )  ^ {2} }{2 \sqrt {y _ {j} } }
 +
} \right ] .
 +
$$
  
iv) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089072.png" /> is a box (i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089073.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089074.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089075.png" />),
+
iv) $  C $
 +
is a box (i.e., $  C = [ a _ {1} ,b _ {1} ] \times \dots \times [ a _ {n} ,b _ {n} ] $
 +
with $  a _ {j} < b _ {j} $,  
 +
$  1 \leq  j \leq  n $),
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089076.png" /></td> </tr></table>
+
$$
 +
f ( x ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089077.png" /></td> </tr></table>
+
$$
 +
=  
 +
\sum _ {j = 1 } ^ { n }  [ ( x _ {j} - a _ {j} ) { \mathop{\rm log} } ( x _ {j} - a _ {j} ) + ( b _ {j} - x _ {j} ) { \mathop{\rm log} } ( b _ {j} - x _ {j} ) ] .
 +
$$
  
 
In this case
 
In this case
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089078.png" /></td> </tr></table>
+
$$
 +
D _ {f} ( x,y ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089079.png" /></td> </tr></table>
+
$$
 +
=  
 +
\sum _ {j = 1 } ^ { n }  ( x _ {j} - a _ {j} ) { \mathop{\rm log} } \left ( {
 +
\frac{x _ {j} - a _ {j} }{y _ {j} - a _ {j} }
 +
} \right )  +
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089080.png" /></td> </tr></table>
+
$$
 +
+
 +
\sum _ {j = 1 } ^ { n }  ( b _ {j} - x _ {j} ) { \mathop{\rm log} } \left ( {
 +
\frac{b _ {j} - x _ {j} }{b _ {j} - y _ {j} }
 +
} \right ) .
 +
$$
  
v) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089081.png" /> is a [[Polyhedron|polyhedron]] with non-empty interior (i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089082.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089083.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089084.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089085.png" /> (so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089086.png" />)), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089087.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089088.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089089.png" />) are the rows of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089090.png" />. In this case
+
v) $  C $
 +
is a [[Polyhedron|polyhedron]] with non-empty interior (i.e., $  C = \{ {x \in \mathbf R  ^ {n} } : {Ax \leq  b } \} $
 +
with $  A \in \mathbf R ^ {m \times n } $,  
 +
b \in \mathbf R  ^ {n} $
 +
and $  { \mathop{\rm rank} } ( A ) = n $(
 +
so that $  m \geq  n $)),  
 +
$  f ( x ) = \sum _ {i = 1 }  ^ {m} ( b _ {i} - \langle  {a  ^ {i} ,x } \rangle ) { \mathop{\rm log} } ( b _ {i} - \langle  {a  ^ {i} ,x } \rangle ) $,  
 +
where $  a  ^ {i} $(
 +
$  1 \leq  i \leq  m $)  
 +
are the rows of $  A $.  
 +
In this case
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089091.png" /></td> </tr></table>
+
$$
 +
D _ {f} ( x,y ) =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089092.png" /></td> </tr></table>
+
$$
 +
=  
 +
\sum _ {i = 1 } ^ { m }  \left [ ( b _ {i} - \left \langle  {a  ^ {i} ,x } \right \rangle ) { \mathop{\rm log} } \left ( {
 +
\frac{b _ {i} - \left \langle  {a  ^ {i} ,x } \right \rangle }{b _ {i} - \left \langle  {a  ^ {i} ,y } \right \rangle }
 +
} \right ) + \left \langle  {a  ^ {i} , x - y } \right \rangle \right ] .
 +
$$
  
All the Bregman functions in the above examples are zone coercive, except for iii) with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b110/b110890/b11089093.png" />, which is only boundary coercive.
+
All the Bregman functions in the above examples are zone coercive, except for iii) with $  \alpha = 1 $,  
 +
which is only boundary coercive.
  
 
Bregman functions are used in algorithms for convex feasibility problems and linearly constrained convex optimization (cf. [[Bregman distance|Bregman distance]]), as well as for generalizations of the proximal point method for convex optimization (cf. [[Proximal point methods in mathematical programming|Proximal point methods in mathematical programming]]).
 
Bregman functions are used in algorithms for convex feasibility problems and linearly constrained convex optimization (cf. [[Bregman distance|Bregman distance]]), as well as for generalizations of the proximal point method for convex optimization (cf. [[Proximal point methods in mathematical programming|Proximal point methods in mathematical programming]]).

Latest revision as of 06:29, 30 May 2020


Let $ C $ be a closed convex subset of $ \mathbf R ^ {n} $ and $ C ^ {o} $ its interior. Consider a real-valued convex function (cf. also convex function (of a real variable)) $ f $ whose effective domain contains $ C $ and let $ {D _ {f} } : {C \times C ^ {o} } \rightarrow \mathbf R $ be defined as

$$ D _ {f} ( x,y ) = f ( x ) - f ( y ) - \left \langle {\nabla f ( y ) ,x - y } \right \rangle . $$

$ f $ is said to be a Bregman function with zone $ C $( and $ D _ {f} $ the Bregman distance induced by $ f $) if the following conditions hold:

B1) $ f $ is continuously differentiable on $ C ^ {o} $;

B2) $ f $ is strictly convex and continuous on $ C $;

B3) for all $ \delta \in \mathbf R $ the partial level sets $ \Gamma ( x, \delta ) = \{ {y \in C ^ {o} } : {D _ {f} ( x,y ) \leq \delta } \} $ are bounded for all $ x \in C $;

B4) if $ \{ y ^ {k} \} \subset C ^ {o} $ converges to $ y ^ {*} $, then $ D _ {f} ( y ^ {*} ,y ^ {k} ) $ converges to $ 0 $;

B5) if $ \{ x ^ {k} \} \subset C $ and $ \{ y ^ {k} \} \subset C ^ {o} $ are sequences such that $ \{ x ^ {k} \} $ is bounded, $ {\lim\limits } _ {k \rightarrow \infty } y ^ {k} = y ^ {*} $ and $ {\lim\limits } _ {k \rightarrow \infty } D _ {f} ( x ^ {k} ,y ^ {k} ) = 0 $, then $ {\lim\limits } _ {k \rightarrow \infty } x ^ {k} = y ^ {*} $.

Bregman functions were introduced in [a1]. B4) and B5) hold automatically when $ x ^ {k} ,y ^ {*} $ are in $ C ^ {o} $, as a consequence of B1), B2) and B3), and so they need to be checked only at points on the boundary $ \partial C $ of $ C $. When $ C = \mathbf R ^ {n} $, a sufficient condition for a strictly convex differentiable function $ f $ to be a Bregman function is

$$ {\lim\limits } _ {\left \| x \right \| \rightarrow \infty } { \frac{f ( x ) }{\left \| x \right \| } } = \infty $$

(see [a2]).

A Bregman function $ f $ is said to be boundary coercive if for all $ \{ y ^ {k} \} \subset C ^ {o} $ such that $ {\lim\limits } _ {k \rightarrow \infty } y ^ {k} = y \in \partial C $ one has $ {\lim\limits } _ {k \rightarrow \infty } D _ {g} ( x,y ^ {k} ) = \infty $ for all $ x \in C ^ {o} $, and zone coercive if the image of $ C ^ {o} $ under $ \nabla f $ is equal to $ \mathbf R ^ {n} $. Zone coerciveness implies boundary coerciveness (see [a3]). These notions are closely related to essential smoothness, as defined in [a5]. For a boundary-coercive Bregman function $ f $ the zone $ C $ is uniquely determined from $ f $, i.e. $ f $ cannot be finitely extended outside $ C $. This property is essential in most applications of Bregman functions.

Examples.

$ \mathbf R ^ {n} _ {+} $ denotes the non-negative orthant of $ \mathbf R ^ {n} $.

i) $ C = \mathbf R ^ {n} $, $ f ( x ) = \| x \| ^ {2} $. In this case $ D _ {f} ( x,y ) = \| {x - y } \| ^ {2} $. More generally, $ f ( x ) = x ^ {t} Mx $, with $ M \in \mathbf R ^ {n \times n } $ symmetric and positive definite, in which case $ D _ {f} ( x,y ) = ( x - y ) ^ {t} M ( x - y ) $.

ii) $ C = \mathbf R _ {+} ^ {n} $, $ f ( x ) = \sum _ {j = 1 } ^ {n} x _ {j} { \mathop{\rm log} } x _ {j} $, extended by continuity to $ \partial \mathbf R _ {+} ^ {n} $ with the convention that $ 0 { \mathop{\rm log} } 0 = 0 $. In this case

$$ D _ {f} ( x,y ) = \sum _ {j = 1 } ^ { n } \left ( x _ {j} { \mathop{\rm log} } { \frac{x _ {j} }{y _ {j} } } + y _ {j} - x _ {j} \right ) , $$

which is the Kullback–Leibler information divergence, widely used in statistics (see Kullback–Leibler-type distance measures; [a4]).

iii) $ C = \mathbf R _ {+} ^ {n} $, $ f ( x ) = \sum _ {j = 1 } ^ {n} ( x _ {j} ^ \alpha - x _ {j} ^ \beta ) $ with $ \alpha \geq 1 $, $ 0 < \beta < 1 $. For $ \alpha = 2 $, $ \beta = {1 / 2 } $ one has

$$ D _ {f} ( x,y ) = \left \| {x - y } \right \| ^ {2} + \sum _ {j = 1 } ^ { n } \left [ { \frac{( \sqrt {x _ {j} } - \sqrt {y _ {j} } ) ^ {2} }{2 \sqrt {y _ {j} } } } \right ] , $$

and for $ \alpha = 1 $, $ \beta = 1/2 $ one has

$$ D _ {f} ( x,y ) = \sum _ {j = 1 } ^ { n } \left [ { \frac{( \sqrt {x _ {j} } - \sqrt {y _ {j} } ) ^ {2} }{2 \sqrt {y _ {j} } } } \right ] . $$

iv) $ C $ is a box (i.e., $ C = [ a _ {1} ,b _ {1} ] \times \dots \times [ a _ {n} ,b _ {n} ] $ with $ a _ {j} < b _ {j} $, $ 1 \leq j \leq n $),

$$ f ( x ) = $$

$$ = \sum _ {j = 1 } ^ { n } [ ( x _ {j} - a _ {j} ) { \mathop{\rm log} } ( x _ {j} - a _ {j} ) + ( b _ {j} - x _ {j} ) { \mathop{\rm log} } ( b _ {j} - x _ {j} ) ] . $$

In this case

$$ D _ {f} ( x,y ) = $$

$$ = \sum _ {j = 1 } ^ { n } ( x _ {j} - a _ {j} ) { \mathop{\rm log} } \left ( { \frac{x _ {j} - a _ {j} }{y _ {j} - a _ {j} } } \right ) + $$

$$ + \sum _ {j = 1 } ^ { n } ( b _ {j} - x _ {j} ) { \mathop{\rm log} } \left ( { \frac{b _ {j} - x _ {j} }{b _ {j} - y _ {j} } } \right ) . $$

v) $ C $ is a polyhedron with non-empty interior (i.e., $ C = \{ {x \in \mathbf R ^ {n} } : {Ax \leq b } \} $ with $ A \in \mathbf R ^ {m \times n } $, $ b \in \mathbf R ^ {n} $ and $ { \mathop{\rm rank} } ( A ) = n $( so that $ m \geq n $)), $ f ( x ) = \sum _ {i = 1 } ^ {m} ( b _ {i} - \langle {a ^ {i} ,x } \rangle ) { \mathop{\rm log} } ( b _ {i} - \langle {a ^ {i} ,x } \rangle ) $, where $ a ^ {i} $( $ 1 \leq i \leq m $) are the rows of $ A $. In this case

$$ D _ {f} ( x,y ) = $$

$$ = \sum _ {i = 1 } ^ { m } \left [ ( b _ {i} - \left \langle {a ^ {i} ,x } \right \rangle ) { \mathop{\rm log} } \left ( { \frac{b _ {i} - \left \langle {a ^ {i} ,x } \right \rangle }{b _ {i} - \left \langle {a ^ {i} ,y } \right \rangle } } \right ) + \left \langle {a ^ {i} , x - y } \right \rangle \right ] . $$

All the Bregman functions in the above examples are zone coercive, except for iii) with $ \alpha = 1 $, which is only boundary coercive.

Bregman functions are used in algorithms for convex feasibility problems and linearly constrained convex optimization (cf. Bregman distance), as well as for generalizations of the proximal point method for convex optimization (cf. Proximal point methods in mathematical programming).

References

[a1] L.M. Bregman, "The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming" USSR Comput. Math. Math. Phys. , 7 : 3 (1967) pp. 200–217 (In Russian)
[a2] A.R. de Pierro, A.N. Iusem, "A relaxed version of Bregman's method for convex programming" J. Optimization Th. Appl. , 51 (1986) pp. 421–440
[a3] A.N. Iusem, "On some properties of generalized proximal point methods for quadratic and linear programming" J. Optimization Th. Appl. , 85 (1995) pp. 593–612
[a4] F. Liese, I. Vajda, "Convex statistical distances" , Teubner (1987)
[a5] R.T. Rockafellar, "Convex analysis" , Princeton Univ. Press (1970)
How to Cite This Entry:
Bregman function. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bregman_function&oldid=16823
This article was adapted from an original article by A.N. Iusem (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article