Namespaces
Variants
Actions

Difference between revisions of "Proximal point methods in mathematical programming"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (fixing spaces)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The proximal point method for finding a zero of a maximal monotone [[Operator|operator]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102501.png" /> generates a sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102502.png" />, starting with any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102503.png" />, whose iteration formula is given by
+
<!--
 +
p1102501.png
 +
$#A+1 = 79 n = 0
 +
$#C+1 = 79 : ~/encyclopedia/old_files/data/P110/P.1100250 Proximal point methods in mathematical programming
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102504.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
{{TEX|auto}}
 +
{{TEX|done}}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102505.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102506.png" /> is a bounded sequence of positive real numbers. The origin of the method can be traced back to the study of regularization of ill-posed problems (see [[#References|[a3]]]; [[Regularization method|Regularization method]]; [[Ill-posed problems|Ill-posed problems]]) and is closely related to Moreau–Yoshida convolution (see [[#References|[a5]]]). A survey on the method can be found in [[#References|[a4]]]. The basic properties of the method, as established in [[#References|[a7]]], are as follows.
+
The proximal point method for finding a zero of a maximal monotone [[Operator|operator]] $  T : {\mathbf R  ^ {n} } \rightarrow { {\mathcal P} ( \mathbf R  ^ {n} ) } $
 +
generates a sequence  $  \{ x  ^ {k} \} $,  
 +
starting with any  $  x  ^ {0} \in \mathbf R  ^ {n} $,  
 +
whose iteration formula is given by
  
i) The sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102507.png" /> is well defined (in the sense that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102508.png" /> has a unique zero), converges to a zero of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p1102509.png" /> if such a zero exists, and is unbounded otherwise.
+
$$ \tag{a1 }
 +
0 \in T _ {k} ( x ^ {k + 1 } ) ,
 +
$$
  
ii) If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025010.png" /> is strongly monotone, then the convergence rate of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025011.png" /> is linear, and super-linear when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025012.png" />. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025013.png" /> is the [[Subdifferential|subdifferential]] of a convex polyhedral function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025014.png" /> (i.e. the epigraph of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025015.png" /> is a polyhedron), then convergence is finite.
+
where  $  T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( x - x  ^ {k} ) $
 +
and  $  \{ \lambda _ {k} \} $
 +
is a bounded sequence of positive real numbers. The origin of the method can be traced back to the study of regularization of ill-posed problems (see [[#References|[a3]]]; [[Regularization method|Regularization method]]; [[Ill-posed problems|Ill-posed problems]]) and is closely related to Moreau–Yoshida convolution (see [[#References|[a5]]]). A survey on the method can be found in [[#References|[a4]]]. The basic properties of the method, as established in [[#References|[a7]]], are as follows.
  
iii) Convergence properties are preserved under inexact computation of the zero of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025016.png" />, in the following sense: (a1) is replaced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025017.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025018.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025019.png" />.
+
i) The sequence  $  \{ x  ^ {k} \} $
 +
is well defined (in the sense that  $  T _ {k} $
 +
has a unique zero), converges to a zero of  $  T $
 +
if such a zero exists, and is unbounded otherwise.
  
When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025020.png" /> (the subdifferential of a convex function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025021.png" />), then (a1) is equivalent to
+
ii) If  $  T $
 +
is strongly monotone, then the convergence rate of  $  \{ x  ^ {k} \} $
 +
is linear, and super-linear when  $  {\lim\limits } _ {k \rightarrow \infty }  \lambda _ {k} = 0 $.  
 +
When  $  T $
 +
is the [[Subdifferential|subdifferential]] of a convex polyhedral function $  f $ (i.e. the [[epigraph]] of  $  f $
 +
is a polyhedron), then convergence is finite.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025022.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a2)</td></tr></table>
+
iii) Convergence properties are preserved under inexact computation of the zero of  $  T _ {k} $,
 +
in the following sense: (a1) is replaced by  $  \| {x ^ {k + 1 } - {\widetilde{x}  }  ^ {k} } \| \leq  \varepsilon _ {k} $,
 +
where  $  0 \in T _ {k} ( {\widetilde{x}  }  ^ {k} ) $
 +
and  $  \sum _ {k = 0 }  ^  \infty  \varepsilon _ {k} < \infty $.
  
and the zeros of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025023.png" /> are the minimizers of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025024.png" />, so that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025025.png" /> converges to a minimizer of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025026.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025027.png" /> is the subdifferential of the dual objective of a convex optimization problem, then the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025028.png" /> coincides with the dual sequence generated by the augmented Lagrangian method (see [[#References|[a6]]]).
+
When  $  T = \partial  f $ (the subdifferential of a convex function  $  f $),
 +
then (a1) is equivalent to
 +
 
 +
$$ \tag{a2 }
 +
x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ f ( x ) + {
 +
\frac{\lambda _ {k} }{2}
 +
} \left \| {x - x  ^ {k} } \right \|  ^ {2} \right \}
 +
$$
 +
 
 +
and the zeros of $  T $
 +
are the minimizers of $  f $,  
 +
so that $  \{ x  ^ {k} \} $
 +
converges to a minimizer of $  f $.  
 +
If $  T $
 +
is the subdifferential of the dual objective of a convex optimization problem, then the sequence $  \{ x  ^ {k} \} $
 +
coincides with the dual sequence generated by the augmented Lagrangian method (see [[#References|[a6]]]).
  
 
All these properties hold also for maximal monotone operators in a [[Hilbert space|Hilbert space]], in which case convergence is understood in the sense of the [[Weak topology|weak topology]].
 
All these properties hold also for maximal monotone operators in a [[Hilbert space|Hilbert space]], in which case convergence is understood in the sense of the [[Weak topology|weak topology]].
  
The proximal point method can be applied to problems with convex constraints, e.g. the variational inequality problem <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025030.png" />, for a closed and convex set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025031.png" />, which consists of finding a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025032.png" /> such that there exists an <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025033.png" /> satisfying <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025034.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025035.png" />. When <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025036.png" /> with convex <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025037.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025038.png" /> reduces to minimizing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025039.png" /> subject to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025040.png" />. For applying the proximal point method to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025041.png" /> it suffices to replace <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025042.png" /> by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025043.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025044.png" /> is the normal cone operator of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025045.png" />. In such a case, (a1) becomes equivalent to stating that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025046.png" /> solves <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025047.png" /> and (a2) becomes
+
The proximal point method can be applied to problems with convex constraints, e.g. the variational inequality problem $  { \mathop{\rm VI} } ( T,C ) $,  
 +
for a closed and convex set $  C \subset  \mathbf R  ^ {n} $,  
 +
which consists of finding a $  z \in C $
 +
such that there exists an $  u \in T ( z ) $
 +
satisfying $  \langle  {u,x - z } \rangle \geq  0 $
 +
for all $  x \in C $.  
 +
When $  T = \partial  f $
 +
with convex $  f $,
 +
$  { \mathop{\rm VI} } ( T,C ) $
 +
reduces to minimizing $  f ( x ) $
 +
subject to $  x \in C $.  
 +
For applying the proximal point method to $  { \mathop{\rm VI} } ( T,C ) $
 +
it suffices to replace $  T $
 +
by $  T + N _ {C} $,  
 +
where $  N _ {C} $
 +
is the normal cone operator of $  C $.  
 +
In such a case, (a1) becomes equivalent to stating that $  x ^ {k + 1 } $
 +
solves $  { \mathop{\rm VI} } ( T,C ) $
 +
and (a2) becomes
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025048.png" /></td> </tr></table>
+
$$
 +
x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ {f ( x ) + {
 +
\frac{\lambda _ {k} }{2}
 +
} \left \| {x - x  ^ {k} } \right \|  ^ {2} } : {x \in C } \right \} .
 +
$$
  
There is another way to deal with constrained problems in the context of the proximal point method, without introducing the normal cone of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025049.png" />, for the case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025050.png" /> has non-empty interior and when a boundary-coercive [[Bregman function|Bregman function]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025051.png" /> with zone <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025052.png" /> is available. In this case the method starts with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025053.png" /> (the interior of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025054.png" />) and the iterative formula is still (a1) but now <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025055.png" />. In the optimization case, i.e. <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025056.png" />, this formula is equivalent to
+
There is another way to deal with constrained problems in the context of the proximal point method, without introducing the normal cone of $  C $,  
 +
for the case when $  C $
 +
has non-empty interior and when a boundary-coercive [[Bregman function|Bregman function]] $  g $
 +
with zone $  C $
 +
is available. In this case the method starts with $  x  ^ {0} \in C  ^ {o} $ (the interior of $  C $)  
 +
and the iterative formula is still (a1) but now $  T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x  ^ {k} ) ) $.  
 +
In the optimization case, i.e. $  T = \partial  f $,  
 +
this formula is equivalent to
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025057.png" /></td> </tr></table>
+
$$
 +
x ^ {k + 1 } = { \mathop{\rm argmin} } \{ f ( x ) + \lambda _ {k} D _ {g} ( x,x  ^ {k} ) \}
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025058.png" /> is the [[Bregman distance|Bregman distance]] associated with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025059.png" />. This generalization of the proximal point method is an interior point algorithm, in the sense that the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025060.png" /> lies in the interior of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025061.png" /> (cf. also [[Interior-point methods in mathematical programming|Interior-point methods in mathematical programming]]). Under certain technical hypotheses on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025062.png" />, satisfied, e.g., when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025063.png" />, property i) above still holds, i.e. the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025064.png" /> converges to a solution of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025065.png" /> when a solution exists and is unbounded otherwise, [[#References|[a8]]]. A property of this generalization, not shared by the original proximal point method (a1), is that for certain problems, including [[Linear programming|linear programming]] and [[Quadratic programming|quadratic programming]], the limit of the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025066.png" /> is the solution of the problem which minimizes the Bregman distance to the initial iterate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025067.png" /> (see [[#References|[a2]]], [[#References|[a10]]]). This procedure also generates augmented Lagrangian-type methods when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025068.png" /> is the subdifferential of the dual objective of a convex optimization problem. For instance, when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025069.png" /> is the non-negative orthant of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025070.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025071.png" />, the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025072.png" /> coincides with the dual sequence of the Bertsekas exponential multipliers method (see [[#References|[a1]]], [[#References|[a11]]]).
+
where $  D _ {g} $
 +
is the [[Bregman distance|Bregman distance]] associated with $  g $.  
 +
This generalization of the proximal point method is an interior point algorithm, in the sense that the sequence $  \{ x  ^ {k} \} $
 +
lies in the interior of $  C $ (cf. also [[Interior-point methods in mathematical programming|Interior-point methods in mathematical programming]]). Under certain technical hypotheses on $  T $,  
 +
satisfied, e.g., when $  T = \partial  f $,  
 +
property i) above still holds, i.e. the sequence $  \{ x  ^ {k} \} $
 +
converges to a solution of $  { \mathop{\rm VI} } ( T,C ) $
 +
when a solution exists and is unbounded otherwise, [[#References|[a8]]]. A property of this generalization, not shared by the original proximal point method (a1), is that for certain problems, including [[Linear programming|linear programming]] and [[Quadratic programming|quadratic programming]], the limit of the sequence $  \{ x  ^ {k} \} $
 +
is the solution of the problem which minimizes the Bregman distance to the initial iterate $  x  ^ {0} $ (see [[#References|[a2]]], [[#References|[a10]]]). This procedure also generates augmented Lagrangian-type methods when $  T $
 +
is the subdifferential of the dual objective of a convex optimization problem. For instance, when $  C $
 +
is the non-negative orthant of $  \mathbf R  ^ {n} $
 +
and $  g ( x ) = \sum _ {j = 1 }  ^ {n} x _ {j} { \mathop{\rm log} } x _ {j} $,  
 +
the sequence $  \{ x  ^ {k} \} $
 +
coincides with the dual sequence of the Bertsekas exponential multipliers method (see [[#References|[a1]]], [[#References|[a11]]]).
  
Finally, some results are available for the proximal point method with Bregman distances under inexact solutions of the subproblems, in the spirit of property iii) above, [[#References|[a9]]]. Convergence is guaranteed for the sequence generated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025073.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025074.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025075.png" /> and, for any monotone operator <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025076.png" /> and any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025077.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025078.png" /> is defined as
+
Finally, some results are available for the proximal point method with Bregman distances under inexact solutions of the subproblems, in the spirit of property iii) above, [[#References|[a9]]]. Convergence is guaranteed for the sequence generated by 0 \in {\widehat{T}  } _ {k} ( x ^ {k + 1 } ) $,  
 +
with $  {\widehat{T}  } _ {k} ( x ) = T ^ {\varepsilon _ {k} } ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x  ^ {k} ) ) $,  
 +
where $  \sum _ {k = 0 }  ^  \infty  \varepsilon _ {k} < \infty $
 +
and, for any monotone operator $  T $
 +
and any $  \varepsilon \geq  0 $,  
 +
$  T  ^  \varepsilon  $
 +
is defined as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025079.png" /></td> </tr></table>
+
$$
 +
T  ^  \varepsilon  ( x ) = \{ {u \in \mathbf R  ^ {n} } :  
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/p/p110/p110250/p11025080.png" /></td> </tr></table>
+
$$
 +
\
 +
{} {\left \langle  {u - v,x - y } \right \rangle \geq  - \varepsilon  \textrm{ for  all  }  y \in \mathbf R  ^ {n} , v \in T ( y ) } \} .
 +
$$
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  D. Bertsekas,  "Constrained optimization and Lagrange multipliers" , Acad. Press  (1982)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  A.N. Iusem,  "On some properties of generalized proximal point methods for quadratic and linear programming"  ''J. Optimization Th. Appl.'' , '''85'''  (1995)  pp. 593–612</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  M.A. Krasnoselskii,  "Two observations about the method of successive approximations"  ''Uspekhi Mat. Nauk'' , '''10'''  (1955)  pp. 123–127  (In Russian)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  B. Lemaire,  "The proximal algorithm"  J.P. Penot (ed.) , ''Internat. Ser. Numer. Math.'' , '''87''' , Birkhäuser  (1989)  pp. 73–87</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  J. Moreau,  "Proximité et dualité dans un espace hilbertien"  ''Bull. Soc. Math. France'' , '''93'''  (1965)  pp. 273–299</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  R.T. Rockafellar,  "Augmented Lagrangians and applications of the proximal point algorithm in convex programming"  ''Math. of Oper. Res.'' , '''1'''  (1976)  pp. 97–116</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  R.T. Rockafellar,  "Monotone operators and the proximal point algorithm"  ''SIAM J. Control Optim.'' , '''14'''  (1976)  pp. 877–898</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  R.S. Burachik,  A.N. Iusem,  "A generalized proximal point algorithm for the variational inequality in a Hilbert space"  ''SIAM J. Optim.''  (to appear)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  R.S. Burachik,  A.N. Iusem,  B.F. Svaiter,  "Enlargement of monotone operators with applications to variational inequalities"  ''Set Valued Anal.''  (to appear)</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  A.N. Iusem,  "On some properties of generalized proximal point methods for variational inequalities"  ''J. Optim. Th. &amp; Appl.''  (to appear)</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  A.N. Iusem,  B.F. Svaiter,  M. Teboulle,  "Entropy-like proximal methods in convex programming"  ''Math. of Oper. Res.'' , '''19'''  (1994)  pp. 790–814</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  D. Bertsekas,  "Constrained optimization and Lagrange multipliers" , Acad. Press  (1982)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  A.N. Iusem,  "On some properties of generalized proximal point methods for quadratic and linear programming"  ''J. Optimization Th. Appl.'' , '''85'''  (1995)  pp. 593–612</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  M.A. Krasnoselskii,  "Two observations about the method of successive approximations"  ''Uspekhi Mat. Nauk'' , '''10'''  (1955)  pp. 123–127  (In Russian)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  B. Lemaire,  "The proximal algorithm"  J.P. Penot (ed.) , ''Internat. Ser. Numer. Math.'' , '''87''' , Birkhäuser  (1989)  pp. 73–87</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  J. Moreau,  "Proximité et dualité dans un espace hilbertien"  ''Bull. Soc. Math. France'' , '''93'''  (1965)  pp. 273–299</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  R.T. Rockafellar,  "Augmented Lagrangians and applications of the proximal point algorithm in convex programming"  ''Math. of Oper. Res.'' , '''1'''  (1976)  pp. 97–116</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  R.T. Rockafellar,  "Monotone operators and the proximal point algorithm"  ''SIAM J. Control Optim.'' , '''14'''  (1976)  pp. 877–898</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  R.S. Burachik,  A.N. Iusem,  "A generalized proximal point algorithm for the variational inequality in a Hilbert space"  ''SIAM J. Optim.''  (to appear)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  R.S. Burachik,  A.N. Iusem,  B.F. Svaiter,  "Enlargement of monotone operators with applications to variational inequalities"  ''Set Valued Anal.''  (to appear)</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  A.N. Iusem,  "On some properties of generalized proximal point methods for variational inequalities"  ''J. Optim. Th. &amp; Appl.''  (to appear)</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  A.N. Iusem,  B.F. Svaiter,  M. Teboulle,  "Entropy-like proximal methods in convex programming"  ''Math. of Oper. Res.'' , '''19'''  (1994)  pp. 790–814</TD></TR></table>

Latest revision as of 07:55, 4 March 2022


The proximal point method for finding a zero of a maximal monotone operator $ T : {\mathbf R ^ {n} } \rightarrow { {\mathcal P} ( \mathbf R ^ {n} ) } $ generates a sequence $ \{ x ^ {k} \} $, starting with any $ x ^ {0} \in \mathbf R ^ {n} $, whose iteration formula is given by

$$ \tag{a1 } 0 \in T _ {k} ( x ^ {k + 1 } ) , $$

where $ T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( x - x ^ {k} ) $ and $ \{ \lambda _ {k} \} $ is a bounded sequence of positive real numbers. The origin of the method can be traced back to the study of regularization of ill-posed problems (see [a3]; Regularization method; Ill-posed problems) and is closely related to Moreau–Yoshida convolution (see [a5]). A survey on the method can be found in [a4]. The basic properties of the method, as established in [a7], are as follows.

i) The sequence $ \{ x ^ {k} \} $ is well defined (in the sense that $ T _ {k} $ has a unique zero), converges to a zero of $ T $ if such a zero exists, and is unbounded otherwise.

ii) If $ T $ is strongly monotone, then the convergence rate of $ \{ x ^ {k} \} $ is linear, and super-linear when $ {\lim\limits } _ {k \rightarrow \infty } \lambda _ {k} = 0 $. When $ T $ is the subdifferential of a convex polyhedral function $ f $ (i.e. the epigraph of $ f $ is a polyhedron), then convergence is finite.

iii) Convergence properties are preserved under inexact computation of the zero of $ T _ {k} $, in the following sense: (a1) is replaced by $ \| {x ^ {k + 1 } - {\widetilde{x} } ^ {k} } \| \leq \varepsilon _ {k} $, where $ 0 \in T _ {k} ( {\widetilde{x} } ^ {k} ) $ and $ \sum _ {k = 0 } ^ \infty \varepsilon _ {k} < \infty $.

When $ T = \partial f $ (the subdifferential of a convex function $ f $), then (a1) is equivalent to

$$ \tag{a2 } x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ f ( x ) + { \frac{\lambda _ {k} }{2} } \left \| {x - x ^ {k} } \right \| ^ {2} \right \} $$

and the zeros of $ T $ are the minimizers of $ f $, so that $ \{ x ^ {k} \} $ converges to a minimizer of $ f $. If $ T $ is the subdifferential of the dual objective of a convex optimization problem, then the sequence $ \{ x ^ {k} \} $ coincides with the dual sequence generated by the augmented Lagrangian method (see [a6]).

All these properties hold also for maximal monotone operators in a Hilbert space, in which case convergence is understood in the sense of the weak topology.

The proximal point method can be applied to problems with convex constraints, e.g. the variational inequality problem $ { \mathop{\rm VI} } ( T,C ) $, for a closed and convex set $ C \subset \mathbf R ^ {n} $, which consists of finding a $ z \in C $ such that there exists an $ u \in T ( z ) $ satisfying $ \langle {u,x - z } \rangle \geq 0 $ for all $ x \in C $. When $ T = \partial f $ with convex $ f $, $ { \mathop{\rm VI} } ( T,C ) $ reduces to minimizing $ f ( x ) $ subject to $ x \in C $. For applying the proximal point method to $ { \mathop{\rm VI} } ( T,C ) $ it suffices to replace $ T $ by $ T + N _ {C} $, where $ N _ {C} $ is the normal cone operator of $ C $. In such a case, (a1) becomes equivalent to stating that $ x ^ {k + 1 } $ solves $ { \mathop{\rm VI} } ( T,C ) $ and (a2) becomes

$$ x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ {f ( x ) + { \frac{\lambda _ {k} }{2} } \left \| {x - x ^ {k} } \right \| ^ {2} } : {x \in C } \right \} . $$

There is another way to deal with constrained problems in the context of the proximal point method, without introducing the normal cone of $ C $, for the case when $ C $ has non-empty interior and when a boundary-coercive Bregman function $ g $ with zone $ C $ is available. In this case the method starts with $ x ^ {0} \in C ^ {o} $ (the interior of $ C $) and the iterative formula is still (a1) but now $ T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x ^ {k} ) ) $. In the optimization case, i.e. $ T = \partial f $, this formula is equivalent to

$$ x ^ {k + 1 } = { \mathop{\rm argmin} } \{ f ( x ) + \lambda _ {k} D _ {g} ( x,x ^ {k} ) \} $$

where $ D _ {g} $ is the Bregman distance associated with $ g $. This generalization of the proximal point method is an interior point algorithm, in the sense that the sequence $ \{ x ^ {k} \} $ lies in the interior of $ C $ (cf. also Interior-point methods in mathematical programming). Under certain technical hypotheses on $ T $, satisfied, e.g., when $ T = \partial f $, property i) above still holds, i.e. the sequence $ \{ x ^ {k} \} $ converges to a solution of $ { \mathop{\rm VI} } ( T,C ) $ when a solution exists and is unbounded otherwise, [a8]. A property of this generalization, not shared by the original proximal point method (a1), is that for certain problems, including linear programming and quadratic programming, the limit of the sequence $ \{ x ^ {k} \} $ is the solution of the problem which minimizes the Bregman distance to the initial iterate $ x ^ {0} $ (see [a2], [a10]). This procedure also generates augmented Lagrangian-type methods when $ T $ is the subdifferential of the dual objective of a convex optimization problem. For instance, when $ C $ is the non-negative orthant of $ \mathbf R ^ {n} $ and $ g ( x ) = \sum _ {j = 1 } ^ {n} x _ {j} { \mathop{\rm log} } x _ {j} $, the sequence $ \{ x ^ {k} \} $ coincides with the dual sequence of the Bertsekas exponential multipliers method (see [a1], [a11]).

Finally, some results are available for the proximal point method with Bregman distances under inexact solutions of the subproblems, in the spirit of property iii) above, [a9]. Convergence is guaranteed for the sequence generated by $ 0 \in {\widehat{T} } _ {k} ( x ^ {k + 1 } ) $, with $ {\widehat{T} } _ {k} ( x ) = T ^ {\varepsilon _ {k} } ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x ^ {k} ) ) $, where $ \sum _ {k = 0 } ^ \infty \varepsilon _ {k} < \infty $ and, for any monotone operator $ T $ and any $ \varepsilon \geq 0 $, $ T ^ \varepsilon $ is defined as

$$ T ^ \varepsilon ( x ) = \{ {u \in \mathbf R ^ {n} } : $$

$$ \ {} {\left \langle {u - v,x - y } \right \rangle \geq - \varepsilon \textrm{ for all } y \in \mathbf R ^ {n} , v \in T ( y ) } \} . $$

References

[a1] D. Bertsekas, "Constrained optimization and Lagrange multipliers" , Acad. Press (1982)
[a2] A.N. Iusem, "On some properties of generalized proximal point methods for quadratic and linear programming" J. Optimization Th. Appl. , 85 (1995) pp. 593–612
[a3] M.A. Krasnoselskii, "Two observations about the method of successive approximations" Uspekhi Mat. Nauk , 10 (1955) pp. 123–127 (In Russian)
[a4] B. Lemaire, "The proximal algorithm" J.P. Penot (ed.) , Internat. Ser. Numer. Math. , 87 , Birkhäuser (1989) pp. 73–87
[a5] J. Moreau, "Proximité et dualité dans un espace hilbertien" Bull. Soc. Math. France , 93 (1965) pp. 273–299
[a6] R.T. Rockafellar, "Augmented Lagrangians and applications of the proximal point algorithm in convex programming" Math. of Oper. Res. , 1 (1976) pp. 97–116
[a7] R.T. Rockafellar, "Monotone operators and the proximal point algorithm" SIAM J. Control Optim. , 14 (1976) pp. 877–898
[a8] R.S. Burachik, A.N. Iusem, "A generalized proximal point algorithm for the variational inequality in a Hilbert space" SIAM J. Optim. (to appear)
[a9] R.S. Burachik, A.N. Iusem, B.F. Svaiter, "Enlargement of monotone operators with applications to variational inequalities" Set Valued Anal. (to appear)
[a10] A.N. Iusem, "On some properties of generalized proximal point methods for variational inequalities" J. Optim. Th. & Appl. (to appear)
[a11] A.N. Iusem, B.F. Svaiter, M. Teboulle, "Entropy-like proximal methods in convex programming" Math. of Oper. Res. , 19 (1994) pp. 790–814
How to Cite This Entry:
Proximal point methods in mathematical programming. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Proximal_point_methods_in_mathematical_programming&oldid=14222
This article was adapted from an original article by A.N. Iusem (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article