# Proximal point methods in mathematical programming

The proximal point method for finding a zero of a maximal monotone operator $ T : {\mathbf R ^ {n} } \rightarrow { {\mathcal P} ( \mathbf R ^ {n} ) } $
generates a sequence $ \{ x ^ {k} \} $,
starting with any $ x ^ {0} \in \mathbf R ^ {n} $,
whose iteration formula is given by

$$ \tag{a1 } 0 \in T _ {k} ( x ^ {k + 1 } ) , $$

where $ T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( x - x ^ {k} ) $ and $ \{ \lambda _ {k} \} $ is a bounded sequence of positive real numbers. The origin of the method can be traced back to the study of regularization of ill-posed problems (see [a3]; Regularization method; Ill-posed problems) and is closely related to Moreau–Yoshida convolution (see [a5]). A survey on the method can be found in [a4]. The basic properties of the method, as established in [a7], are as follows.

i) The sequence $ \{ x ^ {k} \} $ is well defined (in the sense that $ T _ {k} $ has a unique zero), converges to a zero of $ T $ if such a zero exists, and is unbounded otherwise.

ii) If $ T $ is strongly monotone, then the convergence rate of $ \{ x ^ {k} \} $ is linear, and super-linear when $ {\lim\limits } _ {k \rightarrow \infty } \lambda _ {k} = 0 $. When $ T $ is the subdifferential of a convex polyhedral function $ f $( i.e. the epigraph of $ f $ is a polyhedron), then convergence is finite.

iii) Convergence properties are preserved under inexact computation of the zero of $ T _ {k} $, in the following sense: (a1) is replaced by $ \| {x ^ {k + 1 } - {\widetilde{x} } ^ {k} } \| \leq \varepsilon _ {k} $, where $ 0 \in T _ {k} ( {\widetilde{x} } ^ {k} ) $ and $ \sum _ {k = 0 } ^ \infty \varepsilon _ {k} < \infty $.

When $ T = \partial f $( the subdifferential of a convex function $ f $), then (a1) is equivalent to

$$ \tag{a2 } x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ f ( x ) + { \frac{\lambda _ {k} }{2} } \left \| {x - x ^ {k} } \right \| ^ {2} \right \} $$

and the zeros of $ T $ are the minimizers of $ f $, so that $ \{ x ^ {k} \} $ converges to a minimizer of $ f $. If $ T $ is the subdifferential of the dual objective of a convex optimization problem, then the sequence $ \{ x ^ {k} \} $ coincides with the dual sequence generated by the augmented Lagrangian method (see [a6]).

All these properties hold also for maximal monotone operators in a Hilbert space, in which case convergence is understood in the sense of the weak topology.

The proximal point method can be applied to problems with convex constraints, e.g. the variational inequality problem $ { \mathop{\rm VI} } ( T,C ) $, for a closed and convex set $ C \subset \mathbf R ^ {n} $, which consists of finding a $ z \in C $ such that there exists an $ u \in T ( z ) $ satisfying $ \langle {u,x - z } \rangle \geq 0 $ for all $ x \in C $. When $ T = \partial f $ with convex $ f $, $ { \mathop{\rm VI} } ( T,C ) $ reduces to minimizing $ f ( x ) $ subject to $ x \in C $. For applying the proximal point method to $ { \mathop{\rm VI} } ( T,C ) $ it suffices to replace $ T $ by $ T + N _ {C} $, where $ N _ {C} $ is the normal cone operator of $ C $. In such a case, (a1) becomes equivalent to stating that $ x ^ {k + 1 } $ solves $ { \mathop{\rm VI} } ( T,C ) $ and (a2) becomes

$$ x ^ {k + 1 } = { \mathop{\rm argmin} } \left \{ {f ( x ) + { \frac{\lambda _ {k} }{2} } \left \| {x - x ^ {k} } \right \| ^ {2} } : {x \in C } \right \} . $$

There is another way to deal with constrained problems in the context of the proximal point method, without introducing the normal cone of $ C $, for the case when $ C $ has non-empty interior and when a boundary-coercive Bregman function $ g $ with zone $ C $ is available. In this case the method starts with $ x ^ {0} \in C ^ {o} $( the interior of $ C $) and the iterative formula is still (a1) but now $ T _ {k} ( x ) = T ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x ^ {k} ) ) $. In the optimization case, i.e. $ T = \partial f $, this formula is equivalent to

$$ x ^ {k + 1 } = { \mathop{\rm argmin} } \{ f ( x ) + \lambda _ {k} D _ {g} ( x,x ^ {k} ) \} $$

where $ D _ {g} $ is the Bregman distance associated with $ g $. This generalization of the proximal point method is an interior point algorithm, in the sense that the sequence $ \{ x ^ {k} \} $ lies in the interior of $ C $( cf. also Interior-point methods in mathematical programming). Under certain technical hypotheses on $ T $, satisfied, e.g., when $ T = \partial f $, property i) above still holds, i.e. the sequence $ \{ x ^ {k} \} $ converges to a solution of $ { \mathop{\rm VI} } ( T,C ) $ when a solution exists and is unbounded otherwise, [a8]. A property of this generalization, not shared by the original proximal point method (a1), is that for certain problems, including linear programming and quadratic programming, the limit of the sequence $ \{ x ^ {k} \} $ is the solution of the problem which minimizes the Bregman distance to the initial iterate $ x ^ {0} $( see [a2], [a10]). This procedure also generates augmented Lagrangian-type methods when $ T $ is the subdifferential of the dual objective of a convex optimization problem. For instance, when $ C $ is the non-negative orthant of $ \mathbf R ^ {n} $ and $ g ( x ) = \sum _ {j = 1 } ^ {n} x _ {j} { \mathop{\rm log} } x _ {j} $, the sequence $ \{ x ^ {k} \} $ coincides with the dual sequence of the Bertsekas exponential multipliers method (see [a1], [a11]).

Finally, some results are available for the proximal point method with Bregman distances under inexact solutions of the subproblems, in the spirit of property iii) above, [a9]. Convergence is guaranteed for the sequence generated by $ 0 \in {\widehat{T} } _ {k} ( x ^ {k + 1 } ) $, with $ {\widehat{T} } _ {k} ( x ) = T ^ {\varepsilon _ {k} } ( x ) + \lambda _ {k} ( \nabla g ( x ) - \nabla g ( x ^ {k} ) ) $, where $ \sum _ {k = 0 } ^ \infty \varepsilon _ {k} < \infty $ and, for any monotone operator $ T $ and any $ \varepsilon \geq 0 $, $ T ^ \varepsilon $ is defined as

$$ T ^ \varepsilon ( x ) = \{ {u \in \mathbf R ^ {n} } : $$

$$ \ {} {\left \langle {u - v,x - y } \right \rangle \geq - \varepsilon \textrm{ for all } y \in \mathbf R ^ {n} , v \in T ( y ) } \} . $$

#### References

[a1] | D. Bertsekas, "Constrained optimization and Lagrange multipliers" , Acad. Press (1982) |

[a2] | A.N. Iusem, "On some properties of generalized proximal point methods for quadratic and linear programming" J. Optimization Th. Appl. , 85 (1995) pp. 593–612 |

[a3] | M.A. Krasnoselskii, "Two observations about the method of successive approximations" Uspekhi Mat. Nauk , 10 (1955) pp. 123–127 (In Russian) |

[a4] | B. Lemaire, "The proximal algorithm" J.P. Penot (ed.) , Internat. Ser. Numer. Math. , 87 , Birkhäuser (1989) pp. 73–87 |

[a5] | J. Moreau, "Proximité et dualité dans un espace hilbertien" Bull. Soc. Math. France , 93 (1965) pp. 273–299 |

[a6] | R.T. Rockafellar, "Augmented Lagrangians and applications of the proximal point algorithm in convex programming" Math. of Oper. Res. , 1 (1976) pp. 97–116 |

[a7] | R.T. Rockafellar, "Monotone operators and the proximal point algorithm" SIAM J. Control Optim. , 14 (1976) pp. 877–898 |

[a8] | R.S. Burachik, A.N. Iusem, "A generalized proximal point algorithm for the variational inequality in a Hilbert space" SIAM J. Optim. (to appear) |

[a9] | R.S. Burachik, A.N. Iusem, B.F. Svaiter, "Enlargement of monotone operators with applications to variational inequalities" Set Valued Anal. (to appear) |

[a10] | A.N. Iusem, "On some properties of generalized proximal point methods for variational inequalities" J. Optim. Th. & Appl. (to appear) |

[a11] | A.N. Iusem, B.F. Svaiter, M. Teboulle, "Entropy-like proximal methods in convex programming" Math. of Oper. Res. , 19 (1994) pp. 790–814 |

**How to Cite This Entry:**

Proximal point methods in mathematical programming.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Proximal_point_methods_in_mathematical_programming&oldid=48336