Difference between revisions of "Broyden-Fletcher-Goldfarb-Shanno method"

Latest revision as of 07:21, 13 February 2024

BFGS method

The unconstrained optimization problem is to minimize a real-valued function $f$ of $N$ variables. That is, to find a local minimizer, i.e. a point $x ^ { * }$ such that

\begin{equation} \tag{a1} f ( x ^ { * } ) \leq f ( x ) \text { for all } x \text{ near } x ^ { * }; \end{equation}

$f$ is typically called the cost function or objective function.

A classic approach for doing this is the method of steepest descent (cf. Steepest descent, method of). The iteration, here described in terms of the transition from a current approximation $x _ { c }$ to a local minimizer $x ^ { * }$, to an update (and hopefully better) approximation is

\begin{equation} \tag{a2} x _ { + } = x _ { c } - \lambda \nabla f ( x _ { c } ). \end{equation}

By elementary calculus, $- \nabla f ( x _ { c } )$ is the direction of most rapid decrease (steepest descent) in $f$ starting from $x _ { c }$. The step length $\lambda$ must be a part of the algorithm in order to ensure that $f ( x _ { + } ) < f ( x _ { c } )$ (which must be so for a sufficiently small $\lambda$).

There are several methods for selecting an appropriate $\lambda$, [a3], [a7], for instance the classical Armijo rule, [a1], in which $\lambda = \beta ^ { m }$ for some $\beta \in ( 0,1 )$ and where $m \geq 0$ is the least integer such that the sufficient decrease condition

\begin{equation} \tag{a3} f ( x _ { c } + \lambda d ) \leq f ( x _ { c } ) + \alpha \lambda d ^ { T } \nabla f ( x _ { c } ) \end{equation}

holds. In (a3), $- \nabla f ( x _ { c } )$ is replaced by a general descent direction $d$, a vector satisfying $d ^ { T } \nabla f ( x _ { c } ) < 0$, and a parameter $\alpha$ has been introduced (typically $10 ^ { - 4 }$). One might think that simple decrease ($f ( x _ { c } + \lambda d ) < f ( x _ { c } )$) would suffice, and it usually does in practice. However, the theory requires (a3) to eliminate the possibility of stagnation.

The steepest descent iteration with the Armijo rule will, if $f$ is smooth and bounded away from $- \infty$, which is assumed throughout below, produce a sequence $\{ x _ { n } \}$ such that

\begin{equation} \tag{a4} \operatorname { lim } _ { n \rightarrow \infty } \nabla f ( x _ { n } ) = 0. \end{equation}

Hence, any limit point $x ^ { * }$ of the steepest descent sequence satisfies the first-order necessary conditions $\nabla f ( x ^ { * } ) = 0$. This does not imply convergence to a local minimizer and even if the steepest descent iterations do converge, and they often do, the convergence is usually very slow in the terminal phase when the iterates are near $x ^ { * }$. Scaling the steepest descent direction by multiplying it with the inverse of a symmetric positive-definite scaling matrix $H _ { c }$ can preserve the convergence of $\nabla f$ to zero and dramatically improve the convergence near a minimizer.

Now, if $x ^ { * }$ is a local minimizer that satisfies the standard assumptions, i.e. the Hessian of $f$, $\nabla ^ { 2 } f$, is symmetric, positive definite, and Lipschitz continuous near $x ^ { * }$, then the Newton method,

\begin{equation} \tag{a5} x _ { + } = x _ { c } - ( \nabla ^ { 2 } f ( x _ { c } ) ) ^ { - 1 } \nabla f ( x _ { c } ), \end{equation}

will converge rapidly to the minimizer if the initial iterate $x _ { 0 }$ is sufficiently near $x ^ { * }$, [a3], [a7]. However, when far from $x ^ { * }$, the Hessian need not be positive definite and cannot be used as a scaling matrix.

Quasi-Newton methods (cf. also Quasi-Newton method) update an approximation of $\nabla ^ { 2 } f ( x ^ { * } )$ as the iteration progresses. In general, the transition from current approximations $x _ { c }$ and $H _ { c }$ of $x ^ { * }$ and $\nabla f ( x ^ { * } )$ to new approximations $x_{+}$ and $H _ { + }$ is given (using a line search paradigm) by:

1) compute a search direction $d = - H _ { c } ^ { - 1 } \nabla f ( x _ { c } )$;

2) find $x _ { + } = x _ { c } + \lambda d$ using a line search to ensure sufficient decrease;

3) use $x _ { c }$, $x_{+}$, and $H _ { c }$ to update $H _ { c }$ and obtain $H _ { + }$. The way in which $H _ { + }$ is computed determines the method.

The BFGS method, [a2], [a4], [a5], [a6], updates $H _ { c }$ with the rank-two formula

\begin{equation} \tag{a6} H _ { + } = H _ { c } + \frac { y y ^ { T } } { y ^ { T } s } - \frac { ( H _ { c } s ) ( H _ { c } s ) ^ { T } } { s ^ { T } H _ { c } s }. \end{equation}

In (a6),

\begin{equation*} s = x _ { + } - x _ { c } , \quad y = \nabla f ( x _ { + } ) - \nabla f ( x _ { c } ). \end{equation*}

Local and global convergence theory.

The local convergence result, [a8], [a3], [a7], assumes that the standard assumptions hold and that the initial data is sufficiently good (i.e., $x _ { 0 }$ is near to $x ^ { * }$ and $H _ { 0 }$ is near $\nabla ^ { 2 } f ( x ^ { * } )$). Under these assumptions the BFGS iterates exist and converge $q$-superlinearly to $x ^ { * }$, i.e.,

\begin{equation} \tag{a7} \operatorname { lim } _ { n \rightarrow \infty } \frac { \left\| x _ { n + 1} - x ^ { * } \right\| } { \left\| x _ { n } - x ^ { * } \right\| } = 0. \end{equation}

By a global convergence theory one means that one does not assume that the initial data is good and one manages the poor data by making certain that $f$ is decreased as the iteration progresses. There are several variants of global convergence theorems for BFGS and related methods, [a9], [a11], [a10], [a12]. All these results make strong assumptions on the function and some require line search methods more complicated that the simple Armijo rule discussed above.

The result from [a9] is the most general and a special case illustrating the idea now follows. One must assume that the set

\begin{equation*} D = \{ x : f ( x ) \leq f ( x _ { 0 } ) \} \end{equation*}

is convex, $f$ is Lipschitz twice continuously differentiable in $D$, and that the Hessian and its inverse are uniformly bounded and positive definite in $D$. This assumption implies that $f$ has a unique minimizer $x ^ { * }$ in $D$ and that the standard assumptions hold. If $H _ { 0 }$ is symmetric and positive definite, the BFGS iterations, using the Armijo rule to enforce (a3), converge $q$-superlinearly to $x ^ { * }$.

Implementation.

The recursive implementation from [a7], described below, stores the history of the iteration and uses that information recursively to compute the action of $H _ { k } ^{- 1}$ on a vector. This idea was suggested in [a13], [a14], [a17]. Other implementations may be found in [a15], [a3], [a7], and [a16].

One can easily show that

\begin{equation} \tag{a8} H _ { + } ^ { - 1 } = \left( I - \frac { s y ^ { T } } { y ^ { T } s } \right) H _ { c } ^ { - 1 } \left( I - \frac { y s ^ { T } } { y ^ { T } s } \right) + \frac { s s ^ { T } } { y ^ { T } s }. \end{equation}

The algorithm $\operatorname{bfgsrec}$ below overwrites a given vector $d$ with $H _ { n}^{ - 1} d$. The storage needed is one vector for $d$ and $2 n$ vectors for the sequences $\{ s _ { k } , y _ { k } \} _ { k = 0 } ^ { n - 1 }$. A method for computing the product of $H _ { 0 } ^ { - 1 }$ and a vector must also be provided.

'

1 IF $n = 0$, $d = H _ { 0 } ^ { - 1 } d$; return

2 $\alpha = s _ { n-1 } ^ { T } d / y _ { n-1 } ^ { T }s$; $d = d - \alpha y _ { n - 1}$

call $ \operatorname{bfgsrec} ( n - 1 , \{ s _ { k } \} , \{ y _ { k } \} , H _ { 0 } ^ { - 1 } , d )$;

3 $d = d + ( \alpha - ( y _ { n-1 } ^ { T } { d } / y _ { n - 1 } ^ { T } s _ { n - 1 } ) s _ { n - 1 }$.

References
[a1] L. Armijo, "Minimization of functions having Lipschitz-continuous first partial derivatives" Pacific J. Math. , 16 (1966) pp. 1–3
[a2] C.G. Broyden, "A new double-rank minimization algorithm" Notices Amer. Math. Soc. , 16 (1969) pp. 670
[a3] J.E. Dennis, R.B. Schnabel, "Numerical Methods for Nonlinear Equations and Unconstrained Optimization" , Classics in Applied Math. : 16 , SIAM (Soc. Industrial Applied Math.) (1996)
[a4] R. Fletcher, "A new approach to variable metric methods" Comput. J. , 13 (1970) pp. 317–322
[a5] D. Goldfarb, "A family of variable metric methods derived by variational means" Math. Comp. , 24 (1970) pp. 23–26
[a6] D.F. Shanno, "Conditioning of quasi-Newton methods for function minimization" Math. Comp. , 24 (1970) pp. 647–657
[a7] C.T. Kelley, "Iterative methods for optimization" , Frontiers in Appl. Math. , 18 , SIAM (Soc. Industrial Applied Math.) (1999)
[a8] C.G. Broyden, J.E. Dennis, J.J. Moré, "On the local and superlinear convergence of quasi-Newton methods" J. Inst. Math. Appl. , 12 (1973) pp. 223–246
[a9] R.H. Byrd, J. Nocedal, "A tool for the analysis of quasi-Newton methods with application to unconstrained minimization" SIAM J. Numer. Anal. , 26 (1989) pp. 727–739
[a10] R.H. Byrd, J. Nocedal, Y. Yuan, "Global convergence of a class of quasi-Newton methods on convex problems" SIAM J. Numer. Anal. , 24 (1987) pp. 1171–1190
[a11] M.J.D. Powell, "Some global convergence properties of a variable metric algorithm without exact line searches" R. Cottle (ed.) C. Lemke (ed.) , Nonlinear Programming , Amer. Math. Soc. (1976) pp. 53–72
[a12] J. Werner, "Über die globale konvergenz von Variable-Metric Verfahren mit nichtexakter Schrittweitenbestimmung" Numer. Math. , 31 (1978) pp. 321–334
[a13] K.J. Bathe, A.P. Cimento, "Some practical procedures for the solution of nonlinear finite element equations" Comput. Meth. Appl. Mech. Eng. , 22 (1980) pp. 59–85
[a14] H. Matthies, G. Strang, "The solution of nonlinear finite element equations" Internat. J. Numerical Methods Eng. , 14 (1979) pp. 1613–1626
[a15] R.H. Byrd, J. Nocedal, R.B. Schnabel, "Representation of quasi-Newton matrices and their use in limited memory methods" Math. Progr. , 63 (1994) pp. 129–156
[a16] J.L. Nazareth, "Conjugate gradient methods less dependent on conjugacy" SIAM Review , 28 (1986) pp. 501–512
[a17] J. Nocedal, "Updating quasi-Newton matrices with limited storage" Math. Comp. , 35 (1980) pp. 773–782

How to Cite This Entry:
Broyden-Fletcher-Goldfarb-Shanno method. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Broyden-Fletcher-Goldfarb-Shanno_method&oldid=17420

This article was adapted from an original article by C.T. Kelley (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

@@ Line 1: / Line 1: @@
+<!--This article has been texified automatically. Since there was no Nroff source code for this article,
+the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
+was used.
+If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
+Out of 93 formulas, 93 were replaced by TEX code.-->
+{{TEX|semi-auto}}{{TEX|done}}
 ''BFGS method''
-The unconstrained optimization problem is to minimize a real-valued function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205101.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205102.png" /> variables. That is, to find a local minimizer, i.e. a point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205103.png" /> such that
+The unconstrained optimization problem is to minimize a real-valued function $f$ of $N$ variables. That is, to find a local minimizer, i.e. a point $x ^ { * }$ such that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205104.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+\begin{equation} \tag{a1} f ( x ^ { * } ) \leq f ( x ) \text { for all } x \text{ near } x ^ { * }; \end{equation}
-<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205105.png" /> is typically called the cost function or [[Objective function|objective function]].
+$f$ is typically called the cost function or [[Objective function|objective function]].
-A classic approach for doing this is the method of steepest descent (cf. [[Steepest descent, method of|Steepest descent, method of]]). The iteration, here described in terms of the transition from a current approximation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205106.png" /> to a local minimizer <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205107.png" />, to an update (and hopefully better) approximation is
+A classic approach for doing this is the method of steepest descent (cf. [[Steepest descent, method of|Steepest descent, method of]]). The iteration, here described in terms of the transition from a current approximation $x _ { c }$ to a local minimizer $x ^ { * }$, to an update (and hopefully better) approximation is
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205108.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a2)</td></tr></table>
+\begin{equation} \tag{a2} x _ { + } = x _ { c } - \lambda \nabla f ( x _ { c } ). \end{equation}
-By elementary calculus, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b1205109.png" /> is the direction of most rapid decrease (steepest descent) in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051010.png" /> starting from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051011.png" />. The step length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051012.png" /> must be a part of the algorithm in order to ensure that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051013.png" /> (which must be so for a sufficiently small <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051014.png" />).
+By elementary calculus, $- \nabla f ( x _ { c } )$ is the direction of most rapid decrease (steepest descent) in $f$ starting from $x _ { c }$. The step length $\lambda$ must be a part of the algorithm in order to ensure that $f ( x _ { + } ) < f ( x _ { c } )$ (which must be so for a sufficiently small $\lambda$).
-There are several methods for selecting an appropriate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051015.png" />, [[#References|[a3]]], [[#References|[a7]]], for instance the classical Armijo rule, [[#References|[a1]]], in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051016.png" /> for some <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051017.png" /> and where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051018.png" /> is the least integer such that the sufficient decrease condition
+There are several methods for selecting an appropriate $\lambda$, [[#References|[a3]]], [[#References|[a7]]], for instance the classical Armijo rule, [[#References|[a1]]], in which $\lambda = \beta ^ { m }$ for some $\beta \in ( 0,1 )$ and where $m \geq 0$ is the least integer such that the sufficient decrease condition
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051019.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a3)</td></tr></table>
+\begin{equation} \tag{a3} f ( x _ { c } + \lambda d ) \leq f ( x _ { c } ) + \alpha \lambda d ^ { T } \nabla f ( x _ { c } ) \end{equation}
-holds. In (a3), <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051020.png" /> is replaced by a general descent direction <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051021.png" />, a vector satisfying <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051022.png" />, and a parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051023.png" /> has been introduced (typically <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051024.png" />). One might think that simple decrease (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051025.png" />) would suffice, and it usually does in practice. However, the theory requires (a3) to eliminate the possibility of stagnation.
+holds. In (a3), $- \nabla f ( x _ { c } )$ is replaced by a general descent direction $d$, a vector satisfying $d ^ { T } \nabla f ( x _ { c } ) < 0$, and a parameter $\alpha$ has been introduced (typically $10 ^ { - 4 }$). One might think that simple decrease ($f ( x _ { c } + \lambda d ) < f ( x _ { c } )$) would suffice, and it usually does in practice. However, the theory requires (a3) to eliminate the possibility of stagnation.
-The steepest descent iteration with the Armijo rule will, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051026.png" /> is smooth and bounded away from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051027.png" />, which is assumed throughout below, produce a sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051028.png" /> such that
+The steepest descent iteration with the Armijo rule will, if $f$ is smooth and bounded away from $- \infty$, which is assumed throughout below, produce a sequence $\{ x _ { n } \}$ such that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051029.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a4)</td></tr></table>
+\begin{equation} \tag{a4} \operatorname { lim } _ { n \rightarrow \infty } \nabla f ( x _ { n } ) = 0. \end{equation}
-Hence, any limit point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051030.png" /> of the steepest descent sequence satisfies the first-order necessary conditions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051031.png" />. This does not imply convergence to a local minimizer and even if the steepest descent iterations do converge, and they often do, the convergence is usually very slow in the terminal phase when the iterates are near <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051032.png" />. Scaling the steepest descent direction by multiplying it with the inverse of a symmetric positive-definite scaling matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051033.png" /> can preserve the convergence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051034.png" /> to zero and dramatically improve the convergence near a minimizer.
+Hence, any limit point $x ^ { * }$ of the steepest descent sequence satisfies the first-order necessary conditions $\nabla f ( x ^ { * } ) = 0$. This does not imply convergence to a local minimizer and even if the steepest descent iterations do converge, and they often do, the convergence is usually very slow in the terminal phase when the iterates are near $x ^ { * }$. Scaling the steepest descent direction by multiplying it with the inverse of a symmetric positive-definite scaling matrix $H _ { c }$ can preserve the convergence of $\nabla f$ to zero and dramatically improve the convergence near a minimizer.
-Now, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051035.png" /> is a local minimizer that satisfies the standard assumptions, i.e. the Hessian of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051036.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051037.png" />, is symmetric, positive definite, and Lipschitz continuous near <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051038.png" />, then the [[Newton method|Newton method]],
+Now, if $x ^ { * }$ is a local minimizer that satisfies the standard assumptions, i.e. the Hessian of $f$, $\nabla ^ { 2 } f$, is symmetric, positive definite, and Lipschitz continuous near $x ^ { * }$, then the [[Newton method|Newton method]],
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051039.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a5)</td></tr></table>
+\begin{equation} \tag{a5} x _ { + } = x _ { c } - ( \nabla ^ { 2 } f ( x _ { c } ) ) ^ { - 1 } \nabla f ( x _ { c } ), \end{equation}
-will converge rapidly to the minimizer if the initial iterate <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051040.png" /> is sufficiently near <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051041.png" />, [[#References|[a3]]], [[#References|[a7]]]. However, when far from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051042.png" />, the Hessian need not be positive definite and cannot be used as a scaling matrix.
+will converge rapidly to the minimizer if the initial iterate $x _ { 0 }$ is sufficiently near $x ^ { * }$, [[#References|[a3]]], [[#References|[a7]]]. However, when far from $x ^ { * }$, the Hessian need not be positive definite and cannot be used as a scaling matrix.
-Quasi-Newton methods (cf. also [[Quasi-Newton method|Quasi-Newton method]]) update an approximation of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051043.png" /> as the iteration progresses. In general, the transition from current approximations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051044.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051045.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051046.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051047.png" /> to new approximations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051048.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051049.png" /> is given (using a line search paradigm) by:
+Quasi-Newton methods (cf. also [[Quasi-Newton method|Quasi-Newton method]]) update an approximation of $\nabla ^ { 2 } f ( x ^ { * } )$ as the iteration progresses. In general, the transition from current approximations $x _ { c }$ and $H _ { c }$ of $x ^ { * }$ and $\nabla f ( x ^ { * } )$ to new approximations $x_{+}$ and $H _ { + }$ is given (using a line search paradigm) by:
-) compute a search direction <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051050.png" />;
+) compute a search direction $d = - H _ { c } ^ { - 1 } \nabla f ( x _ { c } )$;
-) find <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051051.png" /> using a line search to ensure sufficient decrease;
+) find $x _ { + } = x _ { c } + \lambda d$ using a line search to ensure sufficient decrease;
-) use <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051052.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051053.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051054.png" /> to update <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051055.png" /> and obtain <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051056.png" />. The way in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051057.png" /> is computed determines the method.
+) use $x _ { c }$, $x_{+}$, and $H _ { c }$ to update $H _ { c }$ and obtain $H _ { + }$. The way in which $H _ { + }$ is computed determines the method.
-The BFGS method, [[#References|[a2]]], [[#References|[a4]]], [[#References|[a5]]], [[#References|[a6]]], updates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051058.png" /> with the rank-two formula
+The BFGS method, [[#References|[a2]]], [[#References|[a4]]], [[#References|[a5]]], [[#References|[a6]]], updates $H _ { c }$ with the rank-two formula
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051059.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a6)</td></tr></table>
+\begin{equation} \tag{a6} H _ { + } = H _ { c } + \frac { y y ^ { T } } { y ^ { T } s } - \frac { ( H _ { c } s ) ( H _ { c } s ) ^ { T } } { s ^ { T } H _ { c } s }. \end{equation}
 In (a6),
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051060.png" /></td> </tr></table>
+\begin{equation*} s = x _ { + } - x _ { c } , \quad y = \nabla f ( x _ { + } ) - \nabla f ( x _ { c } ). \end{equation*}
 ==Local and global convergence theory.==
-The local convergence result, [[#References|[a8]]], [[#References|[a3]]], [[#References|[a7]]], assumes that the standard assumptions hold and that the initial data is sufficiently good (i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051061.png" /> is near to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051062.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051063.png" /> is near <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051064.png" />). Under these assumptions the BFGS iterates exist and converge <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051066.png" />-superlinearly to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051067.png" />, i.e.,
+The local convergence result, [[#References|[a8]]], [[#References|[a3]]], [[#References|[a7]]], assumes that the standard assumptions hold and that the initial data is sufficiently good (i.e., $x _ { 0 }$ is near to $x ^ { * }$ and $H _ { 0 }$ is near $\nabla ^ { 2 } f ( x ^ { * } )$). Under these assumptions the BFGS iterates exist and converge $q$-superlinearly to $x ^ { * }$, i.e.,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051068.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a7)</td></tr></table>
+\begin{equation} \tag{a7} \operatorname { lim } _ { n \rightarrow \infty } \frac { \left\| x _ { n + 1} - x ^ { * } \right\| } { \left\| x _ { n } - x ^ { * } \right\| } = 0. \end{equation}
-By a global convergence theory one means that one does not assume that the initial data is good and one manages the poor data by making certain that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051069.png" /> is decreased as the iteration progresses. There are several variants of global convergence theorems for BFGS and related methods, [[#References|[a9]]], [[#References|[a11]]], [[#References|[a10]]], [[#References|[a12]]]. All these results make strong assumptions on the function and some require line search methods more complicated that the simple Armijo rule discussed above.
+By a global convergence theory one means that one does not assume that the initial data is good and one manages the poor data by making certain that $f$ is decreased as the iteration progresses. There are several variants of global convergence theorems for BFGS and related methods, [[#References|[a9]]], [[#References|[a11]]], [[#References|[a10]]], [[#References|[a12]]]. All these results make strong assumptions on the function and some require line search methods more complicated that the simple Armijo rule discussed above.
 The result from [[#References|[a9]]] is the most general and a special case illustrating the idea now follows. One must assume that the set
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051070.png" /></td> </tr></table>
+\begin{equation*} D = \{ x : f ( x ) \leq f ( x _ { 0 } ) \} \end{equation*}
-is convex, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051071.png" /> is Lipschitz twice continuously differentiable in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051072.png" />, and that the Hessian and its inverse are uniformly bounded and positive definite in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051073.png" />. This assumption implies that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051074.png" /> has a unique minimizer <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051075.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051076.png" /> and that the standard assumptions hold. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051077.png" /> is symmetric and positive definite, the BFGS iterations, using the Armijo rule to enforce (a3), converge <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051078.png" />-superlinearly to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051079.png" />.
+is convex, $f$ is Lipschitz twice continuously differentiable in $D$, and that the Hessian and its inverse are uniformly bounded and positive definite in $D$. This assumption implies that $f$ has a unique minimizer $x ^ { * }$ in $D$ and that the standard assumptions hold. If $H _ { 0 }$ is symmetric and positive definite, the BFGS iterations, using the Armijo rule to enforce (a3), converge $q$-superlinearly to $x ^ { * }$.
 ==Implementation.==
-The recursive implementation from [[#References|[a7]]], described below, stores the history of the iteration and uses that information recursively to compute the action of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051080.png" /> on a vector. This idea was suggested in [[#References|[a13]]], [[#References|[a14]]], [[#References|[a17]]]. Other implementations may be found in [[#References|[a15]]], [[#References|[a3]]], [[#References|[a7]]], and [[#References|[a16]]].
+The recursive implementation from [[#References|[a7]]], described below, stores the history of the iteration and uses that information recursively to compute the action of $H _ { k } ^{- 1}$ on a vector. This idea was suggested in [[#References|[a13]]], [[#References|[a14]]], [[#References|[a17]]]. Other implementations may be found in [[#References|[a15]]], [[#References|[a3]]], [[#References|[a7]]], and [[#References|[a16]]].
 One can easily show that
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051081.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a8)</td></tr></table>
+\begin{equation} \tag{a8} H _ { + } ^ { - 1 } = \left( I - \frac { s y ^ { T } } { y ^ { T } s } \right) H _ { c } ^ { - 1 } \left( I - \frac { y s ^ { T } } { y ^ { T } s } \right) + \frac { s s ^ { T } } { y ^ { T } s }. \end{equation}
-The algorithm <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051082.png" /> below overwrites a given vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051083.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051084.png" />. The storage needed is one vector for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051085.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051086.png" /> vectors for the sequences <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051087.png" />. A method for computing the product of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051088.png" /> and a vector must also be provided.
+The algorithm $\operatorname{bfgsrec}$ below overwrites a given vector $d$ with $H _ { n}^{ - 1}  d$. The storage needed is one vector for $d$ and $2 n$ vectors for the sequences $\{ s _ { k } , y _ { k } \} _ { k = 0 } ^ { n - 1 }$. A method for computing the product of $H _ { 0 } ^ { - 1 }$ and a vector must also be provided.
-''''''<table border="0" cellpadding="0" cellspacing="0" style="background-color:black;"> <tr><td> <table border="0" cellspacing="1" cellpadding="4" style="background-color:black;"> <tbody> <tr> <td colname="1" style="background-color:white;" colspan="1">1</td> <td colname="2" style="background-color:white;" colspan="1">IF <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051089.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051090.png" />; return</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1">2</td> <td colname="2" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051091.png" />; <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051092.png" /></td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1"></td> <td colname="2" style="background-color:white;" colspan="1">call <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051093.png" />;</td> </tr> <tr> <td colname="1" style="background-color:white;" colspan="1">3</td> <td colname="2" style="background-color:white;" colspan="1"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120510/b12051094.png" />.</td> </tr> </tbody> </table>
+''''''<table border="0" cellpadding="0" cellspacing="0" style="background-color:black;"> <tr><td> <table border="0" cellpadding="4" cellspacing="1" style="background-color:black;">  <tr> <td colname="1" colspan="1" style="background-color:white;">1</td> <td colname="2" colspan="1" style="background-color:white;">IF $n = 0$, $d = H _ { 0 } ^ { - 1 } d$; return</td> </tr> <tr> <td colname="1" colspan="1" style="background-color:white;">2</td> <td colname="2" colspan="1" style="background-color:white;">$\alpha = s _ { n-1 } ^ { T }  d / y _ { n-1 } ^ { T }s$; $d = d - \alpha y _ { n  - 1}$</td> </tr> <tr> <td colname="1" colspan="1" style="background-color:white;"></td> <td colname="2" colspan="1" style="background-color:white;">call $ \operatorname{bfgsrec} ( n - 1 , \{ s _ { k } \} , \{ y _ { k } \} , H _ { 0 } ^ { - 1 } , d )$;</td> </tr> <tr> <td colname="1" colspan="1" style="background-color:white;">3</td> <td colname="2" colspan="1" style="background-color:white;">$d = d + ( \alpha - ( y _ { n-1 } ^ { T }  { d } / y _ { n - 1 } ^ { T } s _ { n - 1 } ) s _ { n - 1 }$.</td> </tr>  </table>
 </td></tr> </table>
 ====References====
-<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  L. Armijo,   "Minimization of functions having Lipschitz-continuous first partial derivatives"  ''Pacific J. Math.'' , '''16'''  (1966)  pp. 1–3</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  C.G. Broyden,   "A new double-rank minimization algorithm"  ''Notices Amer. Math. Soc.'' , '''16'''  (1969)  pp. 670</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  J.E. Dennis,   R.B. Schnabel,   "Numerical Methods for Nonlinear Equations and Unconstrained Optimization" , ''Classics in Applied Math.'' :  16 , SIAM (Soc. Industrial Applied Math.)  (1996)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  R. Fletcher,   "A new approach to variable metric methods"  ''Comput. J.'' , '''13'''  (1970)  pp. 317–322</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  D. Goldfarb,   "A family of variable metric methods derived by variational means"  ''Math. Comp.'' , '''24'''  (1970)  pp. 23–26</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  D.F. Shanno,   "Conditioning of quasi-Newton methods for function minimization"  ''Math. Comp.'' , '''24'''  (1970)  pp. 647–657</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  C.T. Kelley,   "Iterative methods for optimization" , ''Frontiers in Appl. Math.'' , '''18''' , SIAM (Soc. Industrial Applied Math.)  (1999)</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  C.G. Broyden,   J.E. Dennis,   J.J. Moré,   "On the local and superlinear convergence of quasi-Newton methods"  ''J. Inst. Math. Appl.'' , '''12'''  (1973)  pp. 223–246</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  R.H. Byrd,   J. Nocedal,   "A tool for the analysis of quasi-Newton methods with application to unconstrained minimization"  ''SIAM J. Numer. Anal.'' , '''26'''  (1989)  pp. 727–739</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  R.H. Byrd,   J. Nocedal,   Y. Yuan,   "Global convergence of a class of quasi-Newton methods on convex problems"  ''SIAM J. Numer. Anal.'' , '''24'''  (1987)  pp. 1171–1190</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  M.J.D. Powell,   "Some global convergence properties of a variable metric algorithm without exact line searches"  R. Cottle (ed.)  C. Lemke (ed.) , ''Nonlinear Programming'' , Amer. Math. Soc.  (1976)  pp. 53–72</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top">  J. Werner,   "Über die globale konvergenz von Variable-Metric Verfahren mit nichtexakter Schrittweitenbestimmung"  ''Numer. Math.'' , '''31'''  (1978)  pp. 321–334</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top">  K.J. Bathe,   A.P. Cimento,   "Some practical procedures for the solution of nonlinear finite element equations"  ''Comput. Meth. Appl. Mech. Eng.'' , '''22'''  (1980)  pp. 59–85</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top">  H. Matthies,   G. Strang,   "The solution of nonlinear finite element equations"  ''Internat. J. Numerical Methods Eng.'' , '''14'''  (1979)  pp. 1613–1626</TD></TR><TR><TD valign="top">[a15]</TD> <TD valign="top">  R.H. Byrd,   J. Nocedal,   R.B. Schnabel,   "Representation of quasi-Newton matrices and their use in limited memory methods"  ''Math. Progr.'' , '''63'''  (1994)  pp. 129–156</TD></TR><TR><TD valign="top">[a16]</TD> <TD valign="top">  J.L. Nazareth,   "Conjugate gradient methods less dependent on conjugacy"  ''SIAM Review'' , '''28'''  (1986)  pp. 501–512</TD></TR><TR><TD valign="top">[a17]</TD> <TD valign="top">  J. Nocedal,   "Updating quasi-Newton matrices with limited storage"  ''Math. Comp.'' , '''35'''  (1980)  pp. 773–782</TD></TR></table>
+<table><tr><td valign="top">[a1]</td> <td valign="top">  L. Armijo,   "Minimization of functions having Lipschitz-continuous first partial derivatives"  ''Pacific J. Math.'' , '''16'''  (1966)  pp. 1–3</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  C.G. Broyden,   "A new double-rank minimization algorithm"  ''Notices Amer. Math. Soc.'' , '''16'''  (1969)  pp. 670</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  J.E. Dennis,   R.B. Schnabel,   "Numerical Methods for Nonlinear Equations and Unconstrained Optimization" , ''Classics in Applied Math.'' :  16 , SIAM (Soc. Industrial Applied Math.)  (1996)</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  R. Fletcher,   "A new approach to variable metric methods"  ''Comput. J.'' , '''13'''  (1970)  pp. 317–322</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  D. Goldfarb,   "A family of variable metric methods derived by variational means"  ''Math. Comp.'' , '''24'''  (1970)  pp. 23–26</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  D.F. Shanno,   "Conditioning of quasi-Newton methods for function minimization"  ''Math. Comp.'' , '''24'''  (1970)  pp. 647–657</td></tr><tr><td valign="top">[a7]</td> <td valign="top">  C.T. Kelley,   "Iterative methods for optimization" , ''Frontiers in Appl. Math.'' , '''18''' , SIAM (Soc. Industrial Applied Math.)  (1999)</td></tr><tr><td valign="top">[a8]</td> <td valign="top">  C.G. Broyden,   J.E. Dennis,   J.J. Moré,   "On the local and superlinear convergence of quasi-Newton methods"  ''J. Inst. Math. Appl.'' , '''12'''  (1973)  pp. 223–246</td></tr><tr><td valign="top">[a9]</td> <td valign="top">  R.H. Byrd,   J. Nocedal,   "A tool for the analysis of quasi-Newton methods with application to unconstrained minimization"  ''SIAM J. Numer. Anal.'' , '''26'''  (1989)  pp. 727–739</td></tr><tr><td valign="top">[a10]</td> <td valign="top">  R.H. Byrd,   J. Nocedal,   Y. Yuan,   "Global convergence of a class of quasi-Newton methods on convex problems"  ''SIAM J. Numer. Anal.'' , '''24'''  (1987)  pp. 1171–1190</td></tr><tr><td valign="top">[a11]</td> <td valign="top">  M.J.D. Powell,   "Some global convergence properties of a variable metric algorithm without exact line searches"  R. Cottle (ed.)  C. Lemke (ed.) , ''Nonlinear Programming'' , Amer. Math. Soc.  (1976)  pp. 53–72</td></tr><tr><td valign="top">[a12]</td> <td valign="top">  J. Werner,   "Über die globale konvergenz von Variable-Metric Verfahren mit nichtexakter Schrittweitenbestimmung"  ''Numer. Math.'' , '''31'''  (1978)  pp. 321–334</td></tr><tr><td valign="top">[a13]</td> <td valign="top">  K.J. Bathe,   A.P. Cimento,   "Some practical procedures for the solution of nonlinear finite element equations"  ''Comput. Meth. Appl. Mech. Eng.'' , '''22'''  (1980)  pp. 59–85</td></tr><tr><td valign="top">[a14]</td> <td valign="top">  H. Matthies,   G. Strang,   "The solution of nonlinear finite element equations"  ''Internat. J. Numerical Methods Eng.'' , '''14'''  (1979)  pp. 1613–1626</td></tr><tr><td valign="top">[a15]</td> <td valign="top">  R.H. Byrd,   J. Nocedal,   R.B. Schnabel,   "Representation of quasi-Newton matrices and their use in limited memory methods"  ''Math. Progr.'' , '''63'''  (1994)  pp. 129–156</td></tr><tr><td valign="top">[a16]</td> <td valign="top">  J.L. Nazareth,   "Conjugate gradient methods less dependent on conjugacy"  ''SIAM Review'' , '''28'''  (1986)  pp. 501–512</td></tr><tr><td valign="top">[a17]</td> <td valign="top">  J. Nocedal,   "Updating quasi-Newton matrices with limited storage"  ''Math. Comp.'' , '''35'''  (1980)  pp. 773–782</td></tr>
+</table>

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Broyden-Fletcher-Goldfarb-Shanno method"

Latest revision as of 07:21, 13 February 2024

Local and global convergence theory.

Implementation.

References

[a1]	L. Armijo, "Minimization of functions having Lipschitz-continuous first partial derivatives" Pacific J. Math. , 16 (1966) pp. 1–3
[a2]	C.G. Broyden, "A new double-rank minimization algorithm" Notices Amer. Math. Soc. , 16 (1969) pp. 670
[a3]	J.E. Dennis, R.B. Schnabel, "Numerical Methods for Nonlinear Equations and Unconstrained Optimization" , Classics in Applied Math. : 16 , SIAM (Soc. Industrial Applied Math.) (1996)
[a4]	R. Fletcher, "A new approach to variable metric methods" Comput. J. , 13 (1970) pp. 317–322
[a5]	D. Goldfarb, "A family of variable metric methods derived by variational means" Math. Comp. , 24 (1970) pp. 23–26
[a6]	D.F. Shanno, "Conditioning of quasi-Newton methods for function minimization" Math. Comp. , 24 (1970) pp. 647–657
[a7]	C.T. Kelley, "Iterative methods for optimization" , Frontiers in Appl. Math. , 18 , SIAM (Soc. Industrial Applied Math.) (1999)
[a8]	C.G. Broyden, J.E. Dennis, J.J. Moré, "On the local and superlinear convergence of quasi-Newton methods" J. Inst. Math. Appl. , 12 (1973) pp. 223–246
[a9]	R.H. Byrd, J. Nocedal, "A tool for the analysis of quasi-Newton methods with application to unconstrained minimization" SIAM J. Numer. Anal. , 26 (1989) pp. 727–739
[a10]	R.H. Byrd, J. Nocedal, Y. Yuan, "Global convergence of a class of quasi-Newton methods on convex problems" SIAM J. Numer. Anal. , 24 (1987) pp. 1171–1190
[a11]	M.J.D. Powell, "Some global convergence properties of a variable metric algorithm without exact line searches" R. Cottle (ed.) C. Lemke (ed.) , Nonlinear Programming , Amer. Math. Soc. (1976) pp. 53–72
[a12]	J. Werner, "Über die globale konvergenz von Variable-Metric Verfahren mit nichtexakter Schrittweitenbestimmung" Numer. Math. , 31 (1978) pp. 321–334
[a13]	K.J. Bathe, A.P. Cimento, "Some practical procedures for the solution of nonlinear finite element equations" Comput. Meth. Appl. Mech. Eng. , 22 (1980) pp. 59–85
[a14]	H. Matthies, G. Strang, "The solution of nonlinear finite element equations" Internat. J. Numerical Methods Eng. , 14 (1979) pp. 1613–1626
[a15]	R.H. Byrd, J. Nocedal, R.B. Schnabel, "Representation of quasi-Newton matrices and their use in limited memory methods" Math. Progr. , 63 (1994) pp. 129–156
[a16]	J.L. Nazareth, "Conjugate gradient methods less dependent on conjugacy" SIAM Review , 28 (1986) pp. 501–512
[a17]	J. Nocedal, "Updating quasi-Newton matrices with limited storage" Math. Comp. , 35 (1980) pp. 773–782