Namespaces
Variants
Actions

Difference between revisions of "Quasi-Newton method"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (AUTOMATIC EDIT (latexlist): Replaced 107 formulas out of 107 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.)
Line 1: Line 1:
The [[Newton method|Newton method]] for solving a system of non-linear equations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200501.png" /> is defined by the iteration
+
<!--This article has been texified automatically. Since there was no Nroff source code for this article,
 +
the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
 +
was used.
 +
If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200502.png" /></td> </tr></table>
+
Out of 107 formulas, 107 were replaced by TEX code.-->
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200503.png" /></td> </tr></table>
+
{{TEX|semi-auto}}{{TEX|done}}
 +
The [[Newton method|Newton method]] for solving a system of non-linear equations $F ( x ) = 0$ is defined by the iteration
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200504.png" /> is a starting point and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200505.png" /> denotes the [[Jacobi matrix|Jacobi matrix]] of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200506.png" />. In fact, the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200507.png" /> solves the linearized equation at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200508.png" />: <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q1200509.png" />. The method can be applied to find a local minimizer of a smooth real-valued function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005010.png" /> by searching for the stationary points of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005011.png" />, i.e. the zeros of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005012.png" />. The advantage of this method is that it works extremely well if the starting point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005013.png" /> is chosen sufficiently close to a local minimizer <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005014.png" /> with positive-definite Hessian <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005015.png" />. The method even exhibits quadratic convergence in this case, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005016.png" /> depends Lipschitzian on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005017.png" /> (cf. also [[Lipschitz condition|Lipschitz condition]]).
+
\begin{equation*} x ^ { k + 1 } = x ^ { k } - [ D F ( x ^ { k } ) ] ^ { - 1 } F ( x ^ { k } ), \end{equation*}
  
Apart from the lack of global convergence of Newton's method, the use of Hessian matrices is a disadvantage, since explicit expressions for second-order derivatives are often hard to obtain. Even if Hessians are available, the solution of the corresponding linear system <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005018.png" /> for the determination of the displacement <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005019.png" /> in each step might be numerically expensive.
+
\begin{equation*} k = 0,1 , \dots , \end{equation*}
 +
 
 +
where $x ^ { 0 } \in \mathbf{R} ^ { n}$ is a starting point and $D F$ denotes the [[Jacobi matrix|Jacobi matrix]] of $F$. In fact, the point $x ^ { k + 1 }$ solves the linearized equation at $x ^ { k }$: $F ( x ^ { k } ) + D F ( x ^ { k } ) ( x - x ^ { k } ) = 0$. The method can be applied to find a local minimizer of a smooth real-valued function $f$ by searching for the stationary points of $f$, i.e. the zeros of $F = D ^ { T } f$. The advantage of this method is that it works extremely well if the starting point $x ^ { 0 }$ is chosen sufficiently close to a local minimizer $x ^ { * }$ with positive-definite Hessian $D ^ { 2 } f ( x ^ { \color{blue}* } ) = D ( D ^ { T } f ( x ^ {\color{blue } * } ) )$. The method even exhibits quadratic convergence in this case, if $D ^ { 2 } f$ depends Lipschitzian on $x$ (cf. also [[Lipschitz condition|Lipschitz condition]]).
 +
 
 +
Apart from the lack of global convergence of Newton's method, the use of Hessian matrices is a disadvantage, since explicit expressions for second-order derivatives are often hard to obtain. Even if Hessians are available, the solution of the corresponding linear system $D ^ { 2 } f ( x ^ { k } ) \cdot d = - D ^ { T } f ( x ^ { k } )$ for the determination of the displacement $d ^ { k }$ in each step might be numerically expensive.
  
 
One is led to methods which use derivatives of at most order one, so-called gradient methods (cf. also [[Gradient method|Gradient method]]). A wide class of these methods is included in the following modified Newton algorithm:
 
One is led to methods which use derivatives of at most order one, so-called gradient methods (cf. also [[Gradient method|Gradient method]]). A wide class of these methods is included in the following modified Newton algorithm:
  
1) Choose a starting point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005020.png" /> and a symmetric positive-definite matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005021.png" /> (e.g. the identity matrix).
+
1) Choose a starting point $x ^ { 0 } \in \mathbf{R} ^ { n}$ and a symmetric positive-definite matrix $H _ { 0 }$ (e.g. the identity matrix).
  
2) Put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005022.png" />.
+
2) Put $d ^ { k } = - H _ { k } D ^ { T } f ( x ^ { k } )$.
  
3) Determine a step length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005023.png" /> by approximate minimization of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005024.png" /> and put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005025.png" />.
+
3) Determine a step length $\alpha _ { k } &gt; 0$ by approximate minimization of the function $\alpha \mapsto f ( x ^ { k } + \alpha d ^ { k } )$ and put $x ^ { k + 1 } = x ^ { k } + \alpha _ { k } d ^ { k }$.
  
4) Generate a symmetric positive-definite matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005026.png" />, put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005027.png" /> and go to step 2).
+
4) Generate a symmetric positive-definite matrix $H _ { k+1 }  $, put $ k  = k + 1$ and go to step 2).
  
The remainder of this article focuses on the choice of the matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005028.png" />.
+
The remainder of this article focuses on the choice of the matrices $H _ { k }$.
  
First note that the search direction <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005029.png" /> is the negative gradient of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005030.png" /> with respect to the scalar product induced by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005031.png" /> at the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005032.png" />. In fact, each symmetric positive-definite <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005033.png" />-matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005034.png" /> defines a scalar product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005035.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005036.png" />, and the gradient of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005037.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005038.png" /> is the vector which solves <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005039.png" /> for all <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005040.png" />. It follows that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005041.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005042.png" /> depends on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005043.png" />, then it is called a Riemannian metric or variable metric. Since the search direction in step 2 can obviously be rewritten as
+
First note that the search direction $d ^ { k }$ is the negative gradient of $f$ with respect to the scalar product induced by $H _ { k } ^{- 1}$ at the point $x ^ { k }$. In fact, each symmetric positive-definite $( n \times n )$-matrix $R$ defines a scalar product $\langle x , y \rangle _ { R } = x ^ { T } R y$ on ${\bf R} ^ { n }$, and the gradient of $f$ with respect to $R$ is the vector which solves $\langle \operatorname { grad } _ { R } f ( x ) , v \rangle _ { R } = D f ( x ) . v$ for all $v \in {\bf R} ^ { n }$. It follows that $\operatorname{grad}_R f ( x ) = R ^ { - 1 } D ^ { T } f ( x )$. If $R$ depends on $x$, then it is called a Riemannian metric or variable metric. Since the search direction in step 2 can obviously be rewritten as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005044.png" /></td> </tr></table>
+
\begin{equation*} d ^ { k } = - \operatorname { grad } _ { H _ { k } ^ { - 1 } } f ( x ^ { k } ), \end{equation*}
  
 
methods of the considered type are also called variable metric methods ([[#References|[a4]]]). Due to the relation
 
methods of the considered type are also called variable metric methods ([[#References|[a4]]]). Due to the relation
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005045.png" /></td> </tr></table>
+
\begin{equation*} \frac { d } { d \alpha } f ( x ^ { k } + \alpha d ^ { k } ) | _ { \alpha = 0 } = D f ( x ^ { k } ) d ^ { k } = \end{equation*}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005046.png" /></td> </tr></table>
+
\begin{equation*} = - D f ( x ^ { k } ) H _ { k } D ^ { T } f ( x ^ { k } ) &lt; 0, \end{equation*}
  
the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005047.png" /> is always a descent direction for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005048.png" />, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005049.png" /> (cf. also [[Descent, method of|Descent, method of]]). In particular, the step-length <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005050.png" /> can be chosen to be strictly positive.
+
the vector $d ^ { k }$ is always a descent direction for $f$, if $D f ( x ^ { k } ) \neq 0$ (cf. also [[Descent, method of|Descent, method of]]). In particular, the step-length $\alpha _ { k }$ can be chosen to be strictly positive.
  
A quasi-Newton method is generated if in step 4) of the modified Newton algorithm the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005051.png" /> satisfies the quasi-Newton condition (or secant equation) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005052.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005053.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005054.png" />. In order to obtain the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005055.png" /> in a numerically efficient way, it is assumed to be a symmetric rank-<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005056.png" /> or rank-<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005057.png" /> update of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005058.png" />:
+
A quasi-Newton method is generated if in step 4) of the modified Newton algorithm the matrix $H _ { k+1 }  $ satisfies the quasi-Newton condition (or secant equation) $H _ { k + 1 } y ^ { k } = s ^ { k }$, where $y ^ { k } = D ^ { T } f ( x ^ { k + 1 } ) - D ^ { T } f ( x ^ { k } )$ and $s ^ { k } = x ^ { k + 1 } - x ^ { k }$. In order to obtain the matrix $H _ { k+1 }  $ in a numerically efficient way, it is assumed to be a symmetric rank-$1$ or rank-$2$ update of $H _ { k }$:
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005059.png" /></td> </tr></table>
+
\begin{equation*} H _ { k + 1 } = H _ { k } + \beta _ { k } u ^ { k } ( u ^ { k } ) ^ { T } + \gamma _ { k } v ^ { k } ( v ^ { k } ) ^ { T } \end{equation*}
  
with scalars <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005060.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005061.png" /> and vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005062.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005063.png" />. An important class of update formulas is given by the Broyden family [[#References|[a2]]]. Putting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005064.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005065.png" />, etc., it can be written in the form
+
with scalars $\beta _ { k }$, $\gamma _ { k }$ and vectors $u ^ { k }$, $v ^ { k }$. An important class of update formulas is given by the Broyden family [[#References|[a2]]]. Putting $H = H _ { k }$, $H _ { \text{new} } = H _ { k + 1 }$, etc., it can be written in the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005066.png" /></td> </tr></table>
+
\begin{equation*} H _ { \operatorname {new} } = H - \frac { H y y ^ { T } H } { y ^ { T } H y } + \frac { s s ^ { T } } { s ^ { T } y } + \phi \cdot w v ^ { T }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005067.png" /> is an additional parameter and
+
where $\phi = \phi ^ { k }$ is an additional parameter and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005068.png" /></td> </tr></table>
+
\begin{equation*} v = \sqrt { y ^ { T } H y } \left( \frac { s } { s ^ { T } y } - \frac { H y } { y ^ { T } H y } \right). \end{equation*}
  
For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005069.png" />, one obtains the symmetric rank-<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005070.png" /> update SR1 ([[#References|[a1]]], [[#References|[a7]]]), whereas the choices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005072.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005073.png" /> correspond to the DFP formula ([[#References|[a4]]], [[#References|[a9]]]) and to the BFGS formula ([[#References|[a3]]], [[#References|[a6]]], [[#References|[a11]]], [[#References|[a14]]]; cf. also [[Broyden–Fletcher–Goldfarb–Shanno method|Broyden–Fletcher–Goldfarb–Shanno method]]), respectively. The DFP and BFGS formulas are dual in the following sense. The Sherman–Morrison–Woodbury formula implies that the update for the matrix inverse <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005074.png" /> takes the form
+
For $\phi = s ^ { T } y ( s ^ { T } y - y ^ { T } H y ) ^ { - 1 }$, one obtains the symmetric rank-$1$ update SR1 ([[#References|[a1]]], [[#References|[a7]]]), whereas the choices $\phi = 0$ and $\phi = 1$ correspond to the DFP formula ([[#References|[a4]]], [[#References|[a9]]]) and to the BFGS formula ([[#References|[a3]]], [[#References|[a6]]], [[#References|[a11]]], [[#References|[a14]]]; cf. also [[Broyden–Fletcher–Goldfarb–Shanno method|Broyden–Fletcher–Goldfarb–Shanno method]]), respectively. The DFP and BFGS formulas are dual in the following sense. The Sherman–Morrison–Woodbury formula implies that the update for the matrix inverse $B = H ^ { - 1 }$ takes the form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005075.png" /></td> </tr></table>
+
\begin{equation*} B _ { \operatorname{new} } = B - \frac { B s s ^ { T } B } { s ^ { T } B s } + \frac { y y ^ { T } } { y ^ { T } s } + \theta . w w ^ { T }, \end{equation*}
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005076.png" /> is now the corresponding parameter and
+
where $\theta = \theta ^ { k }$ is now the corresponding parameter and
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005077.png" /></td> </tr></table>
+
\begin{equation*} w = \sqrt { s ^ { T } B s } \left( \frac { y } { y ^ { T } s } - \frac { B s } { s ^ { T } B s } \right). \end{equation*}
  
For given <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005078.png" />, the parameter <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005079.png" /> becomes ([[#References|[a7]]])
+
For given $\phi$, the parameter $\theta$ becomes ([[#References|[a7]]])
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005080.png" /></td> </tr></table>
+
\begin{equation*} \theta = \frac { \phi - 1 } { \phi - 1 - \phi \mu }, \end{equation*}
  
 
with
 
with
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005081.png" /></td> </tr></table>
+
\begin{equation*} \mu = \frac { y ^ { T } H y . s ^ { T } B s } { ( s ^ { T } y ) ^ { 2 } }. \end{equation*}
  
Consequently, the DFP and BFGS updates correspond now to the parameter values <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005082.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005083.png" />, respectively. Here, duality means that the BFGS update for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005084.png" /> is obtained from the DFP update for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005085.png" /> by interchanging <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005086.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005087.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005088.png" /> with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005089.png" />, respectively. The SR1 formula is self-dual in this sense.
+
Consequently, the DFP and BFGS updates correspond now to the parameter values $\theta = 1$ and $\theta = 0$, respectively. Here, duality means that the BFGS update for $H$ is obtained from the DFP update for $B$ by interchanging $H$ with $B$ and $s$ with $y$, respectively. The SR1 formula is self-dual in this sense.
  
It is clear that all updates from the Broyden family inherit symmetry of the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005090.png" />. Each formula with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005091.png" />, i.e. a formula from the so-called convex class, inherits also positive definiteness of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005092.png" />. This holds in particular for the DFP and BFGS updates. The SR1 formula is not in the convex class and might give rise to indefinite or singular matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005093.png" />.
+
It is clear that all updates from the Broyden family inherit symmetry of the matrix $H$. Each formula with $\phi \in [ 0,1 ]$, i.e. a formula from the so-called convex class, inherits also positive definiteness of $H$. This holds in particular for the DFP and BFGS updates. The SR1 formula is not in the convex class and might give rise to indefinite or singular matrices $H_{\text{new}}$.
  
If, for any update formula from the Broyden family, the matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005094.png" /> remain non-singular, then the following properties can be derived (for quadratic <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005095.png" />): The algorithm terminates at the solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005096.png" /> after at most <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005097.png" /> iterations, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005098.png" />. The search directions are conjugate, i.e. orthogonal with respect to the scalar product <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q12005099.png" />, and they are equivalent to the directions from the conjugate-gradient method (cf. [[Conjugate gradients, method of|Conjugate gradients, method of]]; [[#References|[a10]]]) if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050100.png" /> is the identity matrix.
+
If, for any update formula from the Broyden family, the matrices $H _ { k }$ remain non-singular, then the following properties can be derived (for quadratic $f$): The algorithm terminates at the solution $x ^ { * }$ after at most $n$ iterations, and $B _ { n } = H _ { n } ^ { - 1 } = D ^ { 2 } f ( x ^ { * } )$. The search directions are conjugate, i.e. orthogonal with respect to the scalar product $\langle \cdot , \cdot \rangle _ { D ^{ 2} f ( x ^ { * } )}$, and they are equivalent to the directions from the conjugate-gradient method (cf. [[Conjugate gradients, method of|Conjugate gradients, method of]]; [[#References|[a10]]]) if $H _ { 0 }$ is the identity matrix.
  
If the line searches are carried out exactly, then all methods from the Broyden family generate the same sequence of iterates <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050101.png" /> ([[#References|[a5]]]). However, implementation of inaccurate line searches leads to extreme differences between the methods. In particular, the BFGS update behaves less sensitively with regard to inexact line searches than the DFP update. Moreover, results in [[#References|[a13]]] indicate that DFP reduces large eigenvalues of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050102.png" /> much slower than BFGS.
+
If the line searches are carried out exactly, then all methods from the Broyden family generate the same sequence of iterates $( x ^ { k } ) _ { k \in \mathbf{N} }$ ([[#References|[a5]]]). However, implementation of inaccurate line searches leads to extreme differences between the methods. In particular, the BFGS update behaves less sensitively with regard to inexact line searches than the DFP update. Moreover, results in [[#References|[a13]]] indicate that DFP reduces large eigenvalues of $B _ {  k }$ much slower than BFGS.
  
Nevertheless, for the case that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050103.png" /> is positive definite and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050104.png" /> is locally Lipschitz continuous, all updates from the Broyden family with exact line searches give rise to superlinear convergence, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050105.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050106.png" /> are chosen sufficiently close to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050107.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/q/q120/q120050/q120050108.png" />, respectively. The DFP and BFGS methods even converge superlinearly if the Armijo rule is implemented for inexact line searches. For more details on DFP and BFGS see [[#References|[a8]]].
+
Nevertheless, for the case that $D ^ { 2 } f ( x ^ { * } )$ is positive definite and that $D ^ { 2 } f$ is locally Lipschitz continuous, all updates from the Broyden family with exact line searches give rise to superlinear convergence, if $x ^ { 0 }$ and $H _ { 0 }$ are chosen sufficiently close to $x ^ { * }$ and $D ^ { 2 } f ( x ^ { * } )$, respectively. The DFP and BFGS methods even converge superlinearly if the Armijo rule is implemented for inexact line searches. For more details on DFP and BFGS see [[#References|[a8]]].
  
 
The Broyden family is contained in the larger Oren–Luenberger class ([[#References|[a12]]]) of quasi-Newton methods. This class includes, in particular, the self-scaling variable metric algorithms (SSVM algorithms), which share most properties of the Broyden family and automatically compensate for poor scaling of the objective function.
 
The Broyden family is contained in the larger Oren–Luenberger class ([[#References|[a12]]]) of quasi-Newton methods. This class includes, in particular, the self-scaling variable metric algorithms (SSVM algorithms), which share most properties of the Broyden family and automatically compensate for poor scaling of the objective function.
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  C.G. Broyden,  "A class of methods for solving non-linear simultaneous equations"  ''Math. Comput.'' , '''19'''  (1965)  pp. 577–593</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  C.G. Broyden,  "Quasi-Newton methods and their application to function minimisation"  ''Math. Comput.'' , '''21'''  (1967)  pp. 368–381</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  C.G. Broyden,  "The convergence of a class of double-rank minimization algorithms; the new algorithm"  ''J. Inst. Math. Appl.'' , '''6'''  (1970)  pp. 222–231</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  W.C. Davidon,  "Variable metric method for minimisation"  ''SIAM J. Optim.'' , '''1'''  (1991)  pp. 1–17  (Also: Argonne Nat. Lab. Report ANL-5990 (rev.) (1959))</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  L.C.W. Dixon,  "Quasi-Newton algorithms generate identical points"  ''Math. Progr.'' , '''2'''  (1972)  pp. 383–387</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  R. Fletcher,  "A new approach for variable metric algorithms"  ''Comput. J.'' , '''13'''  (1970)  pp. 317–322</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  R. Fletcher,  "Practical methods of optimization" , Wiley  (1986)  (Edition: Second)</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  R. Fletcher,  "An overview of unconstrained optimization"  ''Dundee Numerical Analysis Report'' , '''NA/149'''  (1993)</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  R. Fletcher,  M.J.D. Powell,  "A rapidly convergent descent method for minimisation"  ''Comput. J.'' , '''6'''  (1963)  pp. 163–168</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  R. Fletcher,  C.M. Reeves,  "Function minimization by conjugate gradients"  ''Comput. J.'' , '''7'''  (1964)  pp. 149–154</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  D. Goldfarb,  "A family of variable metric methods derived by variational means"  ''Math. Comput.'' , '''24'''  (1970)  pp. 23–26</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top">  S.S. Oren,  D.G. Luenberger,  "Self-scaling variable metric (SSVM) algorithms I"  ''Management Science'' , '''20'''  (1974)  pp. 845–862</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top">  M.J.D. Powell,  "How bad are the BFGS and DFP methods when the objective function is quadratic?"  ''Math. Progr.'' , '''34'''  (1987)  pp. 34–47</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top">  D.F. Shanno,  "Conditioning of quasi-Newton methods for function minimization"  ''Math. Comput.'' , '''24'''  (1970)  pp. 647–656</TD></TR></table>
+
<table><tr><td valign="top">[a1]</td> <td valign="top">  C.G. Broyden,  "A class of methods for solving non-linear simultaneous equations"  ''Math. Comput.'' , '''19'''  (1965)  pp. 577–593</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  C.G. Broyden,  "Quasi-Newton methods and their application to function minimisation"  ''Math. Comput.'' , '''21'''  (1967)  pp. 368–381</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  C.G. Broyden,  "The convergence of a class of double-rank minimization algorithms; the new algorithm"  ''J. Inst. Math. Appl.'' , '''6'''  (1970)  pp. 222–231</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  W.C. Davidon,  "Variable metric method for minimisation"  ''SIAM J. Optim.'' , '''1'''  (1991)  pp. 1–17  (Also: Argonne Nat. Lab. Report ANL-5990 (rev.) (1959))</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  L.C.W. Dixon,  "Quasi-Newton algorithms generate identical points"  ''Math. Progr.'' , '''2'''  (1972)  pp. 383–387</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  R. Fletcher,  "A new approach for variable metric algorithms"  ''Comput. J.'' , '''13'''  (1970)  pp. 317–322</td></tr><tr><td valign="top">[a7]</td> <td valign="top">  R. Fletcher,  "Practical methods of optimization" , Wiley  (1986)  (Edition: Second)</td></tr><tr><td valign="top">[a8]</td> <td valign="top">  R. Fletcher,  "An overview of unconstrained optimization"  ''Dundee Numerical Analysis Report'' , '''NA/149'''  (1993)</td></tr><tr><td valign="top">[a9]</td> <td valign="top">  R. Fletcher,  M.J.D. Powell,  "A rapidly convergent descent method for minimisation"  ''Comput. J.'' , '''6'''  (1963)  pp. 163–168</td></tr><tr><td valign="top">[a10]</td> <td valign="top">  R. Fletcher,  C.M. Reeves,  "Function minimization by conjugate gradients"  ''Comput. J.'' , '''7'''  (1964)  pp. 149–154</td></tr><tr><td valign="top">[a11]</td> <td valign="top">  D. Goldfarb,  "A family of variable metric methods derived by variational means"  ''Math. Comput.'' , '''24'''  (1970)  pp. 23–26</td></tr><tr><td valign="top">[a12]</td> <td valign="top">  S.S. Oren,  D.G. Luenberger,  "Self-scaling variable metric (SSVM) algorithms I"  ''Management Science'' , '''20'''  (1974)  pp. 845–862</td></tr><tr><td valign="top">[a13]</td> <td valign="top">  M.J.D. Powell,  "How bad are the BFGS and DFP methods when the objective function is quadratic?"  ''Math. Progr.'' , '''34'''  (1987)  pp. 34–47</td></tr><tr><td valign="top">[a14]</td> <td valign="top">  D.F. Shanno,  "Conditioning of quasi-Newton methods for function minimization"  ''Math. Comput.'' , '''24'''  (1970)  pp. 647–656</td></tr></table>

Revision as of 16:59, 1 July 2020

The Newton method for solving a system of non-linear equations $F ( x ) = 0$ is defined by the iteration

\begin{equation*} x ^ { k + 1 } = x ^ { k } - [ D F ( x ^ { k } ) ] ^ { - 1 } F ( x ^ { k } ), \end{equation*}

\begin{equation*} k = 0,1 , \dots , \end{equation*}

where $x ^ { 0 } \in \mathbf{R} ^ { n}$ is a starting point and $D F$ denotes the Jacobi matrix of $F$. In fact, the point $x ^ { k + 1 }$ solves the linearized equation at $x ^ { k }$: $F ( x ^ { k } ) + D F ( x ^ { k } ) ( x - x ^ { k } ) = 0$. The method can be applied to find a local minimizer of a smooth real-valued function $f$ by searching for the stationary points of $f$, i.e. the zeros of $F = D ^ { T } f$. The advantage of this method is that it works extremely well if the starting point $x ^ { 0 }$ is chosen sufficiently close to a local minimizer $x ^ { * }$ with positive-definite Hessian $D ^ { 2 } f ( x ^ { \color{blue}* } ) = D ( D ^ { T } f ( x ^ {\color{blue } * } ) )$. The method even exhibits quadratic convergence in this case, if $D ^ { 2 } f$ depends Lipschitzian on $x$ (cf. also Lipschitz condition).

Apart from the lack of global convergence of Newton's method, the use of Hessian matrices is a disadvantage, since explicit expressions for second-order derivatives are often hard to obtain. Even if Hessians are available, the solution of the corresponding linear system $D ^ { 2 } f ( x ^ { k } ) \cdot d = - D ^ { T } f ( x ^ { k } )$ for the determination of the displacement $d ^ { k }$ in each step might be numerically expensive.

One is led to methods which use derivatives of at most order one, so-called gradient methods (cf. also Gradient method). A wide class of these methods is included in the following modified Newton algorithm:

1) Choose a starting point $x ^ { 0 } \in \mathbf{R} ^ { n}$ and a symmetric positive-definite matrix $H _ { 0 }$ (e.g. the identity matrix).

2) Put $d ^ { k } = - H _ { k } D ^ { T } f ( x ^ { k } )$.

3) Determine a step length $\alpha _ { k } > 0$ by approximate minimization of the function $\alpha \mapsto f ( x ^ { k } + \alpha d ^ { k } )$ and put $x ^ { k + 1 } = x ^ { k } + \alpha _ { k } d ^ { k }$.

4) Generate a symmetric positive-definite matrix $H _ { k+1 } $, put $ k = k + 1$ and go to step 2).

The remainder of this article focuses on the choice of the matrices $H _ { k }$.

First note that the search direction $d ^ { k }$ is the negative gradient of $f$ with respect to the scalar product induced by $H _ { k } ^{- 1}$ at the point $x ^ { k }$. In fact, each symmetric positive-definite $( n \times n )$-matrix $R$ defines a scalar product $\langle x , y \rangle _ { R } = x ^ { T } R y$ on ${\bf R} ^ { n }$, and the gradient of $f$ with respect to $R$ is the vector which solves $\langle \operatorname { grad } _ { R } f ( x ) , v \rangle _ { R } = D f ( x ) . v$ for all $v \in {\bf R} ^ { n }$. It follows that $\operatorname{grad}_R f ( x ) = R ^ { - 1 } D ^ { T } f ( x )$. If $R$ depends on $x$, then it is called a Riemannian metric or variable metric. Since the search direction in step 2 can obviously be rewritten as

\begin{equation*} d ^ { k } = - \operatorname { grad } _ { H _ { k } ^ { - 1 } } f ( x ^ { k } ), \end{equation*}

methods of the considered type are also called variable metric methods ([a4]). Due to the relation

\begin{equation*} \frac { d } { d \alpha } f ( x ^ { k } + \alpha d ^ { k } ) | _ { \alpha = 0 } = D f ( x ^ { k } ) d ^ { k } = \end{equation*}

\begin{equation*} = - D f ( x ^ { k } ) H _ { k } D ^ { T } f ( x ^ { k } ) < 0, \end{equation*}

the vector $d ^ { k }$ is always a descent direction for $f$, if $D f ( x ^ { k } ) \neq 0$ (cf. also Descent, method of). In particular, the step-length $\alpha _ { k }$ can be chosen to be strictly positive.

A quasi-Newton method is generated if in step 4) of the modified Newton algorithm the matrix $H _ { k+1 } $ satisfies the quasi-Newton condition (or secant equation) $H _ { k + 1 } y ^ { k } = s ^ { k }$, where $y ^ { k } = D ^ { T } f ( x ^ { k + 1 } ) - D ^ { T } f ( x ^ { k } )$ and $s ^ { k } = x ^ { k + 1 } - x ^ { k }$. In order to obtain the matrix $H _ { k+1 } $ in a numerically efficient way, it is assumed to be a symmetric rank-$1$ or rank-$2$ update of $H _ { k }$:

\begin{equation*} H _ { k + 1 } = H _ { k } + \beta _ { k } u ^ { k } ( u ^ { k } ) ^ { T } + \gamma _ { k } v ^ { k } ( v ^ { k } ) ^ { T } \end{equation*}

with scalars $\beta _ { k }$, $\gamma _ { k }$ and vectors $u ^ { k }$, $v ^ { k }$. An important class of update formulas is given by the Broyden family [a2]. Putting $H = H _ { k }$, $H _ { \text{new} } = H _ { k + 1 }$, etc., it can be written in the form

\begin{equation*} H _ { \operatorname {new} } = H - \frac { H y y ^ { T } H } { y ^ { T } H y } + \frac { s s ^ { T } } { s ^ { T } y } + \phi \cdot w v ^ { T }, \end{equation*}

where $\phi = \phi ^ { k }$ is an additional parameter and

\begin{equation*} v = \sqrt { y ^ { T } H y } \left( \frac { s } { s ^ { T } y } - \frac { H y } { y ^ { T } H y } \right). \end{equation*}

For $\phi = s ^ { T } y ( s ^ { T } y - y ^ { T } H y ) ^ { - 1 }$, one obtains the symmetric rank-$1$ update SR1 ([a1], [a7]), whereas the choices $\phi = 0$ and $\phi = 1$ correspond to the DFP formula ([a4], [a9]) and to the BFGS formula ([a3], [a6], [a11], [a14]; cf. also Broyden–Fletcher–Goldfarb–Shanno method), respectively. The DFP and BFGS formulas are dual in the following sense. The Sherman–Morrison–Woodbury formula implies that the update for the matrix inverse $B = H ^ { - 1 }$ takes the form

\begin{equation*} B _ { \operatorname{new} } = B - \frac { B s s ^ { T } B } { s ^ { T } B s } + \frac { y y ^ { T } } { y ^ { T } s } + \theta . w w ^ { T }, \end{equation*}

where $\theta = \theta ^ { k }$ is now the corresponding parameter and

\begin{equation*} w = \sqrt { s ^ { T } B s } \left( \frac { y } { y ^ { T } s } - \frac { B s } { s ^ { T } B s } \right). \end{equation*}

For given $\phi$, the parameter $\theta$ becomes ([a7])

\begin{equation*} \theta = \frac { \phi - 1 } { \phi - 1 - \phi \mu }, \end{equation*}

with

\begin{equation*} \mu = \frac { y ^ { T } H y . s ^ { T } B s } { ( s ^ { T } y ) ^ { 2 } }. \end{equation*}

Consequently, the DFP and BFGS updates correspond now to the parameter values $\theta = 1$ and $\theta = 0$, respectively. Here, duality means that the BFGS update for $H$ is obtained from the DFP update for $B$ by interchanging $H$ with $B$ and $s$ with $y$, respectively. The SR1 formula is self-dual in this sense.

It is clear that all updates from the Broyden family inherit symmetry of the matrix $H$. Each formula with $\phi \in [ 0,1 ]$, i.e. a formula from the so-called convex class, inherits also positive definiteness of $H$. This holds in particular for the DFP and BFGS updates. The SR1 formula is not in the convex class and might give rise to indefinite or singular matrices $H_{\text{new}}$.

If, for any update formula from the Broyden family, the matrices $H _ { k }$ remain non-singular, then the following properties can be derived (for quadratic $f$): The algorithm terminates at the solution $x ^ { * }$ after at most $n$ iterations, and $B _ { n } = H _ { n } ^ { - 1 } = D ^ { 2 } f ( x ^ { * } )$. The search directions are conjugate, i.e. orthogonal with respect to the scalar product $\langle \cdot , \cdot \rangle _ { D ^{ 2} f ( x ^ { * } )}$, and they are equivalent to the directions from the conjugate-gradient method (cf. Conjugate gradients, method of; [a10]) if $H _ { 0 }$ is the identity matrix.

If the line searches are carried out exactly, then all methods from the Broyden family generate the same sequence of iterates $( x ^ { k } ) _ { k \in \mathbf{N} }$ ([a5]). However, implementation of inaccurate line searches leads to extreme differences between the methods. In particular, the BFGS update behaves less sensitively with regard to inexact line searches than the DFP update. Moreover, results in [a13] indicate that DFP reduces large eigenvalues of $B _ { k }$ much slower than BFGS.

Nevertheless, for the case that $D ^ { 2 } f ( x ^ { * } )$ is positive definite and that $D ^ { 2 } f$ is locally Lipschitz continuous, all updates from the Broyden family with exact line searches give rise to superlinear convergence, if $x ^ { 0 }$ and $H _ { 0 }$ are chosen sufficiently close to $x ^ { * }$ and $D ^ { 2 } f ( x ^ { * } )$, respectively. The DFP and BFGS methods even converge superlinearly if the Armijo rule is implemented for inexact line searches. For more details on DFP and BFGS see [a8].

The Broyden family is contained in the larger Oren–Luenberger class ([a12]) of quasi-Newton methods. This class includes, in particular, the self-scaling variable metric algorithms (SSVM algorithms), which share most properties of the Broyden family and automatically compensate for poor scaling of the objective function.

References

[a1] C.G. Broyden, "A class of methods for solving non-linear simultaneous equations" Math. Comput. , 19 (1965) pp. 577–593
[a2] C.G. Broyden, "Quasi-Newton methods and their application to function minimisation" Math. Comput. , 21 (1967) pp. 368–381
[a3] C.G. Broyden, "The convergence of a class of double-rank minimization algorithms; the new algorithm" J. Inst. Math. Appl. , 6 (1970) pp. 222–231
[a4] W.C. Davidon, "Variable metric method for minimisation" SIAM J. Optim. , 1 (1991) pp. 1–17 (Also: Argonne Nat. Lab. Report ANL-5990 (rev.) (1959))
[a5] L.C.W. Dixon, "Quasi-Newton algorithms generate identical points" Math. Progr. , 2 (1972) pp. 383–387
[a6] R. Fletcher, "A new approach for variable metric algorithms" Comput. J. , 13 (1970) pp. 317–322
[a7] R. Fletcher, "Practical methods of optimization" , Wiley (1986) (Edition: Second)
[a8] R. Fletcher, "An overview of unconstrained optimization" Dundee Numerical Analysis Report , NA/149 (1993)
[a9] R. Fletcher, M.J.D. Powell, "A rapidly convergent descent method for minimisation" Comput. J. , 6 (1963) pp. 163–168
[a10] R. Fletcher, C.M. Reeves, "Function minimization by conjugate gradients" Comput. J. , 7 (1964) pp. 149–154
[a11] D. Goldfarb, "A family of variable metric methods derived by variational means" Math. Comput. , 24 (1970) pp. 23–26
[a12] S.S. Oren, D.G. Luenberger, "Self-scaling variable metric (SSVM) algorithms I" Management Science , 20 (1974) pp. 845–862
[a13] M.J.D. Powell, "How bad are the BFGS and DFP methods when the objective function is quadratic?" Math. Progr. , 34 (1987) pp. 34–47
[a14] D.F. Shanno, "Conditioning of quasi-Newton methods for function minimization" Math. Comput. , 24 (1970) pp. 647–656
How to Cite This Entry:
Quasi-Newton method. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Quasi-Newton_method&oldid=14733
This article was adapted from an original article by Hubertus Th. JongenOliver Stein (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article