The Newton method for solving a system of non-linear equations is defined by the iteration
where is a starting point and denotes the Jacobi matrix of . In fact, the point solves the linearized equation at : . The method can be applied to find a local minimizer of a smooth real-valued function by searching for the stationary points of , i.e. the zeros of . The advantage of this method is that it works extremely well if the starting point is chosen sufficiently close to a local minimizer with positive-definite Hessian . The method even exhibits quadratic convergence in this case, if depends Lipschitzian on (cf. also Lipschitz condition).
Apart from the lack of global convergence of Newton's method, the use of Hessian matrices is a disadvantage, since explicit expressions for second-order derivatives are often hard to obtain. Even if Hessians are available, the solution of the corresponding linear system for the determination of the displacement in each step might be numerically expensive.
One is led to methods which use derivatives of at most order one, so-called gradient methods (cf. also Gradient method). A wide class of these methods is included in the following modified Newton algorithm:
1) Choose a starting point and a symmetric positive-definite matrix (e.g. the identity matrix).
2) Put .
3) Determine a step length by approximate minimization of the function and put .
4) Generate a symmetric positive-definite matrix , put and go to step 2).
The remainder of this article focuses on the choice of the matrices .
First note that the search direction is the negative gradient of with respect to the scalar product induced by at the point . In fact, each symmetric positive-definite -matrix defines a scalar product on , and the gradient of with respect to is the vector which solves for all . It follows that . If depends on , then it is called a Riemannian metric or variable metric. Since the search direction in step 2 can obviously be rewritten as
methods of the considered type are also called variable metric methods ([a4]). Due to the relation
the vector is always a descent direction for , if (cf. also Descent, method of). In particular, the step-length can be chosen to be strictly positive.
A quasi-Newton method is generated if in step 4) of the modified Newton algorithm the matrix satisfies the quasi-Newton condition (or secant equation) , where and . In order to obtain the matrix in a numerically efficient way, it is assumed to be a symmetric rank- or rank- update of :
with scalars , and vectors , . An important class of update formulas is given by the Broyden family [a2]. Putting , , etc., it can be written in the form
where is an additional parameter and
For , one obtains the symmetric rank- update SR1 ([a1], [a7]), whereas the choices and correspond to the DFP formula ([a4], [a9]) and to the BFGS formula ([a3], [a6], [a11], [a14]; cf. also Broyden–Fletcher–Goldfarb–Shanno method), respectively. The DFP and BFGS formulas are dual in the following sense. The Sherman–Morrison–Woodbury formula implies that the update for the matrix inverse takes the form
where is now the corresponding parameter and
For given , the parameter becomes ([a7])
Consequently, the DFP and BFGS updates correspond now to the parameter values and , respectively. Here, duality means that the BFGS update for is obtained from the DFP update for by interchanging with and with , respectively. The SR1 formula is self-dual in this sense.
It is clear that all updates from the Broyden family inherit symmetry of the matrix . Each formula with , i.e. a formula from the so-called convex class, inherits also positive definiteness of . This holds in particular for the DFP and BFGS updates. The SR1 formula is not in the convex class and might give rise to indefinite or singular matrices .
If, for any update formula from the Broyden family, the matrices remain non-singular, then the following properties can be derived (for quadratic ): The algorithm terminates at the solution after at most iterations, and . The search directions are conjugate, i.e. orthogonal with respect to the scalar product , and they are equivalent to the directions from the conjugate-gradient method (cf. Conjugate gradients, method of; [a10]) if is the identity matrix.
If the line searches are carried out exactly, then all methods from the Broyden family generate the same sequence of iterates ([a5]). However, implementation of inaccurate line searches leads to extreme differences between the methods. In particular, the BFGS update behaves less sensitively with regard to inexact line searches than the DFP update. Moreover, results in [a13] indicate that DFP reduces large eigenvalues of much slower than BFGS.
Nevertheless, for the case that is positive definite and that is locally Lipschitz continuous, all updates from the Broyden family with exact line searches give rise to superlinear convergence, if and are chosen sufficiently close to and , respectively. The DFP and BFGS methods even converge superlinearly if the Armijo rule is implemented for inexact line searches. For more details on DFP and BFGS see [a8].
The Broyden family is contained in the larger Oren–Luenberger class ([a12]) of quasi-Newton methods. This class includes, in particular, the self-scaling variable metric algorithms (SSVM algorithms), which share most properties of the Broyden family and automatically compensate for poor scaling of the objective function.
|[a1]||C.G. Broyden, "A class of methods for solving non-linear simultaneous equations" Math. Comput. , 19 (1965) pp. 577–593|
|[a2]||C.G. Broyden, "Quasi-Newton methods and their application to function minimisation" Math. Comput. , 21 (1967) pp. 368–381|
|[a3]||C.G. Broyden, "The convergence of a class of double-rank minimization algorithms; the new algorithm" J. Inst. Math. Appl. , 6 (1970) pp. 222–231|
|[a4]||W.C. Davidon, "Variable metric method for minimisation" SIAM J. Optim. , 1 (1991) pp. 1–17 (Also: Argonne Nat. Lab. Report ANL-5990 (rev.) (1959))|
|[a5]||L.C.W. Dixon, "Quasi-Newton algorithms generate identical points" Math. Progr. , 2 (1972) pp. 383–387|
|[a6]||R. Fletcher, "A new approach for variable metric algorithms" Comput. J. , 13 (1970) pp. 317–322|
|[a7]||R. Fletcher, "Practical methods of optimization" , Wiley (1986) (Edition: Second)|
|[a8]||R. Fletcher, "An overview of unconstrained optimization" Dundee Numerical Analysis Report , NA/149 (1993)|
|[a9]||R. Fletcher, M.J.D. Powell, "A rapidly convergent descent method for minimisation" Comput. J. , 6 (1963) pp. 163–168|
|[a10]||R. Fletcher, C.M. Reeves, "Function minimization by conjugate gradients" Comput. J. , 7 (1964) pp. 149–154|
|[a11]||D. Goldfarb, "A family of variable metric methods derived by variational means" Math. Comput. , 24 (1970) pp. 23–26|
|[a12]||S.S. Oren, D.G. Luenberger, "Self-scaling variable metric (SSVM) algorithms I" Management Science , 20 (1974) pp. 845–862|
|[a13]||M.J.D. Powell, "How bad are the BFGS and DFP methods when the objective function is quadratic?" Math. Progr. , 34 (1987) pp. 34–47|
|[a14]||D.F. Shanno, "Conditioning of quasi-Newton methods for function minimization" Math. Comput. , 24 (1970) pp. 647–656|
Quasi-Newton method. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Quasi-Newton_method&oldid=14733