# Acceleration methods

The theory and techniques for increasing the speed of convergence in the iterative solution of linear systems $ Ax = b $.
See Acceleration of convergence and the references given there.

The fundamental problem of linear algebra is the solution of finite-dimensional linear algebraic systems

$$ \tag{a1 } Ax = b $$

in which $ A $ is a given $ ( n \times m ) $-matrix. For simplicity, let $ A $ be an $ ( n \times n ) $-matrix with real entries $ a _ {ij } $. The numerical solution of very large systems (e.g., $ n = 100,000 $) is not uncommon today (1996). Commonly, the saving grace of such large systems is their sparseness: most $ a _ {ij } $ are zero (cf. Sparse matrix). The classical direct method of solution, Gauss elimination, has at least three disadvantages: the pivoting operation fills in zeros, the operation count is $ O ( n ^ {3} ) $, and the storage requirements when building a classical triangular factorization of $ A $ increases superlinearly with $ n $.

For these reasons, indirect iterative methods for the solution of (a1) have recently come to the fore. Iterative methods may be roughly broken into two classes: splitting methods and gradient methods. Splitting methods decompose $ A $ according to

$$ \tag{a2 } A = M - N $$

where $ M $ is non-singular (cf. Non-singular matrix). The solution to (a1) is then sought as the fixed point $ x ^ {*} $ of the equation

$$ \tag{a3 } x \rightarrow F ( x ) = M ^ {- 1 } ( Nx + b ) . $$

Accordingly, one picks an initial guess $ x _ {0} $ and forms the iterates

$$ \tag{a4 } x _ {k + 1 } = M ^ {- 1 } ( Nx _ {k} + b ) . $$

Provided that the spectral radius $ \rho ( M ^ {- 1 } N ) $ is less than one, the iteration (a4) converges to the solution $ x ^ {*} $ of (a1) for any initial guess $ x _ {0} $.

Gradient methods, on the other hand, form a variational problem from (a1). For example, minimizing the quadratic form

$$ \tag{a5 } Q ( x ) = { \frac{1}{2} } \left \langle {x, Ax } \right \rangle - \left \langle {b,x } \right \rangle $$

for $ A $ a symmetric positive-definite matrix, is equivalent to the problem (a1).

Just as for the classical direct methods of solution of (a1), the indirect iterative methods in their original formulations are too slow for ever larger matrices $ A $. Therefore the original Jacobi iterative method (cf. Jacobi method)

$$ \tag{a6 } x _ {k + 1 } = D ^ {- 1 } ( b - ( L + U ) x _ {k} ) , $$

where $ A = L + D + U $ has been written in terms of its lower-triangular, diagonal, and upper-triangular parts, has been accelerated by the Gauss–Seidel method

$$ \tag{a7 } x _ {k + 1 } = ( D + L ) ^ {- 1 } ( b - Ux _ {k} ) , $$

which in turn has been accelerated by the successive overrelaxation method

$$ \tag{a8 } x _ {k + 1 } = ( D + \omega L ) ^ {- 1 } ( \omega b - ( ( 1 - \omega ) D - \omega U ) x _ {k} ) . $$

(Cf. also Hyperrelaxation method.) In (a8), $ \omega $ is a free parameter which may be sought optimally. Generally, the error of a splitting iterative method is reduced by each iteration by a factor bounded by $ \rho ( M ^ {- 1 } N ) $.

In like vein, the original gradient iterative method, steepest descent, in which the $ k $-th residual error vector

$$ \tag{a9 } r _ {k} = b - Ax _ {k} , $$

being the negative of the gradient of $ Q ( x _ {k} ) $ hence defines the direction of steepest descent, converges more slowly as one nears the bottom of the quadratic surface (cf. also Steepest descent, method of). The convergence also becomes slow when the condition number

$$ \tag{a10 } \kappa ( A ) = { \frac{\lambda _ { \max } }{\lambda _ { { \mathop{\rm min} } } } } $$

of the matrix $ A $ is large. Here $ \lambda _ { \max } $ and $ \lambda _ { { \mathop{\rm min} } } $ denote the largest and smallest eigenvalues of $ A $ for $ A $ symmetric positive-definite, and may be replaced in (a10) by their absolute values or, more generally, by $ \| A \| $ and $ \| {A ^ {- 1 } } \| ^ {- 1 } $ for general matrices $ A $. Therefore, the steepest descent method has been accelerated by the conjugate-gradient method. Both steepest descent and conjugate gradient may be written as iterations

$$ \tag{a11 } x _ {k + 1 } = x _ {k} + \alpha _ {k} p _ {k} , $$

where $ \alpha _ {k} $ denotes an optimal magnitude of change in the direction $ p _ {k} $. In steepest descent, $ p _ {k} $ is the residual $ r _ {k} $ of (a9) and $ \alpha _ {k} $ is then chosen to minimize $ Q ( x _ {k} + \alpha _ {k} r _ {k} ) $. In conjugate gradient one takes $ p _ {k} = r _ {k} + \beta _ {k} p _ {k - 1 } $, where $ \beta _ {k} $ is chosen so that $ p _ {k} $ is conjugate to $ p _ {k - 1 } $, i.e.,

$$ \tag{a12 } \left \langle {p _ {k} ,Ap _ {k - 1 } } \right \rangle = 0. $$

Then $ \alpha _ {k} $ is chosen to minimize $ Q ( x _ {k} + \alpha _ {k} p _ {k} ) $ as before. Because the conjugate-gradient method may be seen to choose a new dimension each iteration, in theory (i.e., in the absence of roundoff errors) it must converge in $ n $ steps. In practice it usually converges even faster than that.

The above acceleration methods, i.e., those based upon splittings and those based upon gradients, respectively, can be referred to as the classical acceleration methods: relaxation methods and direction methods, respectively. Recently (1996) these methods have been sharpened in a number of ways. Two of the most important sharpenings are Chebyshev acceleration and preconditioning acceleration. Both methods often involve polynomial iterations

$$ \tag{a13 } x _ {k} = p _ {k} ( A ) x _ {0} , $$

where $ p _ {k} $ is some polynomial chosen from some knowledge or estimation of the eigenvalues of $ A $. These methods therefore make use of results and techniques from both approximation theory and spectral theory.

Chebyshev methods depend upon the property of the Chebyshev polynomials of minimizing the maximum error $ \max | {p ( t ) } | $ for $ t $ in some interval $ [ a,b ] $. For example, for the conjugate-gradient method it may be shown that (see [a5])

$$ \tag{a14 } \left \| {x _ {k} - x ^ {*} } \right \| _ {A} \leq { \frac{1}{C _ {m} ( 1 + 2 \eta ) } } \left \| {x _ {0} - x ^ {*} } \right \| _ {A} , $$

where $ C _ {m} $ is the Chebyshev polynomial of the first kind of degree $ k $, where $ \eta = { {\lambda _ { { \mathop{\rm min} } } } / {( \lambda _ { \max } - \lambda _ { { \mathop{\rm min} } } ) } } $, and where $ \| x \| _ {A} = \langle {Ax, x } \rangle ^ { {1 / 2 } } $. A disadvantage of Chebyshev methods in practice is their dependence upon knowledge or good estimates of the eigenvalues of $ A $.

Preconditioning has recently become the most important acceleration method. There are several preconditioning strategies, but the basic idea is to left or right multiply $ A $ in (a1) by some preconditioning matrix so that the transformed system $ {\widetilde{A} } {\widetilde{x} } = {\widetilde{b} } $ is easier to solve. For example, if $ M $ approximates $ A $ in some correct sense of representing $ A $' s main invertibility features, i.e., so that $ M ^ {- 1 } $ approximates $ A ^ {- 1 } $ in some good way, then (a1) is transformed by right preconditioning to the equivalent system

$$ \tag{a15 } [ AM ^ {- 1 } ] [ Mx ] = b. $$

Left preconditioning yields

$$ \tag{a16 } [ M ^ {- 1 } A ] x = [ M ^ {- 1 } b ] . $$

Two-sided preconditioning by multiplicatively splitting $ M $ into $ M = M _ {1} M _ {2} $ becomes

$$ \tag{a17 } [ M _ {1} ^ {- 1 } A M _ {2} ^ {- 1 } ] [ M _ {2} x ] = [ M _ {1} ^ {- 1 } b ] . $$

Because the convergence rate of iterative methods generally depends on the condition number $ \kappa ( A ) $ of (a10), a rule of thumb has been to seek preconditioners $ M $ such that the transformed system $ {\widetilde{A} } {\widetilde{x} } = {\widetilde{b} } $ has lower, hopefully much lower, condition number $ \kappa ( {\widetilde{A} } ) $.

There are roughly three generally successful theories for finding preconditioners: incomplete factorizations, Frobenius-norm methods and relaxation methods. The incomplete factorizations approximate $ A $ by its approximate factorization into lower and upper triangular factors, but insisting on sparse approximations by dropping small elements. Then a conjugate-gradient method or Krylov-space variant of it (e.g., schemes such as ORTHOMIN or GMRES) may converge well. The Frobenius-norm methods, roughly, seek a weight matrix $ W $ such that the trace $ { \mathop{\rm Tr} } ( A ^ \top WA ) $ is minimized. Often, an incomplete Cholesky factorization enters into applications of this method. Relaxation methods have been described above. Often, these are combined (see [a1]) with block Schur complement decompositions of $ A $. Relaxation methods such as (a6), (a7), (a8) are used today chiefly as preconditioners. Even the lowly Jacobi method (a6) can be a practical preconditioner on highly parallel computers when appropriately implemented.

Acceleration methods are also used for the iterative solution of the eigenvalue-eigenvector problem

$$ \tag{a18 } Ax = \lambda x. $$

(a18) may be regarded as a special case of (a1) in which $ A \rightarrow A - \lambda I $ and $ b = 0 $. However, one must approximate the eigenvalue $ \lambda $ as one is approximating the eigenvector $ x $. Both Chebyshev and preconditioning acceleration methods enter into these (e.g., Arnoldi and Krylov) methods for eigenvalue-eigenvector computation. An important preconditioning method somewhat special to these problems is the so-called shift and invert method, using inverted shifts $ ( A - \sigma I ) ^ {- 1 } $ to more widely disperse the eigenvalues (see [a4]).

There are other useful variations on preconditioning for accelerating convergence. Conjugate gradient can be applied as a preconditioner for any of the relaxation schemes. Other methods, such as multigrid schemes, have been successfully used as preconditioners (see [a3]). Preconditioning strategies can be cast in terms of a new theory of anti-eigenvalues $ \mu $ rather than the eigenvalue condition number $ \kappa $ (see [a2] and Anti-eigenvalue). Finding optimal preconditioners for an important specific application often usefully involves specific physical details of that application. For this reason, in practice preconditioning is sometimes labeled "black magic" .

#### References

[a1] | O. Axelsson, "Iterative solution methods" , Cambridge , New York (1994) |

[a2] | K. Gustafson, "Lectures on computational fluid dynamics, mathematical physics, and linear algebra" , Kaigai & World Sci. (1996/7) |

[a3] | W. Hackbusch, "Iterative solution of large sparse systems of equations" , Springer (1994) Zbl 0789.65017 |

[a4] | Y. Saad, "Numerical methods for large eigenvalue problems" , Halsted (1992) |

[a5] | Y. Saad, "Iterative methods for sparse linear systems" , PWS Publishing (1996) |

**How to Cite This Entry:**

Acceleration methods.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Acceleration_methods&oldid=53507