Difference between revisions of "Complete problem of eigen values"
Ulf Rehmann (talk | contribs) m (tex encoded by computer) |
(latex details) |
||
Line 34: | Line 34: | ||
multiplications [[#References|[1]]], [[#References|[2]]]. | multiplications [[#References|[1]]], [[#References|[2]]]. | ||
− | Methods in this group have received the name of direct, or exact, ones for the reason that if carried out with exact arithmetic they give the exact values of the coefficients in the characteristic polynomial. The behaviour in real calculations with rounding-off errors could not be tested for problems of any substantial order until digital computers became available. Such tests were carried out in the 1950-s, and as a result, direct methods were completely displaced from numerical practice. There is a catastrophic instability in calculating the eigen values by these methods, and there are two major reasons for this. First, the coefficients in the characteristic polynomial in the majority of exact methods are determined directly or indirectly as components of the solution of a system of linear equations, the matrix of which is built up by columns from the vectors $ v, Av \dots A ^ {n-} | + | Methods in this group have received the name of direct, or exact, ones for the reason that if carried out with exact arithmetic they give the exact values of the coefficients in the characteristic polynomial. The behaviour in real calculations with rounding-off errors could not be tested for problems of any substantial order until digital computers became available. Such tests were carried out in the 1950-s, and as a result, direct methods were completely displaced from numerical practice. There is a catastrophic instability in calculating the eigen values by these methods, and there are two major reasons for this. First, the coefficients in the characteristic polynomial in the majority of exact methods are determined directly or indirectly as components of the solution of a system of linear equations, the matrix of which is built up by columns from the vectors $ v, Av \dots A ^ {n-1} v $, |
where $ v $ | where $ v $ | ||
is the initial vector in the method. Such a matrix is usually very poorly conditioned, as is evident in particular from the fact that the lengths of the columns are usually very different, and this is the more so the larger $ n $. | is the initial vector in the method. Such a matrix is usually very poorly conditioned, as is evident in particular from the fact that the lengths of the columns are usually very different, and this is the more so the larger $ n $. | ||
Line 50: | Line 50: | ||
one alters the coefficient of $ x ^ {19} $ | one alters the coefficient of $ x ^ {19} $ | ||
from $ - 210 $ | from $ - 210 $ | ||
− | to $ - 210 + 2 ^ {-} | + | to $ - 210 + 2 ^ {-23} $, |
the perturbed polynomial $ \widetilde{p} ( x) $ | the perturbed polynomial $ \widetilde{p} ( x) $ | ||
acquires five pairs of complex-conjugate roots; for one of these pairs, the imaginary parts are $ \pm i \cdot 2.81 \dots $. | acquires five pairs of complex-conjugate roots; for one of these pairs, the imaginary parts are $ \pm i \cdot 2.81 \dots $. | ||
Line 56: | Line 56: | ||
This sensitivity of the eigen values to perturbations in the coefficients of the characteristic polynomial is usually not accompanied by a comparable sensitivity with respect to the elements of the matrix itself. For example, if this polynomial $ p( x) $ | This sensitivity of the eigen values to perturbations in the coefficients of the characteristic polynomial is usually not accompanied by a comparable sensitivity with respect to the elements of the matrix itself. For example, if this polynomial $ p( x) $ | ||
is the characteristic polynomial of a symmetric matrix $ A $, | is the characteristic polynomial of a symmetric matrix $ A $, | ||
− | then perturbations of the order of $ 2 ^ {-} | + | then perturbations of the order of $ 2 ^ {-23} $ |
in any element in the matrix produce changes at most of the same order in the eigen values. | in any element in the matrix produce changes at most of the same order in the eigen values. | ||
Line 120: | Line 120: | ||
is the spectral norm). Then the sensitivity of $ \lambda $ | is the spectral norm). Then the sensitivity of $ \lambda $ | ||
to perturbations in the matrix $ A $ | to perturbations in the matrix $ A $ | ||
− | is characterized by the number $ s( \lambda ) = | y ^ {T} x | ^ {-} | + | is characterized by the number $ s( \lambda ) = | y ^ {T} x | ^ {-1} $, |
which is called the (individual) condition number of that eigen value. When both vectors $ x $ | which is called the (individual) condition number of that eigen value. When both vectors $ x $ | ||
and $ y $ | and $ y $ | ||
− | are real, the number $ s ^ {-} | + | are real, the number $ s ^ {-1} ( \lambda ) $ |
has a simple geometrical meaning: It is the cosine of the angle between $ x $ | has a simple geometrical meaning: It is the cosine of the angle between $ x $ | ||
and $ y $, | and $ y $, | ||
Line 149: | Line 149: | ||
$$ | $$ | ||
− | s( \lambda _ {i} ) \leq \mathop{\rm cond} P \leq | + | s( \lambda _ {i} ) \leq \mathop{\rm cond} P \leq \sum_{i=1}^ { n } s( \lambda _ {i} ). |
$$ | $$ | ||
Line 255: | Line 255: | ||
$$ | $$ | ||
− | ( A _ {n} \lambda ^ {n} + A _ {n-} | + | ( A _ {n} \lambda ^ {n} + A _ {n-1} \lambda ^ {n-1} + \dots + A _ {0} ) x = 0. |
$$ | $$ | ||
Latest revision as of 08:54, 21 January 2024
complete eigen value problem
The problem of calculating all the eigen values (as distinct from the partial problem of eigen values) of a square, usually real or complex, matrix. Often, one has not only to calculate the eigen values but also to construct a basis composed of eigen vectors, or root (principal) vectors, for the matrix.
The solution to the complete problem of eigen values for a matrix $ A $ amounts theoretically to constructing the characteristic polynomial $ \phi _ {A} $ for this matrix and calculating the, real or complex, roots of it (this makes it impossible to compute the eigen values by a finite computational process). For each eigen value $ \lambda $, one can determine the corresponding eigen vectors from a homogeneous system of linear equations: $ ( A- \lambda E) x = 0 $( $ E $ is the identity matrix). In calculations over the field of complex numbers, a sufficient condition for the existence of a basis of eigen vectors is simplicity of the spectrum, while a necessary and sufficient condition is that the algebraic multiplicity of each eigen value $ \lambda $( i.e. its multiplicity as a root of the characteristic polynomial $ \phi _ {A} $) coincides with its geometric multiplicity (i.e. the defect of the matrix $ A - \lambda E $). If it is necessary to calculate the root (principal) vectors of multiplicity (degree) exceeding one, one has to consider homogeneous systems of the form
$$ ( A- \lambda E) ^ {k} x = 0,\ \ k \in \mathbf N ,\ k > 1. $$
According to this scheme numerical methods of solving the complete problem of eigen values were constructed up to the end of the 1940-s. In the 1930-s, highly effective algorithms (as regards the number of arithmetic operations) were developed for calculating the characteristic polynomial of a matrix in terms of its coefficients. For example, in Danilevskii's method, the computation of the characteristic polynomial for a matrix of order $ n $ involves about $ \sim n ^ {3} $ multiplications [1], [2].
Methods in this group have received the name of direct, or exact, ones for the reason that if carried out with exact arithmetic they give the exact values of the coefficients in the characteristic polynomial. The behaviour in real calculations with rounding-off errors could not be tested for problems of any substantial order until digital computers became available. Such tests were carried out in the 1950-s, and as a result, direct methods were completely displaced from numerical practice. There is a catastrophic instability in calculating the eigen values by these methods, and there are two major reasons for this. First, the coefficients in the characteristic polynomial in the majority of exact methods are determined directly or indirectly as components of the solution of a system of linear equations, the matrix of which is built up by columns from the vectors $ v, Av \dots A ^ {n-1} v $, where $ v $ is the initial vector in the method. Such a matrix is usually very poorly conditioned, as is evident in particular from the fact that the lengths of the columns are usually very different, and this is the more so the larger $ n $. Therefore, the coefficients in the characteristic polynomial in general are calculated with very large errors. Secondly, the calculation of the roots of a polynomial is frequently numerically unstable. In that respect, the following example is noteworthy [3]: If in the polynomial
$$ p( x) = ( x- 1)( x- 2) \dots ( x- 20) = $$
$$ = \ x ^ {20} - 210x ^ {19} + \dots $$
one alters the coefficient of $ x ^ {19} $ from $ - 210 $ to $ - 210 + 2 ^ {-23} $, the perturbed polynomial $ \widetilde{p} ( x) $ acquires five pairs of complex-conjugate roots; for one of these pairs, the imaginary parts are $ \pm i \cdot 2.81 \dots $.
This sensitivity of the eigen values to perturbations in the coefficients of the characteristic polynomial is usually not accompanied by a comparable sensitivity with respect to the elements of the matrix itself. For example, if this polynomial $ p( x) $ is the characteristic polynomial of a symmetric matrix $ A $, then perturbations of the order of $ 2 ^ {-23} $ in any element in the matrix produce changes at most of the same order in the eigen values.
Modern numerical methods for solving the complete problem of eigen values yield their results without calculating the characteristic polynomial (see Iteration methods). The number of multiplications involved in the best methods is $ O ( kn ^ {3} ) $, where $ n $ is the order of the matrix and $ k $ is a constant independent of $ n $ and having the meaning of the average number of iterations required to calculate one eigen value. The values of $ k $ in the $ \textrm{ QR } $ algorithm are usually between 1.5 and 2.
Approximate eigen values (vectors) calculated for an ( $ n \times n $)- matrix $ A $ by a method $ M $ based on orthogonal transformations can be interpreted as the exact eigen values (vectors) of a perturbed matrix or matrices $ A + F _ {M} $( the $ \mathop{\rm QR} $ algorithm for matrices of general form, or Jacobi's method or methods based on dividing the spectrum for symmetric and Hermitian matrices are used). Here $ F _ {M} $ is the equivalent perturbation matrix for the method $ M $ that satisfies bounds of the form
$$ \tag{1 } \| F _ {M} \| _ {E} \leq f( n) \| A \| _ {E} \epsilon , $$
where $ \epsilon $ is the relative accuracy of the machine arithmetic, $ \| A \| _ {E} = ( \sum | A _ {ij} | ^ {2} ) ^ {1/2} $ is the Euclidean matrix norm and $ f( n) $ is a function of the form $ Ckn ^ \alpha $. The number $ k $ has been described above, while the exact values of the constant $ C $ and the exponent $ \alpha $ depend on details of the computation such as the method of rounding-off, the use of scalar-product accumulation, etc. The usual value of $ \alpha $ is 2.
The a priori bound (1) allows one to estimate the accuracy in calculating the eigen values (vectors) by the method $ M $. This accuracy depends on the conditioning of the individual eigen values (eigen subspaces).
Let $ \lambda $ be a simple eigen value of a matrix $ A $, let $ x $ be the corresponding normalized eigen vector, and let $ y $ be the normalized eigen vector of the transposed matrix $ A ^ {T} $ for the same eigen value. When the matrix $ A $ is perturbed by a matrix $ F $, the perturbation in the eigen value $ \lambda $ is expressed by the following quantity, accurate up to the second order:
$$ \tag{2 } \Delta \lambda \approx \frac{( y ^ {T} Fx) }{( y ^ {T} x) } $$
and is estimated as
$$ \tag{3 } | \Delta \lambda | \lce \frac{\| F \| _ {2} }{| y ^ {T} x | } $$
( $ \| \cdot \| _ {2} $ is the spectral norm). Then the sensitivity of $ \lambda $ to perturbations in the matrix $ A $ is characterized by the number $ s( \lambda ) = | y ^ {T} x | ^ {-1} $, which is called the (individual) condition number of that eigen value. When both vectors $ x $ and $ y $ are real, the number $ s ^ {-1} ( \lambda ) $ has a simple geometrical meaning: It is the cosine of the angle between $ x $ and $ y $, which explains the other name of $ s( \lambda ) $, the distortion coefficient corresponding to $ \lambda $.
If the matrix $ A $ is diagonalizable (i.e. has a basis of eigen vectors), then the conditioning of its eigen values $ \lambda _ {i} $ can be characterized completely. Let $ P $ be the matrix composed by the columns of eigen vectors at $ \lambda _ {i} $ and having the least condition number among all such matrices. The following theorem applies [4]: All the eigen values of the perturbed matrix $ A + F $ are enclosed in the region in the complex plane that is the union of the circles
$$ \tag{4 } | z - \lambda _ {i} | \leq \mathop{\rm cond} P \| F \| ,\ \ i= 1 \dots n. $$
If this region splits up into connected components, then each of them contains as many eigen values of the perturbed matrix as the number of circles constituting that component (as the norm in (4), one can take the spectral norm, while $ \mathop{\rm cond} P $ can be taken as the spectral condition number).
The number $ \mathop{\rm cond} P $ is called the condition number of the matrix $ A $ in relation to the complete problem of eigen values. Expressed in the spectral norm, it is related to the distortion coefficients as follows:
$$ s( \lambda _ {i} ) \leq \mathop{\rm cond} P \leq \sum_{i=1}^ { n } s( \lambda _ {i} ). $$
The perturbation in an eigen vector $ x $ related to the simple eigen value $ \lambda $ depends on the perturbation of the matrix $ A $ in a more complicated way. It is determined, generally speaking, not only by the distortion coefficient corresponding to $ \lambda $ but also by those for the other eigen values. The sensitivity of $ x $ also increases if there are eigen values close to $ \lambda $. In the limiting case where $ \lambda $ becomes a multiple eigen value, it becomes meaningless to consider the sensitivity of an individual eigen vector, and it is necessary to speak of the sensitivity of an eigen (or invariant) subspace.
Qualitatively, the estimates (3) and (4) mean that the perturbations in the eigen values of a diagonalizable matrix $ A $ are proportional to the size of the perturbation $ F $, while the proportionality factors are represented by the condition numbers, individual or global. If the Jordan form of $ A $ is non-diagonal and the eigen value $ \lambda $ corresponds to an elementary divisor $ ( z - \lambda ) ^ {m} $, then the perturbation of $ \lambda $ in the general case is proportional not to $ \| F \| $, but to $ \| F \| ^ {1/m} $.
The most important particular case of the complete problem of eigen values is to calculate all eigen values (eigen vectors) of a real symmetric or complex Hermitian matrix $ A $. The distortion coefficients for such a matrix are equal to one, and the approximate estimate (3) becomes an exact inequality:
$$ | \Delta \lambda | \leq \| F \| _ {2} . $$
The matrix $ P $ in (4) can be taken as orthogonal or unitary, and therefore the global condition number in the spectral norm is equal to one. No matter what the multiplicity of the points in the spectrum of $ A $, there exists an ordering of the eigen values $ \widetilde \lambda _ {i} $ of the matrix $ A+ F $ such that for all $ i $:
$$ | \widetilde \lambda _ {i} - \lambda _ {i} | \leq \| F \| _ {2} . $$
Even sharper bounds are possible if not only the matrix $ A $ itself is symmetric (Hermitian) but the same applies to the perturbation $ F $[5].
In addition, there are some a posteriori estimates for the accuracy of the calculated eigen values and eigen vectors. They are most effective for symmetric and Hermitian matrices $ A $.
Let $ \widetilde{x} _ {i} $ be an approximate eigen vector and let $ x _ {i} $ be the exact eigen vector at a simple eigen value $ \lambda $ of the matrix $ A $. Both vectors are assumed to be normalized. The best estimate of $ \lambda _ {i} $ that can be obtained by means of the vector $ \widetilde{x} _ {i} $ is given by the value of the Rayleigh functional
$$ \phi ( A, z) = \frac{( Az , z ) }{( z , z ) } $$
corresponding to $ \widetilde{x} _ {i} $, i.e. the number $ \mu _ {i} = \phi ( A, \widetilde{x} _ {i} ) $( this applies for any matrix $ A $, not only Hermitian ones). The vector $ r _ {i} = A \widetilde{x} _ {i} - \mu _ {i} \widetilde{x} _ {i} $ is called the discrepancy vector (or residual).
Let
$$ \epsilon = \| r _ {i} \| _ {2} ,\ \ a = \min _ {j \neq i } | \lambda _ {j} - \mu _ {i} | . $$
A bound for $ a $ can readily be derived from the calculated eigen values $ \widetilde \lambda _ {j} $. The following apply:
$$ | \lambda _ {i} - \mu _ {i} | \leq \frac{\epsilon ^ {2} }{a} ,\ \ | \sin \ang ( x _ {i} , \widetilde{x} _ {i} ) | \leq \frac \epsilon {a} . $$
If $ \lambda _ {i} $ is a multiple eigen value or if there is a group of eigen values near to $ \lambda _ {i} $, it is necessary to estimate the over-all perturbation for the whole group and the perturbation for the corresponding invariant subspace [5].
Along with this ordinary complete eigen value problem, it is often also necessary to solve the so-called generalized eigen value problem:
$$ \tag{5 } Ax = \lambda Bx. $$
In the most important case, the matrices $ A $ and $ B $ are symmetric (Hermitian) and one of them is positive definite. The theory and methods for numerically solving (5) are parallel to those for an ordinary symmetric (Hermitian) eigen value problem.
If neither of the matrices $ A $ and $ B $ is definite or if at least one of them is not symmetric, one uses the $ \mathop{\rm QZ} $ algorithm [6], which is a generalization of the $ \mathop{\rm QR} $ algorithm. Here new effects can occur that are not observed in ordinary complete eigen value problems: infinite eigen values and continuous spectra.
Even more complicated features occur in non-linear eigen value problems:
$$ ( A _ {n} \lambda ^ {n} + A _ {n-1} \lambda ^ {n-1} + \dots + A _ {0} ) x = 0. $$
Such problems are usually solved by reducing them to linear ones of a higher order [7].
References
[1] | A.M. Danilevskii, Mat. Sb. , 2 : 1 (1937) pp. 169–171 |
[2] | D.K. Faddeev, V.N. Faddeeva, "Computational methods of linear algebra" , Freeman (1963) (Translated from Russian) |
[3] | J.H. Wilkinson, "Rounding errors in algebraic processes" , Prentice-Hall (1963) |
[4] | J.H. Wilkinson, "The algebraic eigenvalue problem" , Oxford Univ. Press (1969) |
[5] | B.N. Parlett, "The symmetric eigenvalue problem" , Prentice-Hall (1980) |
[6] | C.B. Moler, G.W. Stewart, "An algorithm for generalized matrix eigenvalue problems" Siam J. Numer. Anal. , 10 (1973) pp. 241–256 |
[7] | V.N. Kublanovskaya, "Numerical methods of linear algebra" , Novosibirsk (1980) pp. 37–53 (In Russian) |
Comments
In recent years much research has been carried out on the eigen value problem for sparse matrices (cf. Sparse matrix). Because of storage problems, the challenges are even greater here. For the symmetric (Hermitian) case the Lanczos algorithm has proved very successful.
Cf. Iteration methods for a description of the QR algorithm and other iterative algorithms for calculating eigen values.
References
[a1] | G.H. Golub, C.F. van Loan, "Matrix computations" , North Oxford Acad. (1983) |
Complete problem of eigen values. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Complete_problem_of_eigen_values&oldid=55271