Namespaces
Variants
Actions

Difference between revisions of "A priori and a posteriori bounds in matrix computations"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (tex encoded by computer)
 
Line 1: Line 1:
Three sources of errors are behind the lack of accuracy in numerical computations: data errors associated with the physical model, truncation errors when series are truncated to a number of terms, and rounding errors resulting from finite machine precision. Therefore, for the equation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100101.png" /> in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100102.png" /> is a parameter, if an error occurs in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100103.png" />, the algorithm returns <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100104.png" /> and not <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100105.png" />. How much the computed solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100106.png" /> differs from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100107.png" /> is an indication of the computational accuracy of the result. Since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100108.png" /> is unknown, the norm <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a1100109.png" />, taken as an indication of the accuracy, cannot be calculated, and numerical analysts have devised bounds for the latter in terms of the variation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001010.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001011.png" />. These bounds are of two types: a priori bounds and a posteriori bounds. The first are applied prior to computation and are usually poor in nature, whereas the second use information extracted from a computation, and are more indicative. In this article, attention is focussed on the most widely available bounds which can be implemented to estimate the accuracy of results and to check the stability of numerical algorithms (cf. also [[Stability of a computational process|Stability of a computational process]]).
+
<!--
 +
a1100101.png
 +
$#A+1 = 302 n = 0
 +
$#C+1 = 302 : ~/encyclopedia/old_files/data/A110/A.1100010 A priori and a posteriori bounds in matrix computations
 +
Automatically converted into TeX, above some diagnostics.
 +
Please remove this comment and the {{TEX|auto}} line below,
 +
if TeX found to be correct.
 +
-->
  
==Linear matrix equations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001012.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001013.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001014.png" />.==
+
{{TEX|auto}}
Assuming that the computed solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001015.png" /> is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001016.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001017.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001018.png" /> is the error in the solution resulting from a perturbation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001019.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001020.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001021.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001022.png" />, the perturbed problem
+
{{TEX|done}}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001023.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a1)</td></tr></table>
+
Three sources of errors are behind the lack of accuracy in numerical computations: data errors associated with the physical model, truncation errors when series are truncated to a number of terms, and rounding errors resulting from finite machine precision. Therefore, for the equation  $  f ( a,x ) = 0 $
 +
in which  $  a $
 +
is a parameter, if an error occurs in  $  a $,
 +
the algorithm returns  $  {\widehat{x}  } $
 +
and not  $  x $.
 +
How much the computed solution  $  {\widehat{x}  } $
 +
differs from  $  x $
 +
is an indication of the computational accuracy of the result. Since  $  x $
 +
is unknown, the norm  $  | { {\widehat{x}  } - x } | $,
 +
taken as an indication of the accuracy, cannot be calculated, and numerical analysts have devised bounds for the latter in terms of the variation  $  \delta a $
 +
in  $  a $.
 +
These bounds are of two types: a priori bounds and a posteriori bounds. The first are applied prior to computation and are usually poor in nature, whereas the second use information extracted from a computation, and are more indicative. In this article, attention is focussed on the most widely available bounds which can be implemented to estimate the accuracy of results and to check the stability of numerical algorithms (cf. also [[Stability of a computational process|Stability of a computational process]]).
 +
 
 +
==Linear matrix equations  $  Ax = b $,  $  A \in \mathbf R ^ {n \times n } $,  $  { \mathop{\rm det} } ( A ) \neq 0 $.==
 +
Assuming that the computed solution  $  {\widehat{x}  } $
 +
is equal to  $  x + \delta x $,
 +
where  $  x = A ^ {- 1 } b $
 +
and  $  \delta x $
 +
is the error in the solution resulting from a perturbation  $  \delta A $
 +
in  $  A $
 +
and  $  \delta b $
 +
in  $  b $,
 +
the perturbed problem
 +
 
 +
$$ \tag{a1 }
 +
( A + \delta A ) ( x + \delta x ) = b + \delta b
 +
$$
  
 
implies, by cancelling equal terms,
 
implies, by cancelling equal terms,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001024.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a2)</td></tr></table>
+
$$ \tag{a2 }
 +
\delta x = A ^ {- 1 } ( - \delta Ax - \delta A \delta x + \delta b ) .
 +
$$
  
 
It follows that for a consistent matrix-vector norm [[#References|[a1]]]
 
It follows that for a consistent matrix-vector norm [[#References|[a1]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001025.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a3)</td></tr></table>
+
$$ \tag{a3 }
 +
\left \| {\delta x } \right \| \leq  \left \| {A ^ {- 1 } \delta A } \right \| \left \| x \right \| + \left \| {A ^ {- 1 } \delta A } \right \| \left \| {\delta x } \right \| + \left \| {A ^ {- 1 } \delta b } \right \| ,
 +
$$
  
which implies that under the conditions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001026.png" />,
+
which implies that under the conditions $  \| {A ^ {- 1 } \delta A } \| < 1 $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001027.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a4)</td></tr></table>
+
$$ \tag{a4 }
 +
{
 +
\frac{\left \| {\delta x } \right \| }{\left \| x \right \| }
 +
} \leq  {
 +
\frac{\left \| {A ^ {- 1 } \delta A } \right \| + {
 +
\frac{\left \| {A ^ {- 1 } \delta b } \right \| }{\left \| x \right \| }
 +
} }{1 - \left \| {A ^ {- 1 } \delta A } \right \| }
 +
} .
 +
$$
  
Setting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001028.png" />, from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001029.png" /> one finds
+
Setting $  k ( A ) = \| {A ^ {- 1 } } \| \| A \| $,  
 +
from $  \| b \| \leq  \| A \| \| x \| $
 +
one finds
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001030.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a5)</td></tr></table>
+
$$ \tag{a5 }
 +
{
 +
\frac{\delta x }{\left \| x \right \| }
 +
} \leq  {
 +
\frac{k ( A ) }{1 - k ( A ) {
 +
\frac{\left \| {\delta A } \right \| }{\left \| A \right \| }
 +
} }
 +
} \left ( {
 +
\frac{\left \| {\delta A } \right \| }{\left \| A \right \| }
 +
} + {
 +
\frac{\left \| {\delta b } \right \| }{\left \| b \right \| }
 +
} \right ) ,
 +
$$
  
which measures the relative error in the solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001031.png" /> resulting from the errors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001032.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001033.png" /> in the data. Thus, the factor <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001034.png" />, called the condition number of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001035.png" /> is the main factor affecting computational accuracy, since by setting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001036.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001037.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001038.png" /> is the relative error in the data, it follows that
+
which measures the relative error in the solution $  x $
 +
resulting from the errors $  \delta A $
 +
and $  \delta b $
 +
in the data. Thus, the factor $  k ( A ) $,  
 +
called the condition number of $  A $
 +
is the main factor affecting computational accuracy, since by setting $  \| {\delta A } \| \leq  \epsilon \| A \| $,  
 +
$  \| {\delta b } \| \leq  \epsilon \| b \| $,  
 +
where $  \epsilon $
 +
is the relative error in the data, it follows that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001039.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a6)</td></tr></table>
+
$$ \tag{a6 }
 +
k ( A ) = {\lim\limits } _ {\epsilon \rightarrow0 } {
 +
\frac{1} \epsilon
 +
} {
 +
\frac{\left \| {\delta x } \right \| }{\left \| x \right \| }
 +
} ,
 +
$$
  
i.e., representing the sensitivity of the solution to relative changes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001040.png" /> in either <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001041.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001042.png" />, that is, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001043.png" />. So, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001044.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001045.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001046.png" /> is the machine mantissa, the number of significant digits in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001047.png" /> becomes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001048.png" />.
+
i.e., representing the sensitivity of the solution to relative changes $  \epsilon $
 +
in either $  A $
 +
or $  b $,  
 +
that is, $  { {\| {\delta x } \| } / {\| x \| } } \approx \epsilon \cdot k ( A ) $.  
 +
So, if $  k ( A ) = 10  ^ {p} $
 +
and $  \epsilon = 5 \cdot 10 ^ {- t } $,  
 +
where $  t $
 +
is the machine mantissa, the number of significant digits in $  {\widehat{x}  } $
 +
becomes $  t - p $.
  
The above analysis lacks accuracy, for the error in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001049.png" /> is not equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001050.png" />, since other factors intervene, one of which is the growth factor in the elements associated with the interchange strategy as well as the size of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001051.png" />, yet it provides an a priori estimate for the expected accuracy. Before the late J. Wilkinson it was thought that rounding errors can ruin the solution completely; he showed than an upper bound exists for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001052.png" /> and that fear of obtaining poor results is unwarranted [[#References|[a2]]].
+
The above analysis lacks accuracy, for the error in $  \| A \| $
 +
is not equal to $  \epsilon \| A \| $,  
 +
since other factors intervene, one of which is the growth factor in the elements associated with the interchange strategy as well as the size of $  \| A \| $,  
 +
yet it provides an a priori estimate for the expected accuracy. Before the late J. Wilkinson it was thought that rounding errors can ruin the solution completely; he showed than an upper bound exists for $  \| {\delta A } \| $
 +
and that fear of obtaining poor results is unwarranted [[#References|[a2]]].
  
 
On the contrary, one can also reach total machine accuracy. Consider, for example, the matrix
 
On the contrary, one can also reach total machine accuracy. Consider, for example, the matrix
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001053.png" /></td> </tr></table>
+
$$
 +
A = \left (
 +
 
 +
\begin{array}{cc}
 +
10  ^ {5}  & 0  \\
 +
0  &10 ^ {- 5 }  \\
 +
\end{array}
 +
 
 +
\right ) ,
 +
$$
  
<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001054.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001055.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001056.png" />, so one expects to loose all significant digits on a <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001057.png" />-digit machine, contrary to the fact that for any right-hand side <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001058.png" />, full accuracy in this situation is reached. This led R. Skeel [[#References|[a3]]] to consider componentwise estimates of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001059.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001060.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001061.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001062.png" /> stands for the modulus of the vector taken componentwise, one obtains from (a2):
+
$  \| A \| = 10  ^ {5} $,  
 +
$  \| {A ^ {- 1 } } \| = 10  ^ {5} $,  
 +
and $  k ( A ) = 10 ^ {10 } $,  
 +
so one expects to loose all significant digits on a $  10 $-
 +
digit machine, contrary to the fact that for any right-hand side $  b $,  
 +
full accuracy in this situation is reached. This led R. Skeel [[#References|[a3]]] to consider componentwise estimates of $  \delta x $.  
 +
For $  | {\delta A } | \leq  \epsilon | A | $,  
 +
$  | {\delta b } | \leq  \epsilon | b | $,  
 +
where $  | \cdot | $
 +
stands for the modulus of the vector taken componentwise, one obtains from (a2):
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001063.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a7)</td></tr></table>
+
$$ \tag{a7 }
 +
\left | {\delta x } \right | \leq  \left | {A ^ {- 1 } \delta A } \right | \left | x \right | + \left | {A ^ {- 1 } \delta A } \right | \left | {\delta x } \right | + \left | {A ^ {- 1 } \delta b } \right | ,
 +
$$
  
giving for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001064.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001065.png" /> is the [[Spectral radius|spectral radius]], that
+
giving for $  \rho ( | {A ^ {- 1 } \delta A } | ) < 1 $,  
 +
where $  \rho $
 +
is the [[Spectral radius|spectral radius]], that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001066.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a8)</td></tr></table>
+
$$ \tag{a8 }
 +
\left | {\delta x } \right | \leq  \left ( I - \left | {A ^ {- 1 } \delta A } \right | \right ) ^ {- 1 } \left ( \left | {A ^ {- 1 } \delta A } \right | \left | x \right | + \left | {A ^ {- 1 } \delta b } \right | \right ) .
 +
$$
  
For relative perturbations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001067.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001068.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001069.png" />, this implies that
+
For relative perturbations $  \epsilon $
 +
in $  A $
 +
and $  b $,  
 +
this implies that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001070.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a9)</td></tr></table>
+
$$ \tag{a9 }
 +
{
 +
\frac{\left \| {\delta x } \right \| }{\left \| x \right \| }
 +
} \leq  {
 +
\frac{2 \epsilon \left \| {  \left | {A ^ {- 1 } } \right |  \left | A \right |  } \right \| }{1 - \epsilon \left \| {  \left | {A ^ {- 1 } } \right |  \left | A \right |  } \right \| }
 +
} ,
 +
$$
  
 
giving
 
giving
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001071.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a10)</td></tr></table>
+
$$ \tag{a10 }
 +
k ( A ) = \left \| {\left | {A ^ {- 1 } } \right | \left | A \right | } \right \| ,
 +
$$
  
which is a better sensitivity criterion. For the above <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001072.png" />-matrix one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001073.png" />, as expected.
+
which is a better sensitivity criterion. For the above $  ( 2 \times2 ) $-
 +
matrix one has $  k ( A ) = 1 $,  
 +
as expected.
  
However, a practical bound for the error in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001074.png" /> relies on information extracted from computation, since it shows the true state of affairs. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001075.png" /> is the residual error of the true solution, then by writing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001076.png" /> one finds <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001077.png" />, hence the a posteriori bound
+
However, a practical bound for the error in $  {\widehat{x}  } $
 +
relies on information extracted from computation, since it shows the true state of affairs. If $  r = A {\widehat{x}  } - b $
 +
is the residual error of the true solution, then by writing $  r = A ( x + \delta x ) - b = A \delta x $
 +
one finds $  \delta x = A ^ {- 1 } r $,  
 +
hence the a posteriori bound
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001078.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a11)</td></tr></table>
+
$$ \tag{a11 }
 +
{
 +
\frac{\left \| {\delta x } \right \| }{\left \| x \right \| }
 +
} \leq  \left \| {A ^ {- 1 } } \right \| {
 +
\frac{\left \| r \right \| }{\left \| x \right \| }
 +
} \leq  k ( A ) {
 +
\frac{\left \| r \right \| }{\left \| b \right \| }
 +
} .
 +
$$
  
This shows that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001079.png" /> intervenes also here and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001080.png" /> can be found very small, yet <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001081.png" /> is totally inaccurate as a result of the large <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001082.png" />, [[#References|[a4]]]. The above bound can still be improved by noticing that since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001083.png" /> is unknown and only an approximate inverse matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001084.png" /> of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001085.png" /> is known, it can be written in the practical form
+
This shows that $  k ( A ) $
 +
intervenes also here and that $  \| r \| $
 +
can be found very small, yet $  {\widehat{x}  } $
 +
is totally inaccurate as a result of the large $  k ( A ) $,  
 +
[[#References|[a4]]]. The above bound can still be improved by noticing that since $  A ^ {- 1 } $
 +
is unknown and only an approximate inverse matrix $  B $
 +
of $  A $
 +
is known, it can be written in the practical form
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001086.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a12)</td></tr></table>
+
$$ \tag{a12 }
 +
\left \| {\delta x } \right \| = \left \| {A ^ {- 1 } B ^ {- 1 } Br } \right \| =
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001087.png" /></td> </tr></table>
+
$$
 +
=  
 +
\left \| {( I - ( I - BA ) ) ^ {- 1 } Br } \right \|  \leq
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001088.png" /></td> </tr></table>
+
$$
 +
\leq 
 +
{
 +
\frac{\left \| {Br } \right \| }{1 - \left \| {I - BA } \right \| }
 +
} ,
 +
$$
  
provided that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001089.png" />.
+
provided that $  \| {I - BA } \| < 1 $.
  
Although scientists have elaborated on various bounds for estimating the accuracy of their results, the situation is not so bleak, for the above bounds are not the end of the story. After all, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001090.png" /> can be ameliorated by a technique called the method of iterative refinement, very similar to Newton's method for solving non-linear equations (cf. [[Newton method|Newton method]]). Today it is an available facility in most packages (Linpack, Lapack, etc.) and runs as follows:
+
Although scientists have elaborated on various bounds for estimating the accuracy of their results, the situation is not so bleak, for the above bounds are not the end of the story. After all, $  {\widehat{x}  } $
 +
can be ameliorated by a technique called the method of iterative refinement, very similar to Newton's method for solving non-linear equations (cf. [[Newton method|Newton method]]). Today it is an available facility in most packages (Linpack, Lapack, etc.) and runs as follows:
  
a) compute <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001091.png" />;
+
a) compute $  r = A {\widehat{x}  } - b $;
  
b) solve <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001092.png" />;
+
b) solve $  A \delta x = r $;
  
c) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001093.png" />, re-do from a). The algorithm converges very rapidly and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001094.png" />.
+
c) $  {\widehat{x}  } _ {\textrm{ new  }  } = {\widehat{x}  } - \delta x $,  
 +
re-do from a). The algorithm converges very rapidly and $  {\widehat{x}  } \rightarrow x $.
  
Finally, accuracy of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001095.png" /> is not the only issue which matters to users. There is also the question of stability of the algorithms. By the stability of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001096.png" /> one means that it satisfies the perturbed problem
+
Finally, accuracy of $  {\widehat{x}  } $
 +
is not the only issue which matters to users. There is also the question of stability of the algorithms. By the stability of $  {\widehat{x}  } $
 +
one means that it satisfies the perturbed problem
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001097.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a13)</td></tr></table>
+
$$ \tag{a13 }
 +
( A + \delta A ) {\widehat{x}  } = b + \delta b,
 +
$$
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001098.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a11001099.png" /> are of the order of the machine allowed uncertainty. An algorithm which returns <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010100.png" /> such that its backward errors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010101.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010102.png" /> are larger than what is anticipated in terms of the machine eps, is unstable. This is checked using the Oettli–Prager criterion [[#References|[a5]]]
+
in which $  \delta A $
 +
and $  \delta b $
 +
are of the order of the machine allowed uncertainty. An algorithm which returns $  {\widehat{x}  } $
 +
such that its backward errors $  \delta A $
 +
and $  \delta b $
 +
are larger than what is anticipated in terms of the machine eps, is unstable. This is checked using the Oettli–Prager criterion [[#References|[a5]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010103.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a14)</td></tr></table>
+
$$ \tag{a14 }
 +
\left | {A {\widehat{x}  } - b } \right | \leq  \Delta A \left | { {\widehat{x}  } } \right | + \Delta b,
 +
$$
  
which has been derived for interval equations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010104.png" /> but is also suitable for systems subject to uncertainties of the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010105.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010106.png" />. Thus follows the stability criterion <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010107.png" />; a result implemented in most available packages by what is called a performance index. (For instance, in the IMSL it is given by
+
which has been derived for interval equations $  [ A \pm \Delta A ] {\widehat{x}  } = [ b \pm \Delta b ] $
 +
but is also suitable for systems subject to uncertainties of the form $  \Delta A = \epsilon | A | $
 +
and $  \Delta b = \epsilon | b | $.  
 +
Thus follows the stability criterion $  | r | \leq  \epsilon ( | A | | { {\widehat{x}  } } | + | b | ) $;  
 +
a result implemented in most available packages by what is called a performance index. (For instance, in the IMSL it is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010108.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a15)</td></tr></table>
+
$$ \tag{a15 }
 +
p = \max  _ {1 \leq  i \leq  n } {
 +
\frac{\left | {b _ {i} - \sum _ {j = 1 } ^ { n }  a _ {ij }  {\widehat{x}  } _ {j} } \right | }{BN + AN \cdot \sum _ {j = 1 } ^ { n }  \left | { {\widehat{x}  } _ {j} } \right | }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010109.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010110.png" />.) If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010111.png" />, the solution is stable [[#References|[a4]]]. The backward error can be given by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010112.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010113.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010114.png" /> is a diagonal matrix and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010115.png" />, with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010116.png" />.
+
where $  BN = \max  _ {1 \leq  i \leq  n }  | {b _ {i} } | $,
 +
$  AN = \max  _ {1 \leq  i,j \leq  n }  | {a _ {ij }  } | $.)  
 +
If $  p < \textrm{ machine  eps  } $,  
 +
the solution is stable [[#References|[a4]]]. The backward error can be given by $  \delta A = - H \cdot | A | \cdot { \mathop{\rm diag} } ( { \mathop{\rm sgn} } ( {\widehat{x}  } _ {i} ) ) $,  
 +
$  \delta b = H \cdot | b | $,  
 +
where $  H $
 +
is a diagonal matrix and $  h _ {ii }  = { {r _ {i} } / {( | A | | { {\widehat{x}  } } | + | b | ) _ {i} } } $,  
 +
with $  | {h _ {ii }  } | < \textrm{ machine  eps  } $.
  
==Linear equations <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010117.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010118.png" />.==
+
==Linear equations $  Ax = b $, $  A \in \mathbf R ^ {m \times n } $.==
Independent of whether the equations have a unique solution (as in the previous section), more than one solution or no solution (only a least-squares solution), the general solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010119.png" /> is given by
+
Independent of whether the equations have a unique solution (as in the previous section), more than one solution or no solution (only a least-squares solution), the general solution $  x $
 +
is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010120.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a16)</td></tr></table>
+
$$ \tag{a16 }
 +
x = A  ^ {+} b + ( I - A  ^ {+} A ) c,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010121.png" /> is an arbitrary vector and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010122.png" /> is the Penrose–Moore inverse of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010123.png" />, satisfying <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010124.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010125.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010126.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010127.png" />. For a consistent set of equations the residual <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010128.png" />, is a special case. The minimal least-squares solution becomes, therefore, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010129.png" />. Unlike in the previous section, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010130.png" /> undergoes a perturbation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010131.png" />, in general <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010132.png" /> cannot be expanded into a Taylor series in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010133.png" />; it does have a Laurent expansion [[#References|[a4]]]. For acute perturbations (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010134.png" />), from [[#References|[a6]]] one finds that
+
where $  c $
 +
is an arbitrary vector and $  A  ^ {+} $
 +
is the Penrose–Moore inverse of $  A $,  
 +
satisfying $  AA  ^ {+} A = A $,  
 +
$  A  ^ {+} AA  ^ {+} = A  ^ {+} $,  
 +
$  ( A  ^ {+} A )  ^ {T} = A  ^ {+} A $
 +
and $  ( AA  ^ {+} )  ^ {T} = AA  ^ {+} $.  
 +
For a consistent set of equations the residual $  Ax - b = ( AA  ^ {+} - I ) b = 0 $,  
 +
is a special case. The minimal least-squares solution becomes, therefore, $  x = A  ^ {+} b $.  
 +
Unlike in the previous section, if $  A $
 +
undergoes a perturbation $  \epsilon A _ {1} $,  
 +
in general $  ( A + \epsilon A _ {1} )  ^ {+} $
 +
cannot be expanded into a Taylor series in $  \epsilon $;  
 +
it does have a Laurent expansion [[#References|[a4]]]. For acute perturbations ( $  { \mathop{\rm rank} } ( A ) = { \mathop{\rm rank} } ( A + \epsilon A _ {1} ) $),  
 +
from [[#References|[a6]]] one finds that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010135.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a17)</td></tr></table>
+
$$ \tag{a17 }
 +
\left \| {( A + \delta A )  ^ {+} } \right \| _ {2} \leq  {
 +
\frac{\left \| {A  ^ {+} } \right \| _ {2} }{1 - \left \| {A  ^ {+} } \right \| _ {2} \left \| {\delta A } \right \| _ {2} }
 +
}
 +
$$
  
 
if
 
if
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010136.png" /></td> </tr></table>
+
$$
 +
\left \| {( A + \delta A )  ^ {+} } \right \| _ {2} = {
 +
\frac{1}{\sigma _ {r} ( A + \delta A ) }
 +
} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010137.png" /></td> </tr></table>
+
$$
 +
\left \| {A  ^ {+} } \right \| _ {2} = {
 +
\frac{1}{\sigma _ {r} ( A ) }
 +
} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010138.png" /></td> </tr></table>
+
$$
 +
\sigma _ {i} ( A ) - \sigma _ {1} ( \delta A ) \leq  \sigma _ {i} ( A + \delta A ) \leq  \sigma _ {i} ( A ) + \sigma _ {i} ( \delta A ) ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010139.png" /></td> </tr></table>
+
$$
 +
i = 1 \dots r,
 +
$$
  
with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010140.png" /> the singular values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010141.png" />. P. Wedin also showed that for any <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010142.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010143.png" />,
+
with $  \sigma _ {1} \geq  \dots \geq  \sigma _ {r} $
 +
the singular values of $  A $.  
 +
P. Wedin also showed that for any $  A $
 +
and $  \delta A $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010144.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a18)</td></tr></table>
+
$$ \tag{a18 }
 +
{
 +
\frac{\left \| {( A + \delta A )  ^ {+} - A  ^ {+} } \right \| }{\left \| {( A + \delta A ) ^ {+} } \right \| _ {2} }
 +
} \leq  \mu \left \| {A  ^ {+} } \right \| _ {2} \left \| {\delta A } \right \| _ {2} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010145.png" /> has tabulated values for various cases of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010146.png" /> in relation to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010147.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010148.png" />. It therefore follows that the bound for the relative error in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010149.png" /> is given by
+
where $  \mu $
 +
has tabulated values for various cases of $  { \mathop{\rm rank} } ( A ) $
 +
in relation to $  m $
 +
and $  n $.  
 +
It therefore follows that the bound for the relative error in $  A  ^ {+} $
 +
is given by
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010150.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a19)</td></tr></table>
+
$$ \tag{a19 }
 +
{
 +
\frac{\left \| {( A + \delta A )  ^ {+} - A  ^ {+} } \right \| }{\left \| {A  ^ {+} } \right \| _ {2} }
 +
} \leq  \mu {
 +
\frac{k ( A ) {
 +
\frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} }
 +
} }{1 - k ( A ) {
 +
\frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} }
 +
} }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010151.png" /> is the spectral condition number, which measures the sensitivity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010152.png" /> to acute perturbations in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010153.png" />. To obtain a bound for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010154.png" /> one starts with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010155.png" /> to find that
+
where $  k ( A ) = \| A \| _ {2} \| {A  ^ {+} } \| _ {2} $
 +
is the spectral condition number, which measures the sensitivity of $  A  ^ {+} $
 +
to acute perturbations in $  A $.  
 +
To obtain a bound for $  { {\| {\delta x } \| } / {\| x \| } } $
 +
one starts with $  x + \delta x = ( A + \delta A )  ^ {+} ( b + \delta b ) $
 +
to find that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010156.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a20)</td></tr></table>
+
$$ \tag{a20 }
 +
\delta x = \left [ ( A + \delta A )  ^ {+} - A  ^ {+} \right ] b + ( A + \delta A ) ^ {+} \delta b.
 +
$$
  
Using a decomposition theorem [[#References|[a6]]] for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010157.png" />, one finds that [[#References|[a7]]]
+
Using a decomposition theorem [[#References|[a6]]] for $  ( A + \delta A )  ^ {+} - A  ^ {+} $,  
 +
one finds that [[#References|[a7]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010158.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a21)</td></tr></table>
+
$$ \tag{a21 }
 +
{
 +
\frac{\left \| {\delta x } \right \| _ {2} }{\left \| x \right \| _ {2} }
 +
} \leq  {\widehat{k}  } \left [ ( 2 + \eta {\widehat{k}  } ) \alpha + \beta \gamma \right ] ,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010159.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a22)</td></tr></table>
+
$$ \tag{a22 }
 +
\alpha = {
 +
\frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} }
 +
} , \quad {\widehat{k}  } = {
 +
\frac{k ( A ) }{1 - \alpha k ( A ) }
 +
} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010160.png" /></td> </tr></table>
+
$$
 +
\eta = {
 +
\frac{\left \| r \right \| _ {2} }{\left \| A \right \| _ {2} \left \| x \right \| _ {2} }
 +
} , \quad \beta = {
 +
\frac{\left \| {\delta b } \right \| _ {2} }{\left \| b \right \| _ {2} }
 +
} , \quad \gamma = {
 +
\frac{\left \| b \right \| _ {2} }{\left \| A \right \| _ {2} \left \| x \right \| _ {2} }
 +
} .
 +
$$
  
Note that the error is dominated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010161.png" />, unlike the bound in (a19), which is dominated by <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010162.png" /> only. For the special case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010163.png" /> the above expression becomes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010164.png" />. For <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010165.png" /> it becomes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010166.png" />. Finally, for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010167.png" /> it reduces to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010168.png" />, which is the situation of the previous section. It should be noted that the above bounds are valid for acute perturbations and that the situation worsens if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010169.png" /> increases the rank of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010170.png" />, in which case the accuracy of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010171.png" /> deteriorates further as
+
Note that the error is dominated by $  {\widehat{k}  }  ^ {2} ( A ) $,  
 +
unlike the bound in (a19), which is dominated by $  {\widehat{k}  } ( A ) $
 +
only. For the special case $  { \mathop{\rm rank} } ( A ) = n < m $
 +
the above expression becomes $  {\widehat{k}  } [ ( 1 + \eta {\widehat{k}  } ) \alpha + \beta \gamma ] $.  
 +
For $  { \mathop{\rm rank} } ( A ) = m < n $
 +
it becomes $  {\widehat{k}  } ( 2 \alpha + \beta ) $.  
 +
Finally, for $  { \mathop{\rm rank} } ( A ) = m = n $
 +
it reduces to $  {\widehat{k}  } ( \alpha + \beta ) $,  
 +
which is the situation of the previous section. It should be noted that the above bounds are valid for acute perturbations and that the situation worsens if $  \delta A $
 +
increases the rank of $  A $,  
 +
in which case the accuracy of $  A  ^ {+} $
 +
deteriorates further as
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010172.png" /></td> </tr></table>
+
$$
 +
\left \| {( A + \delta A )  ^ {+} } \right \| _ {2} > {
 +
\frac{1}{\left \| {\delta A } \right \| _ {2} }
 +
} .
 +
$$
  
Again one can follow an iterative scheme of refinement similar to the one mentioned in the previous section to improve on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010173.png" />. It runs as follows for the full rank situation:
+
Again one can follow an iterative scheme of refinement similar to the one mentioned in the previous section to improve on $  A  ^ {+} $.  
 +
It runs as follows for the full rank situation:
  
a) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010174.png" />;
+
a) $  R = A  ^ {T} - A  ^ {T} AB $;
  
b) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010175.png" />;
+
b) $  \delta A  ^ {+} = - BB  ^ {T} R $;
  
c) <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010176.png" />. The algorithm converges and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010177.png" />.
+
c) $  A  ^ {+} _ {\textrm{ new  }  } = B - \delta A  ^ {+} $.  
 +
The algorithm converges and $  B \rightarrow ( A  ^ {T} A ) ^ {- 1 } A  ^ {T} $.
  
==The eigenvalue problem <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010178.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010179.png" />.==
+
==The eigenvalue problem $  Ax = \lambda x $, $  A \in \mathbf R ^ {n \times n } $.==
Unlike in the previous sections, a perturbation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010180.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010181.png" /> affects both the eigenvalue <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010182.png" /> and its corresponding eigenvector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010183.png" />. One of the earliest a priori bounds for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010184.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010185.png" /> is the eigenvalue of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010186.png" />, is the Bauer–Fike theorem [[#References|[a8]]]. It states that the eigenvalues of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010187.png" /> lie in the union of the discs
+
Unlike in the previous sections, a perturbation $  \delta A $
 +
in $  A $
 +
affects both the eigenvalue $  \lambda $
 +
and its corresponding eigenvector $  x $.  
 +
One of the earliest a priori bounds for $  | { {\widehat \lambda  } - \lambda } | $,  
 +
where $  {\widehat \lambda  } $
 +
is the eigenvalue of $  A + \delta A $,  
 +
is the Bauer–Fike theorem [[#References|[a8]]]. It states that the eigenvalues of $  A + \delta A $
 +
lie in the union of the discs
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010188.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a23)</td></tr></table>
+
$$ \tag{a23 }
 +
M _ {i} = \left \{ z : {\left | {z - \lambda _ {i} } \right | \leq  \left \| {T ^ {- 1 } } \right \| \left \| T \right \| \left \| {\delta A } \right \| } \right \} ,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010189.png" /></td> </tr></table>
+
$$
 +
i = 1 \dots n,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010190.png" /> is the modal matrix of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010191.png" /> (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010192.png" />) assuming <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010193.png" /> to be non-defective. The bound follows upon noticing that since
+
where $  T $
 +
is the modal matrix of $  A $(
 +
$  T ^ {- 1 } AT = \Lambda $)  
 +
assuming $  A $
 +
to be non-defective. The bound follows upon noticing that since
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010194.png" /></td> </tr></table>
+
$$
 +
{\widehat \lambda  } I - A - \delta A = ( {\widehat \lambda  } I - A ) \left [ I - ( {\widehat \lambda  } I - A ) ^ {- 1 } \delta A \right ]
 +
$$
  
is singular, one has <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010195.png" />. Writing
+
is singular, one has $  \| {( {\widehat \lambda  } I - A ) ^ {- 1 } \delta A } \| > 1 $.  
 +
Writing
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010196.png" /></td> </tr></table>
+
$$
 +
( {\widehat \lambda  } I - A ) ^ {- 1 } = T ( {\widehat \lambda  } I - \Lambda ) ^ {- 1 } T ^ {- 1 } ,
 +
$$
  
 
one finds that
 
one finds that
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010197.png" /></td> </tr></table>
+
$$
 +
1 \leq  \left \| {T ( {\widehat \lambda  } I - \Lambda ) ^ {- 1 } T ^ {- 1 } \delta A } \right \| \leq
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010198.png" /></td> </tr></table>
+
$$
 +
\leq 
 +
\left \| T \right \|  \left \| {T ^ {- 1 } } \right \|  \left \| {\delta A } \right \| {
 +
\frac{1}{ { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda  } - \lambda _ {i} } \right | }
 +
} ,
 +
$$
  
which implies (a23). The condition number to the problem is therefore of the order of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010199.png" />. Indeed, if the above discs are isolated, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010200.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010201.png" />. However, unless <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010202.png" /> is a normal matrix (<img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010203.png" />), the bound in (a23) is known to yield pessimistic results. A more realistic criterion can be obtained. As <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010204.png" /> is singular, then for a vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010205.png" />,
+
which implies (a23). The condition number to the problem is therefore of the order of $  k ( T ) = \| T \| \| {T ^ {- 1 } } \| $.  
 +
Indeed, if the above discs are isolated, then $  { \mathop{\rm min} } _ {i} | { {\widehat \lambda  } - \lambda _ {i} } | = | {\delta \lambda _ {i} } | $
 +
and $  | {\delta \lambda _ {i} } | \leq  k ( T ) \| {\delta A } \| $.  
 +
However, unless $  A $
 +
is a normal matrix ( $  \| T \| _ {2} = \| {T ^ {- 1 } } \| _ {2} = 1 $),  
 +
the bound in (a23) is known to yield pessimistic results. A more realistic criterion can be obtained. As $  I - ( {\widehat \lambda  } I - A ) ^ {- 1 } \delta A $
 +
is singular, then for a vector $  z \neq 0 $,
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010206.png" /></td> </tr></table>
+
$$
 +
z \leq  \left | {( {\widehat \lambda  } I - \Lambda ) ^ {- 1 } } \right | \left | {T ^ {- 1 } } \right | \left | {\delta A } \right | \left | T \right | \left | z \right | ,
 +
$$
  
 
implying that [[#References|[a9]]]
 
implying that [[#References|[a9]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010207.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a24)</td></tr></table>
+
$$ \tag{a24 }
 +
{ \mathop{\rm min} } _ { i } \left | { {\widehat \lambda  } - \lambda _ {i} } \right | \leq  \rho ( \left | {T ^ {- 1 } } \right | \left | {\delta A } \right | \left | T \right | ) ,
 +
$$
  
which is tighter than (a23) for non-normal matrices. Moreover, (a24) is scale-invariant under the transformation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010208.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010209.png" /> is diagonal.
+
which is tighter than (a23) for non-normal matrices. Moreover, (a24) is scale-invariant under the transformation $  A = DBD ^ {- 1 } $,  
 +
where $  D $
 +
is diagonal.
  
Both the Bauer–Fike bound and the above are valid for the whole spectrum; thus, a group of ill-conditioned eigenvalues affects the bound for the well-conditioned ones. This led Wilkinson to introduce his <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010211.png" /> factors, measuring the sensitivity of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010212.png" />th eigenvalue alone. From first-order perturbation theory one finds, [[#References|[a1]]],
+
Both the Bauer–Fike bound and the above are valid for the whole spectrum; thus, a group of ill-conditioned eigenvalues affects the bound for the well-conditioned ones. This led Wilkinson to introduce his $  {1 / {s _ {i} } } $
 +
factors, measuring the sensitivity of the $  i $
 +
th eigenvalue alone. From first-order perturbation theory one finds, [[#References|[a1]]],
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010213.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a25)</td></tr></table>
+
$$ \tag{a25 }
 +
\delta \lambda _ {i} \approx {
 +
\frac{y ^ {i  ^ {*} } \delta Ax  ^ {i} }{y ^ {i  ^ {*} } x  ^ {i} }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010214.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010215.png" /> are the eigenvector and eigenrow associated with a simple <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010216.png" />. So, normalizing both of them, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010217.png" /> becomes the Wilkinson factor and is equal to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010218.png" />. However, for a big <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010219.png" /> (a25) does not bound the true shift in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010220.png" />. For this one uses a second Bauer–Fike theorem: If
+
where $  x  ^ {i} $
 +
and $  y ^ {i  ^ {*} } $
 +
are the eigenvector and eigenrow associated with a simple $  \lambda _ {i} $.  
 +
So, normalizing both of them, $  {1 / {| {y ^ {i  ^ {*} } x  ^ {i} } | } } $
 +
becomes the Wilkinson factor and is equal to $  { {| {\delta \lambda _ {i} } | } / {\| {\delta A } \| } } $.  
 +
However, for a big $  \| {\delta A } \| $(
 +
a25) does not bound the true shift in $  \lambda _ {i} $.  
 +
For this one uses a second Bauer–Fike theorem: If
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010221.png" /></td> </tr></table>
+
$$
 +
\delta A \leq  {
 +
\frac{1}{n}
 +
} { \mathop{\rm min} } _ {k \neq i } {
 +
\frac{\left | {\lambda _ {i} - \lambda _ {k} } \right | }{\left \| {E _ {i} } \right \| + \left \| {E _ {k} } \right \| }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010222.png" /> is the eigenprojection of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010223.png" /> (i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010224.png" />), then [[#References|[a8]]]
+
where $  E _ {i} $
 +
is the eigenprojection of $  A $(
 +
i.e., $  E _ {i} = x  ^ {i} y ^ {i  ^ {*} } $),  
 +
then [[#References|[a8]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010225.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a26)</td></tr></table>
+
$$ \tag{a26 }
 +
\left | {\delta \lambda _ {i} } \right | \leq  {
 +
\frac{\left \| {E _ {i} } \right \| \left \| {\delta A } \right \| }{1 - \left \| {\delta A } \right \| \sum _ {k \neq i } {
 +
\frac{\left \| {E _ {i} } \right \| + \left \| {E _ {k} } \right \| }{\left | {\lambda _ {i} - \lambda _ {k} } \right | }
 +
} }
 +
} .
 +
$$
  
It follows that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010226.png" /> represents the sensitivity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010227.png" /> to changes in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010228.png" />. One can also show that [[#References|[a10]]]
+
It follows that $  \| {E _ {i} } \| $
 +
represents the sensitivity of $  \lambda _ {i} $
 +
to changes in $  A $.  
 +
One can also show that [[#References|[a10]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010229.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a27)</td></tr></table>
+
$$ \tag{a27 }
 +
{
 +
\frac{\left \| { {\widehat{x}  }  ^ {i} - x  ^ {i} } \right \| }{\left \| {x  ^ {i} } \right \| }
 +
} \leq  {
 +
\frac \psi { { \mathop{\rm min} } _ {j \neq i } \left | {\lambda _ {i} - \lambda _ {j} } \right | - 2 \psi }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010230.png" /> bounds the shifts in the eigenvalues. So, the shift in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010231.png" /> is dependent on the separation of the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010232.png" />th eigenvalue from the nearest <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010233.png" />. Therefore, a matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010234.png" /> has an ill-conditioned eigenvalue problem if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010235.png" /> or if the discs containing <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010236.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010237.png" />, intersect as a result of poor separation.
+
where $  \psi $
 +
bounds the shifts in the eigenvalues. So, the shift in $  x  ^ {i} $
 +
is dependent on the separation of the $  i $
 +
th eigenvalue from the nearest $  \lambda _ {j} $.  
 +
Therefore, a matrix $  A $
 +
has an ill-conditioned eigenvalue problem if $  { \mathop{\rm min} } _ {j \neq i }  | {\lambda _ {i} - \lambda _ {j} } | < 2 \psi $
 +
or if the discs containing $  {\widehat \lambda  } _ {i} $,  
 +
$  i = 1 \dots n $,  
 +
intersect as a result of poor separation.
  
To obtain an a posteriori bound for the error in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010238.png" /> in terms of the residual one forms <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010239.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010240.png" /> is an approximate eigenpair. Then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010241.png" />, implying
+
To obtain an a posteriori bound for the error in $  \lambda $
 +
in terms of the residual one forms $  r = A {\widehat{x}  } - {\widehat \lambda  } {\widehat{x}  } $,  
 +
where $  ( {\widehat \lambda  } , {\widehat{x}  } ) $
 +
is an approximate eigenpair. Then $  {\widehat{x}  } = T ( \Lambda - {\widehat \lambda  } I ) ^ {- 1 } T ^ {- 1 } r $,  
 +
implying
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010242.png" /></td> </tr></table>
+
$$
 +
\left \| { {\widehat{x}  } } \right \| \leq  \left \| T \right \| \left \| {T ^ {- 1 } } \right \| \left \| r \right \| {
 +
\frac{1}{ { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda  } - \lambda _ {i} } \right | }
 +
} ,
 +
$$
  
 
and [[#References|[a2]]]
 
and [[#References|[a2]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010243.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a28)</td></tr></table>
+
$$ \tag{a28 }
 +
{ \mathop{\rm min} } _ { i } \left | { {\widehat \lambda  } - \lambda _ {i} } \right | \leq  k ( T ) {
 +
\frac{\left \| r \right \| }{\left \| { {\widehat{x}  } } \right \| }
 +
} .
 +
$$
  
For symmetric matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010244.png" />; thus, these have well-conditioned eigenproblems. For the non-symmetric case a powerful technique of H. Wielandt [[#References|[a1]]] can update <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010245.png" /> by solving iteratively the linear equations
+
For symmetric matrices $  k _ {2} ( T ) = 1 $;  
 +
thus, these have well-conditioned eigenproblems. For the non-symmetric case a powerful technique of H. Wielandt [[#References|[a1]]] can update $  {\widehat{x}  } $
 +
by solving iteratively the linear equations
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010246.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a29)</td></tr></table>
+
$$ \tag{a29 }
 +
( A - {\widehat \lambda  } I ) x ^ {( i + 1 ) } = x ^ {( i ) } , \quad i = 1 \dots n,
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010247.png" /></td> </tr></table>
+
$$
 +
x  ^ {1} = {\widehat{x}  } ,
 +
$$
  
and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010248.png" /> very rapidly.
+
and $  x ^ {( i ) } \rightarrow x $
 +
very rapidly.
  
To check the stability of the solution which satisfies <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010249.png" />, i.e., <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010250.png" />, one uses the criterion
+
To check the stability of the solution which satisfies $  ( A + \delta A ) {\widehat{x}  } = {\widehat \lambda  } {\widehat{x}  } $,  
 +
i.e., $  A {\widehat{x}  } - {\widehat \lambda  } {\widehat{x}  } = - \delta A {\widehat{x}  } $,  
 +
one uses the criterion
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010251.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a30)</td></tr></table>
+
$$ \tag{a30 }
 +
\left \| r \right \| = \left \| {A {\widehat{x}  } - {\widehat \lambda  } {\widehat{x}  } } \right \| _ {2} \leq  \epsilon \left \| A \right \| _ {2} \left \| { {\widehat{x}  } } \right \| _ {2} ,
 +
$$
  
which asserts that the system is backward stable and there exists a matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010252.png" /> such that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010253.png" /> is an exact eigenpair of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010254.png" />. One can also construct the backwards error in the form <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010255.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010256.png" />, satisfying exactly <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010257.png" /> by letting <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010258.png" />. The above theory is valid for semi-simple eigenvalues only.
+
which asserts that the system is backward stable and there exists a matrix $  \delta A = - { {r {\widehat{x}  }  ^ {*} } / {\| { {\widehat{x}  } } \| _ {2}  ^ {2} } } $
 +
such that $  ( {\widehat \lambda  } , {\widehat{x}  } ) $
 +
is an exact eigenpair of $  A + \delta A $.  
 +
One can also construct the backwards error in the form $  \delta A = - H \cdot | A | \cdot { \mathop{\rm diag} } ( { \mathop{\rm sgn} } ( {\widehat{x}  } _ {i} ) ) $,  
 +
where $  | {h _ {ii }  } | \leq  \textrm{ machine  eps  } $,  
 +
satisfying exactly $  ( A + \delta A ) {\widehat{x}  } = {\widehat \lambda  } {\widehat{x}  } $
 +
by letting $  r = H \cdot | A | \cdot | { {\widehat{x}  } } | $.  
 +
The above theory is valid for semi-simple eigenvalues only.
  
The case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010259.png" /> is defective (cf. [[Defective value|Defective value]]) is more complicated, since <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010260.png" />, for a perturbation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010261.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010262.png" />, has a Puiseux expansion (cf. [[Puiseux series|Puiseux series]]) in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010263.png" /> in powers of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010264.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010265.png" /> is the multiplicity of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010266.png" />, namely [[#References|[a1]]],
+
The case when $  A $
 +
is defective (cf. [[Defective value|Defective value]]) is more complicated, since $  {\widehat \lambda  } $,  
 +
for a perturbation $  \epsilon A _ {1} $
 +
in $  A $,  
 +
has a Puiseux expansion (cf. [[Puiseux series|Puiseux series]]) in $  \epsilon $
 +
in powers of $  {1 / m } $,  
 +
where $  m $
 +
is the multiplicity of $  \lambda $,  
 +
namely [[#References|[a1]]],
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010267.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a31)</td></tr></table>
+
$$ \tag{a31 }
 +
{\widehat \lambda  } = \lambda + \epsilon ^ { {1 / m } } \lambda _ {1} + \epsilon ^ { {2 / m } } \lambda _ {2} + \dots .
 +
$$
  
For an eigenvalue with index <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010268.png" /> (the index is the size of the largest Jordan block) a bound for the shift in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010269.png" /> is given by [[#References|[a10]]]
+
For an eigenvalue with index $  r $(
 +
the index is the size of the largest Jordan block) a bound for the shift in $  \lambda $
 +
is given by [[#References|[a10]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010270.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a32)</td></tr></table>
+
$$ \tag{a32 }
 +
{ \mathop{\rm min} } _ { i } \left | { {\widehat \lambda  } - \lambda _ {i} } \right | \leq  \max  \{ r \psi,r ^ { {1 / r } } \psi ^ { {1 / r } } \} .
 +
$$
  
 
==Other bounds.==
 
==Other bounds.==
 
Modifications of the above cases are often encountered. For instance, the Sylvester equation
 
Modifications of the above cases are often encountered. For instance, the Sylvester equation
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010271.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a33)</td></tr></table>
+
$$ \tag{a33 }
 +
AX + XB = C
 +
$$
  
and its special case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010272.png" />, famous in control theory, can be transformed to the case of the first section by considering
+
and its special case $  B = A  ^ {T} $,  
 +
famous in control theory, can be transformed to the case of the first section by considering
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010273.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a34)</td></tr></table>
+
$$ \tag{a34 }
 +
( A \otimes I + I \otimes B  ^ {T} ) { \mathop{\\vec{rm}t} } ( X ) = { \mathop{\\vec{rm}t} } ( C ) ,
 +
$$
  
and a relative perturbation <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010274.png" /> in <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010275.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010276.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010277.png" /> yields <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010278.png" /> with an error [[#References|[a13]]]
+
and a relative perturbation $  \epsilon $
 +
in $  A $,  
 +
$  B $,  
 +
$  C $
 +
yields $  {\widehat{X}  } $
 +
with an error [[#References|[a13]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010279.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a35)</td></tr></table>
+
$$ \tag{a35 }
 +
{
 +
\frac{\left \| {\delta X } \right \| }{\left \| X \right \| }
 +
} \leq  {
 +
\frac{\epsilon \cdot k ( A,B ) }{1 - \epsilon \cdot k ( A,B ) }
 +
} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010280.png" /> is the condition number for the problem.
+
where $  k ( A,B ) $
 +
is the condition number for the problem.
  
 
Also, transforming the eigenvalue problem
 
Also, transforming the eigenvalue problem
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010281.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a36)</td></tr></table>
+
$$ \tag{a36 }
 +
( A _ {n} \lambda  ^ {n} + A _ {n - 1 }  \lambda ^ {n - 1 } + \dots + A _ {0} ) x = 0,
 +
$$
  
in which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010282.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010283.png" />, into a generalized eigenvalue problem <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010284.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010285.png" /> is the companion matrix for (a36), while using the decomposition <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010286.png" />, [[#References|[a1]]], one obtains [[#References|[a10]]]
+
in which $  A _ {i} \in \mathbf R ^ {n \times n } $,  
 +
$  i = 0 \dots n - 1 $,  
 +
into a generalized eigenvalue problem $  Cu = \lambda Bu $,  
 +
where $  C $
 +
is the companion matrix for (a36), while using the decomposition $  ( {\widehat \lambda  } B - C ) ^ {- 1 } = P ( {\widehat \lambda  } I - \Lambda ) ^ {- 1 } Q $,  
 +
[[#References|[a1]]], one obtains [[#References|[a10]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010287.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a37)</td></tr></table>
+
$$ \tag{a37 }
 +
\left | { {\widehat \lambda  } - \lambda } \right | \leq  {
 +
\frac{\left \| P \right \| \left \| Q \right \| ( \left | \lambda \right | \left \| {\delta B } \right \| + \left \| {\delta C } \right \| ) }{1 - \left \| P \right \| \left \| Q \right \| \left \| {\delta B } \right \| }
 +
} ,
 +
$$
  
 
which generalizes (a23).
 
which generalizes (a23).
Line 247: Line 681:
 
Many bounds exist for various matrix functions, for instance [[#References|[a1]]]
 
Many bounds exist for various matrix functions, for instance [[#References|[a1]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010288.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a38)</td></tr></table>
+
$$ \tag{a38 }
 +
\left \| {e ^ {A + \delta A } - e  ^ {A} } \right \| \leq  k ( T ) \cdot \left \| W \right \| ,
 +
$$
  
 
where
 
where
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010289.png" /></td> </tr></table>
+
$$
 +
w _ {ij }  = \left [ { {( e ^ {\lambda _ {i} } - e ^ {\lambda _ {j} } ) } / {( \lambda _ {i} - \lambda _ {j} ) } } \right ] \left \langle  {y  ^ {i} , \delta Ax  ^ {j} } \right \rangle .
 +
$$
  
For a general matrix function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010290.png" /> approximated by another one <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010291.png" /> it can be shown that [[#References|[a14]]]
+
For a general matrix function $  F ( A ) $
 +
approximated by another one $  G ( A ) $
 +
it can be shown that [[#References|[a14]]]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010292.png" /></td> <td valign="top" style="width:5%;text-align:right;">(a39)</td></tr></table>
+
$$ \tag{a39 }
 +
\left \| {F ( A ) - G ( A ) } \right \| \leq
 +
$$
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010293.png" /></td> </tr></table>
+
$$
 +
\leq 
 +
k ( T )  \max  _ {1 \leq  r \leq  m - 1, 1 \leq  i \leq  p } {
 +
\frac{\left | {f ^ {( r ) } ( \lambda _ {i} ) - g ^ {( r ) } ( \lambda _ {i} ) } \right | }{r! }
 +
} m _ {i} ,
 +
$$
  
where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010294.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010295.png" /> are scalar functions applied to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010296.png" /> in the sense that if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010297.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010298.png" />, where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010299.png" /> is the size of the largest Jordan block associated with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010300.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010301.png" /> is the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010302.png" />-th derivative of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/a/a110/a110010/a110010303.png" />.
+
where $  f $
 +
and $  g $
 +
are scalar functions applied to $  \lambda _ {i} $
 +
in the sense that if $  F ( A ) = e  ^ {A} $,  
 +
then $  f ( \lambda ) = e  ^  \lambda  $,  
 +
where $  m _ {i} $
 +
is the size of the largest Jordan block associated with $  f ( \lambda _ {i} ) $,  
 +
and $  f ^ {( r ) } ( \lambda ) $
 +
is the $  r $-
 +
th derivative of $  f ( \lambda ) $.
  
 
====References====
 
====References====
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  A. Deif,  "Advanced matrix theory for scientists and engineers" , Gordon&amp;Breach  (1991)  (Edition: Second)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  J. Wilkinson,  "The algebraic eigenvalue problem" , Clarendon Press  (1965)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R. Skeel,  "Scaling for numerical stability in Gaussian elimination"  ''J. Assoc. Comput. Mach.'' , '''26'''  (1979)  pp. 494–526</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  A. Deif,  "Sensitivity analysis in linear systems" , Springer  (1986)</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  W. Oettli,  W. Prager,  "Compatibility of approximate solution of linear equations with given error bounds for coefficients and right hand sides"  ''Numer. Math.'' , '''6'''  (1964)  pp. 405–409</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  P. Wedin,  "Perturbation theory for pseudo-inverses"  ''BIT'' , '''13'''  (1973)  pp. 217–232</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  C. Lawson,  R. Hanson,  "Solving least-squares problems" , Prentice-Hall  (1974)</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  F. Bauer,  C. Fike,  "Norms and exclusion theorems"  ''Numer. Math.'' , '''2'''  (1960)  pp. 137–141</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  A. Deif,  "Realistic a priori and a posteriori bounds for computed eigenvalues"  ''IMA J. Numer. Anal.'' , '''9'''  (1990)  pp. 323–329</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  A. Deif,  "Rigorous perturbation bounds for eigenvalues and eigenvectors of a matrix"  ''J. Comput. Appl. Math.'' , '''57'''  (1995)  pp. 403–412</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  G. Stewart,  "Introduction to matrix computations" , Acad. Press  (1973)</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top">  G. Stewart,  J. Sun,  "Matrix perturbation theory" , Acad. Press  (1990)</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top">  A. Deif,  N. Seif,  S. Hussein,  "Sylvester's equation: accuracy and computational stability"  ''J. Comput. Appl. Math.'' , '''61'''  (1995)  pp. 1–11</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top">  G. Golub,  C. van Loan,  "Matrix computations" , John Hopkins Univ. Press  (1983)</TD></TR></table>
 
<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  A. Deif,  "Advanced matrix theory for scientists and engineers" , Gordon&amp;Breach  (1991)  (Edition: Second)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  J. Wilkinson,  "The algebraic eigenvalue problem" , Clarendon Press  (1965)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  R. Skeel,  "Scaling for numerical stability in Gaussian elimination"  ''J. Assoc. Comput. Mach.'' , '''26'''  (1979)  pp. 494–526</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  A. Deif,  "Sensitivity analysis in linear systems" , Springer  (1986)</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  W. Oettli,  W. Prager,  "Compatibility of approximate solution of linear equations with given error bounds for coefficients and right hand sides"  ''Numer. Math.'' , '''6'''  (1964)  pp. 405–409</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  P. Wedin,  "Perturbation theory for pseudo-inverses"  ''BIT'' , '''13'''  (1973)  pp. 217–232</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  C. Lawson,  R. Hanson,  "Solving least-squares problems" , Prentice-Hall  (1974)</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  F. Bauer,  C. Fike,  "Norms and exclusion theorems"  ''Numer. Math.'' , '''2'''  (1960)  pp. 137–141</TD></TR><TR><TD valign="top">[a9]</TD> <TD valign="top">  A. Deif,  "Realistic a priori and a posteriori bounds for computed eigenvalues"  ''IMA J. Numer. Anal.'' , '''9'''  (1990)  pp. 323–329</TD></TR><TR><TD valign="top">[a10]</TD> <TD valign="top">  A. Deif,  "Rigorous perturbation bounds for eigenvalues and eigenvectors of a matrix"  ''J. Comput. Appl. Math.'' , '''57'''  (1995)  pp. 403–412</TD></TR><TR><TD valign="top">[a11]</TD> <TD valign="top">  G. Stewart,  "Introduction to matrix computations" , Acad. Press  (1973)</TD></TR><TR><TD valign="top">[a12]</TD> <TD valign="top">  G. Stewart,  J. Sun,  "Matrix perturbation theory" , Acad. Press  (1990)</TD></TR><TR><TD valign="top">[a13]</TD> <TD valign="top">  A. Deif,  N. Seif,  S. Hussein,  "Sylvester's equation: accuracy and computational stability"  ''J. Comput. Appl. Math.'' , '''61'''  (1995)  pp. 1–11</TD></TR><TR><TD valign="top">[a14]</TD> <TD valign="top">  G. Golub,  C. van Loan,  "Matrix computations" , John Hopkins Univ. Press  (1983)</TD></TR></table>

Latest revision as of 18:47, 5 April 2020


Three sources of errors are behind the lack of accuracy in numerical computations: data errors associated with the physical model, truncation errors when series are truncated to a number of terms, and rounding errors resulting from finite machine precision. Therefore, for the equation $ f ( a,x ) = 0 $ in which $ a $ is a parameter, if an error occurs in $ a $, the algorithm returns $ {\widehat{x} } $ and not $ x $. How much the computed solution $ {\widehat{x} } $ differs from $ x $ is an indication of the computational accuracy of the result. Since $ x $ is unknown, the norm $ | { {\widehat{x} } - x } | $, taken as an indication of the accuracy, cannot be calculated, and numerical analysts have devised bounds for the latter in terms of the variation $ \delta a $ in $ a $. These bounds are of two types: a priori bounds and a posteriori bounds. The first are applied prior to computation and are usually poor in nature, whereas the second use information extracted from a computation, and are more indicative. In this article, attention is focussed on the most widely available bounds which can be implemented to estimate the accuracy of results and to check the stability of numerical algorithms (cf. also Stability of a computational process).

Linear matrix equations $ Ax = b $, $ A \in \mathbf R ^ {n \times n } $, $ { \mathop{\rm det} } ( A ) \neq 0 $.

Assuming that the computed solution $ {\widehat{x} } $ is equal to $ x + \delta x $, where $ x = A ^ {- 1 } b $ and $ \delta x $ is the error in the solution resulting from a perturbation $ \delta A $ in $ A $ and $ \delta b $ in $ b $, the perturbed problem

$$ \tag{a1 } ( A + \delta A ) ( x + \delta x ) = b + \delta b $$

implies, by cancelling equal terms,

$$ \tag{a2 } \delta x = A ^ {- 1 } ( - \delta Ax - \delta A \delta x + \delta b ) . $$

It follows that for a consistent matrix-vector norm [a1]

$$ \tag{a3 } \left \| {\delta x } \right \| \leq \left \| {A ^ {- 1 } \delta A } \right \| \left \| x \right \| + \left \| {A ^ {- 1 } \delta A } \right \| \left \| {\delta x } \right \| + \left \| {A ^ {- 1 } \delta b } \right \| , $$

which implies that under the conditions $ \| {A ^ {- 1 } \delta A } \| < 1 $,

$$ \tag{a4 } { \frac{\left \| {\delta x } \right \| }{\left \| x \right \| } } \leq { \frac{\left \| {A ^ {- 1 } \delta A } \right \| + { \frac{\left \| {A ^ {- 1 } \delta b } \right \| }{\left \| x \right \| } } }{1 - \left \| {A ^ {- 1 } \delta A } \right \| } } . $$

Setting $ k ( A ) = \| {A ^ {- 1 } } \| \| A \| $, from $ \| b \| \leq \| A \| \| x \| $ one finds

$$ \tag{a5 } { \frac{\delta x }{\left \| x \right \| } } \leq { \frac{k ( A ) }{1 - k ( A ) { \frac{\left \| {\delta A } \right \| }{\left \| A \right \| } } } } \left ( { \frac{\left \| {\delta A } \right \| }{\left \| A \right \| } } + { \frac{\left \| {\delta b } \right \| }{\left \| b \right \| } } \right ) , $$

which measures the relative error in the solution $ x $ resulting from the errors $ \delta A $ and $ \delta b $ in the data. Thus, the factor $ k ( A ) $, called the condition number of $ A $ is the main factor affecting computational accuracy, since by setting $ \| {\delta A } \| \leq \epsilon \| A \| $, $ \| {\delta b } \| \leq \epsilon \| b \| $, where $ \epsilon $ is the relative error in the data, it follows that

$$ \tag{a6 } k ( A ) = {\lim\limits } _ {\epsilon \rightarrow0 } { \frac{1} \epsilon } { \frac{\left \| {\delta x } \right \| }{\left \| x \right \| } } , $$

i.e., representing the sensitivity of the solution to relative changes $ \epsilon $ in either $ A $ or $ b $, that is, $ { {\| {\delta x } \| } / {\| x \| } } \approx \epsilon \cdot k ( A ) $. So, if $ k ( A ) = 10 ^ {p} $ and $ \epsilon = 5 \cdot 10 ^ {- t } $, where $ t $ is the machine mantissa, the number of significant digits in $ {\widehat{x} } $ becomes $ t - p $.

The above analysis lacks accuracy, for the error in $ \| A \| $ is not equal to $ \epsilon \| A \| $, since other factors intervene, one of which is the growth factor in the elements associated with the interchange strategy as well as the size of $ \| A \| $, yet it provides an a priori estimate for the expected accuracy. Before the late J. Wilkinson it was thought that rounding errors can ruin the solution completely; he showed than an upper bound exists for $ \| {\delta A } \| $ and that fear of obtaining poor results is unwarranted [a2].

On the contrary, one can also reach total machine accuracy. Consider, for example, the matrix

$$ A = \left ( \begin{array}{cc} 10 ^ {5} & 0 \\ 0 &10 ^ {- 5 } \\ \end{array} \right ) , $$

$ \| A \| = 10 ^ {5} $, $ \| {A ^ {- 1 } } \| = 10 ^ {5} $, and $ k ( A ) = 10 ^ {10 } $, so one expects to loose all significant digits on a $ 10 $- digit machine, contrary to the fact that for any right-hand side $ b $, full accuracy in this situation is reached. This led R. Skeel [a3] to consider componentwise estimates of $ \delta x $. For $ | {\delta A } | \leq \epsilon | A | $, $ | {\delta b } | \leq \epsilon | b | $, where $ | \cdot | $ stands for the modulus of the vector taken componentwise, one obtains from (a2):

$$ \tag{a7 } \left | {\delta x } \right | \leq \left | {A ^ {- 1 } \delta A } \right | \left | x \right | + \left | {A ^ {- 1 } \delta A } \right | \left | {\delta x } \right | + \left | {A ^ {- 1 } \delta b } \right | , $$

giving for $ \rho ( | {A ^ {- 1 } \delta A } | ) < 1 $, where $ \rho $ is the spectral radius, that

$$ \tag{a8 } \left | {\delta x } \right | \leq \left ( I - \left | {A ^ {- 1 } \delta A } \right | \right ) ^ {- 1 } \left ( \left | {A ^ {- 1 } \delta A } \right | \left | x \right | + \left | {A ^ {- 1 } \delta b } \right | \right ) . $$

For relative perturbations $ \epsilon $ in $ A $ and $ b $, this implies that

$$ \tag{a9 } { \frac{\left \| {\delta x } \right \| }{\left \| x \right \| } } \leq { \frac{2 \epsilon \left \| { \left | {A ^ {- 1 } } \right | \left | A \right | } \right \| }{1 - \epsilon \left \| { \left | {A ^ {- 1 } } \right | \left | A \right | } \right \| } } , $$

giving

$$ \tag{a10 } k ( A ) = \left \| {\left | {A ^ {- 1 } } \right | \left | A \right | } \right \| , $$

which is a better sensitivity criterion. For the above $ ( 2 \times2 ) $- matrix one has $ k ( A ) = 1 $, as expected.

However, a practical bound for the error in $ {\widehat{x} } $ relies on information extracted from computation, since it shows the true state of affairs. If $ r = A {\widehat{x} } - b $ is the residual error of the true solution, then by writing $ r = A ( x + \delta x ) - b = A \delta x $ one finds $ \delta x = A ^ {- 1 } r $, hence the a posteriori bound

$$ \tag{a11 } { \frac{\left \| {\delta x } \right \| }{\left \| x \right \| } } \leq \left \| {A ^ {- 1 } } \right \| { \frac{\left \| r \right \| }{\left \| x \right \| } } \leq k ( A ) { \frac{\left \| r \right \| }{\left \| b \right \| } } . $$

This shows that $ k ( A ) $ intervenes also here and that $ \| r \| $ can be found very small, yet $ {\widehat{x} } $ is totally inaccurate as a result of the large $ k ( A ) $, [a4]. The above bound can still be improved by noticing that since $ A ^ {- 1 } $ is unknown and only an approximate inverse matrix $ B $ of $ A $ is known, it can be written in the practical form

$$ \tag{a12 } \left \| {\delta x } \right \| = \left \| {A ^ {- 1 } B ^ {- 1 } Br } \right \| = $$

$$ = \left \| {( I - ( I - BA ) ) ^ {- 1 } Br } \right \| \leq $$

$$ \leq { \frac{\left \| {Br } \right \| }{1 - \left \| {I - BA } \right \| } } , $$

provided that $ \| {I - BA } \| < 1 $.

Although scientists have elaborated on various bounds for estimating the accuracy of their results, the situation is not so bleak, for the above bounds are not the end of the story. After all, $ {\widehat{x} } $ can be ameliorated by a technique called the method of iterative refinement, very similar to Newton's method for solving non-linear equations (cf. Newton method). Today it is an available facility in most packages (Linpack, Lapack, etc.) and runs as follows:

a) compute $ r = A {\widehat{x} } - b $;

b) solve $ A \delta x = r $;

c) $ {\widehat{x} } _ {\textrm{ new } } = {\widehat{x} } - \delta x $, re-do from a). The algorithm converges very rapidly and $ {\widehat{x} } \rightarrow x $.

Finally, accuracy of $ {\widehat{x} } $ is not the only issue which matters to users. There is also the question of stability of the algorithms. By the stability of $ {\widehat{x} } $ one means that it satisfies the perturbed problem

$$ \tag{a13 } ( A + \delta A ) {\widehat{x} } = b + \delta b, $$

in which $ \delta A $ and $ \delta b $ are of the order of the machine allowed uncertainty. An algorithm which returns $ {\widehat{x} } $ such that its backward errors $ \delta A $ and $ \delta b $ are larger than what is anticipated in terms of the machine eps, is unstable. This is checked using the Oettli–Prager criterion [a5]

$$ \tag{a14 } \left | {A {\widehat{x} } - b } \right | \leq \Delta A \left | { {\widehat{x} } } \right | + \Delta b, $$

which has been derived for interval equations $ [ A \pm \Delta A ] {\widehat{x} } = [ b \pm \Delta b ] $ but is also suitable for systems subject to uncertainties of the form $ \Delta A = \epsilon | A | $ and $ \Delta b = \epsilon | b | $. Thus follows the stability criterion $ | r | \leq \epsilon ( | A | | { {\widehat{x} } } | + | b | ) $; a result implemented in most available packages by what is called a performance index. (For instance, in the IMSL it is given by

$$ \tag{a15 } p = \max _ {1 \leq i \leq n } { \frac{\left | {b _ {i} - \sum _ {j = 1 } ^ { n } a _ {ij } {\widehat{x} } _ {j} } \right | }{BN + AN \cdot \sum _ {j = 1 } ^ { n } \left | { {\widehat{x} } _ {j} } \right | } } , $$

where $ BN = \max _ {1 \leq i \leq n } | {b _ {i} } | $, $ AN = \max _ {1 \leq i,j \leq n } | {a _ {ij } } | $.) If $ p < \textrm{ machine eps } $, the solution is stable [a4]. The backward error can be given by $ \delta A = - H \cdot | A | \cdot { \mathop{\rm diag} } ( { \mathop{\rm sgn} } ( {\widehat{x} } _ {i} ) ) $, $ \delta b = H \cdot | b | $, where $ H $ is a diagonal matrix and $ h _ {ii } = { {r _ {i} } / {( | A | | { {\widehat{x} } } | + | b | ) _ {i} } } $, with $ | {h _ {ii } } | < \textrm{ machine eps } $.

Linear equations $ Ax = b $, $ A \in \mathbf R ^ {m \times n } $.

Independent of whether the equations have a unique solution (as in the previous section), more than one solution or no solution (only a least-squares solution), the general solution $ x $ is given by

$$ \tag{a16 } x = A ^ {+} b + ( I - A ^ {+} A ) c, $$

where $ c $ is an arbitrary vector and $ A ^ {+} $ is the Penrose–Moore inverse of $ A $, satisfying $ AA ^ {+} A = A $, $ A ^ {+} AA ^ {+} = A ^ {+} $, $ ( A ^ {+} A ) ^ {T} = A ^ {+} A $ and $ ( AA ^ {+} ) ^ {T} = AA ^ {+} $. For a consistent set of equations the residual $ Ax - b = ( AA ^ {+} - I ) b = 0 $, is a special case. The minimal least-squares solution becomes, therefore, $ x = A ^ {+} b $. Unlike in the previous section, if $ A $ undergoes a perturbation $ \epsilon A _ {1} $, in general $ ( A + \epsilon A _ {1} ) ^ {+} $ cannot be expanded into a Taylor series in $ \epsilon $; it does have a Laurent expansion [a4]. For acute perturbations ( $ { \mathop{\rm rank} } ( A ) = { \mathop{\rm rank} } ( A + \epsilon A _ {1} ) $), from [a6] one finds that

$$ \tag{a17 } \left \| {( A + \delta A ) ^ {+} } \right \| _ {2} \leq { \frac{\left \| {A ^ {+} } \right \| _ {2} }{1 - \left \| {A ^ {+} } \right \| _ {2} \left \| {\delta A } \right \| _ {2} } } $$

if

$$ \left \| {( A + \delta A ) ^ {+} } \right \| _ {2} = { \frac{1}{\sigma _ {r} ( A + \delta A ) } } , $$

$$ \left \| {A ^ {+} } \right \| _ {2} = { \frac{1}{\sigma _ {r} ( A ) } } , $$

$$ \sigma _ {i} ( A ) - \sigma _ {1} ( \delta A ) \leq \sigma _ {i} ( A + \delta A ) \leq \sigma _ {i} ( A ) + \sigma _ {i} ( \delta A ) , $$

$$ i = 1 \dots r, $$

with $ \sigma _ {1} \geq \dots \geq \sigma _ {r} $ the singular values of $ A $. P. Wedin also showed that for any $ A $ and $ \delta A $,

$$ \tag{a18 } { \frac{\left \| {( A + \delta A ) ^ {+} - A ^ {+} } \right \| }{\left \| {( A + \delta A ) ^ {+} } \right \| _ {2} } } \leq \mu \left \| {A ^ {+} } \right \| _ {2} \left \| {\delta A } \right \| _ {2} , $$

where $ \mu $ has tabulated values for various cases of $ { \mathop{\rm rank} } ( A ) $ in relation to $ m $ and $ n $. It therefore follows that the bound for the relative error in $ A ^ {+} $ is given by

$$ \tag{a19 } { \frac{\left \| {( A + \delta A ) ^ {+} - A ^ {+} } \right \| }{\left \| {A ^ {+} } \right \| _ {2} } } \leq \mu { \frac{k ( A ) { \frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} } } }{1 - k ( A ) { \frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} } } } } , $$

where $ k ( A ) = \| A \| _ {2} \| {A ^ {+} } \| _ {2} $ is the spectral condition number, which measures the sensitivity of $ A ^ {+} $ to acute perturbations in $ A $. To obtain a bound for $ { {\| {\delta x } \| } / {\| x \| } } $ one starts with $ x + \delta x = ( A + \delta A ) ^ {+} ( b + \delta b ) $ to find that

$$ \tag{a20 } \delta x = \left [ ( A + \delta A ) ^ {+} - A ^ {+} \right ] b + ( A + \delta A ) ^ {+} \delta b. $$

Using a decomposition theorem [a6] for $ ( A + \delta A ) ^ {+} - A ^ {+} $, one finds that [a7]

$$ \tag{a21 } { \frac{\left \| {\delta x } \right \| _ {2} }{\left \| x \right \| _ {2} } } \leq {\widehat{k} } \left [ ( 2 + \eta {\widehat{k} } ) \alpha + \beta \gamma \right ] , $$

where

$$ \tag{a22 } \alpha = { \frac{\left \| {\delta A } \right \| _ {2} }{\left \| A \right \| _ {2} } } , \quad {\widehat{k} } = { \frac{k ( A ) }{1 - \alpha k ( A ) } } , $$

$$ \eta = { \frac{\left \| r \right \| _ {2} }{\left \| A \right \| _ {2} \left \| x \right \| _ {2} } } , \quad \beta = { \frac{\left \| {\delta b } \right \| _ {2} }{\left \| b \right \| _ {2} } } , \quad \gamma = { \frac{\left \| b \right \| _ {2} }{\left \| A \right \| _ {2} \left \| x \right \| _ {2} } } . $$

Note that the error is dominated by $ {\widehat{k} } ^ {2} ( A ) $, unlike the bound in (a19), which is dominated by $ {\widehat{k} } ( A ) $ only. For the special case $ { \mathop{\rm rank} } ( A ) = n < m $ the above expression becomes $ {\widehat{k} } [ ( 1 + \eta {\widehat{k} } ) \alpha + \beta \gamma ] $. For $ { \mathop{\rm rank} } ( A ) = m < n $ it becomes $ {\widehat{k} } ( 2 \alpha + \beta ) $. Finally, for $ { \mathop{\rm rank} } ( A ) = m = n $ it reduces to $ {\widehat{k} } ( \alpha + \beta ) $, which is the situation of the previous section. It should be noted that the above bounds are valid for acute perturbations and that the situation worsens if $ \delta A $ increases the rank of $ A $, in which case the accuracy of $ A ^ {+} $ deteriorates further as

$$ \left \| {( A + \delta A ) ^ {+} } \right \| _ {2} > { \frac{1}{\left \| {\delta A } \right \| _ {2} } } . $$

Again one can follow an iterative scheme of refinement similar to the one mentioned in the previous section to improve on $ A ^ {+} $. It runs as follows for the full rank situation:

a) $ R = A ^ {T} - A ^ {T} AB $;

b) $ \delta A ^ {+} = - BB ^ {T} R $;

c) $ A ^ {+} _ {\textrm{ new } } = B - \delta A ^ {+} $. The algorithm converges and $ B \rightarrow ( A ^ {T} A ) ^ {- 1 } A ^ {T} $.

The eigenvalue problem $ Ax = \lambda x $, $ A \in \mathbf R ^ {n \times n } $.

Unlike in the previous sections, a perturbation $ \delta A $ in $ A $ affects both the eigenvalue $ \lambda $ and its corresponding eigenvector $ x $. One of the earliest a priori bounds for $ | { {\widehat \lambda } - \lambda } | $, where $ {\widehat \lambda } $ is the eigenvalue of $ A + \delta A $, is the Bauer–Fike theorem [a8]. It states that the eigenvalues of $ A + \delta A $ lie in the union of the discs

$$ \tag{a23 } M _ {i} = \left \{ z : {\left | {z - \lambda _ {i} } \right | \leq \left \| {T ^ {- 1 } } \right \| \left \| T \right \| \left \| {\delta A } \right \| } \right \} , $$

$$ i = 1 \dots n, $$

where $ T $ is the modal matrix of $ A $( $ T ^ {- 1 } AT = \Lambda $) assuming $ A $ to be non-defective. The bound follows upon noticing that since

$$ {\widehat \lambda } I - A - \delta A = ( {\widehat \lambda } I - A ) \left [ I - ( {\widehat \lambda } I - A ) ^ {- 1 } \delta A \right ] $$

is singular, one has $ \| {( {\widehat \lambda } I - A ) ^ {- 1 } \delta A } \| > 1 $. Writing

$$ ( {\widehat \lambda } I - A ) ^ {- 1 } = T ( {\widehat \lambda } I - \Lambda ) ^ {- 1 } T ^ {- 1 } , $$

one finds that

$$ 1 \leq \left \| {T ( {\widehat \lambda } I - \Lambda ) ^ {- 1 } T ^ {- 1 } \delta A } \right \| \leq $$

$$ \leq \left \| T \right \| \left \| {T ^ {- 1 } } \right \| \left \| {\delta A } \right \| { \frac{1}{ { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda } - \lambda _ {i} } \right | } } , $$

which implies (a23). The condition number to the problem is therefore of the order of $ k ( T ) = \| T \| \| {T ^ {- 1 } } \| $. Indeed, if the above discs are isolated, then $ { \mathop{\rm min} } _ {i} | { {\widehat \lambda } - \lambda _ {i} } | = | {\delta \lambda _ {i} } | $ and $ | {\delta \lambda _ {i} } | \leq k ( T ) \| {\delta A } \| $. However, unless $ A $ is a normal matrix ( $ \| T \| _ {2} = \| {T ^ {- 1 } } \| _ {2} = 1 $), the bound in (a23) is known to yield pessimistic results. A more realistic criterion can be obtained. As $ I - ( {\widehat \lambda } I - A ) ^ {- 1 } \delta A $ is singular, then for a vector $ z \neq 0 $,

$$ z \leq \left | {( {\widehat \lambda } I - \Lambda ) ^ {- 1 } } \right | \left | {T ^ {- 1 } } \right | \left | {\delta A } \right | \left | T \right | \left | z \right | , $$

implying that [a9]

$$ \tag{a24 } { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda } - \lambda _ {i} } \right | \leq \rho ( \left | {T ^ {- 1 } } \right | \left | {\delta A } \right | \left | T \right | ) , $$

which is tighter than (a23) for non-normal matrices. Moreover, (a24) is scale-invariant under the transformation $ A = DBD ^ {- 1 } $, where $ D $ is diagonal.

Both the Bauer–Fike bound and the above are valid for the whole spectrum; thus, a group of ill-conditioned eigenvalues affects the bound for the well-conditioned ones. This led Wilkinson to introduce his $ {1 / {s _ {i} } } $ factors, measuring the sensitivity of the $ i $ th eigenvalue alone. From first-order perturbation theory one finds, [a1],

$$ \tag{a25 } \delta \lambda _ {i} \approx { \frac{y ^ {i ^ {*} } \delta Ax ^ {i} }{y ^ {i ^ {*} } x ^ {i} } } , $$

where $ x ^ {i} $ and $ y ^ {i ^ {*} } $ are the eigenvector and eigenrow associated with a simple $ \lambda _ {i} $. So, normalizing both of them, $ {1 / {| {y ^ {i ^ {*} } x ^ {i} } | } } $ becomes the Wilkinson factor and is equal to $ { {| {\delta \lambda _ {i} } | } / {\| {\delta A } \| } } $. However, for a big $ \| {\delta A } \| $( a25) does not bound the true shift in $ \lambda _ {i} $. For this one uses a second Bauer–Fike theorem: If

$$ \delta A \leq { \frac{1}{n} } { \mathop{\rm min} } _ {k \neq i } { \frac{\left | {\lambda _ {i} - \lambda _ {k} } \right | }{\left \| {E _ {i} } \right \| + \left \| {E _ {k} } \right \| } } , $$

where $ E _ {i} $ is the eigenprojection of $ A $( i.e., $ E _ {i} = x ^ {i} y ^ {i ^ {*} } $), then [a8]

$$ \tag{a26 } \left | {\delta \lambda _ {i} } \right | \leq { \frac{\left \| {E _ {i} } \right \| \left \| {\delta A } \right \| }{1 - \left \| {\delta A } \right \| \sum _ {k \neq i } { \frac{\left \| {E _ {i} } \right \| + \left \| {E _ {k} } \right \| }{\left | {\lambda _ {i} - \lambda _ {k} } \right | } } } } . $$

It follows that $ \| {E _ {i} } \| $ represents the sensitivity of $ \lambda _ {i} $ to changes in $ A $. One can also show that [a10]

$$ \tag{a27 } { \frac{\left \| { {\widehat{x} } ^ {i} - x ^ {i} } \right \| }{\left \| {x ^ {i} } \right \| } } \leq { \frac \psi { { \mathop{\rm min} } _ {j \neq i } \left | {\lambda _ {i} - \lambda _ {j} } \right | - 2 \psi } } , $$

where $ \psi $ bounds the shifts in the eigenvalues. So, the shift in $ x ^ {i} $ is dependent on the separation of the $ i $ th eigenvalue from the nearest $ \lambda _ {j} $. Therefore, a matrix $ A $ has an ill-conditioned eigenvalue problem if $ { \mathop{\rm min} } _ {j \neq i } | {\lambda _ {i} - \lambda _ {j} } | < 2 \psi $ or if the discs containing $ {\widehat \lambda } _ {i} $, $ i = 1 \dots n $, intersect as a result of poor separation.

To obtain an a posteriori bound for the error in $ \lambda $ in terms of the residual one forms $ r = A {\widehat{x} } - {\widehat \lambda } {\widehat{x} } $, where $ ( {\widehat \lambda } , {\widehat{x} } ) $ is an approximate eigenpair. Then $ {\widehat{x} } = T ( \Lambda - {\widehat \lambda } I ) ^ {- 1 } T ^ {- 1 } r $, implying

$$ \left \| { {\widehat{x} } } \right \| \leq \left \| T \right \| \left \| {T ^ {- 1 } } \right \| \left \| r \right \| { \frac{1}{ { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda } - \lambda _ {i} } \right | } } , $$

and [a2]

$$ \tag{a28 } { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda } - \lambda _ {i} } \right | \leq k ( T ) { \frac{\left \| r \right \| }{\left \| { {\widehat{x} } } \right \| } } . $$

For symmetric matrices $ k _ {2} ( T ) = 1 $; thus, these have well-conditioned eigenproblems. For the non-symmetric case a powerful technique of H. Wielandt [a1] can update $ {\widehat{x} } $ by solving iteratively the linear equations

$$ \tag{a29 } ( A - {\widehat \lambda } I ) x ^ {( i + 1 ) } = x ^ {( i ) } , \quad i = 1 \dots n, $$

$$ x ^ {1} = {\widehat{x} } , $$

and $ x ^ {( i ) } \rightarrow x $ very rapidly.

To check the stability of the solution which satisfies $ ( A + \delta A ) {\widehat{x} } = {\widehat \lambda } {\widehat{x} } $, i.e., $ A {\widehat{x} } - {\widehat \lambda } {\widehat{x} } = - \delta A {\widehat{x} } $, one uses the criterion

$$ \tag{a30 } \left \| r \right \| = \left \| {A {\widehat{x} } - {\widehat \lambda } {\widehat{x} } } \right \| _ {2} \leq \epsilon \left \| A \right \| _ {2} \left \| { {\widehat{x} } } \right \| _ {2} , $$

which asserts that the system is backward stable and there exists a matrix $ \delta A = - { {r {\widehat{x} } ^ {*} } / {\| { {\widehat{x} } } \| _ {2} ^ {2} } } $ such that $ ( {\widehat \lambda } , {\widehat{x} } ) $ is an exact eigenpair of $ A + \delta A $. One can also construct the backwards error in the form $ \delta A = - H \cdot | A | \cdot { \mathop{\rm diag} } ( { \mathop{\rm sgn} } ( {\widehat{x} } _ {i} ) ) $, where $ | {h _ {ii } } | \leq \textrm{ machine eps } $, satisfying exactly $ ( A + \delta A ) {\widehat{x} } = {\widehat \lambda } {\widehat{x} } $ by letting $ r = H \cdot | A | \cdot | { {\widehat{x} } } | $. The above theory is valid for semi-simple eigenvalues only.

The case when $ A $ is defective (cf. Defective value) is more complicated, since $ {\widehat \lambda } $, for a perturbation $ \epsilon A _ {1} $ in $ A $, has a Puiseux expansion (cf. Puiseux series) in $ \epsilon $ in powers of $ {1 / m } $, where $ m $ is the multiplicity of $ \lambda $, namely [a1],

$$ \tag{a31 } {\widehat \lambda } = \lambda + \epsilon ^ { {1 / m } } \lambda _ {1} + \epsilon ^ { {2 / m } } \lambda _ {2} + \dots . $$

For an eigenvalue with index $ r $( the index is the size of the largest Jordan block) a bound for the shift in $ \lambda $ is given by [a10]

$$ \tag{a32 } { \mathop{\rm min} } _ { i } \left | { {\widehat \lambda } - \lambda _ {i} } \right | \leq \max \{ r \psi,r ^ { {1 / r } } \psi ^ { {1 / r } } \} . $$

Other bounds.

Modifications of the above cases are often encountered. For instance, the Sylvester equation

$$ \tag{a33 } AX + XB = C $$

and its special case $ B = A ^ {T} $, famous in control theory, can be transformed to the case of the first section by considering

$$ \tag{a34 } ( A \otimes I + I \otimes B ^ {T} ) { \mathop{\\vec{rm}t} } ( X ) = { \mathop{\\vec{rm}t} } ( C ) , $$

and a relative perturbation $ \epsilon $ in $ A $, $ B $, $ C $ yields $ {\widehat{X} } $ with an error [a13]

$$ \tag{a35 } { \frac{\left \| {\delta X } \right \| }{\left \| X \right \| } } \leq { \frac{\epsilon \cdot k ( A,B ) }{1 - \epsilon \cdot k ( A,B ) } } , $$

where $ k ( A,B ) $ is the condition number for the problem.

Also, transforming the eigenvalue problem

$$ \tag{a36 } ( A _ {n} \lambda ^ {n} + A _ {n - 1 } \lambda ^ {n - 1 } + \dots + A _ {0} ) x = 0, $$

in which $ A _ {i} \in \mathbf R ^ {n \times n } $, $ i = 0 \dots n - 1 $, into a generalized eigenvalue problem $ Cu = \lambda Bu $, where $ C $ is the companion matrix for (a36), while using the decomposition $ ( {\widehat \lambda } B - C ) ^ {- 1 } = P ( {\widehat \lambda } I - \Lambda ) ^ {- 1 } Q $, [a1], one obtains [a10]

$$ \tag{a37 } \left | { {\widehat \lambda } - \lambda } \right | \leq { \frac{\left \| P \right \| \left \| Q \right \| ( \left | \lambda \right | \left \| {\delta B } \right \| + \left \| {\delta C } \right \| ) }{1 - \left \| P \right \| \left \| Q \right \| \left \| {\delta B } \right \| } } , $$

which generalizes (a23).

Many bounds exist for various matrix functions, for instance [a1]

$$ \tag{a38 } \left \| {e ^ {A + \delta A } - e ^ {A} } \right \| \leq k ( T ) \cdot \left \| W \right \| , $$

where

$$ w _ {ij } = \left [ { {( e ^ {\lambda _ {i} } - e ^ {\lambda _ {j} } ) } / {( \lambda _ {i} - \lambda _ {j} ) } } \right ] \left \langle {y ^ {i} , \delta Ax ^ {j} } \right \rangle . $$

For a general matrix function $ F ( A ) $ approximated by another one $ G ( A ) $ it can be shown that [a14]

$$ \tag{a39 } \left \| {F ( A ) - G ( A ) } \right \| \leq $$

$$ \leq k ( T ) \max _ {1 \leq r \leq m - 1, 1 \leq i \leq p } { \frac{\left | {f ^ {( r ) } ( \lambda _ {i} ) - g ^ {( r ) } ( \lambda _ {i} ) } \right | }{r! } } m _ {i} , $$

where $ f $ and $ g $ are scalar functions applied to $ \lambda _ {i} $ in the sense that if $ F ( A ) = e ^ {A} $, then $ f ( \lambda ) = e ^ \lambda $, where $ m _ {i} $ is the size of the largest Jordan block associated with $ f ( \lambda _ {i} ) $, and $ f ^ {( r ) } ( \lambda ) $ is the $ r $- th derivative of $ f ( \lambda ) $.

References

[a1] A. Deif, "Advanced matrix theory for scientists and engineers" , Gordon&Breach (1991) (Edition: Second)
[a2] J. Wilkinson, "The algebraic eigenvalue problem" , Clarendon Press (1965)
[a3] R. Skeel, "Scaling for numerical stability in Gaussian elimination" J. Assoc. Comput. Mach. , 26 (1979) pp. 494–526
[a4] A. Deif, "Sensitivity analysis in linear systems" , Springer (1986)
[a5] W. Oettli, W. Prager, "Compatibility of approximate solution of linear equations with given error bounds for coefficients and right hand sides" Numer. Math. , 6 (1964) pp. 405–409
[a6] P. Wedin, "Perturbation theory for pseudo-inverses" BIT , 13 (1973) pp. 217–232
[a7] C. Lawson, R. Hanson, "Solving least-squares problems" , Prentice-Hall (1974)
[a8] F. Bauer, C. Fike, "Norms and exclusion theorems" Numer. Math. , 2 (1960) pp. 137–141
[a9] A. Deif, "Realistic a priori and a posteriori bounds for computed eigenvalues" IMA J. Numer. Anal. , 9 (1990) pp. 323–329
[a10] A. Deif, "Rigorous perturbation bounds for eigenvalues and eigenvectors of a matrix" J. Comput. Appl. Math. , 57 (1995) pp. 403–412
[a11] G. Stewart, "Introduction to matrix computations" , Acad. Press (1973)
[a12] G. Stewart, J. Sun, "Matrix perturbation theory" , Acad. Press (1990)
[a13] A. Deif, N. Seif, S. Hussein, "Sylvester's equation: accuracy and computational stability" J. Comput. Appl. Math. , 61 (1995) pp. 1–11
[a14] G. Golub, C. van Loan, "Matrix computations" , John Hopkins Univ. Press (1983)
How to Cite This Entry:
A priori and a posteriori bounds in matrix computations. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=A_priori_and_a_posteriori_bounds_in_matrix_computations&oldid=11746
This article was adapted from an original article by A.S. Deif (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article