Difference between revisions of "Bayesian numerical analysis"
(Importing text file) |
m (AUTOMATIC EDIT (latexlist): Replaced 23 formulas out of 26 by TEX code with an average confidence of 2.0 and a minimal confidence of 2.0.) |
||
Line 1: | Line 1: | ||
+ | <!--This article has been texified automatically. Since there was no Nroff source code for this article, | ||
+ | the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist | ||
+ | was used. | ||
+ | If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category. | ||
+ | |||
+ | Out of 26 formulas, 23 were replaced by TEX code.--> | ||
+ | |||
+ | {{TEX|semi-auto}}{{TEX|partial}} | ||
The comparison of algorithms and the analysis of numerical problems in a Bayesian setting, cf. also [[Bayesian approach|Bayesian approach]]. | The comparison of algorithms and the analysis of numerical problems in a Bayesian setting, cf. also [[Bayesian approach|Bayesian approach]]. | ||
− | A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class | + | A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class $P$ of inputs and defines the cost and the error of an algorithm in a worst-case sense over the class $P$. Alternatively, in Bayesian numerical analysis, one puts an [[A priori distribution|a priori distribution]] $\mu$ on the inputs and defines the cost and the error of an algorithm in an average-case sense, i.e., via expectations with respect to $\mu$. For instance, if $\epsilon ( p , m )$ denotes the error of a method $m$, applied to an input $p$, then |
− | + | \begin{equation*} \mathcal{E} _ { \operatorname{wor} } ( P , m ) = \operatorname { sup } _ { p \in P } | \epsilon ( p , m ) | \end{equation*} | |
− | is called the worst-case (maximal) error of | + | is called the worst-case (maximal) error of $m$ on $P$ while |
− | + | \begin{equation*} \mathcal{E} _ { \text{avg} } ( \mu , m ) = \int | \epsilon ( p , m ) | d \mu ( p ) \end{equation*} | |
− | is called the average error of | + | is called the average error of $m$ with respect to $\mu$. The notions of optimality of algorithms and complexity of numerical problems are defined analogously in the worst-case setting and the Bayesian setting, see [[Optimization of computational algorithms|Optimization of computational algorithms]] and [[Index transform|Information-based complexity]]. |
− | Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of | + | Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of $P$ or being distributed according to $\mu$. For problems with inputs from finite-dimensional spaces, the [[Lebesgue measure|Lebesgue measure]] is often used in the definition of $\mu$, see [[#References|[a1]]] for [[Linear programming|linear programming]] and [[#References|[a7]]] for solving systems of polynomial equations. For problems with inputs from infinite-dimensional spaces, often a Gaussian measure $\mu$ is used. If the inputs are real-valued functions on some domain, imposing an a priori distribution is essentially equivalent to treating inputs as random functions (cf. also [[Random function|Random function]]). |
The Bayesian approach leads to new algorithms and to new insight into numerical problems. A few examples of this are as follows. | The Bayesian approach leads to new algorithms and to new insight into numerical problems. A few examples of this are as follows. | ||
===Multivariate integration.=== | ===Multivariate integration.=== | ||
− | Under very mild assumptions on the measure | + | Under very mild assumptions on the measure $\mu$, the cost of optimal methods does not depend exponentially on the number of variables, see [[#References|[a8]]]. On the other hand, often a [[Curse of dimension|curse of dimension]] occurs in the worst-case setting. |
===Linear programming.=== | ===Linear programming.=== | ||
− | Under natural symmetry properties of | + | Under natural symmetry properties of $\mu$, the average cost of the [[Simplex method|simplex method]] is polynomial, while the worst-case (maximal) cost is exponential, see [[#References|[a1]]]. |
===Global optimization.=== | ===Global optimization.=== | ||
− | Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [[#References|[a2]]]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class | + | Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [[#References|[a2]]]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class $P$. See also [[Optimization of computational algorithms|Optimization of computational algorithms]]; [[Adaptive quadrature|Adaptive quadrature]]; see [[#References|[a6]]] for similar results. |
===Zero finding.=== | ===Zero finding.=== | ||
− | Bisection is worst-case optimal for many classes | + | Bisection is worst-case optimal for many classes $P$ of (smooth) functions. Thus, the worst-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008022.png"/>-complexity is of order $\operatorname { log } ( 1 / \epsilon )$. For measures that are derived from the [[Wiener measure|Wiener measure]] and concentrated on smooth functions, the average-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008024.png"/>-complexity is of order $\operatorname { log } \operatorname { log } ( 1 / \epsilon )$. The upper bound is achieved by a hybrid secant-bisection method, see [[#References|[a5]]]. |
For references, see also [[Index transform|Information-based complexity]]. | For references, see also [[Index transform|Information-based complexity]]. | ||
====References==== | ====References==== | ||
− | <table>< | + | <table><tr><td valign="top">[a1]</td> <td valign="top"> K.H. Borgwardt, "The simplex method: A probabilistic approach" , Springer (1987)</td></tr><tr><td valign="top">[a2]</td> <td valign="top"> J.M. Calvin, "Average performance of a class of adaptive algorithms for global optimization" ''Ann. Appl. Probab.'' , '''7''' (1997) pp. 711–730</td></tr><tr><td valign="top">[a3]</td> <td valign="top"> P. Diaconis, "Bayesian numerical analysis" S.S. Gupta (ed.) J.O. Berger (ed.) , ''Statistical Decision Theory and Related Topics IV'' , '''1''' , Springer (1988) pp. 163–175</td></tr><tr><td valign="top">[a4]</td> <td valign="top"> J.B. Kadane, G.W. Wasilkowski, "Average case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008026.png"/>-complexity in computer science: a Bayesian view" J.M. Bernardo (ed.) , ''Bayesian Statistics'' , North-Holland (1988) pp. 361–374</td></tr><tr><td valign="top">[a5]</td> <td valign="top"> E. Novak, K. Ritter, H. Woźniakowski, "Average case optimality of a hybrid secant-bisection method" ''Math. Comp.'' , '''64''' (1995) pp. 1517–1539</td></tr><tr><td valign="top">[a6]</td> <td valign="top"> K. Ritter, "Average case analysis of numerical problems" , ''Lecture Notes Math.'' , Springer (2000)</td></tr><tr><td valign="top">[a7]</td> <td valign="top"> M. Shub, S. Smale, "Complexity of Bezout's theorem V: polynomial time" ''Theoret. Comput. Sci.'' , '''133''' (1994) pp. 141–164</td></tr><tr><td valign="top">[a8]</td> <td valign="top"> G.W. Wasilkowski, "Average case complexity of multivariate integration and function approximation: an overview" ''J. Complexity'' , '''12''' (1996) pp. 257–272</td></tr></table> |
Revision as of 16:45, 1 July 2020
The comparison of algorithms and the analysis of numerical problems in a Bayesian setting, cf. also Bayesian approach.
A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class $P$ of inputs and defines the cost and the error of an algorithm in a worst-case sense over the class $P$. Alternatively, in Bayesian numerical analysis, one puts an a priori distribution $\mu$ on the inputs and defines the cost and the error of an algorithm in an average-case sense, i.e., via expectations with respect to $\mu$. For instance, if $\epsilon ( p , m )$ denotes the error of a method $m$, applied to an input $p$, then
\begin{equation*} \mathcal{E} _ { \operatorname{wor} } ( P , m ) = \operatorname { sup } _ { p \in P } | \epsilon ( p , m ) | \end{equation*}
is called the worst-case (maximal) error of $m$ on $P$ while
\begin{equation*} \mathcal{E} _ { \text{avg} } ( \mu , m ) = \int | \epsilon ( p , m ) | d \mu ( p ) \end{equation*}
is called the average error of $m$ with respect to $\mu$. The notions of optimality of algorithms and complexity of numerical problems are defined analogously in the worst-case setting and the Bayesian setting, see Optimization of computational algorithms and Information-based complexity.
Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of $P$ or being distributed according to $\mu$. For problems with inputs from finite-dimensional spaces, the Lebesgue measure is often used in the definition of $\mu$, see [a1] for linear programming and [a7] for solving systems of polynomial equations. For problems with inputs from infinite-dimensional spaces, often a Gaussian measure $\mu$ is used. If the inputs are real-valued functions on some domain, imposing an a priori distribution is essentially equivalent to treating inputs as random functions (cf. also Random function).
The Bayesian approach leads to new algorithms and to new insight into numerical problems. A few examples of this are as follows.
Multivariate integration.
Under very mild assumptions on the measure $\mu$, the cost of optimal methods does not depend exponentially on the number of variables, see [a8]. On the other hand, often a curse of dimension occurs in the worst-case setting.
Linear programming.
Under natural symmetry properties of $\mu$, the average cost of the simplex method is polynomial, while the worst-case (maximal) cost is exponential, see [a1].
Global optimization.
Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [a2]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class $P$. See also Optimization of computational algorithms; Adaptive quadrature; see [a6] for similar results.
Zero finding.
Bisection is worst-case optimal for many classes $P$ of (smooth) functions. Thus, the worst-case -complexity is of order $\operatorname { log } ( 1 / \epsilon )$. For measures that are derived from the Wiener measure and concentrated on smooth functions, the average-case -complexity is of order $\operatorname { log } \operatorname { log } ( 1 / \epsilon )$. The upper bound is achieved by a hybrid secant-bisection method, see [a5].
For references, see also Information-based complexity.
References
[a1] | K.H. Borgwardt, "The simplex method: A probabilistic approach" , Springer (1987) |
[a2] | J.M. Calvin, "Average performance of a class of adaptive algorithms for global optimization" Ann. Appl. Probab. , 7 (1997) pp. 711–730 |
[a3] | P. Diaconis, "Bayesian numerical analysis" S.S. Gupta (ed.) J.O. Berger (ed.) , Statistical Decision Theory and Related Topics IV , 1 , Springer (1988) pp. 163–175 |
[a4] | J.B. Kadane, G.W. Wasilkowski, "Average case -complexity in computer science: a Bayesian view" J.M. Bernardo (ed.) , Bayesian Statistics , North-Holland (1988) pp. 361–374 |
[a5] | E. Novak, K. Ritter, H. Woźniakowski, "Average case optimality of a hybrid secant-bisection method" Math. Comp. , 64 (1995) pp. 1517–1539 |
[a6] | K. Ritter, "Average case analysis of numerical problems" , Lecture Notes Math. , Springer (2000) |
[a7] | M. Shub, S. Smale, "Complexity of Bezout's theorem V: polynomial time" Theoret. Comput. Sci. , 133 (1994) pp. 141–164 |
[a8] | G.W. Wasilkowski, "Average case complexity of multivariate integration and function approximation: an overview" J. Complexity , 12 (1996) pp. 257–272 |
Bayesian numerical analysis. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_numerical_analysis&oldid=49958