Difference between revisions of "Bayesian numerical analysis"

Revision as of 16:45, 1 July 2020

The comparison of algorithms and the analysis of numerical problems in a Bayesian setting, cf. also Bayesian approach.

A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class $P$ of inputs and defines the cost and the error of an algorithm in a worst-case sense over the class $P$. Alternatively, in Bayesian numerical analysis, one puts an a priori distribution $\mu$ on the inputs and defines the cost and the error of an algorithm in an average-case sense, i.e., via expectations with respect to $\mu$. For instance, if $\epsilon ( p , m )$ denotes the error of a method $m$, applied to an input $p$, then

\begin{equation*} \mathcal{E} _ { \operatorname{wor} } ( P , m ) = \operatorname { sup } _ { p \in P } | \epsilon ( p , m ) | \end{equation*}

is called the worst-case (maximal) error of $m$ on $P$ while

\begin{equation*} \mathcal{E} _ { \text{avg} } ( \mu , m ) = \int | \epsilon ( p , m ) | d \mu ( p ) \end{equation*}

is called the average error of $m$ with respect to $\mu$. The notions of optimality of algorithms and complexity of numerical problems are defined analogously in the worst-case setting and the Bayesian setting, see Optimization of computational algorithms and Information-based complexity.

Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of $P$ or being distributed according to $\mu$. For problems with inputs from finite-dimensional spaces, the Lebesgue measure is often used in the definition of $\mu$, see [a1] for linear programming and [a7] for solving systems of polynomial equations. For problems with inputs from infinite-dimensional spaces, often a Gaussian measure $\mu$ is used. If the inputs are real-valued functions on some domain, imposing an a priori distribution is essentially equivalent to treating inputs as random functions (cf. also Random function).

The Bayesian approach leads to new algorithms and to new insight into numerical problems. A few examples of this are as follows.

Multivariate integration.

Under very mild assumptions on the measure $\mu$, the cost of optimal methods does not depend exponentially on the number of variables, see [a8]. On the other hand, often a curse of dimension occurs in the worst-case setting.

Linear programming.

Under natural symmetry properties of $\mu$, the average cost of the simplex method is polynomial, while the worst-case (maximal) cost is exponential, see [a1].

Global optimization.

Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [a2]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class $P$. See also Optimization of computational algorithms; Adaptive quadrature; see [a6] for similar results.

Zero finding.

Bisection is worst-case optimal for many classes $P$ of (smooth) functions. Thus, the worst-case -complexity is of order $\operatorname { log } ( 1 / \epsilon )$. For measures that are derived from the Wiener measure and concentrated on smooth functions, the average-case -complexity is of order $\operatorname { log } \operatorname { log } ( 1 / \epsilon )$. The upper bound is achieved by a hybrid secant-bisection method, see [a5].

For references, see also Information-based complexity.

References

[a1]	K.H. Borgwardt, "The simplex method: A probabilistic approach" , Springer (1987)
[a2]	J.M. Calvin, "Average performance of a class of adaptive algorithms for global optimization" Ann. Appl. Probab. , 7 (1997) pp. 711–730
[a3]	P. Diaconis, "Bayesian numerical analysis" S.S. Gupta (ed.) J.O. Berger (ed.) , Statistical Decision Theory and Related Topics IV , 1 , Springer (1988) pp. 163–175
[a4]	J.B. Kadane, G.W. Wasilkowski, "Average case -complexity in computer science: a Bayesian view" J.M. Bernardo (ed.) , Bayesian Statistics , North-Holland (1988) pp. 361–374
[a5]	E. Novak, K. Ritter, H. Woźniakowski, "Average case optimality of a hybrid secant-bisection method" Math. Comp. , 64 (1995) pp. 1517–1539
[a6]	K. Ritter, "Average case analysis of numerical problems" , Lecture Notes Math. , Springer (2000)
[a7]	M. Shub, S. Smale, "Complexity of Bezout's theorem V: polynomial time" Theoret. Comput. Sci. , 133 (1994) pp. 141–164
[a8]	G.W. Wasilkowski, "Average case complexity of multivariate integration and function approximation: an overview" J. Complexity , 12 (1996) pp. 257–272

How to Cite This Entry:
Bayesian numerical analysis. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Bayesian_numerical_analysis&oldid=49958

This article was adapted from an original article by Klaus Ritter (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

@@ Line 1: / Line 1: @@
+<!--This article has been texified automatically. Since there was no Nroff source code for this article,
+the semi-automatic procedure described at https://encyclopediaofmath.org/wiki/User:Maximilian_Janisch/latexlist
+was used.
+If the TeX and formula formatting is correct, please remove this message and the {{TEX|semi-auto}} category.
+Out of 26 formulas, 23 were replaced by TEX code.-->
+{{TEX|semi-auto}}{{TEX|partial}}
 The comparison of algorithms and the analysis of numerical problems in a Bayesian setting, cf. also [[Bayesian approach|Bayesian approach]].
-A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200801.png" /> of inputs and defines the cost and the error of an algorithm in a worst-case sense over the class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200802.png" />. Alternatively, in Bayesian numerical analysis, one puts an [[A priori distribution|a priori distribution]] <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200803.png" /> on the inputs and defines the cost and the error of an algorithm in an average-case sense, i.e., via expectations with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200804.png" />. For instance, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200805.png" /> denotes the error of a method <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200806.png" />, applied to an input <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200807.png" />, then
+A numerical algorithm is usually developed and applied for inputs sharing some common properties. In a theoretical comparison of algorithms one either selects a class $P$ of inputs and defines the cost and the error of an algorithm in a worst-case sense over the class $P$. Alternatively, in Bayesian numerical analysis, one puts an [[A priori distribution|a priori distribution]] $\mu$ on the inputs and defines the cost and the error of an algorithm in an average-case sense, i.e., via expectations with respect to $\mu$. For instance, if $\epsilon ( p , m )$ denotes the error of a method $m$, applied to an input $p$, then
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200808.png" /></td> </tr></table>
+\begin{equation*} \mathcal{E} _ { \operatorname{wor} } ( P , m ) = \operatorname { sup } _ { p \in P } | \epsilon ( p , m ) | \end{equation*}
-is called the worst-case (maximal) error of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b1200809.png" /> on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008010.png" /> while
+is called the worst-case (maximal) error of $m$ on $P$ while
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008011.png" /></td> </tr></table>
+\begin{equation*} \mathcal{E} _ { \text{avg} } ( \mu , m ) = \int | \epsilon ( p , m ) | d \mu ( p ) \end{equation*}
-is called the average error of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008012.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008013.png" />. The notions of optimality of algorithms and complexity of numerical problems are defined analogously in the worst-case setting and the Bayesian setting, see [[Optimization of computational algorithms|Optimization of computational algorithms]] and [[Index transform|Information-based complexity]].
+is called the average error of $m$ with respect to $\mu$. The notions of optimality of algorithms and complexity of numerical problems are defined analogously in the worst-case setting and the Bayesian setting, see [[Optimization of computational algorithms|Optimization of computational algorithms]] and [[Index transform|Information-based complexity]].
-Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008014.png" /> or being distributed according to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008015.png" />. For problems with inputs from finite-dimensional spaces, the [[Lebesgue measure|Lebesgue measure]] is often used in the definition of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008016.png" />, see [[#References|[a1]]] for [[Linear programming|linear programming]] and [[#References|[a7]]] for solving systems of polynomial equations. For problems with inputs from infinite-dimensional spaces, often a Gaussian measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008017.png" /> is used. If the inputs are real-valued functions on some domain, imposing an a priori distribution is essentially equivalent to treating inputs as random functions (cf. also [[Random function|Random function]]).
+Properties of the inputs, e.g., smoothness of integrands for numerical integration, are expressed by being a member of $P$ or being distributed according to $\mu$. For problems with inputs from finite-dimensional spaces, the [[Lebesgue measure|Lebesgue measure]] is often used in the definition of $\mu$, see [[#References|[a1]]] for [[Linear programming|linear programming]] and [[#References|[a7]]] for solving systems of polynomial equations. For problems with inputs from infinite-dimensional spaces, often a Gaussian measure $\mu$ is used. If the inputs are real-valued functions on some domain, imposing an a priori distribution is essentially equivalent to treating inputs as random functions (cf. also [[Random function|Random function]]).
 The Bayesian approach leads to new algorithms and to new insight into numerical problems. A few examples of this are as follows.
 ===Multivariate integration.===
-Under very mild assumptions on the measure <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008018.png" />, the cost of optimal methods does not depend exponentially on the number of variables, see [[#References|[a8]]]. On the other hand, often a [[Curse of dimension|curse of dimension]] occurs in the worst-case setting.
+Under very mild assumptions on the measure $\mu$, the cost of optimal methods does not depend exponentially on the number of variables, see [[#References|[a8]]]. On the other hand, often a [[Curse of dimension|curse of dimension]] occurs in the worst-case setting.
 ===Linear programming.===
-Under natural symmetry properties of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008019.png" />, the average cost of the [[Simplex method|simplex method]] is polynomial, while the worst-case (maximal) cost is exponential, see [[#References|[a1]]].
+Under natural symmetry properties of $\mu$, the average cost of the [[Simplex method|simplex method]] is polynomial, while the worst-case (maximal) cost is exponential, see [[#References|[a1]]].
 ===Global optimization.===
-Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [[#References|[a2]]]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008020.png" />. See also [[Optimization of computational algorithms|Optimization of computational algorithms]]; [[Adaptive quadrature|Adaptive quadrature]]; see [[#References|[a6]]] for similar results.
+Suitable adaptive (active) methods are superior to all non-adaptive (passive) ones in a Bayesian setting, see [[#References|[a2]]]. In contrast, there is no superiority of adaptive methods in the worst-case setting for any convex and symmetric class $P$. See also [[Optimization of computational algorithms|Optimization of computational algorithms]]; [[Adaptive quadrature|Adaptive quadrature]]; see [[#References|[a6]]] for similar results.
 ===Zero finding.===
-Bisection is worst-case optimal for many classes <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008021.png" /> of (smooth) functions. Thus, the worst-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008022.png" />-complexity is of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008023.png" />. For measures that are derived from the [[Wiener measure|Wiener measure]] and concentrated on smooth functions, the average-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008024.png" />-complexity is of order <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008025.png" />. The upper bound is achieved by a hybrid secant-bisection method, see [[#References|[a5]]].
+Bisection is worst-case optimal for many classes $P$ of (smooth) functions. Thus, the worst-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008022.png"/>-complexity is of order $\operatorname { log } ( 1 / \epsilon )$. For measures that are derived from the [[Wiener measure|Wiener measure]] and concentrated on smooth functions, the average-case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008024.png"/>-complexity is of order $\operatorname { log } \operatorname { log } ( 1 / \epsilon )$. The upper bound is achieved by a hybrid secant-bisection method, see [[#References|[a5]]].
 For references, see also [[Index transform|Information-based complexity]].
 ====References====
-<table><TR><TD valign="top">[a1]</TD> <TD valign="top">  K.H. Borgwardt,   "The simplex method: A probabilistic approach" , Springer  (1987)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top">  J.M. Calvin,   "Average performance of a class of adaptive algorithms for global optimization"  ''Ann. Appl. Probab.'' , '''7'''  (1997)  pp. 711–730</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  P. Diaconis,   "Bayesian numerical analysis"  S.S. Gupta (ed.)  J.O. Berger (ed.) , ''Statistical Decision Theory and Related Topics IV'' , '''1''' , Springer  (1988)  pp. 163–175</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top">  J.B. Kadane,   G.W. Wasilkowski,   "Average case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008026.png" />-complexity in computer science: a Bayesian view"  J.M. Bernardo (ed.) , ''Bayesian Statistics'' , North-Holland  (1988)  pp. 361–374</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  E. Novak,   K. Ritter,   H. Woźniakowski,   "Average case optimality of a hybrid secant-bisection method"  ''Math. Comp.'' , '''64'''  (1995)  pp. 1517–1539</TD></TR><TR><TD valign="top">[a6]</TD> <TD valign="top">  K. Ritter,   "Average case analysis of numerical problems" , ''Lecture Notes Math.'' , Springer  (2000)</TD></TR><TR><TD valign="top">[a7]</TD> <TD valign="top">  M. Shub,   S. Smale,   "Complexity of Bezout's theorem V: polynomial time"  ''Theoret. Comput. Sci.'' , '''133'''  (1994)  pp. 141–164</TD></TR><TR><TD valign="top">[a8]</TD> <TD valign="top">  G.W. Wasilkowski,   "Average case complexity of multivariate integration and function approximation: an overview"  ''J. Complexity'' , '''12'''  (1996)  pp. 257–272</TD></TR></table>
+<table><tr><td valign="top">[a1]</td> <td valign="top">  K.H. Borgwardt,   "The simplex method: A probabilistic approach" , Springer  (1987)</td></tr><tr><td valign="top">[a2]</td> <td valign="top">  J.M. Calvin,   "Average performance of a class of adaptive algorithms for global optimization"  ''Ann. Appl. Probab.'' , '''7'''  (1997)  pp. 711–730</td></tr><tr><td valign="top">[a3]</td> <td valign="top">  P. Diaconis,   "Bayesian numerical analysis"  S.S. Gupta (ed.)  J.O. Berger (ed.) , ''Statistical Decision Theory and Related Topics IV'' , '''1''' , Springer  (1988)  pp. 163–175</td></tr><tr><td valign="top">[a4]</td> <td valign="top">  J.B. Kadane,   G.W. Wasilkowski,   "Average case <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/b/b120/b120080/b12008026.png"/>-complexity in computer science: a Bayesian view"  J.M. Bernardo (ed.) , ''Bayesian Statistics'' , North-Holland  (1988)  pp. 361–374</td></tr><tr><td valign="top">[a5]</td> <td valign="top">  E. Novak,   K. Ritter,   H. Woźniakowski,   "Average case optimality of a hybrid secant-bisection method"  ''Math. Comp.'' , '''64'''  (1995)  pp. 1517–1539</td></tr><tr><td valign="top">[a6]</td> <td valign="top">  K. Ritter,   "Average case analysis of numerical problems" , ''Lecture Notes Math.'' , Springer  (2000)</td></tr><tr><td valign="top">[a7]</td> <td valign="top">  M. Shub,   S. Smale,   "Complexity of Bezout's theorem V: polynomial time"  ''Theoret. Comput. Sci.'' , '''133'''  (1994)  pp. 141–164</td></tr><tr><td valign="top">[a8]</td> <td valign="top">  G.W. Wasilkowski,   "Average case complexity of multivariate integration and function approximation: an overview"  ''J. Complexity'' , '''12'''  (1996)  pp. 257–272</td></tr></table>

Navigation

Tools

Namespaces

Variants

Views

Actions