Difference between revisions of "Multi-extremum problem"

Latest revision as of 08:01, 6 June 2020

An extremum problem having several, or an unknown number of, local extrema.

The problem of finding a global extremum of a function $ f ( x) $, $ x = ( x ^ {1} \dots x ^ {n} ) \in X \subset \mathbf R ^ {n} $, $ \overline{X}\; $ compact, has been solved for the basic classes of unimodal functions (first of all for convex and related functions, see Convex programming). For multimodal functions, even for smooth and slowly varying ones, at present (1989) there are no methods which are guaranteed to calculate a global extremum, with the exception of scanning along trajectories which form an everywhere-dense set in the admissible set $ X $. In practice, time-consuming scanning is combined with algorithms for finding a local extremum: by scanning and a priori reduction of $ f ( x) $( estimates of derivatives, functional equations and inequalities, etc.) one delineates a domain of attraction for each local extremum and dead zones, where a concrete local algorithm loses effectiveness (for example, in the gradient method the dead zones are neighbourhoods of saddle points and "bottoms of valleys" ). The extrema are then estimated or looked for by local methods and compared with each other.

For functions satisfying a Lipschitz condition

$$ ( \forall x , x ^ \prime \in X ) \ ( \exists l < \infty ) | f ( x) - f ( x ^ \prime ) | \leq \ l \| x - x ^ \prime \| , $$

a more universal method is that of covering (over a non-uniform grid), which is, moreover, easily realized in the multi-dimensional case.

For finding local extrema close to global ones in an approximation, empirical "heuristic" methods are used, which may be called quasi-global:

1) Algorithms of the heavy-sphere type (cf. Heavy sphere, method of the), minimization methods for functions depending strongly on a few variables, or methods of "galloping" through a local extremum of "fitting" domains of attraction.

2) Random running through the values of $ f ( x) $ by a Monte-Carlo method.

3) Modification of a local method by the introduction of stochastic parameters. For example, the trajectory of a randomized gradient method

$$ \dot{x} = \pm \nabla f ( x) + \xi ( t ) , $$

where $ \xi ( t) $ is a white noise process, can, under certain conditions, move from a point of local to a point of global extremum.

4) "Hardening" a function by approximating it by a sequence of unimodal functions. The approximation can be carried out with respect to the values of $ f ( x) $ at certain points, by expansion with respect to the parameters in an analytic expression of the function, by majorants or minorants, etc.

5) The search for a local extremum of an averaged function $ \overline{ {f ( x) }}\; $( a sliding average),

$$ \tag{1 } \overline{ {f ( x) }}\; = \ \int\limits _ { X } p ( x , \xi ) f ( \xi ) d \xi , $$

$$ p ( x , \xi ) \geq 0 ; \ \int\limits _ { X } p ( x , \xi ) d \xi = 1 . $$

Examples are:

$$ \overline{ {f ( x) }}\; = \ \frac{1}{2T} \int\limits _ { x- } T ^ { x+ } T f ( x) d \xi ,\ \ x \in \mathbf R ^ {1} ,\ x \pm T \in X ; $$

$$ \overline{ {f ( x) }}\; = \frac{\int\limits _ { X } e ^ {- ( x - \xi ) ^ {2} } f ( \xi ) d \xi }{\int\limits _ { X } e ^ {- ( x - \xi ) ^ {2} } d \xi } ,\ x \pm \xi \in X . $$

The physical meaning of this averaging is a "smoothing" of the initial function and a "filtering off" of its oscillating terms.

If $ \overline{ {f ( x) }}\; \cong \textrm{ const } ( x) $( a rapidly-oscillating function $ f ( x) $), then before averaging it can be "detected" with the help of a non-linear transformation, for example,

$$ \tag{2 } {\overline{ {f ( x) }}\; } bar = \ \int\limits _ { X } p ( x , \xi ) | f ( \xi ) | d \xi . $$

As a weight function $ p ( x , \xi ) $ for averaging it is possible to take

$$ p _ {n} ( x , \xi ) = \ \frac{| f ( x - \xi ) | ^ {n} }{\int\limits _ { X } | f ( x) | ^ {n} d x } ,\ \ n \geq 0 , $$

$$ \mathop{\rm ess} _ { M } \sup _ { X } | f( x) | = \lim\limits _ {n\rightarrow \infty } \left ( \frac{\int\limits _ { X } | f( x) | ^ {n} dM _ {x} }{\int\limits _ { X } d M _ {x} } \right ) . $$

The latter can also be used for obtaining estimates for essential maxima and minima.

Formulas (1) and (2) can be interpreted as mathematical expectations for functions of a random variable $ \xi $, distributed with probability density $ p ( x , \xi ) $. Therefore the given multi-extremum problem can be treated as a problem in stochastic programming and methods of the latter can be applied to it.

After finding the extremum of an approximating or averaged function (outline search), a detailed search for an extremum of the initial function $ f ( x) $ can be carried out in a neighbourhood of the point found.

Global optimization of individual functions can be carried out by particular specific procedures. For example, by constructing a dynamical system for which the point of global extremum is an asymptotically-stable point of rest.

One of the sources of ideas for new methods of (quasi-) global optimization is the modelling of processes in physical and biological systems.

The cumbersome calculations of the non-local search process can be optimized with respect to a number of indices taking account of limitations on the calculating means, a priori and sequentially accumulated information on the function $ f ( x) $, the probabilistic characteristics of random factors, etc. One of the approaches which has been tried is based on the theory of statistical decisions.

Apart from the search of global extrema, other multi-extremum problems arise, e.g. the determination of the oscillation of a function, or the listing and finding of all local extrema in a given domain.

For polynomials very effective rules for calculating and separating roots of derivatives have been worked out.

For extremum points of transcendental functions, considerations of the type of the Rolle theorem or a comparison theorem for the solutions of differential equations may be useful.

The number of stationary points of an analytic function can be estimated using the principle of the argument (cf. Argument, principle of the).

If a function has an infinite sequence of extrema, then in practice several of them are calculated directly and the others are obtained by using an asymptotic expansion (example: the $ \Gamma $- function).

The quasi-classical approximation for differential equations can be regarded as the asymptotic expansion of the solutions of an equation with respect to the number of extrema.

For multi-extremum problems in the infinite-dimensional cases see Variational calculus in the large. Discrete analogues are given in Integer programming and Discrete programming.

For references see Maximization and minimization of functions.

Comments

For global optimization methods based on the Bayes principle of statistical inference cf. [a1]. Other recent texts specifically aimed at global optimization are [a2]–[a5].

The method of simulated annealing has attracted attention more recently. Within the classification of this article, it belongs to item 3 (modification of a local method by the introduction of stochastic parameters). See [a6].

References

[a1]	J. Mockus, "Bayesian approach to global optimization" , Kluwer (1989)
[a2]	P.M. Pardatos, J.B. Rosen, "Constrained global optimization: algorithms and applications" , Springer (1987)
[a3]	A.H.G. Rinnooy Kan, G.T. Timmer, "Stochastic methods for global optimization" Amer. J. Manag. Sci. , 4 (1984) pp. 7–40
[a4]	J. Pinter, J. Szabo, "Global optimization algorithms: theory and some applications" , Springer (1986)
[a5]	A.A. [A.A. Zhiglyavskii] Zhigljavsky, "Theory of global random search" , Kluwer (1990) (Translated from Russian)
[a6]	P.J.M. van Laarhoven, "Theoretical and computational aspect of simulated annealing" , CWI Tracts , 51 , CWI , Amsterdam (1988)

How to Cite This Entry:
Multi-extremum problem. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Multi-extremum_problem&oldid=11324

This article was adapted from an original article by Yu.P. IvanilovV.V. Okhrimenko (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article

Navigation

Tools

Namespaces

Variants

Views

Actions

Difference between revisions of "Multi-extremum problem"

Latest revision as of 08:01, 6 June 2020

Comments

References

@@ Line 1: / Line 1: @@
+<!--
+m0651601.png
+$#A+1 = 26 n = 0
+$#C+1 = 26 : ~/encyclopedia/old_files/data/M065/M.0605160 Multi\AAhextremum problem
+Automatically converted into TeX, above some diagnostics.
+Please remove this comment and the {{TEX|auto}} line below,
+if TeX found to be correct.
+-->
+{{TEX|auto}}
+{{TEX|done}}
 An extremum problem having several, or an unknown number of, local extrema.
-The problem of finding a global extremum of a function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651601.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651602.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651603.png" /> compact, has been solved for the basic classes of unimodal functions (first of all for convex and related functions, see [[Convex programming|Convex programming]]). For multimodal functions, even for smooth and slowly varying ones, at present (1989) there are no methods which are guaranteed to calculate a global extremum, with the exception of scanning along trajectories which form an everywhere-dense set in the admissible set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651604.png" />. In practice, time-consuming scanning is combined with algorithms for finding a local extremum: by scanning and a priori reduction of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651605.png" /> (estimates of derivatives, functional equations and inequalities, etc.) one delineates a domain of attraction for each local extremum and dead zones, where a concrete local algorithm loses effectiveness (for example, in the [[Gradient method|gradient method]] the dead zones are neighbourhoods of saddle points and  "bottoms of valleys" ). The extrema are then estimated or looked for by local methods and compared with each other.
+The problem of finding a global extremum of a function  $  f ( x) $,
+$  x = ( x  ^ {1} \dots x  ^ {n} ) \in X \subset  \mathbf R  ^ {n} $,
+$  \overline{X}\; $
+compact, has been solved for the basic classes of unimodal functions (first of all for convex and related functions, see [[Convex programming|Convex programming]]). For multimodal functions, even for smooth and slowly varying ones, at present (1989) there are no methods which are guaranteed to calculate a global extremum, with the exception of scanning along trajectories which form an everywhere-dense set in the admissible set  $  X $.
+In practice, time-consuming scanning is combined with algorithms for finding a local extremum: by scanning and a priori reduction of  $  f ( x) $(
+estimates of derivatives, functional equations and inequalities, etc.) one delineates a domain of attraction for each local extremum and dead zones, where a concrete local algorithm loses effectiveness (for example, in the [[Gradient method|gradient method]] the dead zones are neighbourhoods of saddle points and  "bottoms of valleys" ). The extrema are then estimated or looked for by local methods and compared with each other.
 For functions satisfying a Lipschitz condition
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651606.png" /></td> </tr></table>
+$$
+( \forall x , x  ^  \prime  \in X ) \
+( \exists l < \infty )  | f ( x) - f ( x  ^  \prime  ) |  \leq  \
+l  \| x - x  ^  \prime  \| ,
+$$
 a more universal method is that of covering (over a non-uniform grid), which is, moreover, easily realized in the multi-dimensional case.
@@ Line 13: / Line 34: @@
 ) Algorithms of the heavy-sphere type (cf. [[Heavy sphere, method of the|Heavy sphere, method of the]]), [[Minimization methods for functions depending strongly on a few variables|minimization methods for functions depending strongly on a few variables]], or methods of  "galloping"  through a local extremum of  "fitting"  domains of attraction.
-) Random running through the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651607.png" /> by a [[Monte-Carlo method|Monte-Carlo method]].
+) Random running through the values of  $  f ( x) $
+by a [[Monte-Carlo method|Monte-Carlo method]].
 ) Modification of a local method by the introduction of stochastic parameters. For example, the trajectory of a randomized gradient method
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651608.png" /></td> </tr></table>
+$$
+\dot{x}  =  \pm  \nabla f ( x) + \xi ( t ) ,
+$$
-where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m0651609.png" /> is a white noise process, can, under certain conditions, move from a point of local to a point of global extremum.
+where  $  \xi ( t) $
+is a white noise process, can, under certain conditions, move from a point of local to a point of global extremum.
-)  "Hardening"  a function by approximating it by a sequence of unimodal functions. The approximation can be carried out with respect to the values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516010.png" /> at certain points, by expansion with respect to the parameters in an analytic expression of the function, by majorants or minorants, etc.
+)  "Hardening"  a function by approximating it by a sequence of unimodal functions. The approximation can be carried out with respect to the values of  $  f ( x) $
+at certain points, by expansion with respect to the parameters in an analytic expression of the function, by majorants or minorants, etc.
-) The search for a local extremum of an averaged function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516011.png" /> (a sliding average),
+) The search for a local extremum of an averaged function  $  \overline{ {f ( x) }}\; $(
+a sliding average),
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516012.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+$$ \tag{1 }
+\overline{ {f ( x) }}\;  = \
+\int\limits _ { X }
+p ( x , \xi ) f ( \xi )  d \xi ,
+$$
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516013.png" /></td> </tr></table>
+$$
+p ( x , \xi )  \geq   0 ; \  \int\limits _ { X } p ( x , \xi )  d \xi  =  1 .
+$$
 Examples are:
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516014.png" /></td> </tr></table>
+$$
+\overline{ {f ( x) }}\;  = \
+\frac{1}{2T}
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516015.png" /></td> </tr></table>
+\int\limits _ { x- } T ^ { x+ }  T f ( x)  d \xi ,\ \
+x \in \mathbf R  ^ {1} ,\  x \pm  T \in X ;
+$$
+$$
+\overline{ {f ( x) }}\;  =
+\frac{\int\limits _ { X } e ^ {- ( x - \xi )  ^ {2} } f ( \xi )
+ d \xi }{\int\limits _ { X } e ^ {- ( x - \xi )  ^ {2} }  d \xi }
+ ,\  x
+\pm  \xi \in X .
+$$
 The physical meaning of this averaging is a  "smoothing"  of the initial function and a  "filtering off"  of its oscillating terms.
-If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516016.png" /> (a rapidly-oscillating function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516017.png" />), then before averaging it can be  "detected"  with the help of a non-linear transformation, for example,
+If  $  \overline{ {f ( x) }}\; \cong \textrm{ const } ( x) $(
+a rapidly-oscillating function  $  f ( x) $),
+then before averaging it can be  "detected"  with the help of a non-linear transformation, for example,
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516018.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+$$ \tag{2 }
+{\overline{ {f ( x) }}\; } bar  = \
+\int\limits _ { X } p ( x , \xi ) | f ( \xi ) | d \xi .
+$$
-As a weight function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516019.png" /> for averaging it is possible to take
+As a weight function  $  p ( x , \xi ) $
+for averaging it is possible to take
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516020.png" /></td> </tr></table>
+$$
+p _ {n} ( x , \xi )  = \
-<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516021.png" /></td> </tr></table>
+\frac{| f ( x - \xi ) |  ^ {n} }{\int\limits _ { X } | f ( x) |  ^ {n}  d x }
+ ,\ \
+n \geq  0 ,
+$$
+$$
+ \mathop{\rm ess} _ { M }  \sup _ { X }  | f( x) |  =  \lim\limits _ {n\rightarrow
+\infty }  \left (
+\frac{\int\limits _ { X } | f( x) |  ^ {n}  dM _ {x} }{\int\limits _ { X }  d M _ {x} }
+ \right ) .
+$$
 The latter can also be used for obtaining estimates for essential maxima and minima.
-Formulas (1) and (2) can be interpreted as mathematical expectations for functions of a random variable <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516022.png" />, distributed with probability density <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516023.png" />. Therefore the given multi-extremum problem can be treated as a problem in [[Stochastic programming|stochastic programming]] and methods of the latter can be applied to it.
+Formulas (1) and (2) can be interpreted as mathematical expectations for functions of a random variable  $  \xi $,
+distributed with probability density  $  p ( x , \xi ) $.
+Therefore the given multi-extremum problem can be treated as a problem in [[Stochastic programming|stochastic programming]] and methods of the latter can be applied to it.
-After finding the extremum of an approximating or averaged function (outline search), a detailed search for an extremum of the initial function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516024.png" /> can be carried out in a neighbourhood of the point found.
+After finding the extremum of an approximating or averaged function (outline search), a detailed search for an extremum of the initial function  $  f ( x) $
+can be carried out in a neighbourhood of the point found.
 Global optimization of individual functions can be carried out by particular specific procedures. For example, by constructing a dynamical system for which the point of global extremum is an asymptotically-stable point of rest.
@@ Line 57: / Line 123: @@
 One of the sources of ideas for new methods of (quasi-) global optimization is the modelling of processes in physical and biological systems.
-The cumbersome calculations of the non-local search process can be optimized with respect to a number of indices taking account of limitations on the calculating means, a priori and sequentially accumulated information on the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516025.png" />, the probabilistic characteristics of random factors, etc. One of the approaches which has been tried is based on the theory of statistical decisions.
+The cumbersome calculations of the non-local search process can be optimized with respect to a number of indices taking account of limitations on the calculating means, a priori and sequentially accumulated information on the function  $  f ( x) $,
+the probabilistic characteristics of random factors, etc. One of the approaches which has been tried is based on the theory of statistical decisions.
 Apart from the search of global extrema, other multi-extremum problems arise, e.g. the determination of the oscillation of a function, or the listing and finding of all local extrema in a given domain.
@@ Line 67: / Line 134: @@
 The number of stationary points of an analytic function can be estimated using the principle of the argument (cf. [[Argument, principle of the|Argument, principle of the]]).
-If a function has an infinite sequence of extrema, then in practice several of them are calculated directly and the others are obtained by using an asymptotic expansion (example: the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/m/m065/m065160/m06516026.png" />-function).
+If a function has an infinite sequence of extrema, then in practice several of them are calculated directly and the others are obtained by using an asymptotic expansion (example: the  $  \Gamma $-
+function).
 The quasi-classical approximation for differential equations can be regarded as the asymptotic expansion of the solutions of an equation with respect to the number of extrema.
@@ Line 74: / Line 142: @@
 For references see [[Maximization and minimization of functions|Maximization and minimization of functions]].
 ====Comments====