# Multi-extremum problem

An extremum problem having several, or an unknown number of, local extrema.

The problem of finding a global extremum of a function $f ( x)$, $x = ( x ^ {1} \dots x ^ {n} ) \in X \subset \mathbf R ^ {n}$, $\overline{X}\;$ compact, has been solved for the basic classes of unimodal functions (first of all for convex and related functions, see Convex programming). For multimodal functions, even for smooth and slowly varying ones, at present (1989) there are no methods which are guaranteed to calculate a global extremum, with the exception of scanning along trajectories which form an everywhere-dense set in the admissible set $X$. In practice, time-consuming scanning is combined with algorithms for finding a local extremum: by scanning and a priori reduction of $f ( x)$( estimates of derivatives, functional equations and inequalities, etc.) one delineates a domain of attraction for each local extremum and dead zones, where a concrete local algorithm loses effectiveness (for example, in the gradient method the dead zones are neighbourhoods of saddle points and "bottoms of valleys" ). The extrema are then estimated or looked for by local methods and compared with each other.

For functions satisfying a Lipschitz condition

$$( \forall x , x ^ \prime \in X ) \ ( \exists l < \infty ) | f ( x) - f ( x ^ \prime ) | \leq \ l \| x - x ^ \prime \| ,$$

a more universal method is that of covering (over a non-uniform grid), which is, moreover, easily realized in the multi-dimensional case.

For finding local extrema close to global ones in an approximation, empirical "heuristic" methods are used, which may be called quasi-global:

1) Algorithms of the heavy-sphere type (cf. Heavy sphere, method of the), minimization methods for functions depending strongly on a few variables, or methods of "galloping" through a local extremum of "fitting" domains of attraction.

2) Random running through the values of $f ( x)$ by a Monte-Carlo method.

3) Modification of a local method by the introduction of stochastic parameters. For example, the trajectory of a randomized gradient method

$$\dot{x} = \pm \nabla f ( x) + \xi ( t ) ,$$

where $\xi ( t)$ is a white noise process, can, under certain conditions, move from a point of local to a point of global extremum.

4) "Hardening" a function by approximating it by a sequence of unimodal functions. The approximation can be carried out with respect to the values of $f ( x)$ at certain points, by expansion with respect to the parameters in an analytic expression of the function, by majorants or minorants, etc.

5) The search for a local extremum of an averaged function $\overline{ {f ( x) }}\;$( a sliding average),

$$\tag{1 } \overline{ {f ( x) }}\; = \ \int\limits _ { X } p ( x , \xi ) f ( \xi ) d \xi ,$$

$$p ( x , \xi ) \geq 0 ; \ \int\limits _ { X } p ( x , \xi ) d \xi = 1 .$$

Examples are:

$$\overline{ {f ( x) }}\; = \ \frac{1}{2T} \int\limits _ { x- } T ^ { x+ } T f ( x) d \xi ,\ \ x \in \mathbf R ^ {1} ,\ x \pm T \in X ;$$

$$\overline{ {f ( x) }}\; = \frac{\int\limits _ { X } e ^ {- ( x - \xi ) ^ {2} } f ( \xi ) d \xi }{\int\limits _ { X } e ^ {- ( x - \xi ) ^ {2} } d \xi } ,\ x \pm \xi \in X .$$

The physical meaning of this averaging is a "smoothing" of the initial function and a "filtering off" of its oscillating terms.

If $\overline{ {f ( x) }}\; \cong \textrm{ const } ( x)$( a rapidly-oscillating function $f ( x)$), then before averaging it can be "detected" with the help of a non-linear transformation, for example,

$$\tag{2 } {\overline{ {f ( x) }}\; } bar = \ \int\limits _ { X } p ( x , \xi ) | f ( \xi ) | d \xi .$$

As a weight function $p ( x , \xi )$ for averaging it is possible to take

$$p _ {n} ( x , \xi ) = \ \frac{| f ( x - \xi ) | ^ {n} }{\int\limits _ { X } | f ( x) | ^ {n} d x } ,\ \ n \geq 0 ,$$

$$\mathop{\rm ess} _ { M } \sup _ { X } | f( x) | = \lim\limits _ {n\rightarrow \infty } \left ( \frac{\int\limits _ { X } | f( x) | ^ {n} dM _ {x} }{\int\limits _ { X } d M _ {x} } \right ) .$$

The latter can also be used for obtaining estimates for essential maxima and minima.

Formulas (1) and (2) can be interpreted as mathematical expectations for functions of a random variable $\xi$, distributed with probability density $p ( x , \xi )$. Therefore the given multi-extremum problem can be treated as a problem in stochastic programming and methods of the latter can be applied to it.

After finding the extremum of an approximating or averaged function (outline search), a detailed search for an extremum of the initial function $f ( x)$ can be carried out in a neighbourhood of the point found.

Global optimization of individual functions can be carried out by particular specific procedures. For example, by constructing a dynamical system for which the point of global extremum is an asymptotically-stable point of rest.

One of the sources of ideas for new methods of (quasi-) global optimization is the modelling of processes in physical and biological systems.

The cumbersome calculations of the non-local search process can be optimized with respect to a number of indices taking account of limitations on the calculating means, a priori and sequentially accumulated information on the function $f ( x)$, the probabilistic characteristics of random factors, etc. One of the approaches which has been tried is based on the theory of statistical decisions.

Apart from the search of global extrema, other multi-extremum problems arise, e.g. the determination of the oscillation of a function, or the listing and finding of all local extrema in a given domain.

For polynomials very effective rules for calculating and separating roots of derivatives have been worked out.

For extremum points of transcendental functions, considerations of the type of the Rolle theorem or a comparison theorem for the solutions of differential equations may be useful.

The number of stationary points of an analytic function can be estimated using the principle of the argument (cf. Argument, principle of the).

If a function has an infinite sequence of extrema, then in practice several of them are calculated directly and the others are obtained by using an asymptotic expansion (example: the $\Gamma$- function).

The quasi-classical approximation for differential equations can be regarded as the asymptotic expansion of the solutions of an equation with respect to the number of extrema.

For multi-extremum problems in the infinite-dimensional cases see Variational calculus in the large. Discrete analogues are given in Integer programming and Discrete programming.

For references see Maximization and minimization of functions.