# Maximization and minimization of functions

of a finite number of variables

The problem of finding the extrema of a function $f ( x)$, $x= ( x ^ {1} \dots x ^ {n} ) \in X \subseteq \mathbf R ^ {n}$. This means

1) finding $\overline{f}\; = \sup _ {x \in X } f ( x)$ or $\underline{f} = \inf _ {x \in X } f ( x)$;

2) searching maximum or minimum points if $\overline{f}\;$ or $\underline{f}$ are attained on the admissible set (see Maximum and minimum of a function);

3) the construction of maximizing or minimizing sequences $\{ x _ {i} \}$ such that:

$$\lim\limits _ {i \rightarrow \infty } \ f ( x _ {i} ) = \overline{f}\; ,\ \ \lim\limits _ {j \rightarrow \infty } \ f ( x _ {j} ) = \underline{f} ,$$

if $\overline{f}\;$ or $\underline{f}$ is not attained on $X$.

The investigation of the extrema of functions of discrete arguments is referred to as discrete programming or integer programming. Below only maximization and minimization for functions of continuous arguments is explained.

The classical (indirect) methods of maximization and minimization apply only to smooth functions. They use necessary conditions for an extremum in order to locate stationary points. Zeros of the derivatives $\partial ^ \alpha f / \partial x ^ \alpha$, $\alpha = 1 \dots n$, are usually computed in practice by some successive approximation method (see [3]). On the other hand, each problem of solving a finite set of functional equations of the form

$$\phi _ {m} ( x ^ {1} \dots x ^ {n} ) = 0 ,\ \ m \leq n ,$$

can be interpreted as a maximization or minimization problem of some function, for example, of the function

$$f ( x) = \phi _ {1} ^ {2} ( x) + \dots + \phi _ {m} ^ {2} ( x) \rightarrow \min ,$$

and one of the specific methods for maximizing and minimizing can be applied.

Direct methods for maximizing and minimizing functions are based on direct comparison of the values of $f$ at two or more points.

The practical search of extrema uses iterative algorithms of the form:

$$x _ {i+} 1 = \widehat{X} ( i , x _ {i} , x _ {i-} 1 \dots x _ {i-} j ) ,$$

where $i$ is the index of the iteration and $\widehat{X} ( \cdot )$ is some operator. Here it is usually assumed that

a) the algorithm converges in some sense, most often in the sense

$$x _ \infty = \overline{x}\; \ \ ( x _ \infty = \underline{x} ) \ \ \textrm{ or } \ \ f ( x _ \infty ) = \overline{f}\; \ ( f ( x _ \infty ) = \underline{f} ) ;$$

b) the iteration procedure is local, that is, $j \ll i$( $j = o ( i)$ as $i \rightarrow \infty$); the algorithm "remembers" the values of $x$ only for iterations in some neighbourhood of the current position $x _ {i}$. For $j = 0$ a memoryless simple Markov computational process is obtained.

The operator $\widehat{X} ( \cdot )$ may be deterministic, in deterministic methods, or may contain stochastic parameters. In computational practice stochastic methods are often combined with deterministic methods, for example, in the coordinate-wise descent method the direction of descent may be determined randomly. The probabilistic characteristics of the stochastic parameters, in turn, may be changed from iteration to iteration (search with adaptation and "self-learning" random search).

Various deterministic methods are widely used in combination, including sequential and parallel computation of an extremum by several methods, composition of algorithms of the form $\widehat{X} = \widehat{X} _ {2} ( \widehat{X} _ {1} ( \cdot ) )$, etc. For example, the Levenberg–Marquardt method

$$x _ {i+} 1 = \ x _ {i} - ( \alpha _ {i} \nabla \nabla f( x _ {j} ) + \beta _ {i} I ) ^ {-} 1 \nabla f ( x _ {i} ) ,$$

which coincides with the gradient method for $\alpha _ {i} = 0$ and with the Newton method for $\beta _ {i} = 0$.

One-dimensional optimization, that is, maximizing and minimizing a function $f ( x)$, $x \in \mathbf R ^ {1}$, in addition to its interest by itself, is a necessary stage in most applicable methods. Specific one-dimensional methods include, for example, the Fibonacci method; the dichotomy method (division by halves) and the parabola method. Methods for maximizing and minimizing functions in several variables are the gradient method, the method of steepest descent (cf. Steepest descent, method of), the coordinate-wise descent method, the simplex method, the scanning method, the method of conjugate gradients, the method of the heavy sphere (cf. Conjugate gradients, method of; Heavy sphere, method of the), the adjustment method, and others.

The algorithms of most of the methods listed fall into the scheme of the descent (ascent) method:

$$x _ {i+} 1 = x _ {i} \mps \kappa _ {i} y _ {i} ,$$

where $f ( x _ {i+} 1 ) \leq f ( x _ {i} )$ or $f ( x _ {i+} 1 ) \geq f ( x _ {i} )$ for all $i$( the relaxation condition). These are distinguished either by the choice of a descent direction $y _ {i}$ or by the method of moving along the descent vector, defined by the step factor $\kappa$.

Cut methods have been developed for functions whose contours are "valleys with steep slopes" (see Minimization methods for functions depending strongly on a few variables). Ordinary (non-cut) methods when applied here give a zig-zag relaxational path, requiring excessive amounts of machine time to calculate the extremum.

The comparative effectiveness of the methods is estimated by many, even contradictory, criteria. E.g., accuracy of the solution, speed of approach to the solution, reliability of the method, preparation time from the problem to the calculation, convergence of the algorithm, etc. The domain of applicability of each of the approved methods is very limited.

For testing the methods a set of standard test functions, characteristic of different function classes, has been collected (see [1]). Convergence of methods for maximizing and minimizing functions has been extensively studied (see [6], [8]). However, convergence is a property which is neither necessary nor sufficient for the effective termination of a calculation.

All methods above lead to a local extremum if the first approximation belongs to the domain of attraction of this extremum. The detection of a global extremum is guaranteed only for convex and related unimodal functions. The theory of finding a global extremum is still (1989) in the initial phase of development (see Multi-extremum problem). Another area of maximizing and minimizing of functions which is developing is optimization of non-smooth functions (see [4], [13], [16]). In particular, the problem of minimizing the maxima of a family of functions generally leads to non-smooth functions (see Maximin, numerical methods). Apparently, all of the commonly-used methods for optimization have an interesting physical, economic or biological meaning. The corresponding research is only just developing (see [9]) and leads to the creation of new methods (see also Continuous analogues of iteration methods). If the values of the functions being considered are defined statistically with random noise, then one of the methods of stochastic approximation is applied to find extrema. Here one is bordering on the design of experiments.

Experimental methods of maximizing and minimizing functions use the reproduction of various physical processes in the search for extrema. A related area is simulation on analogue computers (see [17]). In spite of the convenience and cheapness of using the simplest automatic optimizers, they do not guarantee high accuracy of calculation.

Graphical methods are suitable only for obtaining rough estimates and for the construction of a first approximation for iterative methods.

If the admissible set is given by functional conditions

$$\phi _ {m} ( x) \leq 0$$

(constraints and restrictions, conditional extremum), then methods of mathematical programming are applied in the search of extrema. This problem can be reduced to a sequence of problems on unconstrained extrema by the use of penalty and barrier functions (see Penalty functions, method of).

#### References

 [1] M. Aoki, "Introduction to optimization techniques" , Macmillan (1971) [2] N.S. Bakhvalov, "Numerical methods: analysis, algebra, ordinary differential equations" , MIR (1977) (Translated from Russian) [3] I.S. Berezin, N.P. Zhidkov, "Computing methods" , Pergamon (1973) (Translated from Russian) [4] F.P. Vasil'ev, "Lectures on methods for solving extremal problems" , Moscow (1974) (In Russian) [5] A.M. Gupal, "Stochastic methods of solving non-smooth extremal problems" , Kiev (1979) (In Russian) [6] V.G. Karmanov, "Mathematical programming" , Moscow (1975) (In Russian) [7] N.N. Moiseev, Yu.P. Ivanilov, E.M. Stolyarova, "Methods of optimization" , Moscow (1978) (In Russian) [8] B.N. Pshenichnyi, Yu.M. Danilin, "Numerical methods in extremal problems" , MIR (1978) (Translated from Russian) [9] B.S. Razymikhin, "Physical models and methods of the theory of equilibrium in programming and economics" , Reidel (1984) (Translated from Russian) [10] L.A. Pastrigin, "Extremal control systems" , Moscow (1974) (In Russian) [11] V.K. Saul'ev, I.I. Samoilova, "Approximation methods for the unconstrained optimization of functions of several variables" J. Soviet Math. , 4 : 6 (1975) pp. 681–705 Itogi Nauk. i Tekhn. Mat. Anal. , 11 (1973) pp. 91–128 [12] D.J. Wilde, "Extremum searching methods" , Prentice-Hall (1965) [13] V.V. Fedorov, "Numerical maximin methods" , Moscow (1979) (In Russian) [14] J.M. Ortega, W.C. Rheinboldt, "Iterative solution of non-linear equations in several variables" , Acad. Press (1970) [15] , The current state of the theory of Operations Research , Moscow (1979) (In Russian) [16] C. Lemarechal (ed.) R. Mifflin (ed.) , Nonsmooth optimization , Pergamon (1978) [17] L.C.W. Dixon (ed.) G.P. Szegö (ed.) , Towards global optimisation , 1–2 , North-Holland (1975–1978) [18] Yu.G. Evtushenko, "Numerical methods for finding global extrema (case of a non-uniform mesh)" USSR Comp. Math. Math. Phys. , 11 : 6 (1971) pp. 38–54 Zh. Vychisl. Mat. i Mat. Fiz. , 11 : 6 (1971) pp. 1390–1403