|
|
(2 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| + | {{TEX|done}} |
| A method for solving the minimization problem | | A method for solving the minimization problem |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313601.png" /></td> </tr></table>
| + | $$f(x^*)=\min_xf(x),$$ |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313602.png" /> is some function of the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313603.png" />. The iterative sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313604.png" /> of the method of descent is computed by the formula | + | where $f$ is some function of the variables $(x_1,\dotsc,x_n)$. The iterative sequence $\{x_k\}$ of the method of descent is computed by the formula |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313605.png" /></td> </tr></table>
| + | $$x^{k+1}=x^k+\alpha_kg^k,$$ |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313606.png" /> is a vector indicating some direction of decrease of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313607.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313608.png" />, and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d0313609.png" /> is an iterative parameter, the value of which indicates the step-length in the direction <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136010.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136011.png" /> is a differentiable function and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136012.png" /> is not an extremal point of it, then the vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136013.png" /> must satisfy the inequality | + | where $g^k$ is a vector indicating some direction of decrease of $f$ at $x^k$, and $\alpha_k$ is an iterative parameter, the value of which indicates the step-length in the direction $g^k$. If $f$ is a differentiable function and $x^k$ is not an extremal point of it, then the vector $g_k$ must satisfy the inequality |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136014.png" /></td> <td valign="top" style="width:5%;text-align:right;">(*)</td></tr></table> | + | $$(f'(x^k),g^k)<0,\label{*}\tag{*}$$ |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136015.png" /> is the gradient of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136016.png" /> at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136017.png" />. | + | where $f'(x^k)$ is the gradient of $f$ at $x^k$. |
| | | |
− | If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136018.png" /> is a sufficiently smooth function (e.g. twice continuously differentiable) and if the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136019.png" /> satisfies inequality (*), then there exists a sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136020.png" /> such that | + | If $f$ is a sufficiently smooth function (e.g. twice continuously differentiable) and if the sequence $\{g^k\}$ satisfies inequality \eqref{*}, then there exists a sequence $\{\alpha_k\}$ such that |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136021.png" /></td> </tr></table>
| + | $$f(x^0)>\dotsb>f(x^k)>\dotsb.$$ |
| | | |
− | Under certain restrictions (see [[#References|[3]]]) on the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136022.png" /> and on the method of choosing the parameters <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136023.png" /> and the vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136024.png" />, the sequence <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136025.png" /> converges to a solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136026.png" /> of the initial problem. | + | Under certain restrictions (see [[#References|[3]]]) on the function $f$ and on the method of choosing the parameters $\{\alpha_k\}$ and the vectors $g^k$, the sequence $\{x_k\}$ converges to a solution $x^*$ of the initial problem. |
| | | |
− | The gradient method, in which the vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136027.png" /> are in some way expressed in terms of the vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136028.png" />, is a method of descent. One of the most common cases is when | + | The gradient method, in which the vectors $\{g^k\}$ are in some way expressed in terms of the vectors $\{f'(x^k)\}$, is a method of descent. One of the most common cases is when |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136029.png" /></td> </tr></table>
| + | $$g^k=-B(x^k)f'(x^k),$$ |
| | | |
− | where <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136030.png" /> is a symmetric matrix satisfying | + | where $B(x)$ is a symmetric matrix satisfying |
| | | |
− | <table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136031.png" /></td> </tr></table>
| + | $$m(x,x)\leq(B(y)x,x)\leq M(x,x)$$ |
| | | |
− | for any two vectors <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136032.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136033.png" />, with certain constants <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136034.png" />. Under additional restrictions (see [[#References|[3]]]) on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136035.png" /> and by a special selection of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136036.png" />, the gradient method ensures the convergence of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136037.png" /> to a solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136038.png" /> of the initial problem with the rate of an arithmetical progression with ratio <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136039.png" />. A special case of the gradient method is the method of steepest descent (cf. [[Steepest descent, method of|Steepest descent, method of]]), in which the matrix <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/d/d031/d031360/d03136040.png" /> is the unit matrix. | + | for any two vectors $x$ and $y$, with certain constants $M\geq m>0$. Under additional restrictions (see [[#References|[3]]]) on $f$ and by a special selection of $\{\alpha_k\}$, the gradient method ensures the convergence of $\{x^k\}$ to a solution $x^*$ of the initial problem with the rate of an arithmetical progression with ratio $g<1$. A special case of the gradient method is the method of steepest descent (cf. [[Steepest descent, method of|Steepest descent, method of]]), in which the matrix $B(x)$ is the unit matrix. |
| | | |
| ====References==== | | ====References==== |
A method for solving the minimization problem
$$f(x^*)=\min_xf(x),$$
where $f$ is some function of the variables $(x_1,\dotsc,x_n)$. The iterative sequence $\{x_k\}$ of the method of descent is computed by the formula
$$x^{k+1}=x^k+\alpha_kg^k,$$
where $g^k$ is a vector indicating some direction of decrease of $f$ at $x^k$, and $\alpha_k$ is an iterative parameter, the value of which indicates the step-length in the direction $g^k$. If $f$ is a differentiable function and $x^k$ is not an extremal point of it, then the vector $g_k$ must satisfy the inequality
$$(f'(x^k),g^k)<0,\label{*}\tag{*}$$
where $f'(x^k)$ is the gradient of $f$ at $x^k$.
If $f$ is a sufficiently smooth function (e.g. twice continuously differentiable) and if the sequence $\{g^k\}$ satisfies inequality \eqref{*}, then there exists a sequence $\{\alpha_k\}$ such that
$$f(x^0)>\dotsb>f(x^k)>\dotsb.$$
Under certain restrictions (see [3]) on the function $f$ and on the method of choosing the parameters $\{\alpha_k\}$ and the vectors $g^k$, the sequence $\{x_k\}$ converges to a solution $x^*$ of the initial problem.
The gradient method, in which the vectors $\{g^k\}$ are in some way expressed in terms of the vectors $\{f'(x^k)\}$, is a method of descent. One of the most common cases is when
$$g^k=-B(x^k)f'(x^k),$$
where $B(x)$ is a symmetric matrix satisfying
$$m(x,x)\leq(B(y)x,x)\leq M(x,x)$$
for any two vectors $x$ and $y$, with certain constants $M\geq m>0$. Under additional restrictions (see [3]) on $f$ and by a special selection of $\{\alpha_k\}$, the gradient method ensures the convergence of $\{x^k\}$ to a solution $x^*$ of the initial problem with the rate of an arithmetical progression with ratio $g<1$. A special case of the gradient method is the method of steepest descent (cf. Steepest descent, method of), in which the matrix $B(x)$ is the unit matrix.
References
[1] | L.V. Kantorovich, G.P. Akilov, "Functionalanalysis in normierten Räumen" , Akademie Verlag (1964) (Translated from Russian) |
[2] | G. Zoutendijk, "Methods of feasible directions" , Elsevier (1970) |
[3] | B.N. Pshenichnyi, Yu.M. Danilin, "Numerical methods in extremal problems" , MIR (1978) (Translated from Russian) |
[4] | B.T. Polyak, "Gradient methods for the minimization of functionals" USSR Comp. Math. Math. Physics , 3 : 4 (1963) pp. 864–878 Zh. Vychisl. Mat. i Mat. Fiz. , 3 : 4 (1963) pp. 643–654 |
See also Coordinate-wise descent method.
References
[a1] | J.M. Ortega, W.C. Rheinboldt, "Iterative solution of non-linear equations in several variables" , Acad. Press (1970) |