Namespaces
Variants
Actions

Difference between revisions of "Lagrange function"

From Encyclopedia of Mathematics
Jump to: navigation, search
(Importing text file)
 
m (Typos)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
A function that is used in the solution of problems on a conditional extremum of functions of several variables and functionals. By means of a Lagrange function one can write down necessary conditions for optimality in problems on a conditional extremum. One does not need to express some variables in terms of others or to take into account the fact that not all variables are independent. The necessary conditions obtained by means of a Lagrange function form a closed system of relations, among the solutions of which the required optimal solution of the problem on a conditional extremum is to be found. A Lagrange function is used in both theoretical questions of linear and non-linear programming and in the construction of certain computational methods.
+
{{MSC|49}}
 +
{{TEX|done}}
 +
[[Category:Analysis]]
  
Suppose, for example, that one has the following problem on a conditional extremum of a function of several variables: Find a maximum or minimum of the function
+
A function, related to the method of [[Lagrange multipliers]], that is used to derive necessary conditions for conditional extrema of functions of several variables or, in a wider setting,
 +
of functionals. The primary example is that of locating the local maxima (resp. minima) of a function $g$ of $n$ variables $(x_1, \ldots , x_n)=:x$ over the set $\Sigma$ of points $x$ which satisfy the constraints $g_1 (x) = \ldots = g_m (x) = 0$. The method of Lagrange multipliers is particularly useful because the corresponding conditions can often be solved without resorting
 +
to explcit formulae describing the set of points in $\Sigma$ in terms of $n-m$ independent variables. In fact the necessary conditions obtained by means of a Lagrange function form often a closed system of relations, i.e. a system of $n+m$ equations in $n+m$ variables. Lagrange functions are used in both theoretical questions of linear and non-linear programming as in applied problems where they provide often explicit computational methods.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571601.png" /></td> <td valign="top" style="width:5%;text-align:right;">(1)</td></tr></table>
+
====The method of Lagrange multipliers====
 +
A rather general statement in the finite-dimensional setting is the following
  
under the conditions
+
'''Theorem 1'''
 +
Assume $f, g_1, \ldots, g_m$ are $C^1$ function of $n$ variables and let $x^\star$ be a local maximum (resp. minimum) point of $f$ subject to the
 +
constraint $g_1, \ldots, g_m$: namely there is a neighborhood $U$ of $x^\star$ such that
 +
\[
 +
f (x^\star) \geq f(y) \quad \big(\mbox{resp. } f (x^\star) \leq f(y)\big)\qquad \mbox{for all } y \mbox{ such that } g_i (x^\star)= g_i (y) = b_i \qquad \forall i\in \{1, \ldots , m\}\, .
 +
\]
 +
Then there exists $\lambda_0^\star, \lambda_1^\star, \ldots, \lambda_m^\star \in \mathbb R$ such that, if $F$ denotes the function
 +
\[
 +
F(\lambda, x) = \lambda_0 f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, ,
 +
\]
 +
then
 +
\[
 +
\frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad \forall j\in \{1, \ldots , n\}
 +
\]
 +
and
 +
\[
 +
\frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad \forall i\in \{1, \ldots, m\}\, .
 +
\]
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571602.png" /></td> <td valign="top" style="width:5%;text-align:right;">(2)</td></tr></table>
+
$F$ is called Lagrange function and the numbers $\lambda_0, \lambda_1, \ldots , \lambda_m$ Lagrange multipliers. The statement takes a more elegant form under the assumption that $b = (b_1, \ldots , b_m)$ is a [[Regular value|regular value]] of the map $g= (g_1, \ldots , g_m)$. More precisely
  
The function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571603.png" /> defined by the expression
+
'''Theorem 2'''
 +
Let $f$, $g= (g_1, \ldots , g_m)$ and $x^\star$ be as in Theorem 1 and assume in addition that $m<n$ and that the differential of $g$ at $x^\star$ is surjective.
 +
Then there exists $\lambda^\star = (\lambda_1^\star, \ldots, \lambda_m^\star) \in \mathbb R^m$ such that $(x^\star, \lambda^\star)$ is a critical point of the function
 +
\begin{equation}\label{e:Lagrange_function}
 +
F(\lambda, x) = f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, .
 +
\end{equation}
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571604.png" /></td> <td valign="top" style="width:5%;text-align:right;">(3)</td></tr></table>
+
In fact the statement of Theorem 2 is more common than that of Theorem 1 and it is typically the slightly less general version of \eqref{e:Lagrange_function} to
 +
which the name "Lagrange function" refers to. Observe also that Theorem 2 gives a system of $m+n$ equations in the $m+n$ unknowns $x^\star_1, \ldots , x^\star_n, \lambda^\star_1 , \ldots , \lambda^\star_m$.
  
is called the Lagrange function, and the numbers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571605.png" /> are called [[Lagrange multipliers|Lagrange multipliers]]. The following assertion, called the multiplier rule, holds: If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571606.png" /> is a solution of the problem on a conditional extremum (1), (2), then there is at least one non-zero system of Lagrange multipliers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571607.png" /> such that the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571608.png" /> is a stationary point of the Lagrange function with respect to the variables <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l0571609.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716010.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716011.png" />, regarded as independent variables. The necessary conditions for a stationary point of the Lagrange function lead to a system of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716012.png" /> equations,
+
For most practical problems Theorem 2 is enough. However it is not difficult to give examples where Theorem 2 does not apply and one has to resort to Theorem 1 (see {{Cite|Sm}} or {{Cite|Bl}}). It is possible to give an interpretation of the Lagrange multipliers $\lambda_i^\star$ that has a definite physical meaning and which can be succesfully generalized to constraints in the form of inequalities (see [[Lagrange multipliers|Lagrange multipliers]]).
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716013.png" /></td> <td valign="top" style="width:5%;text-align:right;">(4)</td></tr></table>
+
In the case when the function $f$ to be optimized is quadratic and the constraints $g_i$ are linear, the resulting system of necessary conditions turns out to be linear. Such system is otherwise nonlinear and in general one has to resort to appropriate computational methods to approximate the solutions (such as the [[Newton method|Newton method]]). The main issue, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. The fact that no computational process ensures that one obtains all solutions of the system is one of the main restrictions on the practical applicability of the method of Lagrange multipliers.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716014.png" /></td> </tr></table>
+
====Constraints in the form of inequalities====
 +
As already mentioned, the method can be extended to minimization problems in which some of the constraints appear as inequalities. Consider in particular the following setting. $x^\star$ is a local extremum for the function $f: \mathbb R^n \to \mathbb R$ among the points $x$ which satisfy the following constraints:
 +
\begin{equation}\label{e:gen_const}
 +
\left\{
 +
\begin{array}{ll}
 +
g_i (x) \leq b_i  \qquad\qquad & \mbox{ for } 1 \leq i \leq m_1\\
 +
g_i (x) \geq b_i \qquad\qquad & \mbox{ for } m_1+1 \leq i \leq m_2\\
 +
g_i (x) = b_i \qquad\qquad    & \mbox{ for } m_2+1 \leq i \leq m\\
 +
x_j \geq 0 \qquad\qquad &\mbox{ for } 1 \leq j \leq n\, .
 +
\end{array}\right.
 +
\end{equation}
 +
Specifically, let us assume that $x^\star$ is a local maximum, that
 +
the functions $f$ and $g_i$ are $C^1$, that $m < n$ and that $b$ is a regular value for the map $g$
 +
(or, more generally, that the differential of $g$ is surjective at the point $x^\star$). If we consider the Lagrange function as in \eqref{e:Lagrange_function} and
 +
denote by $J$ the set of indices $j$ for which $x^\star_j > 0$ and by $I$ the set of indices for which the inequalities in \eqref{e:gen_const} are strict, then
 +
there is a vector $\lambda^\star \in \mathbb R^m$ which satisfies the following set of conditions
 +
\begin{equation}\label{e:Lagr_gen}
 +
\left\{
 +
\begin{array}{ll}
 +
\lambda^\star_i \geq 0 \qquad\qquad&\mbox{ for } 1 \leq i \leq m_1\\
 +
\lambda^\star_i \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\
 +
\lambda^\star_i = 0 \qquad\qquad &\mbox{ for } i \in I\\
 +
\frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \geq 0 \qquad\qquad &\mbox{ for } 1\leq i \leq m_1\\
 +
\frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\
 +
\frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for }  m_2 +1 \leq i \leq m\\
 +
\frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for } j\in J\\
 +
\frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } j\not\in J\, .
 +
\end{array}
 +
\right.
 +
\end{equation}
 +
Obviously these conditions generalize the ones of Theorem 2. Using the concept of a [[Saddle point|saddle point]] of the function $F$, one can interpret these conditions above as necessary conditions to ensure that
 +
\[
 +
F (x,  \lambda^\star) \leq F (x^\star, \lambda^\star) \leq F (x^\star, \lambda)\,
 +
\]
 +
for all $x$ and $\lambda$ in a neighborhood, respectively, of $x^\star$ and $\lambda^\star$. In the case when $f$ is concave, $g_i$ is convex when $\lambda^\star_i>0$ and concave when $\lambda^\star_i <0$, then such conditions turn out to be sufficient.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716015.png" /></td> <td valign="top" style="width:5%;text-align:right;">(5)</td></tr></table>
+
The general case considered above can be easily reduced, upon a transformation of the dependent and independent variables, into the problem of finding the maximum of a function $f$ under the restriction that $x_i \geq 0$ for all $0\leq i \leq n$, $g_j (x) \geq 0$ for all $1\leq j \leq \bar{m}$ and $g_i (x) =0$ for all $\bar{m}+1 \leq i \leq m$. Obviously in this case the conditions \eqref{e:Lagr_gen} take a simpler form.
 
 
The relations (5) of the resulting system are the conditions (2). The point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716016.png" /> gives an ordinary (unconditional) extremum of the Lagrange function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716017.png" /> with respect to <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716018.png" />.
 
 
 
For most practical problems the value of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716019.png" /> in (3) and (4) can be taken equal to one. However, there are examples (see [[#References|[1]]] or [[#References|[4]]]) in which the multiplier rule is not satisfied for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716020.png" />, but is satisfied for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716021.png" />. To determine conditions that make it possible to distinguish between the cases <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716022.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716023.png" />, one considers (see [[#References|[2]]]) the matrices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716024.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716025.png" />:
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716026.png" /></td> </tr></table>
 
 
 
Let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716027.png" /> be the rank of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716028.png" /> considered at the optimal point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716029.png" />. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716030.png" />, then <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716031.png" />; if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716032.png" />, then in order to satisfy the multiplier rule one must put <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716033.png" />. Moreover, if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716034.png" /> (the case most often occurring in practical problems), then the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716035.png" /> are determined uniquely, but if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716036.png" /> or <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716037.png" />, then the <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716038.png" /> are not determined uniquely. Depending on the case under consideration, one puts <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716039.png" /> equal to 0 or 1. Then (4), (5) becomes a system of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716040.png" /> equations with <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716041.png" /> unknowns <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716042.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716043.png" />. It is possible to give an interpretation of the Lagrange multipliers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716044.png" /> that has a definite physical meaning (see [[Lagrange multipliers|Lagrange multipliers]]).
 
 
 
In the case when the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716045.png" /> to be optimized is quadratic and the conditions (2) are linear, the system of necessary conditions (4), (5) turns out to be linear, and solving it is not difficult. In the general case the system of necessary conditions (4), (5) in the problem on a conditional extremum, obtained by means of a Lagrange function, turns out to be non-linear, and its solution can only be obtained by iterative methods, such as the [[Newton method|Newton method]]. The main difficulty, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. There is no computational process that ensures that one obtains all solutions of the system (4), (5), and this is one of the restrictions of the application of the method of Lagrange multipliers.
 
 
 
A Lagrange function is used in problems of non-linear programming that differ from the classical problems on a conditional extremum in the presence of constraints of inequality type, as well as of constraints of equality type: Find a minimum or maximum of
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716046.png" /></td> <td valign="top" style="width:5%;text-align:right;">(6)</td></tr></table>
 
 
 
under the conditions
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716047.png" /></td> <td valign="top" style="width:5%;text-align:right;">(7)</td></tr></table>
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716048.png" /></td> <td valign="top" style="width:5%;text-align:right;">(8)</td></tr></table>
 
 
 
To derive necessary conditions for optimality in the problem (6)–(8) one introduces the Lagrange function
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716049.png" /></td> <td valign="top" style="width:5%;text-align:right;">(9)</td></tr></table>
 
 
 
Specifically, consider the case when one looks for a maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716050.png" />. Suppose that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716051.png" /> gives a maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716052.png" /> under the restrictions (7), (8), and that the requirement that the restrictions are regular (see [[#References|[2]]]) is satisfied at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716053.png" />; let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716054.png" /> be the set of indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716055.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716056.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716057.png" />, let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716058.png" /> be the set of indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716059.png" /> for which <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716060.png" /> and let <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716061.png" /> be the set of indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716062.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716063.png" /> for which the restrictions (7) are satisfied at <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716064.png" /> as strict inequalities. Then there is a vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716065.png" />,
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716066.png" /></td> <td valign="top" style="width:5%;text-align:right;">(10)</td></tr></table>
 
 
 
such that
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716067.png" /></td> <td valign="top" style="width:5%;text-align:right;">(11)</td></tr></table>
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716068.png" /></td> <td valign="top" style="width:5%;text-align:right;">(12)</td></tr></table>
 
 
 
The necessary conditions stated generalize the conditions (4), (5). Using the concept of a [[Saddle point|saddle point]] of the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716069.png" />, one can interpret these conditions. At a saddle point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716070.png" /> the function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716071.png" /> satisfies the inequalities
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716072.png" /></td> </tr></table>
 
 
 
A point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716073.png" /> at which the conditions (10)–(12) hold satisfies the necessary conditions for the existence of a saddle point of the Lagrange function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716074.png" /> on the set <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716075.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716076.png" /> satisfying (10). In the case when <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716077.png" /> is concave for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716078.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716079.png" /> is convex if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716080.png" /> and concave if <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716081.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716082.png" />, the necessary conditions stated turn out to be sufficient, that is, the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716083.png" /> found from the necessary conditions is a saddle point of the Lagrange function <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716084.png" /> for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716085.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716086.png" /> satisfying (10), and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716087.png" /> is an absolute maximum of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716088.png" /> under the restrictions (7), (8).
 
 
 
Together with the Lagrange function written in the form (9), one also uses another form of writing a Lagrange function, which differs in the signs of the Lagrange multipliers. The form of writing the necessary conditions also changes. Suppose one is given the following problem of non-linear programming: Find a maximum of
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716089.png" /></td> <td valign="top" style="width:5%;text-align:right;">(13)</td></tr></table>
 
 
 
under the restrictions
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716090.png" /></td> <td valign="top" style="width:5%;text-align:right;">(14)</td></tr></table>
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716091.png" /></td> <td valign="top" style="width:5%;text-align:right;">(15)</td></tr></table>
 
 
 
The restrictions (7) for <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716092.png" /> reduce to (14) by a simple transformation. The conditions of equality type <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716093.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716094.png" />, are replaced by the inequalities <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716095.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716096.png" />, and they can also be reduced to the form (14).
 
 
 
Suppose that the Lagrange function is written in the form
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716097.png" /></td> <td valign="top" style="width:5%;text-align:right;">(16)</td></tr></table>
 
 
 
and that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716098.png" /> is the set of indices <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l05716099.png" /> from <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160100.png" /> for which the restrictions (14) are satisfied as strict inequalities. If <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160101.png" /> is the optimal solution of the problem (13)–(15), then if the requirement that the restrictions be regular is satisfied, there is a vector <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160102.png" />,
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160103.png" /></td> <td valign="top" style="width:5%;text-align:right;">(17)</td></tr></table>
 
 
 
such that
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160104.png" /></td> <td valign="top" style="width:5%;text-align:right;">(18)</td></tr></table>
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160105.png" /></td> <td valign="top" style="width:5%;text-align:right;">(19)</td></tr></table>
 
 
 
(see [[#References|[2]]], [[#References|[3]]]). Taking into account positive and zero values of <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160106.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160107.png" />, one sometimes writes (17)–(19) as follows:
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160108.png" /></td> </tr></table>
 
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160109.png" /></td> </tr></table>
 
 
 
If the restrictions (7) or (14) are linear, then the condition (mentioned above) that the restrictions be regular is always satisfied. Therefore, for linear restrictions the only assumption for the derivation of necessary conditions is that <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160110.png" /> is differentiable.
 
 
 
If in the problem (13)–(15) the functions <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160111.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160112.png" />, <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160113.png" />, are concave, then the point <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160114.png" /> satisfying the necessary conditions (17)–(19) is a saddle point of the Lagrange function (16) and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160115.png" /> gives an absolute maximum. The fact that the Lagrange function has a saddle point in this case can be proved without any differentiability assumptions on <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160116.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160117.png" />.
 
  
 +
====Calculus of varations====
 
An analogue of the Lagrange function is used in the calculus of variations in considering problems on a conditional extremum of functionals. Here also it is convenient to write the necessary conditions for optimality in a problem on a conditional extremum as necessary conditions for some functional (the analogue of the Lagrange function) constructed by means of Lagrange multipliers. Suppose, for example, that one is given the [[Bolza problem|Bolza problem]]: Find a minimum of the functional
 
An analogue of the Lagrange function is used in the calculus of variations in considering problems on a conditional extremum of functionals. Here also it is convenient to write the necessary conditions for optimality in a problem on a conditional extremum as necessary conditions for some functional (the analogue of the Lagrange function) constructed by means of Lagrange multipliers. Suppose, for example, that one is given the [[Bolza problem|Bolza problem]]: Find a minimum of the functional
 +
\[
 +
J (x) = \int_{t_1}^{t_2} f (t, x(t), \dot{x} (t))\, dt + g (t_1, x (t_1), t_2, x(t_2))\,
 +
\]
 +
in the presence of differential constraints of type
 +
\[
 +
\phi (t, x(t), \dot{x} (t)) =0 \qquad \qquad \forall t\in [t_1, t_2]
 +
\]
 +
and boundary conditions
 +
\[
 +
\psi (t_1, g(t_1), t_2, g(t_2))\, ,
 +
\]
 +
where the unknown $x$ is a curve $x: [t_1, t_2]\to \mathbb R^n$ and the given functions $f$, $\phi$ and $\psi$ are $f: [t_1, t_2]\times \mathbb R^n \times \mathbb R^n \to \mathbb R$, $\phi: [t_1, t_2] \times \mathbb R^n \times \mathbb R^n \to \mathbb R^m$, for some $m< n$, and $\psi: [t_1, t_2]\times \mathbb R^n \times\mathbb R^n \to \mathbb R^p$ for some $p\leq 2n+2$.
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160118.png" /></td> </tr></table>
+
Under appropriate assumptions a necessary condition for a minimizer is the existence of Lagrange multipliers
 +
$\lambda_1, \ldots, \lambda_m : [t_1, t_2]\to \mathbb R$ and $e_1, \ldots, e_p\in \mathbb R$ such that the triple $(x, \lambda, e)$ is a critical point of the
 +
functional
 +
\[
 +
I (x, \lambda, e) := \int_{t_1}^{t_2} \left( f (t, x(t), \dot{x} (t)) + \sum_i \lambda_i (t) \phi (t, x(t), \dot{x} (t)\right)\, dt
 +
+ g (t_1, x(t_1), t_2, x(t_2)) + \sum_\mu e_\mu \psi_\mu (t_1, g(t_1), t_2, g (t_2))\, .
 +
\]
 +
These necessary conditions, which form a closed system of relations for determining an optimal solution and the corresponding Lagrange multipliers, are written in specific formulations as the [[Euler equation|Euler equation]], the [[Weierstrass conditions (for a variational extremum)|Weierstrass conditions (for a variational extremum)]] and the [[Transversality condition|transversality condition]].
  
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160119.png" /></td> </tr></table>
+
====Comments====
 +
Since the early 1960's, much effort has gone into the development of a generalized kind of differentiation for extended-real-valued functions on real vector spaces that can serve well in the analysis of problems of optimization. In such problems, which range from models in economics and operations research to variational principles that correspond to partial differential equations, it is often necessary to treat functions that are not differentiable in any traditional two-sided sense and may even have infinite values at some points. See {{Cite|Ro2}}.
  
in the presence of differential constraints of equality type:
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160120.png" /></td> </tr></table>
 
 
and boundary conditions:
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160121.png" /></td> </tr></table>
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160122.png" /></td> </tr></table>
 
 
In this conditional extremum problem the necessary conditions are obtained as necessary conditions for an unconditional extremum of the functional (see [[#References|[4]]])
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160123.png" /></td> </tr></table>
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160124.png" /></td> </tr></table>
 
 
formed by means of the Lagrange multipliers
 
 
<table class="eq" style="width:100%;"> <tr><td valign="top" style="width:94%;text-align:center;"><img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160125.png" /></td> </tr></table>
 
 
These necessary conditions, which form a closed system of relations for determining an optimal solution <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160126.png" /> and the corresponding Lagrange multipliers <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160127.png" /> and <img align="absmiddle" border="0" src="https://www.encyclopediaofmath.org/legacyimages/l/l057/l057160/l057160128.png" />, are written in specific formulations as the [[Euler equation|Euler equation]], the [[Weierstrass conditions (for a variational extremum)|Weierstrass conditions (for a variational extremum)]] and the [[Transversality condition|transversality condition]].
 
 
====References====
 
<table><TR><TD valign="top">[1]</TD> <TD valign="top">  V.I. Smirnov,  "A course of higher mathematics" , '''1''' , Addison-Wesley  (1964)  (Translated from Russian)</TD></TR><TR><TD valign="top">[2]</TD> <TD valign="top">  G.F. Hadley,  "Nonlinear and dynamic programming" , Addison-Wesley  (1964)</TD></TR><TR><TD valign="top">[3]</TD> <TD valign="top">  H.W. Kuhn,  A.W. Tucker,  "Nonlinear programming" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press  (1951)  pp. 481–492</TD></TR><TR><TD valign="top">[4]</TD> <TD valign="top">  G.A. Bliss,  "Lectures on the calculus of variations" , Chicago Univ. Press  (1947)</TD></TR></table>
 
 
 
 
====Comments====
 
Since the early 1960's, much effort has gone into the development of a generalized kind of differentiation for extended-real-valued functions on real vector spaces that can serve well in the analysis of problems of optimization. In such problems, which range from models in economics and operations research to variational principles that correspond to partial differential equations, it is often necessary to treat functions that are not differentiable in any traditional two-sided sense and may even have infinite values at some points. See [[#References|[a2]]].
 
  
 
====References====
 
====References====
<table><TR><TD valign="top">[a1]</TD> <TD valign="top"> R.T. Rockafellar,  "Convex analysis" , Princeton Univ. Press  (1970)</TD></TR><TR><TD valign="top">[a2]</TD> <TD valign="top"> R.T. Rockafellar,  "The theory of subgradients and its applications to problems of optimization. Convex and nonconvex functions" , Heldermann (1981)</TD></TR><TR><TD valign="top">[a3]</TD> <TD valign="top">  M. Avriel,  "Nonlinear programming: analysis and methods" , Prentice-Hall (1976)</TD></TR><TR><TD valign="top">[a4]</TD> <TD valign="top"> L. Cesari,  "Optimization - Theory and applications" , Springer (1983)</TD></TR><TR><TD valign="top">[a5]</TD> <TD valign="top">  N.I. Akhiezer,  "The calculus of variations" , Blaisdell (1962(Translated from Russian)</TD></TR></table>
+
{|
 +
|-
 +
|valign="top"|{{Ref|Ak}}|| N.I. Akhiezer,  "The calculus of variations" , Blaisdell  (1962)
 +
|-
 +
|valign="top"|{{Ref|Av}}|| M. Avriel,  "Nonlinear programming: analysis and methods" , Prentice-Hall (1976)
 +
|-
 +
|valign="top"|{{Ref|Bl}}|| G.A. Bliss,  "Lectures on the calculus of variations" , Chicago Univ. Press  (1947)
 +
|-
 +
|valign="top"|{{Ref|Ce}}|| L. Cesari,  "Optimization - Theory and applications" , Springer  (1983)
 +
|-
 +
|valign="top"|{{Ref|Ha}}|| G.F. Hadley,  "Nonlinear and dynamic programming" , Addison-Wesley (1964)
 +
|-
 +
|valign="top"|{{Ref|KT}}||  H.W. Kuhn,  A.W. Tucker,  "Nonlinear programming" , ''Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950)'' , Univ. California Press (1951)
 +
|-
 +
|valign="top"|{{Ref|Ro}}|| R.T. Rockafellar,  "Convex analysis" , Princeton Univ. Press 
 +
|-
 +
|valign="top"|{{Ref|Ro2}}|| R.T. Rockafellar,  "The theory of subgradients and its applications to problems of optimization. Convex and nonconvex functions" , Heldermann (1981)
 +
|-
 +
|valign="top"|{{Ref|Sm}}|| V.I. Smirnov,  "A course of higher mathematics" , '''1''' , Addison-Wesley (1964)   
 +
|-
 +
|}

Latest revision as of 23:04, 27 June 2014

2020 Mathematics Subject Classification: Primary: 49-XX [MSN][ZBL]

A function, related to the method of Lagrange multipliers, that is used to derive necessary conditions for conditional extrema of functions of several variables or, in a wider setting, of functionals. The primary example is that of locating the local maxima (resp. minima) of a function $g$ of $n$ variables $(x_1, \ldots , x_n)=:x$ over the set $\Sigma$ of points $x$ which satisfy the constraints $g_1 (x) = \ldots = g_m (x) = 0$. The method of Lagrange multipliers is particularly useful because the corresponding conditions can often be solved without resorting to explcit formulae describing the set of points in $\Sigma$ in terms of $n-m$ independent variables. In fact the necessary conditions obtained by means of a Lagrange function form often a closed system of relations, i.e. a system of $n+m$ equations in $n+m$ variables. Lagrange functions are used in both theoretical questions of linear and non-linear programming as in applied problems where they provide often explicit computational methods.

The method of Lagrange multipliers

A rather general statement in the finite-dimensional setting is the following

Theorem 1 Assume $f, g_1, \ldots, g_m$ are $C^1$ function of $n$ variables and let $x^\star$ be a local maximum (resp. minimum) point of $f$ subject to the constraint $g_1, \ldots, g_m$: namely there is a neighborhood $U$ of $x^\star$ such that \[ f (x^\star) \geq f(y) \quad \big(\mbox{resp. } f (x^\star) \leq f(y)\big)\qquad \mbox{for all } y \mbox{ such that } g_i (x^\star)= g_i (y) = b_i \qquad \forall i\in \{1, \ldots , m\}\, . \] Then there exists $\lambda_0^\star, \lambda_1^\star, \ldots, \lambda_m^\star \in \mathbb R$ such that, if $F$ denotes the function \[ F(\lambda, x) = \lambda_0 f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, , \] then \[ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad \forall j\in \{1, \ldots , n\} \] and \[ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad \forall i\in \{1, \ldots, m\}\, . \]

$F$ is called Lagrange function and the numbers $\lambda_0, \lambda_1, \ldots , \lambda_m$ Lagrange multipliers. The statement takes a more elegant form under the assumption that $b = (b_1, \ldots , b_m)$ is a regular value of the map $g= (g_1, \ldots , g_m)$. More precisely

Theorem 2 Let $f$, $g= (g_1, \ldots , g_m)$ and $x^\star$ be as in Theorem 1 and assume in addition that $m<n$ and that the differential of $g$ at $x^\star$ is surjective. Then there exists $\lambda^\star = (\lambda_1^\star, \ldots, \lambda_m^\star) \in \mathbb R^m$ such that $(x^\star, \lambda^\star)$ is a critical point of the function \begin{equation}\label{e:Lagrange_function} F(\lambda, x) = f(x) + \sum_{i=1}^m \lambda_i (b_i - g_i (x))\, . \end{equation}

In fact the statement of Theorem 2 is more common than that of Theorem 1 and it is typically the slightly less general version of \eqref{e:Lagrange_function} to which the name "Lagrange function" refers to. Observe also that Theorem 2 gives a system of $m+n$ equations in the $m+n$ unknowns $x^\star_1, \ldots , x^\star_n, \lambda^\star_1 , \ldots , \lambda^\star_m$.

For most practical problems Theorem 2 is enough. However it is not difficult to give examples where Theorem 2 does not apply and one has to resort to Theorem 1 (see [Sm] or [Bl]). It is possible to give an interpretation of the Lagrange multipliers $\lambda_i^\star$ that has a definite physical meaning and which can be succesfully generalized to constraints in the form of inequalities (see Lagrange multipliers).

In the case when the function $f$ to be optimized is quadratic and the constraints $g_i$ are linear, the resulting system of necessary conditions turns out to be linear. Such system is otherwise nonlinear and in general one has to resort to appropriate computational methods to approximate the solutions (such as the Newton method). The main issue, apart from the computational difficulties of solving a system of non-linear equations, is the problem of obtaining all solutions satisfying the necessary conditions. The fact that no computational process ensures that one obtains all solutions of the system is one of the main restrictions on the practical applicability of the method of Lagrange multipliers.

Constraints in the form of inequalities

As already mentioned, the method can be extended to minimization problems in which some of the constraints appear as inequalities. Consider in particular the following setting. $x^\star$ is a local extremum for the function $f: \mathbb R^n \to \mathbb R$ among the points $x$ which satisfy the following constraints: \begin{equation}\label{e:gen_const} \left\{ \begin{array}{ll} g_i (x) \leq b_i \qquad\qquad & \mbox{ for } 1 \leq i \leq m_1\\ g_i (x) \geq b_i \qquad\qquad & \mbox{ for } m_1+1 \leq i \leq m_2\\ g_i (x) = b_i \qquad\qquad & \mbox{ for } m_2+1 \leq i \leq m\\ x_j \geq 0 \qquad\qquad &\mbox{ for } 1 \leq j \leq n\, . \end{array}\right. \end{equation} Specifically, let us assume that $x^\star$ is a local maximum, that the functions $f$ and $g_i$ are $C^1$, that $m < n$ and that $b$ is a regular value for the map $g$ (or, more generally, that the differential of $g$ is surjective at the point $x^\star$). If we consider the Lagrange function as in \eqref{e:Lagrange_function} and denote by $J$ the set of indices $j$ for which $x^\star_j > 0$ and by $I$ the set of indices for which the inequalities in \eqref{e:gen_const} are strict, then there is a vector $\lambda^\star \in \mathbb R^m$ which satisfies the following set of conditions \begin{equation}\label{e:Lagr_gen} \left\{ \begin{array}{ll} \lambda^\star_i \geq 0 \qquad\qquad&\mbox{ for } 1 \leq i \leq m_1\\ \lambda^\star_i \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\ \lambda^\star_i = 0 \qquad\qquad &\mbox{ for } i \in I\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \geq 0 \qquad\qquad &\mbox{ for } 1\leq i \leq m_1\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } m_1+1 \leq i \leq m_2\\ \frac{\partial F}{\partial \lambda_i} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for } m_2 +1 \leq i \leq m\\ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) = 0 \qquad\qquad &\mbox{ for } j\in J\\ \frac{\partial F}{\partial x_j} (x^\star, \lambda^\star) \leq 0 \qquad\qquad &\mbox{ for } j\not\in J\, . \end{array} \right. \end{equation} Obviously these conditions generalize the ones of Theorem 2. Using the concept of a saddle point of the function $F$, one can interpret these conditions above as necessary conditions to ensure that \[ F (x, \lambda^\star) \leq F (x^\star, \lambda^\star) \leq F (x^\star, \lambda)\, \] for all $x$ and $\lambda$ in a neighborhood, respectively, of $x^\star$ and $\lambda^\star$. In the case when $f$ is concave, $g_i$ is convex when $\lambda^\star_i>0$ and concave when $\lambda^\star_i <0$, then such conditions turn out to be sufficient.

The general case considered above can be easily reduced, upon a transformation of the dependent and independent variables, into the problem of finding the maximum of a function $f$ under the restriction that $x_i \geq 0$ for all $0\leq i \leq n$, $g_j (x) \geq 0$ for all $1\leq j \leq \bar{m}$ and $g_i (x) =0$ for all $\bar{m}+1 \leq i \leq m$. Obviously in this case the conditions \eqref{e:Lagr_gen} take a simpler form.

Calculus of varations

An analogue of the Lagrange function is used in the calculus of variations in considering problems on a conditional extremum of functionals. Here also it is convenient to write the necessary conditions for optimality in a problem on a conditional extremum as necessary conditions for some functional (the analogue of the Lagrange function) constructed by means of Lagrange multipliers. Suppose, for example, that one is given the Bolza problem: Find a minimum of the functional \[ J (x) = \int_{t_1}^{t_2} f (t, x(t), \dot{x} (t))\, dt + g (t_1, x (t_1), t_2, x(t_2))\, \] in the presence of differential constraints of type \[ \phi (t, x(t), \dot{x} (t)) =0 \qquad \qquad \forall t\in [t_1, t_2] \] and boundary conditions \[ \psi (t_1, g(t_1), t_2, g(t_2))\, , \] where the unknown $x$ is a curve $x: [t_1, t_2]\to \mathbb R^n$ and the given functions $f$, $\phi$ and $\psi$ are $f: [t_1, t_2]\times \mathbb R^n \times \mathbb R^n \to \mathbb R$, $\phi: [t_1, t_2] \times \mathbb R^n \times \mathbb R^n \to \mathbb R^m$, for some $m< n$, and $\psi: [t_1, t_2]\times \mathbb R^n \times\mathbb R^n \to \mathbb R^p$ for some $p\leq 2n+2$.

Under appropriate assumptions a necessary condition for a minimizer is the existence of Lagrange multipliers $\lambda_1, \ldots, \lambda_m : [t_1, t_2]\to \mathbb R$ and $e_1, \ldots, e_p\in \mathbb R$ such that the triple $(x, \lambda, e)$ is a critical point of the functional \[ I (x, \lambda, e) := \int_{t_1}^{t_2} \left( f (t, x(t), \dot{x} (t)) + \sum_i \lambda_i (t) \phi (t, x(t), \dot{x} (t)\right)\, dt + g (t_1, x(t_1), t_2, x(t_2)) + \sum_\mu e_\mu \psi_\mu (t_1, g(t_1), t_2, g (t_2))\, . \] These necessary conditions, which form a closed system of relations for determining an optimal solution and the corresponding Lagrange multipliers, are written in specific formulations as the Euler equation, the Weierstrass conditions (for a variational extremum) and the transversality condition.

Comments

Since the early 1960's, much effort has gone into the development of a generalized kind of differentiation for extended-real-valued functions on real vector spaces that can serve well in the analysis of problems of optimization. In such problems, which range from models in economics and operations research to variational principles that correspond to partial differential equations, it is often necessary to treat functions that are not differentiable in any traditional two-sided sense and may even have infinite values at some points. See [Ro2].


References

[Ak] N.I. Akhiezer, "The calculus of variations" , Blaisdell (1962)
[Av] M. Avriel, "Nonlinear programming: analysis and methods" , Prentice-Hall (1976)
[Bl] G.A. Bliss, "Lectures on the calculus of variations" , Chicago Univ. Press (1947)
[Ce] L. Cesari, "Optimization - Theory and applications" , Springer (1983)
[Ha] G.F. Hadley, "Nonlinear and dynamic programming" , Addison-Wesley (1964)
[KT] H.W. Kuhn, A.W. Tucker, "Nonlinear programming" , Proc. 2nd Berkeley Symp. Math. Stat. Probab. (1950) , Univ. California Press (1951)
[Ro] R.T. Rockafellar, "Convex analysis" , Princeton Univ. Press
[Ro2] R.T. Rockafellar, "The theory of subgradients and its applications to problems of optimization. Convex and nonconvex functions" , Heldermann (1981)
[Sm] V.I. Smirnov, "A course of higher mathematics" , 1 , Addison-Wesley (1964)
How to Cite This Entry:
Lagrange function. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Lagrange_function&oldid=13134
This article was adapted from an original article by I.B. Vapnyarskii (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article