# Fritz John condition

2010 Mathematics Subject Classification: Primary: 49Kxx [MSN][ZBL]

A necessary condition for local optimality in problems with inequality constraints. It is closely related to the classical problem of minimizing a function $f$ on a constraint set $F=\left\{x\in\mathbb{R}^n\colon h^i(x)=0, i\in P\right\}$. The functions are defined on $\mathbb{R}^n$, the objective $f$ is assumed to be differentiable and the constraints $h^i$, $i\in P$, are continuously differentiable at a feasible point $x^*\in F$ tested for optimality; $P$ is a finite index set. If $f(x^*)\leq f(x)$ for every $x\in F\cap N(x^*)$, where $N(x^*)$ is a neighbourhood of $x^*$, then $x^*$ is said to be a constrained local minimum. At such a point the gradients of the objective function and the constraints are linearly dependent, i.e., there exist multipliers $\lambda_i\in\mathbb{R}$, $i\in\{0\}\cup P$, not all zero, such that $\lambda_0\nabla f(x^*) + \sum_{i\in P}\lambda_i\nabla h^i(x^*)=0$. If the gradients of the constraints are linearly independent, then one can specify $\lambda_0=1$. This result, usually proved by the implicit function theorem, is used to formulate the Lagrange method in calculus.

Now assume that the feasible set is determined by inequality constraints, i.e., consider \begin{equation} \label{eq:1} \begin{array}{ll} \mbox{minimize} & f(x), \\ \mbox{subject to} & x\in F=\left\{x\in\mathbb{R}^n\colon g^i(x)\leq 0,\; i\in P\right\}. \end{array} \end{equation} The Fritz John condition describes local optimality of a feasible point $x^*$ using the gradients of the objective function and the "active" constraints $P(x^*)=\left\{i\in P\colon g^i(x^*)=0\right\}$, [a10]. The basic Fritz John condition is as follows. Consider the problem \eqref{eq:1} where all functions are differentiable at some feasible $x^*$. If $x^*$ is a constrained local minimum, then there exist multipliers $u_i\geq 0$, $i\in\{0\}\cup P(x^*)$, not all zero, such that \begin{equation} u_0\nabla f(x^*) + \sum_{i\in P(x^*)}u_i\nabla g^i(x^*)=0. \end{equation} By Gordan's theorem of the alternative (e.g., [a14]), or the Dubovitskii–Milyutin theorem (e.g., [a7]), the Fritz John condition is equivalent to the inconsistency of the system \begin{equation} \nabla f(x^*)d<0,\quad\nabla g^i(x^*)d<0,\quad i\in P(x^*), \end{equation} which is a "primal" optimality condition. For problems with both equality and inequality constraints, the Fritz John condition requires that the equality constraints be continuously differentiable while the active inequality constraints need only be differentiable. The multipliers that correspond to the equality constraints are unrestricted in sign. This result, referred to as the Mangasarian–Fromovitz condition, does not follow from the Fritz John condition; e.g., [a14], [a15]. Conditions that guarantee that the leading multiplier in the Fritz John condition is positive, in which case it can be set $u_0=1$, are called regularization conditions or constraint qualifications. One of these is that the gradients of active constraints are linearly independent. This condition is typically violated in multi-level and multi-objective optimization problems; e.g., [a4], [a23].

## Convex programs

The Fritz John condition is not sufficient for optimality even for linear programs; e.g., [a3], p. 150, [a4], [a14]. However, a reformulation of it is both necessary and sufficient for optimality of a feasible point $x^*$ for a convex program, i.e., for the problem \eqref{eq:1} when all functions are convex. First, using properties of feasible directions, optimality of $x^*$ is characterized by the inconsistency of the system \begin{equation} \nabla f(x^*)d<0,\quad\nabla g^i(x^*)d<0,\quad i\in P(x^*)\setminus P^=,\quad d\in D(x^*), \end{equation} where $P^=$ is the set of the constraints that are equal to zero on the entire feasible set and $D(x^*)$ is the intersection of the cones of directions of constancy at $x^*$ of all such constraints. This is equivalent to the Fritz John formulation $u_0\nabla f(x^*) + \sum_{i\in P(x^*)\setminus P^=}u_i\nabla g^i(x^*)\in \{D(x^*)\}^+$, where the multipliers are non-negative and not all equal to zero, and $\{D(x^*)\}^+$ is the polar set of $D(x^*)$, [a4]. Moreover, feasibility of $x^*$ guarantees here $u_0=1$. If the gradients of active constraints are linearly independent, or, more generally, if $P^=$ is an empty set (this is known as Slater's condition, cf. Mathematical programming), then the Fritz John condition becomes \begin{equation} \nabla f(x^*) + \sum_{i\in P(x^*)}u_i\nabla g^i(x^*)=0,\quad u_i\geq 0,\quad i\in P(x^*) \end{equation} Consistency of this system is known as the Karush–Kuhn–Tucker conditions.

Alternatively, in convex programming the Fritz John condition is equivalent to the existence of a saddle point of the Lagrangian $L(x,u)=f(x)+\sum_{i\in P\setminus P^=}u_i g^i(x)$. First, define $F^==\left\{x\colon g^i(x)=0,\; i\in P^=\right\}$ and let $c$ be the cardinality of $P\setminus P^=$. Consider the convex program \eqref{eq:1}. Then a point $x^*\in F^=$ is optimal if and only if there exists a $u^*\geq 0$ in $\mathbb{R}^c$ such that $L(x^*,u)\leq L(x^*,u^*)\leq L(x,u^*)$ for every $u\geq 0$ and every $x\in F^=$ (saddle-point characterization of optimality).

This formulation is useful when some functions are not differentiable. Using non-smooth analysis, one can replace derivatives by other objects such as subgradients (see, e.g., [a3]).

## Optimality and stability

Descriptions of optimality often require stability in the convex model \begin{equation} \label{eq:2} \begin{array}{ll} \mbox{minimize}_x & f(x,\theta), \\ \mbox{subject to} & x\in F(\theta)=\left\{x\colon g^i(x,\theta)\leq 0,\; i\in P\right\}; \end{array} \end{equation} all functions are continuous, $\theta\in\mathbb{R}^p$ is considered as a "parameter" , and $f(\cdot,\theta)$, $g^i(\cdot,\theta)\colon\mathbb{R}^n\rightarrow\mathbb{R}$, $i\in P$ are convex for every $\theta$. (The functions need not be convex in $\theta$.) Assume that the set of optimal solutions $x^0=x^0(\theta)$ of the corresponding convex program is non-empty and bounded at some $\theta=\theta^*$. Perturbations of $\theta$ from $\theta^*$ that locally preserve lower semi-continuity of the feasible set mapping $F\colon\theta\rightarrow F(\theta)$ form a "region of stability" at $\theta^*$, denoted by $S(\theta^*)$. Denote the constraints that are equal to zero on $F(\theta)$ by $P^=(\theta)$. Let $F_*^=(\theta)=\{x\colon g^i(x,\theta)\leq 0,\; i\in P^=(\theta^*)\}$, let $c$ be the cardinality of the set $P\setminus P^=(\theta^*)$, and consider the Lagrangian $L^*(x,u;\; \theta)=f(x,\theta)+\sum_{i\in P\setminus P^=(\theta^*)}u_i g^i(x,\theta)$. Then the following condition characterizes local optimality of $\theta^*$, relative to stable perturbations, for the optimal value function $f^0(\theta)=f(x^0(\theta),\theta)$ (cf., e.g., [a23]).

Consider the convex model \eqref{eq:2} around some $\theta^*$. Then $\theta^*$ locally minimizes $f^0(\theta)$ relative to perturbations in $S(\theta^*)$ if and only if there exists a non-negative function $U\colon S(\theta^*)\cap N(\theta^*)\rightarrow\mathbb{R}^c$ such that, whenever $\theta\in S(\theta^*)\cap N(\theta^*)$, \begin{equation} L^*(x^0(\theta^*),u;\;\theta^*)\leq L^*(x^0(\theta^*),U(\theta^*);\;\theta^*)\leq L^*(x,U(\theta);\;\theta), \end{equation} for every non-negative $u\in\mathbb{R}^c$ and every $x\in F_*^=(\theta^*)$ (a characterization of local optimality on a region of stability).

The above claim does not generally hold if the stability requirement is omitted. As an example, consider the convex model $\min_(x)\left\{x_1\theta\colon x_2\leq 0,\;(x_1^2-x_2)\theta^2\leq 0,\;\max\{x_1^2+x_2^2-1,0\}\leq 0\right\}$. Here $f^0(\theta)=0$ for any $\theta$, hence $\theta^*=0$ is a global minimum. The saddle-point condition requires $U_1=U_1(\theta)\geq 0$ such that $x_1\theta + U_1 x_2\geq 0$ for all $x_1^2\leq x_2$. But such a multiplier function does not exist outside the region of stability, e.g., for $\theta <0$, as a sequence on $x_1^2= x_2$ with $x_1\rightarrow +0$ shows.

The necessary conditions for local optimality of parameters, subject to stable perturbations, are simplified under "input constraint qualifications" [a21]. Global optimality of a parameter $\theta^*$ for the convex model can be characterized by a saddle-point condition on the region of cooperation of $\theta^*$. This is the set of all $\theta$ for which $P^=(\theta)\subset P^=(\theta^*)$. No stability is required in this case [a22]. Programs that assume the form of a convex model, after "freezing" some variables, are called partly convex. Every program with twice continuously differentiable functions can be transformed into an equivalent partly convex program (cf., e.g., [a8], [a12]).

## Abstract programs

Various extensions of the basic Fritz John condition have been studied in abstract spaces with finite or infinite number of constraints; e.g., [a2], [a7], [a11], [a13], [a24]. The weakest derivative for which this condition holds appears to be the Gil de Lamadrid and Sova compact derivative [a16] in non-sequential locally convex Hausdorff spaces. A saddle-point condition that characterizes locally optimal parameters relative to their stable perturbations for the abstract convex model $\min_{x}f(x,\theta)$ on the set $x\in F(\theta)=\left\{x\colon g^\tau(x,\theta)\leq 0,\; \tau\in T\right\}$, where $x$ and the parameter $\theta$ belong to some suitable abstract spaces and $T$ is a compact subset of $\mathbb{R}$, can be given in terms of the existence of a finite Borel measure $\xi$ for the Lagrangian $L(x,\xi;\;\theta)=f(x,\theta)+\int_T g^\tau(x,\theta)\xi (dt)$ (cf. e.g. [a1]). The abstract versions of the Fritz John and Karush–Kuhn–Tucker conditions are applicable to a wide range of problems, from optimal control problems in engineering [a5], [a6], [a17] and management [a18], [a20] to the prior selection problem in Bayesian statistical inference [a1]. These conditions are used for identification of optimal solutions, parameters, controls, and states, and for formulations of duality theories and numerical methods. The multipliers describe sensitivity of the optimal value function subject to the right-hand side perturbations and can be interpreted as values of the constraints ( "shadow prices" in linear programming). Some optimization problems can alternatively be studied by calculus of variations which, fundamentally, is the same method; e.g., [a6], [a9], [a18], [a19], [a20], p. 17.

How to Cite This Entry:
Fritz John condition. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Fritz_John_condition&oldid=42177
This article was adapted from an original article by S. Zlobec (originator), which appeared in Encyclopedia of Mathematics - ISBN 1402006098. See original article