7.4. Constraint Qualifications#

7.4.1. Concepts#

Feasible Sequence

  • Feasible point \(\bar{x}\)

  • Sequence \(\{x^k\}\) (e.g., optimization algorithm iterations)

  • Limit \(\lim_{k \rightarrow \infty} \{x^k\} = \bar{x}\)

  • \(x^{k}\) feasible for all \(k \geq K\) where \(K\) is sufficiently large

Limiting Direction \(d\)

\[\begin{equation*} \lim_{k \rightarrow \infty} \frac{x^k - \bar{x}}{|| x^k - \bar{x} || } = d \end{equation*}\]

Recall the ball rolling example.

  • Feasible sequence: path of the ball rolling to rest at local min

  • Limiting direction: tangent to the path and opposite of the direct the ball is rolling

ball_example

In the above picture:

  • Green sequence is always feasible and thus is a feasible sequence

  • Blue sequence becomes feasible starting with \(x^3\) and is a feasible sequence

Theorem 4.8: If \(x^*\) is a solution of (NLP), then all feasible sequences leading to \(x^*\) must satisfy

\[\begin{equation*} \nabla f(x^*)^T d > 0 \end{equation*}\]

where \(d\) is the limiting direction of the feasible sequence.

Ball analogy: can only roll down the hill. Cannot come from downhill direction (and still be feasible).

Big Picture

Constraint qualification act as the link between KKT conditions (which we numerical solve) and limiting directions (Theorem 4.8).

Key questions:

  • Does \(\nabla g\) and \(\nabla h\) properly characterize/identify local solutions?

  • Does \(\nabla h_i(x^*)^T d = 0\) and \(\nabla g_i(x^*)^T d \leq 0\) \(\forall i \in A(x^*)\) capture the limiting directions of the solution?

7.4.2. Linear Constrainted Optimization Problems#

picture

picture

picture

7.4.3. Example#

Consider the following two dimensional optimization problem:

\[\begin{split} \begin{align} \min_{x_1,x_2} \quad & f(x) := x_1 \\ \mathrm{s.t.} \quad & g_1(x) := x_2 \leq x_1^3 \\ & g_2(x) := -x_1^3 \leq x_2 \end{align} \end{split}\]

This is an example from Section 4.3 in Nonlinear Programming by Biegler.

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt

7.4.3.1. Visualize Feasible Set#

n = 101
x1 = np.linspace(-3,3,n)
plt.figure()

g1 = np.power(x1,3)
g2 = -g1

plt.plot(x1,g1,color="blue",linestyle="-",label="$x_2 \leq x_1^3$")
plt.fill_between(x1,g1,np.min(g1)*np.ones(n),color="blue",alpha=0.5)


plt.plot(x1,g2,color="red",linestyle="-",label="$-x_1^3 \leq x_2$")
plt.fill_between(x1,np.max(g2)*np.ones(n),g2,color="red",alpha=0.5)

plt.xlabel("$x_1$")
plt.ylabel("$x_2$")

plt.grid()
plt.legend(loc="center left")
plt.show()
../../_images/2163ac6bbf2c6b98e853d801fa66afa5d274c55861b9f2a667fc00449aa9df48.png

Discussion

Where on the graph are both constraints satisfied? Choices:

  1. White region

  2. Blue region

  3. Red region

  4. Purple region

7.4.3.2. KKT Conditions#

Rewrite the problem in cannonical form:

\[\begin{split} \begin{align*} \min_{x_1,x_2} \quad & x_1 \\ \mathrm{s.t.} \quad & x_2 - x_1^3 \leq 0 \\ & -x_1^3 - x_2 \leq 0 \end{align*} \end{split}\]

The Lagrangian is defined as:

\[ L(x,u) = f(x) + g(x)^T u \]

Specifically:

\[ L = x_1 + u_1 (x_2 - x_1^3) + u_2 (-x_1^3 - x_2) \]

Set the gradient of the Lagrangian with respect to \(x\) equal to zero:

\[\begin{split} \nabla_x L = \begin{bmatrix} 1 - 3u_1 x_1^2 - 3u_2 x_1^2 \\ 0 + u_1 - u_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{split}\]

Complementary slackness conditions:

  • \((x_2 - x_1^3) u_1 = 0\)

  • \((-x_1^3 - x_2) u_2 = 0\)

Inequality constraints:

  • \(x_2 - x_1^3 \leq 0\)

  • \(-x_1^3 - x_2 \leq 0\)

Non-negativity of Lagrange multipliers:

  • \(u_1 \geq 0\)

  • \(u_2 \geq 0\)

7.4.3.3. Enumerate the Possible Feasible Sets#

  1. Neither constraint is strongly active
    \(u_1 = 0\), \(u_2 = 0\)

    Gradient of the Lagrangian:

    \[\begin{split} \nabla_x L = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \neq \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{split}\]

    Solution is not bounded!

  2. \(g_1\) is active, \(g_2\) is NOT strongly active
    \(u_1 \geq 0\), \(u_2 = 0\)

    Gradient of the Lagrangian:

    \[\begin{split} \nabla_x L = \begin{bmatrix} 1 - 3u_1 x_1^2 \\ 0 + u_1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{split}\]

    From this, \(u_1 = 0\)KKT conditions are not satisfied!

  3. \(g_1\) is NOT strongly active, \(g_2\) is active
    Same problem as case 2.

  4. Both \(g_1\) and \(g_2\) are strongly active
    \(u_1 \geq 0\), \(u_2 \geq 0\)

    Gradient of the Lagrangian:

    \[\begin{split} \nabla_x L = \begin{bmatrix} 1 - 3u_1 x_1^2 - 3u_2 x_1^2 \\ u_1 - u_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{split}\]

    From \(\nabla_{x_2} L = 0\), we get \(u_1 = u_2\).

    Substituting into \(\nabla_{x_1} L = 0\):

    \[ 1 - 3u_1 x_1^2 - 3u_2 x_1^2 = 0 \]

    Solving for \(x_1\):

    \[ x_1 = \sqrt{\frac{1}{3u_1}} \]

    This means we can see from visual inspection that the optimal solution occurs at:

    • \(x_1 \to 0\), \(x_2 \to 0\)

    • \(u_1, u_2 \to \infty\)

    How to handle this case? Need to draw the feasible region and possible limiting directions.

    Feasible region and possible limiting directions

7.4.3.4. Return to Theorem 4.8#

Recall that KKT conditions encode a limiting direction too:

\[\begin{equation*} \nabla f(x^*)^T d = 0 \end{equation*}\]
\[\begin{equation*} \nabla g_i(x^*)^T d \leq 0, \, i \in A(x^*) \end{equation*}\]

Consider the example when both constraints are active.

Gradients of the constraints:

\[\begin{equation*} \frac{\partial g_1}{\partial x} = \begin{bmatrix} -3x_1^2 \\ 1 \end{bmatrix}, \quad \frac{\partial g_2}{\partial x} = \begin{bmatrix} -3x_1^2 \\ -1 \end{bmatrix} \end{equation*}\]

Now we construct the KKT conditions:

\[\begin{equation*} \underbrace{\begin{bmatrix} -3x_1^2 & 1 \\ -3x_1^2 & -1 \end{bmatrix}}_{ \nabla g(x^*)^T} d \leq 0, \, i \in A(x^*) \end{equation*}\]

Recall both constraints are active at the solution \(x_1 = x_2 = 0\):

\[\begin{equation*} \begin{bmatrix} -3x_1^2 & 1 \\ -3x_1^2 & -1 \end{bmatrix} \cdot d = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}\]

Substituting \(x_1 = x_2 = 0\):

\[\begin{equation*} \begin{bmatrix} 0 & 1 \\ 0 & -1 \end{bmatrix} \cdot d = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation*}\]

Thus there are only two solution:

\[\begin{equation*} d_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad d_2 = \begin{bmatrix} -1 \\ 0 \end{bmatrix} \end{equation*}\]

… and all scalar multiples of these.

Does anyone notice anything particular about the gradients of the constraints, especially at the solution \(x_1 = x_2 = 0\)?

Search direction for example

7.4.4. Constraint Qualifications#

Constraint qualifications are the link between KKT conditions and limiting directions (Theorem 4.8).

Main idea: Ensure active constraints are not “too nonlinear” and KKT conditions adequately describe limit directions.

7.4.4.1. Linear Independence Constraint Qualification (LICQ)#

Definition 4.12 Given a local solution \(x^*\) to:

\[\begin{equation*} \min f(x) \quad \text{s.t.} \quad g(x) \leq 0, \, h(x) = 0, \end{equation*}\]

with active set \(\mathcal{A}(x^*)\), LICQ holds if:

\[\begin{equation*} \nabla g_i(x^*), \nabla h_i(x^*) \, \text{for} \, i \in \mathcal{A}(x^*) \end{equation*}\]

are linearly independent.

i.e., the part of the Jacobian corresponding to the active set is full rank.

Does LICQ hold for the example? No.

Lemma 4.13 Consider “the cone”:

\[\begin{equation*} \nabla h_i(x^*)^T d = 0, \quad \nabla g_i(x^*)^T d \leq 0, \, i \in \mathcal{A}(x^*) \end{equation*}\]
  1. The set of limiting directions for all feasible sequences is a subset of “the cone”.

  2. If LICQ holds, “the cone” is equivalent to the set of limiting directions for all feasible sequences.

  3. Consider all limiting directions in “the cone” with \(\|d\| = 1\). When LICQ holds, a feasible sequence \(\{x^k\}\):

    \[\begin{equation*} x^k = x^* + t d + o(t^2) \end{equation*}\]

    can always be constructed to satisfy:

    \[\begin{equation*} h_i(x^k) = t \nabla h(x^*)^T d = 0, \end{equation*}\]
    \[\begin{equation*} g_i(x^k) = t \nabla g_i(x^*)^T d \leq 0, \, i \in \mathcal{A}(x^*) \end{equation*}\]

    for some small positive \(t\) with \(\lim t \to 0\).

What does this mean? When LICQ holds, it is always possible to construct a feasible sequence in the limiting direction. This is important for algorithms.

Theorem 4.14: Consider local solution \(x^*\). If LICQ holds at \(x^*\), then:

  1. The KKT conditions are satisfied.

  2. If:

    • (i) \(f(x)\) and \(g(x)\) are convex,

    • (ii) \(h(x)\) is linear,

    and (iii) the KKT conditions are satisfied,

    then \(x^*\) is a global solution (this implies optimality).

Theorem 4.15 LICQ and Multipliers

Given a point \(x^*\) that satisfies the KKT conditions with multipliers \(u^*\) and \(v^*\), and with active set \(\mathcal{A}(x^*)\). If LICQ holds, then \(u^*\) and \(v^*\) are unique.

Why is this important for interpretation of KKT multipliers? Discuss.

7.4.4.2. Mangasarian-Fromovitz Constraint Qualification (MFCQ)#

What can we say if LICQ does not hold?

Theorem 4.16 Mangasarian-Fromovitz Constraint Qualification (MFCQ)

Given:

  • A local solution \(x^*\)

  • Active set \(\mathcal{A}(x^*)\)

MFCQ is defined by:

  1. Linear independence of equality constraint gradients.

  2. There exists a search direction \(d\) such that:

    \[\begin{equation*} \nabla g_i(x^*)^T d < 0, \quad \nabla h_i(x^*)^T d = 0, \quad i \in \mathcal{A}(x^*). \end{equation*}\]

Key Properties

  1. LICQ \(\implies\) MFCQ.

  2. If MFCQ holds, then multipliers \(u^*\) and \(v^*\) are bounded (but not necessarily unique).

Does MFCQ hold for the example?

No \(d\) satisfies:

\[\begin{equation*} \begin{bmatrix} 0 & 1 \\ 0 & -1 \end{bmatrix} d < 0 \end{equation*}\]

Thus, MFCQ does not hold!

7.4.5. Example Revisited: Solve with Pyomo#

from pyomo.environ import ConcreteModel, Var, Objective, Constraint, SolverFactory, Suffix, minimize, value

## Create concrete Pyomo model
m = ConcreteModel()

## Set up to extract dual variables after model solve.
m.dual = Suffix(direction=Suffix.IMPORT)

## Declare variables with initial values
m.x1 = Var(initialize=1)
m.x2 = Var(initialize=1)

## Declare objective
m.OBJ = Objective(expr=m.x1, sense = minimize)

m.g1 = Constraint(expr=m.x2 <= m.x1**3)

m.g2 = Constraint(expr=-m.x1**3 <= m.x2)

## Specify IPOPT as solver
solver = SolverFactory('ipopt')

## Solve the model
results = solver.solve(m, tee = True)

## Return the solution
print("x1 = ",value(m.x1))
print("x2 = ",value(m.x2))
print("\n")

## Inspect dual variables
m.dual.display()
Ipopt 3.14.16: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.16, running with linear solver MUMPS 5.7.3.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        4
Number of nonzeros in Lagrangian Hessian.............:        1

Total number of variables............................:        2
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        2
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        2

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  1.0000000e+00 0.00e+00 7.89e-01  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  9.7574511e-01 0.00e+00 1.04e-02  -1.0 2.54e-01    -  1.00e+00 1.00e+00f  1
   2  6.6105230e-01 0.00e+00 2.39e-01  -1.7 4.64e+00    -  1.00e+00 3.73e-01f  1
   3  4.6924687e-01 0.00e+00 2.74e-01  -1.7 1.92e-01    -  1.00e+00 1.00e+00h  1
   4  3.2820073e-01 0.00e+00 3.14e-01  -1.7 1.62e-01    -  1.00e+00 8.72e-01h  1
   5  2.6957767e-01 0.00e+00 1.60e-01  -1.7 5.86e-02    -  1.00e+00 1.00e+00f  1
   6  1.8935274e-01 0.00e+00 2.24e+00  -2.5 2.21e-01    -  1.00e+00 3.63e-01f  1
   7  1.3091124e-01 0.00e+00 3.19e-01  -2.5 5.84e-02    -  1.00e+00 1.00e+00h  1
   8  9.6126564e-02 0.00e+00 2.03e+00  -2.5 4.69e-02    -  1.00e+00 7.41e-01h  1
   9  7.1009142e-02 0.00e+00 2.58e-01  -2.5 2.51e-02    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  5.1777995e-02 0.00e+00 1.21e+01  -2.5 3.18e-02    -  1.00e+00 6.04e-01h  1
  11  4.0824512e-02 0.00e+00 1.97e-01  -2.5 1.10e-02    -  1.00e+00 1.00e+00f  1
  12  3.4583313e-02 0.00e+00 5.91e+01  -2.5 1.80e-02    -  1.00e+00 3.47e-01h  2
  13  2.6914569e-02 0.00e+00 1.79e-01  -2.5 7.67e-03    -  1.00e+00 1.00e+00h  1
  14  2.0308733e-02 0.00e+00 2.01e-01  -2.5 6.61e-03    -  1.00e+00 1.00e+00h  1
  15  1.9046691e-02 0.00e+00 3.31e-02  -2.5 1.26e-03    -  1.00e+00 1.00e+00h  1
  16  1.2746908e-02 0.00e+00 5.71e+02  -3.8 1.83e-02    -  1.00e+00 3.44e-01f  1
  17  8.7118644e-03 0.00e+00 3.19e-01  -3.8 4.04e-03    -  1.00e+00 1.00e+00f  1
  18  7.5197334e-03 0.00e+00 1.09e+03  -3.8 3.10e-03    -  1.00e+00 3.85e-01f  2
  19  5.1147446e-03 0.00e+00 3.19e-01  -3.8 2.40e-03    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  20  3.3178215e-03 0.00e+00 3.78e-01  -3.8 1.80e-03    -  1.00e+00 1.00e+00f  1
  21  2.2893616e-03 0.00e+00 3.40e-01  -3.8 1.03e-03    -  1.00e+00 1.00e+00f  1
  22  1.8283227e-03 0.00e+00 8.34e+04  -3.8 4.48e-03    -  1.00e+00 1.03e-01f  2
  23  4.5313818e-04 0.00e+00 8.82e-01  -3.8 1.38e-03    -  1.00e+00 1.00e+00f  1
  24  3.0653152e-04 0.00e+00 3.16e+05  -5.7 1.47e-04   4.0 2.32e-01 1.00e+00h  1
  25 -5.5285928e-04 0.00e+00 1.00e+00  -5.7 8.59e-04    -  1.00e+00 1.00e+00f  1
  26 -1.1223885e-02 1.40e-06 1.36e+04  -5.7 7.96e-01    -  1.00e+00 1.34e-02f  1
  27 -1.1107786e-02 1.36e-06 1.16e+03  -5.7 3.72e-03    -  1.00e+00 3.12e-02h  6
  28 -7.4290268e-03 4.00e-07 4.64e-01  -5.7 3.68e-03    -  1.00e+00 1.00e+00h  1
  29 -5.8687786e-03 1.92e-07 8.66e+02  -5.7 2.41e-03    -  1.00e+00 6.47e-01h  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  30 -5.8542172e-03 1.91e-07 3.64e+03  -5.7 1.86e-03    -  1.00e+00 7.81e-03f  8
  31 -3.9974633e-03 5.39e-08 3.37e-01  -5.7 1.86e-03    -  1.00e+00 1.00e+00h  1
  32 -2.9534118e-03 1.58e-08 5.23e+02  -5.7 1.12e-03    -  1.00e+00 9.29e-01h  1
  33 -2.9344353e-03 1.53e-08 1.06e+04  -5.7 6.07e-04    -  1.00e+00 3.12e-02f  6
  34 -2.3405157e-03 2.82e-09 1.76e-01  -5.7 5.94e-04    -  1.00e+00 1.00e+00h  1
  35 -2.1660008e-03 1.62e-10 3.83e-02  -5.7 1.75e-04    -  1.00e+00 1.00e+00h  1
  36 -2.1508378e-03 0.00e+00 6.75e-04  -5.7 1.52e-05    -  1.00e+00 1.00e+00h  1
  37 -2.1544258e-03 0.00e+00 6.10e-06  -8.6 3.59e-06    -  1.00e+00 1.00e+00f  1
  38 -2.1544329e-03 0.00e+00 7.78e-12  -9.0 7.05e-09    -  1.00e+00 1.00e+00h  1

Number of Iterations....: 38

                                   (scaled)                 (unscaled)
Objective...............:  -2.1544328718688505e-03   -2.1544328718688505e-03
Dual infeasibility......:   7.7819972688075723e-12    7.7819972688075723e-12
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   9.0909382315597638e-10    9.0909382315597638e-10
Overall NLP error.......:   2.5317795697139416e-12    9.0909382315597638e-10


Number of objective function evaluations             = 64
Number of objective gradient evaluations             = 39
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 64
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 39
Number of Lagrangian Hessian evaluations             = 38
Total seconds in IPOPT                               = 0.009

EXIT: Optimal Solution Found.
x1 =  -0.0021544328718688505
x2 =  5.676436642098907e-25


dual : Direction=IMPORT, Datatype=FLOAT
    Key : Value
     g1 : -35907.30543965533
     g2 : -35907.30543965533

Discussion

  1. Why so many iterations for such a simple problem?

  2. Why are the multipliers so negative?

  3. Are the constraints satisfied?

  4. Why is the solution not exactly \(x_1 = x_2 = 0\)?