Unconstrained Optimality Conditions

6.3. Unconstrained Optimality Conditions#

import matplotlib.pyplot as plt
import numpy as np
from scipy import linalg
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

import sys
if "google.colab" in sys.modules:
    !wget "https://raw.githubusercontent.com/ndcbe/optimization/main/notebooks/helper.py"

else:
    sys.path.insert(0, '../')

from helper import set_plotting_style
set_plotting_style()

Handout for Theorems 2.17 and 2.18 from Biegler (2010)

6.3.1. Local and Global Solutions#

6.3.2. Necessary Conditions for Optimality#

6.3.3. Sufficient Conditions for Optimality#

6.3.4. Example 2.19 in Biegler (2010)#

Main Idea: Use the necessary and sufficient conditions to classify a candidate solution in the following example.

(6.1)#\[\begin{align} \min f(x) &= \alpha \exp(-\beta) \\ u &= x_1 - 0.8, \\ v &= x_2 - (a_1 + a_2u^2(1-u)^{\frac{1}{2}} + a_3u), \\ \alpha &= -b_1 + b_2u^2(1+u)^{\frac{1}{2}} + b_3u, \\ \beta &= c_1 v^2(1 - c_2v) / (1 + c_3u^2) \end{align}\]

with

(6.2)#\[\begin{align} a^T &= [0.3, 0.6, 0.2], \nonumber \\ b^T &= [5, 26, 3], \nonumber \\ c^T &= [40, 1, 10]. \nonumber \end{align}\]

The solution to this problem is given by \(x^* = [0.7395, 0.3144]\) with\(f(x^*) = -5.0893\).

At this solution, \(\nabla f(x^*) = 0\) and the Hessian is given by

(6.3)#\[\begin{align} \nabla^2 f(x^*) &= \begin{bmatrix} 77.012 & 108.334 \\ 108.334 & 392.767 \end{bmatrix}, \end{align}\]

which has eigenvalues \(\lambda = 43.417\) and \(\lambda = 426.362\).

6.3.4.1. The Test Function#

## Define Python function
def my_f(x,verbose=False):
    ''' Evaluate function given above at point x

    Inputs:
        x - vector with 2 elements
        
    Outputs:
        f - function value (scalar)
    '''
    # Constants
    a = np.array([0.3, 0.6, 0.2])
    b = np.array([5, 26, 3])
    c = np.array([40, 1, 10])
    
    # Intermediates. Recall Python indicies start at 0
    u = x[0] - 0.8
    s = np.sqrt(1-u)
    s2 = np.sqrt(1+u)
    v = x[1] -(a[0] + a[1]*u**2*s-a[2]*u)
    alpha = -b[0] + b[1]*u**2*s2+ b[2]*u # September 5, 2018: changed 's' to 's2'
    beta = c[0]*v**2*(1-c[1]*v)/(1+c[2]*u**2)
    f = alpha*np.exp(-beta)
    
    if verbose:
        print("##### my_f at x = ",x, "#####")
        print("u = ",u)
        print("sqrt(1-u) = ",s)
        print("sqrt(1+u) = ",s2)
        print("v = ",v)
        print("alpha = ",alpha)
        print("beta = ",beta)
        print("f(x) = ",f)
        print("##### Done. #####\n")
    
    return f

## Define candidate point and check against value reported in book
xtest = np.array([0.7395, 0.3144])
ftest = my_f(xtest)
print("f(x*) = ",my_f(xtest),"\n")

## Make 3D plot to visualize
x1 = np.arange(0.0,1.1,0.05)
x2 = np.arange(0.0,1.0,0.05)

# Create a matrix of all points to sample
X1, X2 = np.meshgrid(x1, x2)
n1 = len(x1)
n2 = len(x2)

# Notice the order. This was wrong in quadratic.ipynb and has been corrected in quadratic_update.ipynb
F = np.zeros([n2, n1])

xtemp = np.zeros(2)

# Evaluate f(x) over grid
for i in range(0,n1):
    xtemp[0] = x1[i]
    for j in range(0,n2):
        xtemp[1] = x2[j]
        F[j,i] = my_f(xtemp)

# Create 3D figure
ax = plt.figure().add_subplot(projection='3d')

# Plot f(x)
surf = ax.plot_surface(X1, X2, F, linewidth=0,cmap=cm.coolwarm,antialiased=True)

# Add candidate point
ax.scatter(xtest[0],xtest[1],ftest,s=50,color="green",depthshade=True)

# Draw vertical line through stationary point to help visualization
# Maximum value in array
fmax = np.amax(F)
fmin = np.amin(F)
ax.plot([xtest[0], xtest[0]], [xtest[1], xtest[1]], [fmin,fmax],color="green")

ax.set_xlabel('$x_{1}$')
ax.set_ylabel('$x_{2}$')
ax.set_zlabel('$f(x)$')

plt.tight_layout()
plt.show()

f(x*) =  -5.089256907976166

../../_images/69d3998d37e8676ca104cb2b62645d5f2b87860a25f2b24953e661b892244159.png

Discussion: Based on the 3D plot, is this function convex?

6.3.4.2. Finite Difference Gradient and Hessian#

We need the calculate \(\nabla f(x)\) and \(\nabla^2 f(x)\) to analyze a given point. Below are functions that implement a central finite difference. Another option it to analytical calculate the derivatives or use automatic differentiation.

## Gradient
def my_grad(x,verbose=True):
    '''
    Calculate gradient of function my_f using central difference formula
    
    Inputs:
        x - point for which to evaluate gradient
        
    Outputs:
        grad - gradient (vector)
        
    Assumptions:
        1. my_f is defined
        2. input x has the correct number of elements for my_f
        3. my_f is continous and differentiable    
    '''
    
    eps = 1E-6
    n = len(x)
    grad = np.zeros(n)

    if(verbose):
        print("***** my_grad at x = ",x,"*****")
    
    for i in range(0,n):
        
        # Create vector of zeros except eps in position i
        e = np.zeros(n)
        e[i] = eps
        
        # Finite difference formula
        my_f_plus = my_f(x + e)
        my_f_minus = my_f(x - e)
        
        # Diagnostics
        if(verbose):
            print("e[",i,"] = ",e)
            print("f(x + e[",i,"]) = ",my_f_plus)
            print("f(x - e[",i,"]) = ",my_f_minus)
        
        
        grad[i] = (my_f_plus - my_f_minus)/(2*eps)
    
    if(verbose):
        print("***** Done. ***** \n")
    
    return grad

grad = my_grad(xtest)
print(grad)

***** my_grad at x =  [0.7395 0.3144] *****
e[ 0 ] =  [1.e-06 0.e+00]
f(x + e[ 0 ]) =  -5.089256904035737
f(x - e[ 0 ]) =  -5.089256911839596
e[ 1 ] =  [0.e+00 1.e-06]
f(x + e[ 1 ]) =  -5.089256892701673
f(x - e[ 1 ]) =  -5.089256922857939
***** Done. ***** 

[0.00390193 0.01507813]

Discussion: According to the book, \(\nabla f(x_{test}) = 0\). Is the above answer reasonable?

Note: Before fixing the mistake in my_f, the gradient was [-0.28182046 0.01506107]. The version we discussed in class had the mistake in my_f.

## Hessian
def my_hes(x):
    '''
    Calculate gradient of function my_f using central difference formula and my_grad
    
    Inputs:
        x - point for which to evaluate gradient
        
    Outputs:
        H - Hessian (matrix)
        
    Assumptions:
        1. my_f and my_grad is defined
        2. input x has the correct number of elements for my_f
        3. my_f is continous and twice differentiable
        4. No mistakes in my_grad
    '''
    
    eps = 1E-6
    n = len(x)
    H = np.zeros([n,n])
    
    for i in range(0,n):
        # Create vector of zeros except eps in position i
        e = np.zeros(n)
        e[i] = eps
        
        # Evaluate gradient twice
        grad_plus = my_grad(x + e,verbose=False)
        grad_minus = my_grad(x - e,verbose=False)
        
        # Notice we are building the Hessian by column (or row)
        H[:,i] = (grad_plus - grad_minus)/(2*eps)
    
    '''
    Note: This is not very efficient. You can probably this of several 
    performance improvements. We will learn more practical ways to approximate 
    the Hessian for optimization algorithms in the next few lectures.
    '''
    
    return H

H = my_hes(xtest)
print(H)

[[ 76.99973992 108.3413359 ]
 [108.3413359  392.7191905 ]]

Primary Discussion: How does this compare to the answer given in the book? Which elements are very close to the given answer? Which have greater error? Why does this make sense?

Secondary Discussion: What happens if you set eps to different values in my_grad and my_hes? Is H still symmetric?

6.3.4.3. Analysis of Optimality Conditions#

Activity

Calculate the eigenvalues of H.
Is \(x_{test}\) a
- stationary point
- local maximizer
- strict global maximizer
- global maximizer
- local minimizer
- strict local minimizer
- global minimizer

Answer Yes, No, or Possibly for each bullet point.

# Add your solution here

[ 43.39788177+0.j 426.32104865+0.j]

6.3.5. Continuous Optimization Algorithms#

Gradient-based Algorithms
- Conjugate gradient methods
- Sequential quadratic programming methods
- Sequential linear programming methods
- Interior point methods
Derivative-free optimization
- Deterministic algorithms
  - Nelder–Mead method
- Stochastic algorithms
  - Simulated Annealing
  - Genetic Algorithms