6.6. Descent and Globalization#

import matplotlib.pyplot as plt
import numpy as np

6.6.1. Define Test Function and Derivatives#

Let’s get started by defining a test function we will use throughout the notebook.

Consider a scalar function \(f(x): \mathbb{R} \rightarrow \mathbb{R}\) to allow for easier visualization. Let

\[f(x) = 0.5 (x-1)^4 + (x+1)^3 - 10 x^2 + 5 x\]
\[f'(x) = 6 - 8 x - 3 x^2 + 2 x^3\]
\[f''(x) = -8 - 6 x + 6 x^2 \]
## Define f(x)
f = lambda x : 0.5*(x-1)**4 + (x+1)**3 - 10*x**2 + 5*x

## Define f'(x)
df = lambda x : 6 - 8*x - 3*x**2 + 2*x**3

## Define f''(x)
ddf = lambda x : -8 - 6*x + 6*x**2
plt.figure()
xplt = np.arange(-3,4,0.1)

fplt = f(xplt)
dfplt = df(xplt)
ddfplt = ddf(xplt)

plt.plot(xplt, fplt, label="f(x)", color="b", linestyle="-")
plt.plot(xplt, dfplt, label="f'(x)", color="r", linestyle="--")
plt.plot(xplt, ddfplt, label="f''(x)", color="g", linestyle=":")
plt.xlabel("x")
plt.legend()
plt.grid()
plt.show()
../../_images/1391adbb9b00ea6bd6a828a151af4f3bd01211a452f0d9f1e9da1a55ee8cb39b.png

6.6.2. Geometric Insights into Newton Steps#

6.6.2.1. Motivation#

We can interpret Newton-type methods for unconstrained optimization as root finding of \(\nabla f(x) = 0\).

At iteration \(k\), we assemble an approximation to \(f(x)\) using a Taylor series expansion:

\[f(x^k + p^k) \approx f(x^k) + \nabla f(x^k)^T p^k + \frac{1}{2} (p^k)^T (B^k) p^k\]

and solve for \(p^k\) such that \(f(x^k + p^k)=0\). This gives:

\[p^k = -(B^k)^{-1} \nabla f(x^k)\]

The choice of \(B^k\) determines the algorithm classification:

  • Pure Newton Method, \(B^k = \nabla^2 f(x^k)\)

  • Steepest Descent, \(B^k = \frac{1}{\alpha} I\), where scalar \(\alpha\) is sometimes known as the dampening factor

  • Levenberg-Marquart, \(B^k = \nabla^2 f(x^k) + \delta I\), where scalar \(\delta\) is chosen to ensure \(B^k\) is positive definite.

  • Broyden Methods, \(B^{k}\) is approximated using history of gradient evaluations, i.e., \(\nabla f(x^0), ... \nabla f(x^k)\). We will study the SR1 and BFGS formulas in this family of methods.

This section explores how choosing \(B^k\) impacts the shape of the approximation and calculated step.

6.6.2.2. Compute and Plot Steps#

Define a function that:

  • Computes the i. Newton, ii. Levenberg-Marquardt and iii. Steepest Descent Step for a given starting point \(x_0\)

  • Plots the step in terms of \(f(x)\) and \(f'(x)\)

def calc_step(x0, epsLM):
    
    # Evaluate f(x0), f'(x0) and f''(x0)
    f0 = f(x0)
    df0 = df(x0)
    ddf0 = ddf(x0)
    print("x0 = ",x0)
    print("f(x0) =",f0)
    print("f'(x0) =",df0)
    print("f''(x0) =",ddf0)
    
    
    ### Calculate steps
    
    # Newtwon Step
    xN = x0 - df0 / ddf0
    
    print("\n### Newton Step ###")
    print("xN = ",xN)
    print("pN = xN - x0 = ",xN - x0)
    f_xN = f(xN)
    print("f(xN) = ",f_xN)
    print("f(xN) - f(x0) = ",f_xN - f0)
    
    # Levenberg-Marquardt Step
    # Recall the eigenvalue of a 1x1 matrix is just that value
    dffLM = np.amax([ddf0, epsLM])
    xLM = x0 - df0 / dffLM
    
    print("\n### Levenberg-Marquardt Step ###")
    print("xLM = ",xLM)
    print("pLM = xLM - x0 = ",xN - x0)
    f_xLM = f(xLM)
    print("f(xLM) = ",f_xLM)
    print("f(xLM) - f(x0) = ",f_xLM - f0)
    
    
    # Steepest Descent Step
    xSD = x0 - df0 / 1
    
    print("\n### Steepest Descent Step ###")
    print("xSD = ",xSD)
    print("pSD = xSD - x0 = ",xSD - x0)
    f_xSD = f(xSD)
    print("f(xSD) = ",f_xSD)
    print("f(xSD) - f(x0) = ",f_xSD - f0)
    
    ### Plot Surrogates on x vs f(x)
    
    ### Plot f(x)
    plt.figure()
    plt.scatter(x0,f0,label="$x_0$",color="black")
    plt.plot(xplt, fplt, label="f(x)",color="purple")
    
    ### Plot approximation for Newton's method
    fN = lambda x : f0 + df0*(x - x0) + 0.5*ddf0*(x-x0)**2
    plt.plot(xplt, fN(xplt),label="Newton",linestyle="--",color="red")
    plt.scatter(xN, f(xN),color="red",marker="x")
    
    
    ### Plot approximation for LM
    fLM = lambda x : f0 + df0*(x-x0) + 0.5*dffLM*(x-x0)**2
    plt.plot(xplt, fLM(xplt),label="LM",linestyle="--",color="blue")
    plt.scatter(xLM, f(xLM),color="blue",marker="x")

    ### Plot approximation for SD
    fSD = lambda x : f0 + df0*(x-x0) + 0.5*(x-x0)**2
    plt.plot(xplt, fSD(xplt),label="Steepest",linestyle="--",color="green")
    plt.scatter(xSD, f(xSD),color="green",marker="x")
    
    #plt.plot([x0, xLM],[f0, f(xLM)],label="LM",color="green",marker="o")
    #plt.plot([x0,xSD],[f0,f(xSD)],label="Steepest",color="blue",marker="s")
    
    plt.xlim((-3.5,4.5))
    plt.ylim((-12.5,22.5))
    plt.xlabel("$x$")
    plt.ylabel("$f(x)$")
    plt.legend()
    plt.title("Function and Surrogates")
    plt.grid()
    plt.show()
    
    ### Plot Surrogates on x vs f(x)
    plt.figure()
    plt.scatter(x0,df0,label="$x_0$",color="black")
    plt.plot(xplt, dfplt, label="f'(x)",color="purple")
    
    
    ### Plot approximation for Newton's method
    dfN = lambda x : df0 + ddf0*(x-x0)
    plt.plot(xplt, dfN(xplt),label="Newton",linestyle="--",color="red")
    plt.scatter(xN, df(xN),color="red",marker="x")
    
    
    ### Plot approximation for LM
    dfLM = lambda x : df0 + dffLM*(x-x0)
    plt.plot(xplt, dfLM(xplt),label="LM",linestyle="--",color="blue")
    plt.scatter(xLM, df(xLM),color="blue",marker="x")

    ### Plot approximation for SD
    dfSD = lambda x : df0 + (x-x0)
    plt.plot(xplt, dfSD(xplt),label="Steepest",linestyle="--",color="green")
    plt.scatter(xSD, df(xSD),color="green",marker="x")
    
    
    plt.xlim((-3.5,4.5))
    plt.ylim((-50,50))
    plt.xlabel("$x$")
    plt.ylabel("$f'(x)$")
    plt.legend()
    plt.title("First Derivative and Surrogates")
    plt.grid()
    plt.show()

6.6.2.3. Consider \(x_0 = -3\)#

calc_step(-3,1E-2)
x0 =  -3
f(x0) = 15.0
f'(x0) = -51
f''(x0) = 64

### Newton Step ###
xN =  -2.203125
pN = xN - x0 =  0.796875
f(xN) =  -8.660857647657394
f(xN) - f(x0) =  -23.660857647657394

### Levenberg-Marquardt Step ###
xLM =  -2.203125
pLM = xLM - x0 =  0.796875
f(xLM) =  -8.660857647657394
f(xLM) - f(x0) =  -23.660857647657394

### Steepest Descent Step ###
xSD =  48.0
pSD = xSD - x0 =  51.0
f(xSD) =  2534689.5
f(xSD) - f(x0) =  2534674.5
../../_images/2e809baa7dcea7dab2824dccc54c7d66f9d3268760455fcb48c3b637555221ae.png ../../_images/2d317d2810e8c8940408010fa1469b77a834cdf44a27c6a04627813227e76601.png

Discussion

  • In this case, do we expect the Newton and LM steps to be the same? Explain.

  • Why does \(f(x)\) increase with a steepest descent (\(B^k = I\)) step? I thought the main idea was that because \(B^k\) is positive definite the step is in a descent direction!

6.6.2.4. Consider \(x_0 = 0\)#

calc_step(0,1E-2)
x0 =  0
f(x0) = 1.5
f'(x0) = 6
f''(x0) = -8

### Newton Step ###
xN =  0.75
pN = xN - x0 =  0.75
f(xN) =  3.486328125
f(xN) - f(x0) =  1.986328125

### Levenberg-Marquardt Step ###
xLM =  -600.0
pLM = xLM - x0 =  0.75
f(xLM) =  65014556401.5
f(xLM) - f(x0) =  65014556400.0

### Steepest Descent Step ###
xSD =  -6.0
pSD = xSD - x0 =  -6.0
f(xSD) =  685.5
f(xSD) - f(x0) =  684.0
../../_images/a32f95fc5cd1601c4152dcff720f07cda799bc8d9cc1c6cc8a022241730356e0.png ../../_images/356cd1accf2e6e9f2f0774e9f901d0a9e76af1816b3ceb9186251ffb303fdd4c.png

Discussion

  • Why does \(f(x)\) increase for all of the steps?

  • Explain Newton-type method in the context of root finding for \(\nabla f(x) = 0\) using the second plot.

6.6.2.5. Descent Properties#

book

6.6.4. Trust Regions#

Excerpts from Section 3.5 in Biegler (2010).

6.6.4.1. Main Idea and General Algorithm#

book

book

6.6.4.2. Trust Region Variations#

book

\(p^C\): Cauchy step

\(p^N\): Newton step

6.6.4.2.1. Levenburg-Marquardt#

book

6.6.4.2.2. Powell Dogleg#

book

book