6.4 Adjoint Sensitivity Notes on Numerical Computation¶

6.4.1 Notes on Numerical Solution¶

For applications that can be reduced to the solution of linear equations, the essential computations for the adjoint sensitivity method is to compute solutions to two systems of linear equations

\begin{align*} Ax & = b \\ A^Ty & = c \end{align*}

The most obvious why to proceed is to solve these two systems of equation independently of each other. The computational burden to compute that adjoint sensitivity is about the same as to compute the forward solution.

Alternatively, the solution to these equations can be found by solving the symmetric system

$$\underbrace{\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}}_P \underbrace{\begin{bmatrix} y \\ x \end{bmatrix}}_z = \underbrace{\begin{bmatrix} b \\ c \end{bmatrix}}_f$$

Iterative algorithms have been developed that simulateously solve both systems of equations with potentially less work then solving them individually. Quoting Lu and Darmofal (2003)

The primal and dual problems are solved simultaneously with essentially the same computational work as solving only one of the problems with the original QMR algorithm.

The purpose of this notebook is to compare the solution time of various iterative algorithms for the solution of sparse systems of linear equations. This naive comparison looks at the relative speedup of solving the combined block matrix equations to the independent solution of the forward and adjoint equations.

Note that these computations do not exploit the special structure of the joint forward/adjoint problem as suggested in the following references.

6.4.2 References¶

Golub, Gene H., Martin Stoll, and Andy Wathen. "Approximation of the scattering amplitude and linear systems." Electron. Trans. Numer. Anal 31.2008 (2008): 178-203. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.439.5358&rep=rep1&type=pdf
Lu, James, and David L. Darmofal. "A quasi-minimal residual method for simultaneous primal-dual solutions and superconvergent functional estimates." SIAM Journal on Scientific Computing 24.5 (2003): 1693-1709. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9176&rep=rep1&type=pdf
Pierce, Niles A., and Michael B. Giles. "Adjoint recovery of superconvergent functionals from PDE approximations." SIAM review 42.2 (2000): 247-264.ll https://epubs.siam.org/doi/pdf/10.1137/S0036144598349423?casa_token=gU8GJSWSZnYAAAAA:jOFJizYd-tUiPVK3jLkOZNRlasVAQ_tasIVaNq8RGOgSJ2MW85CvTnxzW90KnLDin7HW21gnRCQL
Stoll, Martin. Solving linear systems using the adjoint. Diss. Oxford University, 2008. See the motivating examples in Chapter 1, and the extensive discussion in Chapter 5. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.8774&rep=rep1&type=pdf
Stoll, Martin, and Andrew J. Wathen. "All-at-once solution of time-dependent PDE-constrained optimization problems." (2010).

6.4.3 Comparison of interative solution algorithms from scipy.sparse.linalg¶

6.4.3.1 Simple test problems¶

import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt

spmatrix = sparse.csr_matrix

# toy problem
def toy_problem():
    A = np.array([[1, 1, -1, 0], [2, 1, 0, -1], [0, 0, 1, 0 ], [0, 0, 0, 1]])
    b = np.array([0, 0, 80, 100]).reshape((4,1))
    c = np.array([170, 120, -50, -40]).reshape((4,1))
    return spmatrix(A), b, c

# random problem
def test_problem(N=25, density=0.2):
    A = scipy.sparse.random(N, N, density=density)
    b = np.random.rand(N, 1)
    c = np.random.rand(N, 1)
    return spmatrix(A), b, c

6.4.3.2 Create combined sparse block matrix¶

def combined(A, b, c):
    Z = spmatrix(np.zeros(A.shape))
    P = scipy.sparse.vstack(
            (scipy.sparse.hstack((Z, spmatrix(A))),
             scipy.sparse.hstack((spmatrix(A.T), Z))
        )
    )
    P = spmatrix(P)
    xs, ys = A.shape
    #plt.figure(figsize=(xs/40, ys/40))
    #plt.spy(P, ms=1)
    f = scipy.sparse.vstack((spmatrix(b), spmatrix(c))).toarray()
    return P, f

6.4.3.3 Solver comparisons¶

import time
import warnings
import gc
warnings.filterwarnings("ignore", category=DeprecationWarning) 

solvers = [
           scipy.sparse.linalg.spsolve,
           scipy.sparse.linalg.cg,
           #scipy.sparse.linalg.gmres,
           #scipy.sparse.linalg.lgmres,
           scipy.sparse.linalg.minres,
           scipy.sparse.linalg.qmr,
]

def solve_time(A, b, solver, repeats=10):
    gc.disable()
    tic = time.time()
    for k in range(repeats):
        solver(A, b)
    toc = time.time()
    gc.enable()
    return (toc-tic)/repeats

def compare_solvers(A, b, c, solvers=solvers, repeats=10):
    print(f"Matrix size = {A.shape}. Average of {repeats} runs.")
    P, f = combined(A, b, c)
    for solver in solvers:
        print(f"\nSolver: {solver.__name__:15s}")
        tf = solve_time(A, b, solver, repeats)
        print(f"\tforward solution:           {1000*tf:8.2f} ms")
        ta = solve_time(A.T, c, solver, repeats)
        print(f"\tadjoint solution:           {1000*ta:8.2f} ms")
        print(f"\tsum of forward and adjoint: {1000*(tf + ta):8.2f} ms")
        tc = solve_time(P, f, solver, repeats)
        print(f"\tcombined solution:          {1000*tc:8.2f} ms")
        print(f"\tspeedup:                    {(tf + ta)/tc:8.2f}x")
    
compare_solvers(*test_problem(500, density=0.05))

Matrix size = (500, 500). Average of 10 runs.

Solver: spsolve        
	forward solution:              37.20 ms
	adjoint solution:              31.19 ms
	sum of forward and adjoint:    68.39 ms
	combined solution:             85.01 ms
	speedup:                        0.80x

Solver: cg             
	forward solution:             643.98 ms
	adjoint solution:             546.12 ms
	sum of forward and adjoint:  1190.09 ms
	combined solution:            288.91 ms
	speedup:                        4.12x

Solver: minres         
	forward solution:              94.84 ms
	adjoint solution:              94.48 ms
	sum of forward and adjoint:   189.32 ms
	combined solution:            422.99 ms
	speedup:                        0.45x

Solver: qmr            
	forward solution:             974.82 ms
	adjoint solution:             967.60 ms
	sum of forward and adjoint:  1942.43 ms
	combined solution:            577.50 ms
	speedup:                        3.36x

6.4.3.4 Observations¶

A speed up of more than 2x suggests that solving the combined problem is actually faster than solving the individual forward or adjoint problems. There are a number of possible explanatins:

The symmetry of the combined problem is better conditioned?
Enhanced convergence?

A more careful analysis is required that would include factors such as initial conditioning, starting guess, and solution error tolerances.

6.4.4 PDE constrained test problem¶

(Under construction)

\begin{align*} \text{PDE:}\qquad & 0 = \kappa \frac{\partial^2 u}{\partial z^2} + f & 0 < z < L \\ \text{BC1:}\qquad & u(0) = u_0 \\ \text{BC2:}\qquad & u(L) = u_L \\ \end{align*}

with

%matplotlib inline

import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt

def heat_problem():

    kappa = 10.0
    L = 10.0
    Nz = 100

    z = np.linspace(0, L, Nz+1)
    dz = z[1] - z[0]

    def f(z):
        return 1.0

    # diagonals
    main = np.zeros(Nz+1)
    upper = np.zeros(Nz)
    lower = np.zeros(Nz)

    main[1:Nz] = 2*kappa/dz/dz
    upper[1:] = -kappa/dz/dz
    lower[:-1] = -kappa/dz/dz
    main[0] = 1.0
    main[-1] = 1.0

    b = np.array([f(z) for z in z])
    b[0] = 1.0
    b[-1] = 0.0

    c = np.zeros(Nz+1)
    c[:] = 1.0

    A = scipy.sparse.diags(
        diagonals = [main, lower, upper],
        offsets=[0, -1, 1], shape=(Nz+1, Nz+1), format="csr")

    u = scipy.sparse.linalg.spsolve(A, b)
    plt.plot(z, u)
    
    return A, b, c

A

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

< 6.3 Sensitivity Analysis with Adjoint Operators | Contents | 7.0 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond