This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.

6.4 Adjoint Sensitivity Notes on Numerical Computation

6.4.1 Notes on Numerical Solution

For applications that can be reduced to the solution of linear equations, the essential computations for the adjoint sensitivity method is to compute solutions to two systems of linear equations

\begin{align*} Ax & = b \\ A^Ty & = c \end{align*}

The most obvious why to proceed is to solve these two systems of equation independently of each other. The computational burden to compute that adjoint sensitivity is about the same as to compute the forward solution.

Alternatively, the solution to these equations can be found by solving the symmetric system

$$\underbrace{\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}}_P \underbrace{\begin{bmatrix} y \\ x \end{bmatrix}}_z = \underbrace{\begin{bmatrix} b \\ c \end{bmatrix}}_f$$

Iterative algorithms have been developed that simulateously solve both systems of equations with potentially less work then solving them individually. Quoting Lu and Darmofal (2003)

  • The primal and dual problems are solved simultaneously with essentially the same computational work as solving only one of the problems with the original QMR algorithm.

The purpose of this notebook is to compare the solution time of various iterative algorithms for the solution of sparse systems of linear equations. This naive comparison looks at the relative speedup of solving the combined block matrix equations to the independent solution of the forward and adjoint equations.

Note that these computations do not exploit the special structure of the joint forward/adjoint problem as suggested in the following references.

6.4.2 References

6.4.3 Comparison of interative solution algorithms from scipy.sparse.linalg

6.4.3.1 Simple test problems

In [46]:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt

spmatrix = sparse.csr_matrix

# toy problem
def toy_problem():
    A = np.array([[1, 1, -1, 0], [2, 1, 0, -1], [0, 0, 1, 0 ], [0, 0, 0, 1]])
    b = np.array([0, 0, 80, 100]).reshape((4,1))
    c = np.array([170, 120, -50, -40]).reshape((4,1))
    return spmatrix(A), b, c

# random problem
def test_problem(N=25, density=0.2):
    A = scipy.sparse.random(N, N, density=density)
    b = np.random.rand(N, 1)
    c = np.random.rand(N, 1)
    return spmatrix(A), b, c

6.4.3.2 Create combined sparse block matrix

In [53]:
def combined(A, b, c):
    Z = spmatrix(np.zeros(A.shape))
    P = scipy.sparse.vstack(
            (scipy.sparse.hstack((Z, spmatrix(A))),
             scipy.sparse.hstack((spmatrix(A.T), Z))
        )
    )
    P = spmatrix(P)
    xs, ys = A.shape
    #plt.figure(figsize=(xs/40, ys/40))
    #plt.spy(P, ms=1)
    f = scipy.sparse.vstack((spmatrix(b), spmatrix(c))).toarray()
    return P, f

6.4.3.3 Solver comparisons

In [73]:
import time
import warnings
import gc
warnings.filterwarnings("ignore", category=DeprecationWarning) 

solvers = [
           scipy.sparse.linalg.spsolve,
           scipy.sparse.linalg.cg,
           #scipy.sparse.linalg.gmres,
           #scipy.sparse.linalg.lgmres,
           scipy.sparse.linalg.minres,
           scipy.sparse.linalg.qmr,
]

def solve_time(A, b, solver, repeats=10):
    gc.disable()
    tic = time.time()
    for k in range(repeats):
        solver(A, b)
    toc = time.time()
    gc.enable()
    return (toc-tic)/repeats

def compare_solvers(A, b, c, solvers=solvers, repeats=10):
    print(f"Matrix size = {A.shape}. Average of {repeats} runs.")
    P, f = combined(A, b, c)
    for solver in solvers:
        print(f"\nSolver: {solver.__name__:15s}")
        tf = solve_time(A, b, solver, repeats)
        print(f"\tforward solution:           {1000*tf:8.2f} ms")
        ta = solve_time(A.T, c, solver, repeats)
        print(f"\tadjoint solution:           {1000*ta:8.2f} ms")
        print(f"\tsum of forward and adjoint: {1000*(tf + ta):8.2f} ms")
        tc = solve_time(P, f, solver, repeats)
        print(f"\tcombined solution:          {1000*tc:8.2f} ms")
        print(f"\tspeedup:                    {(tf + ta)/tc:8.2f}x")
    
compare_solvers(*test_problem(500, density=0.05))
Matrix size = (500, 500). Average of 10 runs.

Solver: spsolve        
	forward solution:              37.20 ms
	adjoint solution:              31.19 ms
	sum of forward and adjoint:    68.39 ms
	combined solution:             85.01 ms
	speedup:                        0.80x

Solver: cg             
	forward solution:             643.98 ms
	adjoint solution:             546.12 ms
	sum of forward and adjoint:  1190.09 ms
	combined solution:            288.91 ms
	speedup:                        4.12x

Solver: minres         
	forward solution:              94.84 ms
	adjoint solution:              94.48 ms
	sum of forward and adjoint:   189.32 ms
	combined solution:            422.99 ms
	speedup:                        0.45x

Solver: qmr            
	forward solution:             974.82 ms
	adjoint solution:             967.60 ms
	sum of forward and adjoint:  1942.43 ms
	combined solution:            577.50 ms
	speedup:                        3.36x

6.4.3.4 Observations

A speed up of more than 2x suggests that solving the combined problem is actually faster than solving the individual forward or adjoint problems. There are a number of possible explanatins:

  • The symmetry of the combined problem is better conditioned?
  • Enhanced convergence?

A more careful analysis is required that would include factors such as initial conditioning, starting guess, and solution error tolerances.

6.4.4 PDE constrained test problem

(Under construction)

\begin{align*} \text{PDE:}\qquad & 0 = \kappa \frac{\partial^2 u}{\partial z^2} + f & 0 < z < L \\ \text{BC1:}\qquad & u(0) = u_0 \\ \text{BC2:}\qquad & u(L) = u_L \\ \end{align*}

with

In [32]:
%matplotlib inline

import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt

def heat_problem():

    kappa = 10.0
    L = 10.0
    Nz = 100

    z = np.linspace(0, L, Nz+1)
    dz = z[1] - z[0]

    def f(z):
        return 1.0

    # diagonals
    main = np.zeros(Nz+1)
    upper = np.zeros(Nz)
    lower = np.zeros(Nz)

    main[1:Nz] = 2*kappa/dz/dz
    upper[1:] = -kappa/dz/dz
    lower[:-1] = -kappa/dz/dz
    main[0] = 1.0
    main[-1] = 1.0

    b = np.array([f(z) for z in z])
    b[0] = 1.0
    b[-1] = 0.0

    c = np.zeros(Nz+1)
    c[:] = 1.0

    A = scipy.sparse.diags(
        diagonals = [main, lower, upper],
        offsets=[0, -1, 1], shape=(Nz+1, Nz+1), format="csr")

    u = scipy.sparse.linalg.spsolve(A, b)
    plt.plot(z, u)
    
    return A, b, c
In [8]:
A
Out[8]:
array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])
In [ ]: