This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.
For applications that can be reduced to the solution of linear equations, the essential computations for the adjoint sensitivity method is to compute solutions to two systems of linear equations
\begin{align*} Ax & = b \\ A^Ty & = c \end{align*}The most obvious why to proceed is to solve these two systems of equation independently of each other. The computational burden to compute that adjoint sensitivity is about the same as to compute the forward solution.
Alternatively, the solution to these equations can be found by solving the symmetric system
$$\underbrace{\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}}_P \underbrace{\begin{bmatrix} y \\ x \end{bmatrix}}_z = \underbrace{\begin{bmatrix} b \\ c \end{bmatrix}}_f$$Iterative algorithms have been developed that simulateously solve both systems of equations with potentially less work then solving them individually. Quoting Lu and Darmofal (2003)
- The primal and dual problems are solved simultaneously with essentially the same computational work as solving only one of the problems with the original QMR algorithm.
The purpose of this notebook is to compare the solution time of various iterative algorithms for the solution of sparse systems of linear equations. This naive comparison looks at the relative speedup of solving the combined block matrix equations to the independent solution of the forward and adjoint equations.
Note that these computations do not exploit the special structure of the joint forward/adjoint problem as suggested in the following references.
Golub, Gene H., Martin Stoll, and Andy Wathen. "Approximation of the scattering amplitude and linear systems." Electron. Trans. Numer. Anal 31.2008 (2008): 178-203. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.439.5358&rep=rep1&type=pdf
Lu, James, and David L. Darmofal. "A quasi-minimal residual method for simultaneous primal-dual solutions and superconvergent functional estimates." SIAM Journal on Scientific Computing 24.5 (2003): 1693-1709. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9176&rep=rep1&type=pdf
Pierce, Niles A., and Michael B. Giles. "Adjoint recovery of superconvergent functionals from PDE approximations." SIAM review 42.2 (2000): 247-264.ll https://epubs.siam.org/doi/pdf/10.1137/S0036144598349423?casa_token=gU8GJSWSZnYAAAAA:jOFJizYd-tUiPVK3jLkOZNRlasVAQ_tasIVaNq8RGOgSJ2MW85CvTnxzW90KnLDin7HW21gnRCQL
Stoll, Martin. Solving linear systems using the adjoint. Diss. Oxford University, 2008. See the motivating examples in Chapter 1, and the extensive discussion in Chapter 5. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.8774&rep=rep1&type=pdf
Stoll, Martin, and Andrew J. Wathen. "All-at-once solution of time-dependent PDE-constrained optimization problems." (2010).
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt
spmatrix = sparse.csr_matrix
# toy problem
def toy_problem():
A = np.array([[1, 1, -1, 0], [2, 1, 0, -1], [0, 0, 1, 0 ], [0, 0, 0, 1]])
b = np.array([0, 0, 80, 100]).reshape((4,1))
c = np.array([170, 120, -50, -40]).reshape((4,1))
return spmatrix(A), b, c
# random problem
def test_problem(N=25, density=0.2):
A = scipy.sparse.random(N, N, density=density)
b = np.random.rand(N, 1)
c = np.random.rand(N, 1)
return spmatrix(A), b, c
def combined(A, b, c):
Z = spmatrix(np.zeros(A.shape))
P = scipy.sparse.vstack(
(scipy.sparse.hstack((Z, spmatrix(A))),
scipy.sparse.hstack((spmatrix(A.T), Z))
)
)
P = spmatrix(P)
xs, ys = A.shape
#plt.figure(figsize=(xs/40, ys/40))
#plt.spy(P, ms=1)
f = scipy.sparse.vstack((spmatrix(b), spmatrix(c))).toarray()
return P, f
import time
import warnings
import gc
warnings.filterwarnings("ignore", category=DeprecationWarning)
solvers = [
scipy.sparse.linalg.spsolve,
scipy.sparse.linalg.cg,
#scipy.sparse.linalg.gmres,
#scipy.sparse.linalg.lgmres,
scipy.sparse.linalg.minres,
scipy.sparse.linalg.qmr,
]
def solve_time(A, b, solver, repeats=10):
gc.disable()
tic = time.time()
for k in range(repeats):
solver(A, b)
toc = time.time()
gc.enable()
return (toc-tic)/repeats
def compare_solvers(A, b, c, solvers=solvers, repeats=10):
print(f"Matrix size = {A.shape}. Average of {repeats} runs.")
P, f = combined(A, b, c)
for solver in solvers:
print(f"\nSolver: {solver.__name__:15s}")
tf = solve_time(A, b, solver, repeats)
print(f"\tforward solution: {1000*tf:8.2f} ms")
ta = solve_time(A.T, c, solver, repeats)
print(f"\tadjoint solution: {1000*ta:8.2f} ms")
print(f"\tsum of forward and adjoint: {1000*(tf + ta):8.2f} ms")
tc = solve_time(P, f, solver, repeats)
print(f"\tcombined solution: {1000*tc:8.2f} ms")
print(f"\tspeedup: {(tf + ta)/tc:8.2f}x")
compare_solvers(*test_problem(500, density=0.05))
A speed up of more than 2x suggests that solving the combined problem is actually faster than solving the individual forward or adjoint problems. There are a number of possible explanatins:
A more careful analysis is required that would include factors such as initial conditioning, starting guess, and solution error tolerances.
(Under construction)
with
%matplotlib inline
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
import matplotlib.pyplot as plt
def heat_problem():
kappa = 10.0
L = 10.0
Nz = 100
z = np.linspace(0, L, Nz+1)
dz = z[1] - z[0]
def f(z):
return 1.0
# diagonals
main = np.zeros(Nz+1)
upper = np.zeros(Nz)
lower = np.zeros(Nz)
main[1:Nz] = 2*kappa/dz/dz
upper[1:] = -kappa/dz/dz
lower[:-1] = -kappa/dz/dz
main[0] = 1.0
main[-1] = 1.0
b = np.array([f(z) for z in z])
b[0] = 1.0
b[-1] = 0.0
c = np.zeros(Nz+1)
c[:] = 1.0
A = scipy.sparse.diags(
diagonals = [main, lower, upper],
offsets=[0, -1, 1], shape=(Nz+1, Nz+1), format="csr")
u = scipy.sparse.linalg.spsolve(A, b)
plt.plot(z, u)
return A, b, c
A