Problem Set 6#
CBE 60258, University of Notre Dame. © Prof. Alexander Dowling, 2023
You may not distribution homework solutions without written permissions from Prof. Alexander Dowling.
Your Name and Email:
Motivation: Nonlinear Regression for Adsorptive Nanoporous Membranes#
Contributors: Elvis A. Eugene and Alexander W. Dowling
Prof. William Phillip has been developing novel adsorptive nanoporous membranes to remove heavy metals from water. He has asked for your help analyzing data to design a water treatment system.
Assume the amount of contaminant (e.g., Pb) that adsorbs to the water treatment membranes can be modeled with the Langmuir isotherm:
where \(c\) is the concentration of contaminant in the bulk fluid (water), \(q\) is the loading of contaminant on the sorbent (membrane), \(K\) is the binding affinity and \(Q\) is the capacity.
Prof. Phillip’s lab has provided us with the following data:
Contaminant Concentration in Bulk (mM) |
Contaminant Loading on Sorbent (mmol/g) |
---|---|
1 |
0.5 |
2.5 |
0.9 |
5 |
1 |
10 |
1.33 |
20 |
1.3 |
40 |
1.4 |
In this case study, you will compare three different regression strategies:
Linear Regression after Transformation
Linear Regression after a Different Transformation
Nonlinear Regression
For each strategy, we will follow the same basic steps:
Set up the problem, including deriving the transformation (if applicable)
Perform the regression calculation
Plot the experimental data (with transformed variables if applicable)
Plot the experimental data versus fitted model (\(q\) vs \(c\))
Graphically inspect the residuals
Compute the covariance matrix for the fitted parameters, transform to the original parameters (if applicable)
# load libraries
import numpy as np
from scipy import optimize, stats, integrate
import matplotlib.pyplot as plt
import pandas as pd
# define the data
c = np.array([1, 2.5, 5, 10, 20, 40]);
q = np.array([0.5, 0.9, 1.0, 1.33, 1.3, 1.4]);
# number of observations
n = len(c);
# number of fitted parameters
p = 2;
Getting Started#
Determine the units of \(K\) and \(Q\). Hint: \(1\) in the denominator of the Langmuir isotherm is dimensionless.
Units for \(K\):
Units for \(Q\):
Before we run any regression analysis, let’s first plot this data. This is a best practice for data science – first visualize.
Paramter estimation using a transformation and linear regression#
Transformation#
We will start our analysis by performing a transformation; this is neccessary to apply linear regression. We start with the Languir isotherm:
With a little bit of algebra, we obtain:
Write down the mathematical steps to go from the original isotherm \(q=~...\) to the transformed isotherm \(c/q =~...~\) on paper. Turn this in on Gradescope.
Linear Regression#
You may be asking yourselves, how is that model linear?!? We just need to define our variables and parameters carefully:
Recall \(x\) is the independent variable, \(y\) is the observed variable, \(\beta_0\) is the intercept, and \(\beta_1\) is the slope. With this transformation, we can compute the regression coefficients:
# Code for parameter estimation using linear regression
x = c;
y = c/q;
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
print("slope =",slope)
print("intercept =",intercept)
print("R-squared:",r_value**2)
Alright, we now have estimates for \(\beta_0\) and \(\beta_1\). But we really care about \(K\) and \(Q\). Let’s reverse the transformation.
## Reverse transformation to obtain Q and K and display results
Qlin1 = 1/slope;
Klin1 = slope/intercept;
print("\nK (linear regression) = {0:0.1f} l/mmol".format(Klin1));
print("Q (linear regression) = {0:0.1f} mmol/g_membrane".format(Qlin1));
Plot Fit and Residuals for Transformed Model#
Now we want to visualize the quality of the fit. Because we plan to repeat the regression analysis two more times, we invested in writing functions for the visualization. This means we just need to declare the plotting procedures once and we can reuse them. This is a good programming practice; use functions to maximize code reuse.
# define a function to plot linearized isotherm
def plot_linearized_isotherm(x,y,slope,intercept,title):
'''
function to plot the linearized model and residuals
Arguments:
x: transformed independent variable (float vector)
y: transformed independent variable (float vector)
slope: fitted parameter 1 (float)
intercept: fitted parameter 2 (float)
title: keyword in plot title (string). Either 'lin1' or 'lin2'
Returns:
nothing
'''
assert (title == 'lin1') or (title == 'lin2'), "Argument title must be 'lin1' or 'lin2'"
## evaluate the best fit line
# determine the max value of x
xmax = np.amax(x);
# define range of x to evaluate the model
x_span = np.linspace(0,1.2*xmax,50);
# evaluate the model
y_hat = slope*x_span + intercept
## plot the best fit line
plt.figure()
plt.plot(x,y,'ro',label = 'data');
plt.plot(x_span, y_hat, 'k', label='best fit')
if (title == 'lin1'):
plt.xlabel("c (mM)")
plt.ylabel("c/q (grams membrane / L)")
elif (title == 'lin2'):
plt.xlabel("1/c (l/mmol)")
plt.ylabel("1/q (grams membrane / mmol)")
plt.grid(True)
plt.title('Transformed Linear Langmuir Isotherm')
plt.legend()
plt.show()
## calculate the residuals
r = y - (x*slope + intercept);
## plot the residuals versus concentration
plt.figure()
plt.plot(c,r,'ro')
if (title == 'lin1'):
plt.ylabel('Residual (grams membrane / L)')
elif (title == 'lin2'):
plt.ylabel('Residual (grams membrane / mmol)')
plt.xlabel('Equilibrium concentration (mM)')
plt.grid(True)
plt.title('Transformed Residuals')
plt.show()
# End: define a function to linearized isotherm
# Plot linearized isotherm versus data
# The last argument toggles between the two transformations we'll consider
plot_linearized_isotherm(x,y,slope,intercept,'lin1')
Write a few bullet points to interpret the scatter plot:
Fill in
Fill in
Write a few bullet points interpreting the residual plot:
Fill in
Fill in
# Ignore this code block
# Add your solution here
Plot Fit and Residuals for Original Model#
Next, we will define the second function which plots the fitted Langmuir isotherm without transformation and plots the non-transformed residuals. Notice the function can be directly reused for each regression strategy.
# define a function to plot Langmuir isotherm
def plot_original_isotherm(c,q,K,Q,title):
'''
function to plot fitted Langmuir isotherm and data
Arguments:
c: independent variable i.e. concentration - experimental data [mM] (float vector)
q: membrane loading - experimental data [l/gram membrane] (float vector)
K: fitted parameter [1/mmol] (float scalar)
Q: fitted parameter [mol/gram membrane] (float scalar)
title: name of regression to add to title (string)
Returns:
nothing
'''
## evaluate the model
# determine maximum value of c
cmax = np.amax(c);
# define range of x to evaluate the model
c_span = np.linspace(0,1.2*cmax,50);
# evaluate the model
q_hat = Q*K*c_span / (1 + K*c_span)
# plot the best fit model
plt.figure()
plt.title('Langmuir Isotherm')
plt.plot(c_span,q_hat,'b-',label="Fitted Model")
plt.plot(c,q,'r.',label="Experimental Data")
plt.legend()
plt.grid(True)
plt.xlabel("Concentration (mM)")
plt.ylabel("Loading (mmol / grams membrane)")
plt.show()
## calculate the residuals
r = q - Q*K*c/(1+K*c)
## plot the residuals
plt.figure()
plt.plot(c,r,'ro')
plt.ylabel('Residual (mmol / grams membrane)')
plt.xlabel('Equilibrium concentration (mM)')
plt.grid(True)
plt.title('Residuals from {0} regression'.format(title))
plt.show()
# End: define a function to plot Langmuir isotherm
# Plot Langmuir isotherm
plot_original_isotherm(c,q,Klin1,Qlin1,'first transformed')
Write a few bullet points to interpret the scatter plot:
Fill in
Fill in
Write a few bullet points to interpret the residual plot:
Fill in
Fill in
# Ignore this code block
# Add your solution here
Compute Covariance Matricies#
Next, we will compute the covariance matrix for the fitted parameters \(\beta_0\) and \(\beta_1\).
We will start by computing the variance of the residuals. Recall the formula is:
Here \(n\) is the number of observations, \(p\) is the number of fitted parameters, and \(r_i\) is the residual for observation \(i\). An interesting property of linear regression is that the mean of the residuals is always zero.
We can write the above formula with linear algebra in one line of code:
# compute the residuals (using the transformed variables)
r = y - (x*slope + intercept);
# variance of residuals
var_r = r @ r / (n-p)
print("Variance of residuals =",var_r," (g / L)^2")
Recall from the previous class, stats.linregress
does not directly compute the covariance of the linear regression parameters. Instead, we need to write the regression problem in matrix notation:
Observations: \(\vec{y} = [y_1, y_2, ..., y_n]^T\)
Fitted Parameters: \(\vec{\beta} = [\beta_0, \beta_1, ..., \beta_{m}]^T\)
Data / Feature Matrix:
For our transformed membrane, the feature matrix is:
# feature matrix, a.k.a. data matrix, a.k.a. predictor matrix
X = np.ones((n,2))
X[:,1] = c
print("X=\n",X)
Next, we will compute the covariance matrix of the fitted parameters:
# assemble covariance matrix for regression parameters
cov_b = var_r * np.linalg.inv(X.transpose() @ X)
Finally, we will propagate the covariance matrix of the transformed fitted parameters, \(\mathbf{\Sigma_{\vec{\beta}}}\), to determine the covariance matrix of the desired isotherm parameters, \(\mathbf{\Sigma_{\vec{\theta}}}\).
Recall of model transformation:
We will define the vector of model parameters as \(\vec{\theta} = [K, Q]^T\).
In anticipation of applying the general nonlinear error propagation formula, we will calculate the following partial derivatives:
We can then assemble these gradients into the Jacobian matrix:
# Assemble gradient vectors for K and Q
gradK = np.array([-slope / intercept**2, 1/intercept])
gradQ = np.array([0, -1/slope**2])
print("\nGradient of K:",gradK)
print("Gradient of Q:",gradQ)
# Assemble gradient vector
jac = np.stack((gradK, gradQ));
print("\nJacobian Matrix:\n",jac)
For this problem, calculating the partial derivates was very straightforward; but for more complex expression, we can use a finite difference approximation. We’ll see this later in the case study.
We can now apply the general nonlinear error propagation formula:
# Apply nonlinear error propagation formula
cov_theta_lin1 = jac @ cov_b @ jac.transpose()
print("\nCovariance of Original Model Parameters (K,Q):\n",cov_theta_lin1)
Paramter estimation using an alternate transformation and linear regression#
We start with the Languir isotherm:
With a little bit of algebra, we obtain:
Model Transformation#
Identify \(x\), \(y\), \(\beta_0\), and \(\beta_1\) in the alternate transformed model. Show your work on paper and turn in via Gradescope.
Next, determine the new formulas to compute \(K\) and \(Q\) from the new fitted parameters \(\beta_0\) and \(\beta_1\). Show your work on paper and turn in via Gradescope.
Also, typeset the final answer below:
Trust us, writing this down in the notebook will help.
Linear Regression#
Use stats.linregress
to compute the best fit line. To maximize code reuse, save the results in variables slope
and intercept
.
# Add your solution here
Compute \(K\) and \(Q\) from new model. Store your answers in the variances Klin2
and Qlin2
.