This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.
Created by Nilay Kumar
The following resources were used for preparing this notebook:
1. Definition of Bayesian Linear Regression Models
Given a dataset $D$,
$D = [(x_{1},y_{1}), (x_{2},y_{2}),.......,{x_{n},y_{n}}], x_{i} \in R^{d}, y_{i} \in R$
One can simply describe as bayesian linear regression model on the dataset as
$Y_{i} = w^{T} x + \epsilon _{i}$
where, $w$ is a prior and can be approximated as $ w ~ N(0, \nu I)$ and $\epsilon$ is the noise defined using a gaussian with mean o and variance $\sigma^2$
2. $z = x^{T} w$ is a Gaussian Process!
$ \begin{pmatrix} z_{x_{1}} \\ z_{x_{2}} \\ \vdots \\ z_{x_{n}} \end{pmatrix} = $ $ \begin{pmatrix} x_{1}^T w \\ x_{2}^Tw \\ \vdots \\ x_{n}^Tw \end{pmatrix} = $$ \begin{pmatrix} \cdots & x_{1}^T & \cdots \\ \cdots & x_{2}^T & \cdots \\ \vdots \\ \cdots & x_{n}^T & \cdots \end{pmatrix} w $ $= Aw$
A is the design matrix of a linear regression. Since w, the prior for the bayesian regression model is normally distributed. Hence, by $Affine$ $property$ of multivariate gaussian $Aw$ is also a Gausiisn.
Mean and Covariance of $z = x^{T} w$
$\mu (z) = E(z_{z}) = E(x^{T} w)= x^{T} E(w) = 0$
$K(z,z') = cov(z_{x},z_{x}') = E(z_{x}z_{x}') - E(z_{x})E(z_{x}') = x^{T}E(ww^{T})x'^{T} = \nu x^{T}x'^{T}$
3. Estimationg conditional distribution for inference using a Gaussian Process Model
A gaussian process model can be defined as
$Y_{i} = Z_{x_{i}} + \epsilon _{i}$ , where
$Z = N(\mu, K)$, $\epsilon = N(0, \sigma ^2)$
Hence, $Y = N(\mu, K + \sigma ^2 I)$
Let $a = (1,\cdots \cdots, l)$ and $a = (l+1,\cdots \cdots, n)$ be the indices for the variables in training and test dataset
$ Y = \begin{pmatrix} Y_{a} \\ Y_{b} \\ \end{pmatrix} , $ $ Y_{a} = \begin{pmatrix} Y_{1} \\ \vdots \\ Y_{l} \\ \end{pmatrix} , $ $ Y_{b} = \begin{pmatrix} Y_{l+1} \\ \vdots \\ Y_{n} \\ \end{pmatrix} $
$ C = \begin{pmatrix} C_{aa} && C_{ab} \\ C_{ba} && C_{bb}\\ \end{pmatrix} , $ $C_{ab} = K_{ab}$, $C_{aa} = K_{aa} + \sigma ^2 I$
Here K is the kernel function used to define the covariance matrix.With little efforts, it can be easily shown that the conditional distribution $P(Y_{a} | Y_{b} = y_{b})$ is normally distributed.
$P(Y_{a} | Y_{b} = y_{b}) = N(n,D)$, where
$m = \mu _{a} + C_{ab}C_{bb}^{-1}(y_{b} - \mu _{b})$
$D = C_{aa} + C_{ab}C_{bb}^{-1}C_{ba} $
4. Summary for Gausian Process Regression
$Input: (x_{1},\cdots,x_{l},x_{l+1},\cdots,x_{n})$
$Response:(y_{1},\cdots,y_{l},y_{l+1},\cdots,y_{n})$
$GP: \hat{Z} = (Z_{x_{1}}, \cdots, Z_{x_{n}})$
Model
$Y_{i} = Z_{x_{i}} + \epsilon _{i}$ , where
$Z_{x} = GP(\mu, K)$, $\epsilon = N(0, \sigma ^2 I)$
Inference
$P(y_{a}|y_{b}) = N(y_{a}|m,D)$
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
$y = sin(4\pi x) + sin(7\pi x)$
"""
This section of code generates training data for fitting the Gaussian Process Regression model
Variables declared:
x_end: The length till which x's need to be sampled
num_train: It denotes the number of training points that needs to be extracted within the define interval
sigma_noise: It is the standard deviation of the normal distribution from which random numbers are sampled
"""
# sampling points along x
x_end = 2
num_train = 200
x = np.linspace(start=0, stop=x_end, num=num_train)
# Defining function f(x) = sin(4*pi*x) + sin(7*pi*x)
def f(x):
f = np.sin((4*np.pi)*x) + np.sin((7*np.pi)*x)
return(f)
# Stroring the functional evaluations at x sampled
f_x = f(x)
# Adding noise to the functional evaluations
sigma_noise = 0.4
error_train = np.random.normal(loc=0, scale=sigma_noise, size=num_train)
y_train = f_x + error_train
fig, ax = plt.subplots()
# Plot data points with added noise
sns.scatterplot(x=x, y=y_train, label='training data', ax=ax);
# Plot function.
sns.lineplot(x=x, y=f_x, color='red', label='f(x)', ax=ax);
ax.set(title='Training data for GPR model')
ax.legend(loc='upper right');
ax.set(xlabel='x', ylabel='y')
plt.show()
num_test = num_train + 300
x_test = np.linspace(start=0, stop=(x_end + 1), num=num_test)
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, RBF
"""
This section of the code rfeshapes the training data so as to be used by scikit-learns's
Gaussion Process Regressor object
Variables:
d: dimensionality of the problem (1 in this case)
"""
d = 1;
# Reshaping training dataset
X_train = x.reshape(num_train, d)
# Reshaping test dataset
X_test = x_test.reshape(num_test,d)
For this example problem we choose a squared exponential kernel function defined by
$K(x_{p},x_{q}) = \sigma _{f}exp(-\frac{1}{2l^{2}} ||x_{p} - x_{q} ||^2)$
where $\sigma _{f}$ and $l$ are the hyperparameters of the kernel function.
"""
The part of code is used to describe the type of kernel used and the gaussian process regression model
1. Definition of kernel:
kernel = ConstantKernel(constant_value=sigma_f, constant_value_bounds=(1e-2, 1e2)) \
* RBF(length_scale=l, length_scale_bounds=(1e-2, 1e2))
a. RBF kenel has been used for this problem.
b. Hyperparameter definitions:
(i) sigma_f: defines the amplitude of the kernel function
(ii) l: the locality parameter; used to define how far each point is able to interacts
c. The hyperparameter are best chosen so as to minimise the marginal log likelihood
2. Definition of gp model
gp = GaussianProcessRegressor(kernel=kernel, alpha=sigma_n**2, n_restarts_optimizer=10, )
a. Parameter definitions
(i) sigma_n: It is the value added to the diagonal elements of defined kernel matrix. Larger values mean a
larger value of noise in the data
(ii) The number of restarts of the optimizer for finding the kernel’s parameters which maximize the
log-marginal likelihood
3. Output of the code is the gp model and the predictions on the test data
"""
# Initial values of l, sigma_f and sigma_n needs to be defined.
# Other inputs are the training and test datasets that need to be input
def gpPrediction( l, sigma_f, sigma_n , X_train, y_train, X_test):
# Kernel definition
kernel = ConstantKernel(constant_value=sigma_f, constant_value_bounds=(1e-2, 1e2)) \
* RBF(length_scale=l, length_scale_bounds=(1e-2, 1e2))
# GP model
gp = GaussianProcessRegressor(kernel=kernel, alpha=sigma_n**2, n_restarts_optimizer=10, )
# Fitting in the gp model
gp.fit(X_train, y_train)
# Make the prediction on test set.
y_pred = gp.predict(X_test)
return y_pred, gp;
"""
l_init and sigma_f init are the initial values of the hyperparameters l and sigma_f of the kernel function
"""
l_init = 1
sigma_f_init = 3
sigma_n = 1
y_pred, gp = gpPrediction( l_init, sigma_f_init, sigma_n , X_train, y_train, X_test)
# Generate samples from posterior distribution.
y_hat_samples = gp.sample_y(X_test, n_samples=num_test)
# Compute the mean of the sample.
y_hat = np.apply_over_axes(func=np.mean, a=y_hat_samples, axes=1).squeeze()
# Compute the standard deviation of the sample.
y_hat_sd = np.apply_over_axes(func=np.std, a=y_hat_samples, axes=1).squeeze()
"""
This portion of the code is used to visualize the preductions made by gp model on the test datase.
Cridible intervals for the predictions has also been plotted and indicated through the tranparent green corridor
"""
fig, ax = plt.subplots(figsize=(15, 8))
# Plotting the training data.
sns.scatterplot(x=x, y=y_train, label='training data', ax=ax);
# Plot the functional evaluation
sns.lineplot(x=x_test, y=f(x_test), color='red', label='f(x)', ax=ax)
# Plot corridor.
ax.fill_between(x=x_test, y1=(y_hat - 2*y_hat_sd), y2=(y_hat + 2*y_hat_sd), color='green',alpha=0.3, label='Credible Interval')
# Plot prediction.
sns.lineplot(x=x_test, y=y_pred, color='green', label='pred')
# Labeling axes
ax.set(title='Gaussian Process Regression')
ax.legend(loc='lower left');
ax.set(xlabel='x', ylabel='')
plt.show()