This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.

< 1.0 Introduction to Uncertainty Quantification and Predictive Sciences | Contents | 2.0 Probability-and-Statistics-Preliminaries

1.1 Our First Notebook¶

Created by A. Dowling (adowling@nd.edu)

## import all needed Python libraries here
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
import pandas as pd

1.1.1 Python Tutorials¶

We will use primarily Python this semester. Here are some getting started tutorials from Prof. Kantor:

These tutorials come from three separates classes; our apologies for any repetition.

Prof. McClarren has provided supplemental examples in both Python and R. Although it is possible to run R on Google Colab, the steps are a little clunky. If possible, let's try to find popular Python libraries to replace the R examples if possible.

1.1.2 Markdown Tutorial¶

Markdown cells allow us to write text in notebooks. We can also use LaTeX to typeset questions:

$$A = \pi r^2$$

In a markdown cell, starting a line with...

# creates a title. Please only use this at the top of the notebook.

## creates a section title. We recommend organizing your example into 2 to 5 sections.

### creates a subsection title.

#### createa a subsubsection title.

The package nbpages will automatically add numbers to sections, subsections, etc. when publishing to the HTML website. Please do not start titles with a number. Prof. Dowling will run nbpages.

For example, nbpages will automatially added the following section numbers:

# 1.1

## 1.1.1

### 1.1.1.1

#### 1.1.1.1.1

The extra 1. is because this notebook is for chapter 1, subchapter 1.1.

Recommended tutorial: Markdown in Jupyter Notebook

Please avoid adding HTML code to your markdown cells. HTML code can cause formatting issues with nbpages.

1.1.3 Installing Packages in Colab¶

Let's say you want to use a niche Python package. Not a problem. We can add code to automatically detect if a package is available and install it if needed.

## Tip: Please put code like this at the top of your notebook.
# We want all of the module/package installations to start up front

# Packages to interface with your operating system or Colab
import shutil
import sys
import os.path

# Check if Pyomo is available. If not, install it.
if not shutil.which("pyomo"):
    # Tip: "!pip" means run the 'pip' command outside the notebook.ß
    !pip install -q pyomo
    assert(shutil.which("pyomo"))
else:
    print("Pyomo found! No need to install.")
    
# Check in Ipopt (solver) is available. If not, install it.
if not (shutil.which("ipopt") or os.path.isfile("ipopt")):
    # Check if we are running on Google Colab
    if "google.colab" in sys.modules:
        !wget -N -q "https://ampl.com/dl/open/ipopt/ipopt-linux64.zip"
        !unzip -o -q ipopt-linux64
    # Otherwise, try conda
    else:
        try:
            !conda install -c conda-forge ipopt 
        except:
            pass
else:
    print("Ipopt found! No need to install.")

# Verify Ipopt is now available
assert(shutil.which("ipopt") or os.path.isfile("ipopt"))

# Now import the library Pyomo
from pyomo.environ import *
from pyomo.dae import *

Pyomo found! No need to install.
Ipopt found! No need to install.

1.1.4 Advection-Diffusion-Reaction Equation and Heat Transfer Example¶

Section 1.4 of McClarren (2018) introduces the advection-diffusion-reaction equation as a model problem:

\begin{equation} \frac{\partial u}{\partial t} + \mathbf{v} \cdot \nabla u = \nabla \cdot \omega \nabla u + R(u), \quad \mathbf{r} \in V, \quad t > 0 \end{equation}

with boundary and initial conditions:

\begin{equation} u(\mathbf{r},t) = g(\mathbf{r},t) \quad \mathbf{r} \in \partial V, \qquad u(\mathbf{r},0) = f(\mathbf{r}) \end{equation}

Here $u(\mathbf{r},t)$ is a physical property such as temperature, concentration, etc., distributed over spacial domain $V$ and time. $\mathbf{v}$ is the speed of advenction in each direction of $u$, $\omega$ is a diffusion coefficient, and $R(u)$ is a reaction function.

Let's look at heat transfer as a concrete example. For this case $u$ is temperature $T$ and there is no reaction thus $R(u) = 0$. The term $\mathbf{v} \cdot \nabla$ is for advection and the term $\nabla \cdot \omega \nabla$ is for diffusion.

The code below was originally developed by Prof. Kantor.

1.1.4.1 Rescaling the heat equation¶

Transport of heat in a solid is described by the familiar thermal diffusion model (no advection)

\begin{align*} \rho C_p\frac{\partial T}{\partial t} & = \nabla\cdot(k\nabla T) \end{align*}

We'll assume the thermal conductivity $k$ is a constant, and define thermal diffusivity in the conventional way

\begin{align*} \alpha & = \frac{k}{\rho C_p} \end{align*}

(Notice $\alpha = \omega$ in the general advection-diffusion-reaction equation.) We will further assume symmetry with respect to all spatial coordinates except $r$ where $r$ extends from $-R$ to $+R$. The boundary conditions are

\begin{align*} T(t,R) & = T_{\infty} & \forall t > 0 \\ \nabla T(t,0) & = 0 & \forall t \geq 0 \end{align*}

where we have assumed symmetry with respect to $r$ and uniform initial conditions $T(0, r) = T_0$ for all $0 \leq r \leq R$. Following standard scaling procedures, we introduce the dimensionless variables

\begin{align*} T' & = \frac{T - T_0}{T_\infty - T_0} \\ r' & = \frac{r}{R} \\ t' & = t \frac{\alpha}{R^2} \end{align*}

1.1.4.2 Dimensionless model¶

Under these conditions the problem reduces to

\begin{align*} \frac{\partial T'}{\partial t'} & = \nabla^2 T' \end{align*}

with auxiliary conditions

\begin{align*} T'(0, r') & = 0 & \forall 0 \leq r' \leq 1\\ T'(t', 1) & = 1 & \forall t' > 0\\ \nabla T'(t', 0) & = 0 & \forall t' \geq 0 \\ \end{align*}

which we can specialize to specific geometries.

1.1.4.3 Planar coordinates¶

Suppressing the prime notation, for a slab geometry the model specializes to

\begin{align*} \frac{\partial T}{\partial t} & = \frac{\partial^2 T}{\partial r^2} \end{align*}

with auxiliary conditions

\begin{align*} T(0, r) & = 0 & \forall 0 \leq r \leq 1 \\ T(t, 1) & = 1 & \forall t > 0\\ \frac{\partial T}{\partial r} (t, 0) & = 0 & \forall t \geq 0 \\ \end{align*}

1.1.4.3.1 Create the model¶

We will not use Pyomo extensively for the book study; when you ready the code below, just notice the equations match the equations above.

# Create a Pyomo model
m = ConcreteModel()

# Refine the dimensions "r" (position) and "t" (time)
m.r = ContinuousSet(bounds=(0,1))
m.t = ContinuousSet(bounds=(0,2))

# Create the variable T for temperature
m.T = Var(m.t, m.r)

# Define the derivative variables
m.dTdt   = DerivativeVar(m.T, wrt=m.t)
m.dTdr   = DerivativeVar(m.T, wrt=m.r)
m.d2Tdr2 = DerivativeVar(m.T, wrt=(m.r, m.r))

# Define the governing partial differential equation
@m.Constraint(m.t, m.r)
def pde(m, t, r):
    if t == 0:
        return Constraint.Skip
    if r == 0 or r == 1:
        return Constraint.Skip
    return m.dTdt[t,r] == m.d2Tdr2[t,r]

# Define a "dummy" (constant) objective function
m.obj = Objective(expr=1)

# Define the initial condition
m.ic  = Constraint(m.r, rule=lambda m, r:    m.T[0,r] == 0 if r > 0 and r < 1 else Constraint.Skip)

# Define the first boundary condition
m.bc1 = Constraint(m.t, rule=lambda m, t:    m.T[t,1] == 1)

# Define the second boundary condition
m.bc2 = Constraint(m.t, rule=lambda m, t: m.dTdr[t,0] == 0)

1.1.4.3.2 Solve the model¶

# Apply finite difference approximation in the space domain (r)
TransformationFactory('dae.finite_difference').apply_to(m, nfe=50, scheme='FORWARD', wrt=m.r)

# Apply finite difference approximation in the time domain (t)
TransformationFactory('dae.finite_difference').apply_to(m, nfe=50, scheme='FORWARD', wrt=m.t)

# Solve with Ipopt
SolverFactory('ipopt').solve(m, tee=True).write()

Ipopt 3.12.12: 

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.12, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:    30347
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:    10299
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:    10200
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  1.0000000e+00 1.00e+00 0.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  1.0000000e+00 1.48e-12 2.50e+01  -1.7 2.50e+03  -2.0 1.00e+00 1.00e+00h  1
   2  1.0000000e+00 1.56e-12 8.30e-11  -1.7 2.26e-11  -2.5 1.00e+00 1.00e+00h  1

Number of Iterations....: 2

                                   (scaled)                 (unscaled)
Objective...............:   1.0000000000000000e+00    1.0000000000000000e+00
Dual infeasibility......:   8.3006480378414195e-11    8.3006480378414195e-11
Constraint violation....:   3.1190328098062865e-14    1.5595164049031494e-12
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   8.3006480378414195e-11    8.3006480378414195e-11


Number of objective function evaluations             = 3
Number of objective gradient evaluations             = 3
Number of equality constraint evaluations            = 3
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 3
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 2
Total CPU secs in IPOPT (w/o function evaluations)   =      1.393
Total CPU secs in NLP function evaluations           =      0.009

EXIT: Optimal Solution Found.
# ==========================================================
# = Solver Results                                         =
# ==========================================================
# ----------------------------------------------------------
#   Problem Information
# ----------------------------------------------------------
Problem: 
- Lower bound: -inf
  Upper bound: inf
  Number of objectives: 1
  Number of constraints: 10200
  Number of variables: 10299
  Sense: unknown
# ----------------------------------------------------------
#   Solver Information
# ----------------------------------------------------------
Solver: 
- Status: ok
  Message: Ipopt 3.12.12\x3a Optimal Solution Found
  Termination condition: optimal
  Id: 0
  Error rc: 0
  Time: 0.45342111587524414
# ----------------------------------------------------------
#   Solution Information
# ----------------------------------------------------------
Solution: 
- number of solutions: 0
  number of solutions displayed: 0

1.1.4.3.3 Plotting the solution¶

def model_plot(m):
    # extract the elements of space and time domains (which were discretized)
    r = sorted(m.r)
    t = sorted(m.t)

    # Define arrays to store the solution in 2D
    rgrid = np.zeros((len(t), len(r)))
    tgrid = np.zeros((len(t), len(r)))
    Tgrid = np.zeros((len(t), len(r)))

    # Extract the solution from the Pyomo model,
    # put into the numpy arrays
    for i in range(0, len(t)):
        for j in range(0, len(r)):
            rgrid[i,j] = r[j]
            tgrid[i,j] = t[i]
            Tgrid[i,j] = m.T[t[i], r[j]].value

    # Create a 3D plot
    fig = plt.figure(figsize=(10,6))
    ax = fig.add_subplot(1, 1, 1, projection='3d')
    ax.set_xlabel('Distance r')
    ax.set_ylabel('Time t')
    ax.set_zlabel('Temperature T')
    p = ax.plot_wireframe(rgrid, tgrid, Tgrid)

# Call the function
model_plot(m)

1.1.5 Loading Data Using Pandas¶

Inside the folder /notebooks/data/ is the text file Stock_Data.csv. This contains daily closing prices for 5 index funds over 5 years:

Symbol	Name
GSPC	S&P 500
DJI	Dow Jones Industrial Average
IXIC	NASDAQ Composite
RUT	Russell 2000
VIX	CBOE Volatility Index

1.1.5.1 Download the data into Colab¶

The following code checks if the data file can be found. If not, it downloads the file from GitHub. We are working on streamlining this process. For simplicity, we recommend directly defining data in Python as lists or arrays for contributed examples, which bypasses the need for this step.

import os, requests, urllib

# GitHub pages url
url = "https://ndcbe.github.io/cbe67701-uncertainty-quantification/"

# relative file paths to download
# this is the only line of code you need to change
file_paths = ['data/Stock_Data.csv']

# loop over all files to download
for file_path in file_paths:
    print("Checking for",file_path)
    # split each file_path into a folder and filename
    stem, filename = os.path.split(file_path)
    
    # check if the folder name is not empty
    if stem:
        # check if the folder exists
        if not os.path.exists(stem):
            print("\tCreating folder",stem)
            # if the folder does not exist, create it
            os.mkdir(stem)
    # if the file does not exist, create it by downloading from GitHub pages
    if not os.path.isfile(file_path):
        file_url = urllib.parse.urljoin(url,
                urllib.request.pathname2url(file_path))
        print("\tDownloading",file_url)
        with open(file_path, 'wb') as f:
            f.write(requests.get(file_url).content)
    else:
        print("\tFile found!")

Checking for data/Stock_Data.csv
	File found!

1.1.5.2 Import and analyze with Pandas¶

Let's import the data with Pandas:

stock_data = pd.read_csv('./data/Stock_Data.csv')

Now let's look at the first 5 rows of the dataset:

stock_data.head()

We can also calculate summary statistics:

stock_data.describe()

1.1.5.3 Visualize¶

Finally, let's plot the daily closing price for the Dow Jones Industry Average.

plt.plot(stock_data["DJI"],color="blue")
plt.xlabel("Day")
plt.ylabel("Price [USD]")
plt.grid(True)
plt.show()

Unfortunately, I did not record the correspond calendar date for the first day in the dataset; otherwise, we could have modified the horizonal axis to show dates.

< 1.0 Introduction to Uncertainty Quantification and Predictive Sciences | Contents | 2.0 Probability-and-Statistics-Preliminaries

	DJI	GSPC	IXIC	RUT	VIX
0	14447.75000	1551.689941	3235.300049	945.849976	13.74
1	14559.65039	1563.770020	3252.479980	949.820007	12.77
2	14526.16016	1562.849976	3256.520020	950.239990	13.15
3	14578.54004	1569.189941	3267.520020	951.539978	12.70
4	14572.84961	1562.170044	3239.169922	938.789978	13.58

	DJI	GSPC	IXIC	RUT	VIX
count	1259.000000	1259.000000	1259.000000	1259.000000	1259.000000
mean	18308.909006	2090.507514	4998.311675	1220.743042	14.550937
std	2683.633650	288.577349	986.364011	155.571503	3.845423
min	14447.750000	1541.609985	3166.360107	901.510010	9.140000
25%	16447.775390	1894.469970	4334.710205	1120.449951	12.130000
50%	17719.919920	2068.530029	4891.430176	1181.410034	13.710000
75%	19812.250000	2258.800049	5441.530029	1353.050049	15.910000
max	26616.710940	2872.870117	7588.319824	1610.709961	40.740002