This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.
Created by Christian Villa Santos (cvillas2@nd.edu)
The objective is to illustrate the example of the Multivariate Normal Distribution provided in Chapter 2 of the the textbook: Uncertatinty Quantification and Predictive Computational Science: A Foundation for Physical Scientists and Engineers.
As we can infer from the name, a multivariate normal distribution is a case in which we have multiple variables. A univariate normal distribution, with only one variable X1, is defined by the mean and the variance while a multivariate normal distribution has a vector of means and a covariance matrix containing variances on the principal diagonal and the off-diagonal.
First, lets define some terms. The joint cumulative distribution function (joint CDF) is the probability that each random variable is smaller than a given number and is given by:
The difference of the joint CDF results in the probability that each random variable is within a range and is given by:
If we derive the joint CDF we can obtain the joint probability density function (joint PDF).
Therefore, the joint CDF is the integral of the joint PDF. For a single variable the PDF is given by:
Integrating from the second to last variable we get a function of x1 that is called the marginal probability density function for random variable X1. The marginal cumulative distribution function for X1 is:
The marginal distribution of a variable can be seen as the probability distribution of X2 ignoring information about X1 by integrating out X1 and vice versa.
Going back to the multivariate normal distribution, the PDF for a multivariate normal PDF of k variables is given by:
A bivariate normal distrubution is a case of the multivariate and has only two variables. Therefore, it has five parameters: two means, two standard deviations and the product moment correlation between the two variables. It is often imposible to draw figures for systems containing more than this amount of variables.
The peak or centroid is the vector of the means. The values for the standard deviations will dictate the shape of the figure such as circle, ellipse or tilted ellipse.
Here is an illustration fo how the joint bivariate normal density function looks like.
Figure 1 in Tacq (2010).
The orthogonal projections of the marginal distributions are illustrated below.
Figure 2 in Tacq (2010).
Figure 4 in Tacq (2010).
In an ellipse the places with equal height have the same probability and that is the reason for the name of equal-density contours. For the tilted ellipse, that is not the case (it does not have the same height).
Multivariate Distributions are correlations between variables.
For example, higher values of the correlations between variables are represented by thinner ellipses. On the other hand, if there is a lower correlation values the ellipse will be more fat with a larger proportion of a population.
It has statistic applications on linear regression analysis and its extension, structural equation models, discriminant analysis, multivariate analysis of variance, canonical correlation analysis.
Tha paper from J Tacq, listed on the references, provides examples on the topic.
# This code generates 2-D multivariate normal random variable plots
# and marginal distributions.
# Adapted from code provided by Prof. Ryan McClarren
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
N = 10**4
mean = (1, 2)
cov = [[1, 0.35], [0.35, 0.5]]
df = np.random.multivariate_normal(mean, cov, N)
maize = "#ffcb05"
blue = "#00274c"
plt.plot(df[:,0], df[:,1],'.', color=blue)
plt.show()
plt.hist(df[:,0], facecolor=maize, edgecolor=blue)
plt.title("Histogram of X$_1$")
plt.show()
plt.hist(df[:,1], facecolor=maize, edgecolor=blue)
plt.title("Histogram of X$_2$")
plt.show()
McClarren R.G. (2018) Probability and Statistics Preliminaries. In: Uncertainty Quantification and Predictive Computational Science. Springer, Cham
Tacq, J. (2010). Multivariate Normal Distribution. International Encyclopedia of Education, 332–338. doi:10.1016/b978-0-08-044894-7.01351-8