This notebook contains material from cbe67701-uncertainty-quantification; content is available on Github.

2.1 Multivariate Distributions: Example from Texbook

Created by Christian Villa Santos (cvillas2@nd.edu)

2.1.1 Objective

The objective is to illustrate the example of the Multivariate Normal Distribution provided in Chapter 2 of the the textbook: Uncertatinty Quantification and Predictive Computational Science: A Foundation for Physical Scientists and Engineers.

2.1.2 Theory

As we can infer from the name, a multivariate normal distribution is a case in which we have multiple variables. A univariate normal distribution, with only one variable X1, is defined by the mean and the variance while a multivariate normal distribution has a vector of means and a covariance matrix containing variances on the principal diagonal and the off-diagonal.

First, lets define some terms. The joint cumulative distribution function (joint CDF) is the probability that each random variable is smaller than a given number and is given by:

image.png

The difference of the joint CDF results in the probability that each random variable is within a range and is given by:

image.png

If we derive the joint CDF we can obtain the joint probability density function (joint PDF).

image.png

Therefore, the joint CDF is the integral of the joint PDF. For a single variable the PDF is given by:

image.png

Integrating from the second to last variable we get a function of x1 that is called the marginal probability density function for random variable X1. The marginal cumulative distribution function for X1 is:

image.png

The marginal distribution of a variable can be seen as the probability distribution of X2 ignoring information about X1 by integrating out X1 and vice versa.

Going back to the multivariate normal distribution, the PDF for a multivariate normal PDF of k variables is given by:

image.png

2.1.3 Graphical Representations

A bivariate normal distrubution is a case of the multivariate and has only two variables. Therefore, it has five parameters: two means, two standard deviations and the product moment correlation between the two variables. It is often imposible to draw figures for systems containing more than this amount of variables.

The peak or centroid is the vector of the means. The values for the standard deviations will dictate the shape of the figure such as circle, ellipse or tilted ellipse.

Here is an illustration fo how the joint bivariate normal density function looks like.

image.png

Figure 1 in Tacq (2010).

The orthogonal projections of the marginal distributions are illustrated below.

image.png

Figure 2 in Tacq (2010).

2.1.3.1 Equal-Density Contours

image.png

Figure 4 in Tacq (2010).

In an ellipse the places with equal height have the same probability and that is the reason for the name of equal-density contours. For the tilted ellipse, that is not the case (it does not have the same height).

2.1.4 Why are multivariate distributions important?

  1. Multivariate Distributions are correlations between variables.

    For example, higher values of the correlations between variables are represented by thinner ellipses. On the other hand, if there is a lower correlation values the ellipse will be more fat with a larger proportion of a population.

  2. It has statistic applications on linear regression analysis and its extension, structural equation models, discriminant analysis, multivariate analysis of variance, canonical correlation analysis.

  3. Tha paper from J Tacq, listed on the references, provides examples on the topic.

2.1.5 Python Code

In [1]:
# This code generates 2-D multivariate normal random variable plots 
# and marginal distributions.
# Adapted from code provided by Prof. Ryan McClarren
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

N = 10**4
mean = (1, 2)
cov = [[1, 0.35], [0.35, 0.5]]
df = np.random.multivariate_normal(mean, cov, N)
maize = "#ffcb05"
blue = "#00274c"
plt.plot(df[:,0], df[:,1],'.', color=blue)
plt.show()
plt.hist(df[:,0], facecolor=maize, edgecolor=blue)
plt.title("Histogram of X$_1$")
plt.show()
plt.hist(df[:,1], facecolor=maize, edgecolor=blue)
plt.title("Histogram of X$_2$")
plt.show()

2.1.6 References

McClarren R.G. (2018) Probability and Statistics Preliminaries. In: Uncertainty Quantification and Predictive Computational Science. Springer, Cham

Tacq, J. (2010). Multivariate Normal Distribution. International Encyclopedia of Education, 332–338. doi:10.1016/b978-0-08-044894-7.01351-8 

In [ ]: