14.1. Correlation, Covariance, and Independence#

Further Reading: §7.1 Navidi (2015)

14.1.1. Learning Objectives#

After studying this notebook and your lecture notes, you should be able to:

  • Interpret correlation coefficients.

  • Understand the relationship between correlation, covariance, and independence.

  • Determine if two variables are linearly associated.

import numpy as np
import matplotlib.pyplot as plt

14.1.2. Correlation#

Correlation Coefficient:

\[\rho = \frac{\sigma_{XY}}{\sigma_X \sigma_Y} \textrm{ (random variables) }\]
\[-1 \leq \rho_{X,Y} \leq 1\]

The numerator is also known as \(cov(X,Y)\).

Sample Correlation:

\[ r = \frac{1}{n-1} \sum_{i=1}^n (\frac{x_i - \bar{x}}{s_x})(\frac{y_i - \bar{y}}{s_y})\]

Correlation coefficient is dimensionless.

It remains unchanged when:

  1. Scaling a variable by a positive constant.

  2. Adding a constant to a varible.

  3. Interchanging the values of x and y.

Important Notes:

  • Correlation coefficient only measures linear association.

  • Correlation coefficient can be misleading when outliers are present.

  • Correlation is NOT causation.

  • Controlled experiments reduce the risk of confounding.

Home Activity

In the following cell, add A-E to the proper strings to correctly match each graph with its correlation coefficient.

# Correlation coefficient of -0.95
ans1 = ''
# Correlation coefficient of -0.50
ans2 = ''
# Correlation coefficient of 0.00
ans3 = ''
# Correlation coefficient of 0.50
ans4 = ''
# Correlation coefficient of 0.95
ans5 = ''

# Add your solution here

Graph A:

nsim = 100

xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x 
yline1 = y1(xline1)

x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,0.5,nsim)
y1 = yline1 + var1

plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
../../_images/Correlation-Covariance-and-Independence_8_0.png

Graph B:

nsim = 100

xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x
yline1 = y1(xline1)

x1 = np.linspace(-2.5,2.5,nsim)
y1 = np.random.normal(0,1,nsim)

plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
../../_images/Correlation-Covariance-and-Independence_10_0.png

Graph C:

nsim = 100

xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: -x 
yline1 = y1(xline1)

x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,1.3,nsim)
y1 = yline1 + var1

plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
../../_images/Correlation-Covariance-and-Independence_12_0.png

Graph D:

nsim = 100

xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x 
yline1 = y1(xline1)

x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,1.3,nsim)
y1 = yline1 + var1

plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
../../_images/Correlation-Covariance-and-Independence_14_0.png

Graph E:

nsim = 100

xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: -x 
yline1 = y1(xline1)

x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,0.5,nsim)
y1 = yline1 + var1

plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
../../_images/Correlation-Covariance-and-Independence_16_0.png

14.1.3. Covariance#

Often data are not independent because of observation error (or because the variables are related).

Covariance measures the linear relationship between two random variables.

\[cov(X,Y) = E[(X-\mu_X)(Y-\mu_Y)]\]
\[ = E[X \cdot Y] - \mu_X \mu_Y\]

14.1.4. Independence#

If \(cov(X,Y) = 0 = \rho_{X,Y}\), then \(X\) and \(Y\) are uncorrelated.

If \(X\) and \(Y\) are independent, then \(X\) and \(Y\) are uncorrelated.

It is possible for \(X\) and \(Y\) to be uncorrelated and NOT independent.