Correlation, Covariance, and Independence
Contents
14.1. Correlation, Covariance, and Independence#
Further Reading: §7.1 Navidi (2015)
14.1.1. Learning Objectives#
After studying this notebook and your lecture notes, you should be able to:
Interpret correlation coefficients.
Understand the relationship between correlation, covariance, and independence.
Determine if two variables are linearly associated.
import numpy as np
import matplotlib.pyplot as plt
14.1.2. Correlation#
Correlation Coefficient:
The numerator is also known as \(cov(X,Y)\).
Sample Correlation:
Correlation coefficient is dimensionless.
It remains unchanged when:
Scaling a variable by a positive constant.
Adding a constant to a varible.
Interchanging the values of x and y.
Important Notes:
Correlation coefficient only measures linear association.
Correlation coefficient can be misleading when outliers are present.
Correlation is NOT causation.
Controlled experiments reduce the risk of confounding.
Home Activity
In the following cell, add A-E to the proper strings to correctly match each graph with its correlation coefficient.
# Correlation coefficient of -0.95
ans1 = ''
# Correlation coefficient of -0.50
ans2 = ''
# Correlation coefficient of 0.00
ans3 = ''
# Correlation coefficient of 0.50
ans4 = ''
# Correlation coefficient of 0.95
ans5 = ''
# Add your solution here
Graph A:
nsim = 100
xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x
yline1 = y1(xline1)
x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,0.5,nsim)
y1 = yline1 + var1
plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
Graph B:
nsim = 100
xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x
yline1 = y1(xline1)
x1 = np.linspace(-2.5,2.5,nsim)
y1 = np.random.normal(0,1,nsim)
plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
Graph C:
nsim = 100
xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: -x
yline1 = y1(xline1)
x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,1.3,nsim)
y1 = yline1 + var1
plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
Graph D:
nsim = 100
xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: x
yline1 = y1(xline1)
x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,1.3,nsim)
y1 = yline1 + var1
plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
Graph E:
nsim = 100
xline1 = np.linspace(-3,3,nsim)
y1 = lambda x: -x
yline1 = y1(xline1)
x1 = np.linspace(-3,3,nsim)
var1 = np.random.normal(0,0.5,nsim)
y1 = yline1 + var1
plt.plot(x1,y1,'bo',markersize=3)
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.show()
14.1.3. Covariance#
Often data are not independent because of observation error (or because the variables are related).
Covariance measures the linear relationship between two random variables.
14.1.4. Independence#
If \(cov(X,Y) = 0 = \rho_{X,Y}\), then \(X\) and \(Y\) are uncorrelated.
If \(X\) and \(Y\) are independent, then \(X\) and \(Y\) are uncorrelated.
It is possible for \(X\) and \(Y\) to be uncorrelated and NOT independent.