10.3. Jointly Distributed Random Variables#

Further Reading: §2.6 in Navidi (2015)

10.3.1. Learning Objectives#

After attending class, completing these activities, asking questions, and studying notes, you should be able to:

  • Determine if two variables are independent or dependent.

  • Understand how to perform statistical analysis on jointly distributed random variables.

import numpy as np
import random
import pandas as pd

10.3.2. Key Equations#

Join Probability Mass Function:

Joint probability is the probability of two events occurring simultaneously.

\[\rho(x,y) = P(X=x \textrm{ and } Y=y)\]
\[ \sum_x \sum_y P(x,y) = 1\]

Marginal Probability Mass Function:

Marginal probability is the probability of an event occurring regardless of the outcome of another variable.

\[\rho_X(x) = P(X=x) = \sum_y \rho(x,y) \textrm{ (marginalize over y/sum over all y outcomes) }\]
\[\rho_Y(y) = P(Y=y) = \sum_x \rho(x,y) \textrm{ (marginalize over x/sum over all x outcomes) }\]

10.3.3. Example: Independent Random Variables#

Let’s revist the coin example from the Random Variables, but assume the coins are NOT independent:

\[P(A) = 0.6\]
\[P(B | A) = 0.8$$ and $$P(B | \neg A) = \frac{0.5 - 0.6 \cdot 0.8}{0.4} = 0.05\]

Thus,

\[P(B | A) \cdot P(A) + P(B | \neg A) \cdot P(\neg A) = P(B)\]
\[0.8 \cdot 0.6 + 0.05 \cdot 0.4 = 0.5\]

We still have \(P(A) = 0.6\) and \(P(B) = 0.5\) from the original case, but we have introduced a correlation structure.

In the code below, we record 0.0 for a head and 1.0 for a tail.

# number of flips
n = 1000

# store results
coin_A = np.zeros(n)
coin_B = np.zeros(n)

for i in range(n):
    # flip coin A. Generate uniformly distributed random number on [0,1)
    # then check if is in less than 0.6
    coin_A[i] = 1.0*(random.random() < 0.6)
    
    # flip coin B
    if coin_A[i] < 1E-6:
        # coin A for this flip is a tail
        coin_B[i] = 1.0*(random.random() < 0.05)
    else:
        coin_B[i] = 1.0*(random.random() < 0.8)
    
# assemble into pandas dataframe
d = {"A":coin_A, "B":coin_B}
dep_coins = pd.DataFrame(data=d)

# print first few experiments
dep_coins.head()
A B
0 0.0 0.0
1 1.0 1.0
2 0.0 0.0
3 1.0 1.0
4 1.0 1.0
# print mean (average)
dep_coins.mean()
A    0.600
B    0.494
dtype: float64
# print covariance
dep_coins.cov()
A B
A 0.24024 0.179780
B 0.17978 0.250214
# print covariance
dep_coins.corr()
A B
A 1.000000 0.733267
B 0.733267 1.000000

Note

Class Discussion: Based on the simulation data, are these coins independent?