{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6rJXXfntCDIa" }, "source": [ "# Jointly Distributed Random Variables\n", "**Further Reading**: ยง2.6 in Navidi (2015)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8wo2zupKCDId" }, "source": [ "## Learning Objectives\n", "\n", "After attending class, completing these activities, asking questions, and studying notes, you should be able to:\n", "* Determine if two variables are independent or dependent.\n", "* Understand how to perform statistical analysis on jointly distributed random variables." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import random\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Key Equations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Join Probability Mass Function:**\n", "\n", "Joint probability is the probability of two events occurring simultaneously.\n", "\n", "$$\\rho(x,y) = P(X=x \\textrm{ and } Y=y)$$\n", "\n", "$$ \\sum_x \\sum_y P(x,y) = 1$$\n", "\n", "**Marginal Probability Mass Function:**\n", "\n", "Marginal probability is the probability of an event occurring regardless of the outcome of another variable.\n", "\n", "$$\\rho_X(x) = P(X=x) = \\sum_y \\rho(x,y) \\textrm{ (marginalize over y/sum over all y outcomes) }$$\n", "\n", "$$\\rho_Y(y) = P(Y=y) = \\sum_x \\rho(x,y) \\textrm{ (marginalize over x/sum over all x outcomes) }$$" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Example: Independent Random Variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's revist the coin example from the [Random Variables](../10/Random-Variables.ipynb), but assume the coins are NOT independent:\n", "\n", "$$P(A) = 0.6$$\n", "\n", "$$P(B | A) = 0.8$$ and $$P(B | \\neg A) = \\frac{0.5 - 0.6 \\cdot 0.8}{0.4} = 0.05$$\n", "\n", "Thus,\n", "\n", "$$P(B | A) \\cdot P(A) + P(B | \\neg A) \\cdot P(\\neg A) = P(B)$$\n", "$$0.8 \\cdot 0.6 + 0.05 \\cdot 0.4 = 0.5$$\n", "\n", "We still have $P(A) = 0.6$ and $P(B) = 0.5$ from the original case, but we have introduced a correlation structure.\n", "\n", "In the code below, we record 0.0 for a head and 1.0 for a tail." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
00.00.0
11.01.0
20.00.0
31.01.0
41.01.0
\n", "
" ], "text/plain": [ " A B\n", "0 0.0 0.0\n", "1 1.0 1.0\n", "2 0.0 0.0\n", "3 1.0 1.0\n", "4 1.0 1.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# number of flips\n", "n = 1000\n", "\n", "# store results\n", "coin_A = np.zeros(n)\n", "coin_B = np.zeros(n)\n", "\n", "for i in range(n):\n", " # flip coin A. Generate uniformly distributed random number on [0,1)\n", " # then check if is in less than 0.6\n", " coin_A[i] = 1.0*(random.random() < 0.6)\n", " \n", " # flip coin B\n", " if coin_A[i] < 1E-6:\n", " # coin A for this flip is a tail\n", " coin_B[i] = 1.0*(random.random() < 0.05)\n", " else:\n", " coin_B[i] = 1.0*(random.random() < 0.8)\n", " \n", "# assemble into pandas dataframe\n", "d = {\"A\":coin_A, \"B\":coin_B}\n", "dep_coins = pd.DataFrame(data=d)\n", "\n", "# print first few experiments\n", "dep_coins.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 0.600\n", "B 0.494\n", "dtype: float64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print mean (average)\n", "dep_coins.mean()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
A0.240240.179780
B0.179780.250214
\n", "
" ], "text/plain": [ " A B\n", "A 0.24024 0.179780\n", "B 0.17978 0.250214" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print covariance\n", "dep_coins.cov()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
A1.0000000.733267
B0.7332671.000000
\n", "
" ], "text/plain": [ " A B\n", "A 1.000000 0.733267\n", "B 0.733267 1.000000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print covariance\n", "dep_coins.corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Class Discussion: Based on the simulation data, are these coins independent?\n", "
" ] } ], "metadata": { "colab": { "name": "L13-Intro-Probability-Statistics.ipynb", "provenance": [], "version": "0.3.2" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }