{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "*This notebook contains material from [cbe67701-uncertainty-quantification](https://ndcbe.github.io/cbe67701-uncertainty-quantification);\n", "content is available [on Github](https://github.com/ndcbe/cbe67701-uncertainty-quantification.git).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "< [3.1 Copulas](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.01-Contributed-Example.html) | [Contents](toc.html) | [4.0 Local Sensitivity Analysis Based on Derivative Approximations](https://ndcbe.github.io/cbe67701-uncertainty-quantification/04.00-Local-Sensitivity-Analysis-Based-on-Derivative-Approximations.html)
"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "FOk6y691JBW8",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"source": [
"# 3.2 Principal Component Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8BV1D7qRa07S",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"source": [
"Stephen Adams (sadams22@nd.edu)\n",
"6/18/2020"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "JCQR8b7RbCn3",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"source": [
"The following example shows an application of principal component analysis (PCA), also known as random variable reduction, the Hotelling transform, or proper orthogonal decomposition (see page 76 of textbook). PCA is commonly used in machine learning to reduce the number of components, thereby speeding up machine learning algorithms.\n",
"\n",
"In this example, a scree plot will be generated. A scree plot shows how much of the variance in a data set can be attributed to each principal component (see Fig. 3.14 in the textbook)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Checking for data/quarterbacks3.csv\n",
"\tFile found!\n"
]
}
],
"source": [
"## Dowload data from GitHub\n",
"import os, requests, urllib\n",
"\n",
"# GitHub pages url\n",
"url = \"https://ndcbe.github.io/cbe67701-uncertainty-quantification/\"\n",
"\n",
"# relative file paths to download\n",
"# this is the only line of code you need to change\n",
"file_paths = ['data/quarterbacks3.csv']\n",
"\n",
"# loop over all files to download\n",
"for file_path in file_paths:\n",
" print(\"Checking for\",file_path)\n",
" # split each file_path into a folder and filename\n",
" stem, filename = os.path.split(file_path)\n",
" \n",
" # check if the folder name is not empty\n",
" if stem:\n",
" # check if the folder exists\n",
" if not os.path.exists(stem):\n",
" print(\"\\tCreating folder\",stem)\n",
" # if the folder does not exist, create it\n",
" os.mkdir(stem)\n",
" # if the file does not exist, create it by downloading from GitHub pages\n",
" if not os.path.isfile(file_path):\n",
" file_url = urllib.parse.urljoin(url,\n",
" urllib.request.pathname2url(file_path))\n",
" print(\"\\tDownloading\",file_url)\n",
" with open(file_path, 'wb') as f:\n",
" f.write(requests.get(file_url).content)\n",
" else:\n",
" print(\"\\tFile found!\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LEoniqyVfpQw",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"source": [
"The data set being analyzed is the statistics of the starting quarterbacks for all 32 NFL teams in the 2019 season. The statistics can be found at https://www.pro-football-reference.com/years/2019/passing.htm. The salaries were also included and can be found at https://www.spotrac.com/nfl/rankings/2019/average/quarterback/."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 153
},
"colab_type": "code",
"id": "JC_sAcaEHPM8",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
},
"outputId": "91293a06-a022-4d3c-946a-bd92ef0b430b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Data Player Team Age Games Games Started Wins Losses \\\n",
"0 1 Jared Goff LAR 25 16 16 9 7 \n",
"1 2 Jameis Winston TAM 25 16 16 7 9 \n",
"2 3 Matt Ryan ATL 34 15 15 7 8 \n",
"3 4 Tom Brady NWE 42 16 16 12 4 \n",
"4 5 Carson Wentz PHI 27 16 16 9 7 \n",
"\n",
" Salary Completions ... Yards/Cmp Yards/Game Rating Sacks \\\n",
"0 33500000 394 ... 11.8 289.9 86.5 22 \n",
"1 6337819 380 ... 13.4 319.3 84.3 47 \n",
"2 30000000 408 ... 10.9 297.7 92.1 48 \n",
"3 23000000 373 ... 10.9 253.6 88.0 27 \n",
"4 32000000 388 ... 10.4 252.4 93.1 37 \n",
"\n",
" Sack Yards Net Yards/Attempt Adjusted NY/A Sack% 4th Quarter Comeback \\\n",
"0 170 6.90 6.46 3.4 1.0 \n",
"1 282 7.17 6.15 7.0 2.0 \n",
"2 316 6.25 6.08 7.2 3.0 \n",
"3 185 6.05 6.24 4.2 1.0 \n",
"4 230 5.91 6.26 5.7 2.0 \n",
"\n",
" Game Winning Drive \n",
"0 2 \n",
"1 2 \n",
"2 2 \n",
"3 1 \n",
"4 4 \n",
"\n",
"[5 rows x 31 columns]\n"
]
}
],
"source": [
"# Import all libraries\n",
"import pandas as pd\n",
"import io\n",
"import numpy as np\n",
"from sklearn.decomposition import PCA\n",
"from sklearn import preprocessing\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Put data into array\n",
"qb_data = pd.read_csv('./data/quarterbacks3.csv',delimiter=\"\\t\")\n",
"# Preview\n",
"print (qb_data.head())"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "pqTmFjs7i6eS",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
}
},
"source": [
"Now perform PCA on the data set.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"colab_type": "code",
"id": "g7agRpq0yAXc",
"nbpages": {
"level": 1,
"link": "[3.2 Principal Component Analysis](https://ndcbe.github.io/cbe67701-uncertainty-quantification/03.02-Contributed-Example.html#3.2-Principal-Component-Analysis)",
"section": "3.2 Principal Component Analysis"
},
"outputId": "673ef9ed-4bc0-49f7-b682-0aa6cab0d04e"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:7: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by the scale function.\n",
" import sys\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
}
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyOu19Zi7O5/wlDu9Kl/3tjP",
"collapsed_sections": [],
"include_colab_link": true,
"name": "03_02_Contributed_Example.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}