{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "KUdg15uf30TU" }, "source": [ "# Summary Statistics\n", "\n", "**Further Reading**: ยง1.2 in Navidi (2015)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "iCqLvLrU30TW", "tags": [] }, "source": [ "## Learning Objectives\n", "\n", "After studying this notebook and your notes, completing the activities, asking questions in class, you should be able to:\n", "* Compute descriptive statistics for data using Pandas (Python package)\n", "* Interpret the elements of a covariance matrix\n", "* Explain why some data distributions have a very different median and mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two Types of Data: Numerical and Categorical" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As engineers, you will encounter both **numerical** (quantitative) and **categorical** (qualitative) data.\n", "\n", "Unfortunately, Example 1.8 from the textbook was not included in the data files on the publisher's website. But do not fear! We can recreate the table from a dictionary." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Specimen Torque Failure Location\n", "0 1 165 Weld\n", "1 2 237 Beam\n", "2 3 222 Beam\n", "3 4 255 Beam\n", "4 5 194 Weld\n" ] }, { "data": { "text/html": [ "
\n", " | Specimen | \n", "Torque | \n", "Failure Location | \n", "
---|---|---|---|
0 | \n", "1 | \n", "165 | \n", "Weld | \n", "
1 | \n", "2 | \n", "237 | \n", "Beam | \n", "
2 | \n", "3 | \n", "222 | \n", "Beam | \n", "
3 | \n", "4 | \n", "255 | \n", "Beam | \n", "
4 | \n", "5 | \n", "194 | \n", "Weld | \n", "
\n", " | PM | \n", "
---|---|
0 | \n", "1.50 | \n", "
1 | \n", "0.87 | \n", "
2 | \n", "1.12 | \n", "
3 | \n", "1.25 | \n", "
4 | \n", "3.46 | \n", "
Optional Home Activity
\n", " Calculate the sample mean for PM in the high altitude data set. Store your answer (a float) in the variable ans_14b_1.\n", "Class Activity
\n", " If the measured quantitfy $X$ is velocity with units m/s, then what are the units of variance $s^2$?\n", "\n", " | A | \n", "B | \n", "
---|---|---|
0 | \n", "0 | \n", "4 | \n", "
1 | \n", "0 | \n", "4 | \n", "
2 | \n", "5 | \n", "5 | \n", "
3 | \n", "10 | \n", "6 | \n", "
4 | \n", "10 | \n", "6 | \n", "
Optional Home Activity
\n", " Please take a minute to verify (on paper or in your head) that both data sets do in fact have a mean of 5.\n", "Optional Home Activity
\n", " Calculate the standard deviation (using $n-1$) for the particulate matter example. Store the results in the dictionary ans_14b_2 with keys \"low\" and \"high\". Hint: You'll need to calculate the standard deviation as a float and then save the answers into the dictionary.\n", "Optional Home Activity
\n", " Show the following formulae are equivalent to the definitions for sample variance and sample standard deviation given above. This is excellent practice for the next exam.\n", "Optional Home Activity
\n", " Calculate the median for the particulate matter example. Store the results in the dictionary ans_14b_3 with keys \"low\" and \"high\". Hint: You'll need to calculate the median as a float and then save the answers into the dictionary.\n", "\n", " | A | \n", "B | \n", "
---|---|---|
0 | \n", "0 | \n", "4 | \n", "
1 | \n", "0 | \n", "4 | \n", "
2 | \n", "5 | \n", "5 | \n", "
3 | \n", "10 | \n", "6 | \n", "
4 | \n", "10 | \n", "6 | \n", "
\n", " | A | \n", "B | \n", "
---|---|---|
0 | \n", "0 | \n", "4 | \n", "
1 | \n", "10 | \n", "6 | \n", "
Optional Home Activity
\n", " Calculate the mode for the high elevation particulate matter example. Store the results in the float ans_14b_4. Hint: You'll need to either extract the float from the pandas output with key 0 or manually enter your answer.\n", "\n", " | A | \n", "B | \n", "
---|---|---|
0 | \n", "0 | \n", "4 | \n", "
1 | \n", "0 | \n", "4 | \n", "
2 | \n", "5 | \n", "5 | \n", "
3 | \n", "10 | \n", "6 | \n", "
4 | \n", "10 | \n", "6 | \n", "
Optional Home Activity
\n", " Calculate the 80%-ile for the high elevation particulate matter example. Store the results in the float ans_14b_5.\n", "Class Activity
\n", " Print the first and last five elements of the data set.\n", "\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
0 | \n", "60.00 | \n", "8.0 | \n", "14.0 | \n", "10.0 | \n", "17.00 | \n", "14.00 | \n", "
1 | \n", "59.75 | \n", "8.0 | \n", "14.0 | \n", "10.0 | \n", "16.75 | \n", "14.00 | \n", "
2 | \n", "59.75 | \n", "8.0 | \n", "14.0 | \n", "10.0 | \n", "16.75 | \n", "14.00 | \n", "
3 | \n", "59.75 | \n", "8.0 | \n", "14.0 | \n", "10.0 | \n", "16.75 | \n", "14.00 | \n", "
4 | \n", "59.25 | \n", "8.0 | \n", "14.0 | \n", "10.0 | \n", "16.50 | \n", "13.75 | \n", "
\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
61 | \n", "42.25 | \n", "3.25 | \n", "12.00 | \n", "10.00 | \n", "11.25 | \n", "8.75 | \n", "
62 | \n", "39.00 | \n", "3.25 | \n", "8.50 | \n", "10.00 | \n", "8.25 | \n", "9.25 | \n", "
63 | \n", "22.00 | \n", "2.50 | \n", "5.50 | \n", "5.75 | \n", "6.75 | \n", "2.50 | \n", "
64 | \n", "21.75 | \n", "4.00 | \n", "7.00 | \n", "4.75 | \n", "1.00 | \n", "5.00 | \n", "
65 | \n", "15.50 | \n", "1.25 | \n", "1.75 | \n", "3.75 | \n", "10.25 | \n", "1.50 | \n", "
Class Activity
\n", " Compute the mean, median, mode, standard deviation, and quartiles using pandas.\n", "\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
0 | \n", "58.5 | \n", "5.0 | \n", "14.0 | \n", "10.0 | \n", "17.0 | \n", "12.5 | \n", "
\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
count | \n", "66.000000 | \n", "66.000000 | \n", "66.000000 | \n", "66.000000 | \n", "66.000000 | \n", "66.000000 | \n", "
mean | \n", "51.261364 | \n", "5.734848 | \n", "12.875000 | \n", "9.435606 | \n", "14.924242 | \n", "11.083333 | \n", "
std | \n", "8.487823 | \n", "1.839146 | \n", "2.156051 | \n", "1.281008 | \n", "3.065932 | \n", "3.025956 | \n", "
min | \n", "15.500000 | \n", "1.250000 | \n", "1.750000 | \n", "3.750000 | \n", "1.000000 | \n", "1.500000 | \n", "
25% | \n", "49.312500 | \n", "5.000000 | \n", "12.312500 | \n", "9.500000 | \n", "13.562500 | \n", "9.937500 | \n", "
50% | \n", "52.500000 | \n", "5.000000 | \n", "14.000000 | \n", "10.000000 | \n", "16.500000 | \n", "12.250000 | \n", "
75% | \n", "56.500000 | \n", "8.000000 | \n", "14.000000 | \n", "10.000000 | \n", "17.000000 | \n", "12.937500 | \n", "
max | \n", "60.000000 | \n", "8.000000 | \n", "14.000000 | \n", "10.000000 | \n", "17.000000 | \n", "14.000000 | \n", "
\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
Total | \n", "72.043138 | \n", "10.143444 | \n", "16.003365 | \n", "7.843051 | \n", "19.822028 | \n", "20.955769 | \n", "
P1 | \n", "10.143444 | \n", "3.382459 | \n", "1.928846 | \n", "0.750932 | \n", "2.007488 | \n", "2.357051 | \n", "
P2 | \n", "16.003365 | \n", "1.928846 | \n", "4.648558 | \n", "1.763942 | \n", "3.881731 | \n", "4.382692 | \n", "
P3 | \n", "7.843051 | \n", "0.750932 | \n", "1.763942 | \n", "1.640982 | \n", "2.086393 | \n", "1.952564 | \n", "
P4 | \n", "19.822028 | \n", "2.007488 | \n", "3.881731 | \n", "2.086393 | \n", "9.399942 | \n", "3.725641 | \n", "
P5 | \n", "20.955769 | \n", "2.357051 | \n", "4.382692 | \n", "1.952564 | \n", "3.725641 | \n", "9.156410 | \n", "
Class Activity
\n", " What are the units of sample covariance?\n", "Class Activity
\n", " What are the units of sample correlation?\n", "\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
Total | \n", "1.000000 | \n", "0.649790 | \n", "0.874492 | \n", "0.721335 | \n", "0.761709 | \n", "0.815915 | \n", "
P1 | \n", "0.649790 | \n", "1.000000 | \n", "0.486432 | \n", "0.318737 | \n", "0.356020 | \n", "0.423536 | \n", "
P2 | \n", "0.874492 | \n", "0.486432 | \n", "1.000000 | \n", "0.638665 | \n", "0.587224 | \n", "0.671768 | \n", "
P3 | \n", "0.721335 | \n", "0.318737 | \n", "0.638665 | \n", "1.000000 | \n", "0.531229 | \n", "0.503722 | \n", "
P4 | \n", "0.761709 | \n", "0.356020 | \n", "0.587224 | \n", "0.531229 | \n", "1.000000 | \n", "0.401583 | \n", "
P5 | \n", "0.815915 | \n", "0.423536 | \n", "0.671768 | \n", "0.503722 | \n", "0.401583 | \n", "1.000000 | \n", "
Class Activity
\n", " Interpret correlation for the exam data. Why is the diagonal exactly 1.0. What would a negative correlation mean?\n", "\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
Total | \n", "0.020012 | \n", "0.021132 | \n", "0.019052 | \n", "0.013072 | \n", "0.019433 | \n", "0.024947 | \n", "
P1 | \n", "0.021132 | \n", "0.052851 | \n", "0.017222 | \n", "0.009387 | \n", "0.014761 | \n", "0.021045 | \n", "
P2 | \n", "0.019052 | \n", "0.017222 | \n", "0.023717 | \n", "0.012600 | \n", "0.016310 | \n", "0.022361 | \n", "
P3 | \n", "0.013072 | \n", "0.009387 | \n", "0.012600 | \n", "0.016410 | \n", "0.012273 | \n", "0.013947 | \n", "
P4 | \n", "0.019433 | \n", "0.014761 | \n", "0.016310 | \n", "0.012273 | \n", "0.032526 | \n", "0.015654 | \n", "
P5 | \n", "0.024947 | \n", "0.021045 | \n", "0.022361 | \n", "0.013947 | \n", "0.015654 | \n", "0.046716 | \n", "
\n", " | Total | \n", "P1 | \n", "P2 | \n", "P3 | \n", "P4 | \n", "P5 | \n", "
---|---|---|---|---|---|---|
Total | \n", "1.000000 | \n", "0.649790 | \n", "0.874492 | \n", "0.721335 | \n", "0.761709 | \n", "0.815915 | \n", "
P1 | \n", "0.649790 | \n", "1.000000 | \n", "0.486432 | \n", "0.318737 | \n", "0.356020 | \n", "0.423536 | \n", "
P2 | \n", "0.874492 | \n", "0.486432 | \n", "1.000000 | \n", "0.638665 | \n", "0.587224 | \n", "0.671768 | \n", "
P3 | \n", "0.721335 | \n", "0.318737 | \n", "0.638665 | \n", "1.000000 | \n", "0.531229 | \n", "0.503722 | \n", "
P4 | \n", "0.761709 | \n", "0.356020 | \n", "0.587224 | \n", "0.531229 | \n", "1.000000 | \n", "0.401583 | \n", "
P5 | \n", "0.815915 | \n", "0.423536 | \n", "0.671768 | \n", "0.503722 | \n", "0.401583 | \n", "1.000000 | \n", "
Class Activity
\n", " Why is the correlation matrix not changed by scaling?\n", "\n", " | Specimen | \n", "Torque | \n", "Failure Location | \n", "
---|---|---|---|
0 | \n", "1 | \n", "165 | \n", "Weld | \n", "
1 | \n", "2 | \n", "237 | \n", "Beam | \n", "
2 | \n", "3 | \n", "222 | \n", "Beam | \n", "
3 | \n", "4 | \n", "255 | \n", "Beam | \n", "
4 | \n", "5 | \n", "194 | \n", "Weld | \n", "