9.1. Sampling#

9.1.1. Learning Objectives#

After studying this notebook and your notes, completing the activities, asking questions in class, you should be able to:

  • Give at least three examples of how an engineer would use statistics

  • Define statistical vocabulary such as population, random sample, etc.

  • Explain what statistical independence means.

Note

The home activities in this chapter are optional and you already completed many of them in Chapter 1. We recommend revisiting these as practice.

9.1.2. Basic Vocabulary and an Example#

Here is a central procedure that underpins the entire field of statistics:

  1. Collect a sample of data.

  2. Analyze the data to infer something about a population.

We will make this concrete with an example. Imagine we are researching nutrition and health of college students. We want to know the average caloric intake (calories eaten per day) of the entire undergraduate population at Notre Dame.

Class Activity

Brainstorm at least two challenges with measuring the caloric intake of the entire student population.

Activity Answer:

For the reasons you just listed and more, we decide it is infeasible to accurately record everything that everyone on campus eats. Instead we decide to randomly select a sample of students and have them record a food diary. Of course, we’ll have to train them on how to make accurate measurements, compensate them for their time, and take additional steps at estimate how accuracy of their journal entries.

Using this example, let’s introduce some vocaburaly:

Definition

Discussion for Our Example

Population: The entire collection of objects or outcomes about which information is sought.

We want to know about the eating habits of the entire ND undergraduate student body.

Sample: A subset of the population, containing the objects or outcomes that are actually observed.

We ask a small group of ND undergraduates to keep a food journal.

Simple Random Sample: A sample that is randomly choosen, where each collection of population items is equally likely to make up the sample.

We decide to randomly select \(n\) undergraduate students by drawing names of our a well-mixed hat.

Convenience Sample: A sample that is obtained is some convienent way, and not drawn from a well-defined random method.

We decide to ask a few classes on campus to keep food journals.

A fundamental limitation of convience samples is that they often introduce systematic biases.

Class Activity

In two sentences, describe how asking only a few classes to keep food journals could lead us to misguided conclusions about the entire population of ND undergraduates. How would using a random sample help safegaurd against (some) of these biases?

Activity Answer:

Alright, so we randomly select 30 undergraduates and strap Go-Pros to their heads. We painstakingly analyze the videos to verify their food journals are 100% accurate. We then calculate the average caloric intake is 2023.2 kilocalories per day.

Is this the true average of the population? In other words, if we somehow measured the caloric intake of the entire ND undergraduate student population, should we expect the average to be exactly 2023.2 kcal/day?

No. Why? Because our sample was randomly selected. Thus we expect to see sampling variation. In other words, if we collected another sample from 30 other randomly selected students, we expect a slightly different result. We will soon see how probability theory gives us a mathematical framework to quantify this variability.

9.1.3. Conceptual versus Tangible Populations#

There are two types of populations:

Type of Population

Example

Tangible Population: A population that is finite. Often a tangible population decreases by one after an object is sampled.

Population of students.

Conceptual Population: A population that is infinite.

An engineer measures the concentration of a mixture five times. The population is the set of all possible outcomes of these five measurements.

Class Activity

With a partner, for each of the following scenarions, i) define the population and ii) state whether it is conceptual or tangible.

  1. A chemical process is run 15 times, and the yield is measured 15 times.

  2. In a clinical trial to test a new drug that is designed to lower cholesterol, 100 people with high cholesterol levels are recruited to try the new drug.

Class Activity Answer:

9.1.4. Independence#

The items in a sample are independent if knowing the values of some of the items does not help to predict the values of the others. Because tangible populations are finite, samples taken without replacement are not strictly independent. For example, if we choose student “John Doe” to keep a food journal, we know that “John Doe” will not be chosen again (assuming everyone has a unique name and we are sampling without replacement). But we often treat simple random samples as independent if the sample contains less than 5% of the population.