Do you think the distribution of the weights of 47 passengers might vary from boat to boat, even if they were randomly selected from the same (large) population of American adults?
Section 7.1 Investigation 2.4: The Ethan Allen
On October 2, 2005, the tour boat Ethan Allen capsized on Lake George in upstate New York. All 47 passengers on board were thrown into the lake, and 20 of them drowned. Although there were some claims that a large wave had hit the boat, an investigation by New York State Police (2006) concluded that the boat had been overloaded with heavy passengers. In fact, the average weight of the 47 passengers was found to be 174 pounds; whereas the boat was designed with the assumption that passengers would average 140 pounds.

Ethan Allen tour boat on Lake George
Checkpoint 7.1.1. Conjecture about weight distribution.
Checkpoint 7.1.2. Sketch population distribution.
Data from the Centers for Disease Control and Prevention indicate that weights of American adults in 2005 had a mean of 167 pounds and a standard deviation of 35 pounds. (To convey that these are population values, we will use Greek letters to represent their values, \(\mu = 167\) and \(\sigma = 35\text{.}\)) Use this information to sketch a possible distribution of the weights of the population of adult Americans.
Engineers estimated the maximum weight capacity of passengers that the Ethan Allen could accommodate to be 7500 pounds. If the tour boat company consistently accepted 47 passengers, what we want to know is the probability that the combined weight of the 47 passengers would exceed this capacity.
Checkpoint 7.1.3. Identify observational units and variable.
Checkpoint 7.1.4. Convert total weight to average weight.
So, to see how often the boat was sent out with too much weight, we need to know about the distribution of the average weight of 47 passengers from different boats (samples). Think about a distribution of sample mean weights from different random samples of 47 passengers, repeatedly selected from the population of adult Americans.
Checkpoint 7.1.5. Center of sample means distribution.
Checkpoint 7.1.6. Variability of sample means.
Do you think the distribution of sample means would have more variability, less variability, or the same variability as the distribution of weights of individual people?
- More variability
- Less variability
- Same variability
Checkpoint 7.1.7. Compare probabilities.
- LARGER
- SMALLER
- EQUAL
Simulation Analysis.
To investigate this probability, we will generate random samples from a hypothetical population of adultsβ weights. Open the WeightPopulations.txt data file and pretend each column is a different population of 20,000 adult tourists, each with a population mean weight (\(\mu\)) of 167 lbs and a population standard deviation (\(\sigma\)) of 35 lbs.
Copy the data to the clipboard and then open the Sampling from Finite Population applet:
-
Press Clear
-
Click inside the Paste population data box and paste the data from the clipboard (or type WeightPopulations.txt and press Use Data).
-
Press Use Data
Checkpoint 7.1.8. Describe population distribution.
Describe the shape, mean (\(\mu\)) and standard deviation (\(\sigma\)) of pop1 (shown in the histogram under Population Data). You can also check the Show skewness box.
-
Check the Show Sampling Options box. Keep the Number of Samples at 1 and specify the Sample Size to be 47 and press the Draw Samples button.
-
The weights for this random sample of 47 passengers from this population are displayed in blue and in the Most Recent Sample graph.
Checkpoint 7.1.9. Describe sample distribution.
Describe the shape, center, and variability of this distribution. How does this sample distribution compare to the population distribution?
Most Recent Sample distribution:
Shape:
Mean:
Standard deviation:
Comparison (sample vs. population):
Notice that the mean of this sample (an \(\bar{x}\)) has also been added to the graph in the lower right Sampled Statistics graph.
Checkpoint 7.1.10. Generate second sample.
Press Draw Samples again. [Option: Check the "Vertical layout" box in the applet.] Did you obtain the same sample of 47 weights? Did you obtain the same sample mean? Do either of the two sample means generated thus far equal the population mean?
Checkpoint 7.1.11. Generate 10 samples total.
Continue to press Draw Samples 8 more times (for 10 samples total), and notice the variability in the resulting sample means from sample to sample. Is a pattern beginning to emerge in the Sampled Statistics distribution graph?
Checkpoint 7.1.12. Generate 1000 samples.
Now set the Number of Samples to 990 (for 1,000 total) and press Draw Samples. Describe the shape, center, and variability of the distribution of the sample means (lower right). How do these features compare to the population distribution? Which one of these features has changed the most vs. the population, and how has it changed?
Distribution of sample means (aka "sampling distribution"):
Shape:
Mean:
Standard deviation:
Comparison (sample means vs. population):
Checkpoint 7.1.13. Use simulation for probability.
How can we use your simulated distribution of sample means to decide whether it is surprising that a boat with 47 passengers would exceed the (average) weight limit by chance (random sampling error) alone?
-
To count samples, specify the sample mean of interest (159.6) in the Count Samples box, use the pull-down menu to specify whether you want to count samples Greater Than, Less Than, or Beyond (both directions), then press Count.
Checkpoint 7.1.14. Count samples exceeding limit.
To investigate the question posed in CheckpointΒ 7.1.13, what conclusions can you draw from the count displayed by the applet?
Non-normal Population.
To carry out the preceding simulation analysis, we assumed that the population distribution had a normal shape. But what if the population of adult weights has a different, non-normal distribution? Will that change our findings?
Checkpoint 7.1.15. Analyze skewed population.
Now choose the data from the second column (pop2) with the Choose variable pull-down menu on the left side. Describe the shape of this population and what it means for the variable to have this shape in this context. How do the values of \(\mu\) and \(\sigma\) compare to the pop1 (rounding a bit)?
Shape:
Mean:
Standard deviation:
Comparison to previous population:
Checkpoint 7.1.16. Sample from skewed population.
Generate 1,000 random samples of 47 individuals from this population. Produce a well-labeled sketch of the distribution of sample means below and note the values for the mean and standard deviation of the sampling distribution. How does this distribution compare to the distribution in (l)?
Sketch:
Mean:
SD:
Comparison:
Checkpoint 7.1.17. Probability for skewed population.
Use the applet to approximate the probability of obtaining a sample mean weight of at least 159.574 lbs for a random sample of 47 passengers from population 2. Has this probability changed considerably with the change in the shape of the distribution of the population of weights?
Checkpoint 7.1.18. Compare three populations.
Repeat this analysis using the other population distribution (pop3) in the data file and summarize your observations for the three populations in the table below.
| Distribution of sample means | Shape (Skew) | Mean | Standard deviation |
|---|---|---|---|
| Normal (\(\mu = 167, \sigma = 35\)) | |||
| Skewed right (\(\mu = 167, \sigma = 35\)) | |||
| Uniform (\(a = 106.4, b = 227.6\)) |
Changing the Sample Size.
Checkpoint 7.1.20. Predict effect of larger sample size.
Prediction: Now consider changing the sample size from 47 to 188 (four times larger). Make a prediction for how the shape, mean, and SD of the distribution of sample means would change (if at all).
Checkpoint 7.1.21. Test prediction with larger sample.
Make this change of sample size from 47 to 188, and generate 1000 random samples (from the uniformly distributed population, pop3). Did the shape of the sample means change very much? What about the mean of the sample means? What about the SD of the sample means? How were your predictions in (s)?
Discussion: You should see that, as with the Gettysburg Address investigation, the shape of the population is not having much effect on the distribution of the sample means! In fact, you can show that the mean of the distribution of sample means from random samples is always equal to the mean of the population (any discrepancies you find are from not simulating enough random samples) and that, assuming the population is large compared to the size of the sample, the standard deviation of the distribution of sample means is equal to \(\sigma/\sqrt{n}\text{.}\) This standard deviation formula applies when the population size is large (more than 20 times the size of the sample) or infinite (so the randomly selected observations can be considered independent).
Checkpoint 7.1.22. Explain standard deviation formula.
Explain why the formula for the standard deviation of the sample mean (\(\sigma/\sqrt{n}\)) makes intuitive sense (both the \(\sigma\) component and the \(\sqrt{n}\) component).
Key Result: Central Limit Theorem.
If all possible samples of size \(n\) are selected from a large population or an infinite random process with mean \(\mu\) and standard deviation \(\sigma\text{,}\) then the sampling distribution of these sample means will have the following characteristics:
-
The mean will be equal to \(\mu\text{.}\)
-
The standard deviation will be equal to \(\sigma/\sqrt{n}\) (we can call this \(SD(\bar{x})\) or \(\sigma_{\bar{x}}\)).
-
Central Limit Theorem: The shape will be normal if the population distribution is normal, or approximately normal if the sample size is large regardless of the shape of the population distribution.
The convention is to consider the sample size large enough if \(n > 30\text{.}\) However, this rule really depends on the shape of the population. The more non-symmetric the population distribution, the larger the sample size necessary before the distribution of sample means is reasonably modeled by a normal distribution.
The population is considered large if it is more than 20 times the size of the sample.
Discussion: In general, the shape of the distribution of sample means does not depend on the shape of the population distribution, unless you have small sample sizes. So if the population distribution itself follows a normal distribution, then we will always be willing to model the distribution of sample means with a normal distribution. However, we typically donβt know the distribution of the population (thatβs why we need to collect data), but we can make a judgment based on the nature of the variable (e.g., biological characteristics, repeated measurements) or based on the information conveyed to us by the shape of the distribution of the sample data (e.g., normal probability plot).
In this example, a sample size of 47 appears large enough to result in a normal distribution for the distribution of sample means. However, you may have noticed a bit of a right skew when the population was skewed and in such a situation you would want a larger sample size before you were willing to model the distribution of sample means with a normal distribution.
Keep in mind that the results about the mean and standard deviation always hold for random samples: sample means cluster around the population mean and are less variable than individual observations! If the population size is not large, then a "finite population correction factor" can be applied as in Ch. 1.
Checkpoint 7.1.23. Standardize the value.
Use the theoretical results for the mean and standard deviation of a sample mean to standardize the value of 159.57 lbs. [Hint: Start with a well-labeled sketch.] How might you interpret this value?
Checkpoint 7.1.24. Apply Central Limit Theorem.
Use the result from the Central Limit Theorem and technology (e.g., the Normal Probability Calculator applet) to estimate the probability of a sample mean weight exceeding 159.57 lbs for a random sample of 47 passengers from a population with mean \(\mu = 167\) lbs and standard deviation \(\sigma = 35\) lbs. [Hint: Shade the area of interest in your sketch in (v).] How does this estimated probability compare to what you found with repeated sampling from the hypothetical populations?
Checkpoint 7.1.25. Identify concerns.
Identify one concern you might have with this analysis. [Hint: What other assumption, apart from the shape of the population, was made in these simulations that may not be true in this study? Do you think this is a reasonable assumption for this study?]
Study Conclusions.
Assuming the CDC values for the mean and standard deviation of adult Americansβ weights, \(\mu = 167\) lbs and \(\sigma = 35\) lbs, we believe that the distribution of sample mean weights will be well modeled by a normal distribution (based both on the not extremely skewed nature of the variable and the moderately large sample size of 47, which is larger than 30). Therefore, the Central Limit Theorem allows us to predict that the distribution of \(\bar{x}\) is approximately normal with mean 167 lbs and standard deviation \(35/\sqrt{47} \approx 5.105\) lbs. From this information, assuming the CDC data is representative of the population of Ethan Allen travelers, we can estimate the probability of obtaining a sample mean of 159.57 lbs or higher to be 0.9264. Therefore, it is not at all surprising that a boat carrying 47 American adults capsized. In fact, the surprising part might be that it didnβt happen sooner!
Subsection 7.1.1 Practice Problem 2.4A
Checkpoint 7.1.26. Probability for 20 passengers.
Use the Sampling from Finite Population applet or the Central Limit Theorem to estimate the probability that the sample mean of 20 randomly selected passengers exceeds 159.57lbs, assuming a normal population with mean 167lbs and standard deviation 35lbs.
Checkpoint 7.1.27. Compare probabilities.
Is the probability you found in (a) larger or smaller than the probability you found for 47 passengers? Explain why your answer makes intuitive sense.
Checkpoint 7.1.28. Uniform population.
Repeat (a) assuming a uniformly distributed population of weights. How do these two probabilities compare? [Hint: Think about whether it is more appropriate to use the Sampling from Finite Population applet or the CLT to answer this question.]
Checkpoint 7.1.29. Explain limitation.
Explain why the calculation in (a) does not estimate the probability of the Ethan Allen sinking with 20 passengers.
Subsection 7.1.2 Practice Problem 2.4B
Checkpoint 7.1.30. Finite population.
Use the Sampling from Finite Population applet or the Central Limit Theorem to estimate the probability that the sample mean of 47 randomly selected passengers would exceed 159.57lbs, assuming that random samples are repeatedly selected from a population of 80,000 individuals with mean 167 lbs and standard deviation 35 lbs. State any assumptions you need to make and support your answer statistically.
Subsection 7.1.3 Practice Problem 2.4C
Checkpoint 7.1.31. Explain averages.
In this investigation, we found that the average of the sample average is near the population average. Explain what each use of the term "average" means in this statement.
You have attempted of activities on this page.
