Skip to main content

Section 1.3 Modelling

A model is an artificial representation or a simplification of something more complicated, like a toy airplane. Models can be useful in making predictions (e.g., weather models) or to help us better understand the phenomena under study. For example, what factors impact the severity of a hurricane?

Exercises 1.3.1 Investigation C: Modelling Hurricanes

  • Move the red hurricane symbol to a circle on the map and read what happens.
  • Switch between the Sea Surface Temperature, Moisture, and Wind maps to see the data for each circle.
  • Which circles create the strongest hurricanes? Why?

1. Explain Hurricane Formation.

In your own words, explain how sea surface temperature, moisture, and wind interact to create strong hurricanes - or weaken them.
Solution.
Descriptions will vary. For example: Warm water is essential to fuel the storm. When humid air goes up, rainfall occurs, helping hurricanes form. If the speed of upper wind in a certain area is too fast, hurricanes won’t form. If the upper wind levels are calmer, this creates a more likely environment for a hurricane to form.

2. Evaluate Model Accuracy.

Explain how you could evaluate the accuracy of this model of hurricane strength. What would you do next if you decided the model was not appropriate?
Solution.
Opinions will vary. We could compare the predictions to recent data to see how well they match up. If the results don’t match up, we could gather more data in order to improve the model. This could include additional variables we haven’t considered yet.
In this course, you will encounter two main types of models: statistical models and probability/simulation models. For example, the Random Babies simulation in Investigation B, allowed you to simulate hypothetical data to estimate probabilities of different events. As long as the model valid, we can predict that all four mothers receiving the correct babies would not happen very often in the long run. Most models rely on simplifying assumptions, like babies being returned β€œcompletely at random.”
The graphs below show histograms of the hurricane data with different probability models (red curves).
Histogram of hurricane data with probability model overlay (red curve)
Histogram of hurricane data with alternative probability model overlay (red curve)

3. Compare Graphical Displays.

Describe the difference between a histogram and a dotplot. When might histograms be more useful?
Solution.
Histograms bin the observations rather than trying to represent each individual value. They are more useful in looking at the overall distribution for a larger number of observations.

4. Select Best Model.

Which model (red curve) do you consider a better match to the observed data?
  • Left curve
  • Right curve
Solution.
The curve on the left is more symmetric whereas the curve on the right has a right skewedness to it. The right skew appears to better match up with the histogram.

5. Define Statistical Model.

Conjecture: What does the term β€œmodel” mean in this situation? Why would such a model be useful? What assumptions would you need to make?
Hint.
Which do you consider more likely, a year with 13 hurricanes or with 14? How are you deciding? Are you using the data or something beyond the data?
Solution.
By "model," we are looking for a simpler representation of the general pattern in the histogram. A mathematical model can then be used to make predictions about future observations, as long as the "process" is not changing. In particular, while we didn’t observe a year with 13 hurricanes, the model would predict that could happen, but rarely.
To get an idea of a statistical model, you will collect some data:
  • Using a measurement tool provided by your instructor, measure the circumference of a tennis ball.

6. Record Your Measurement.

Record your measurement here (to the nearest hundredths place): cm

7. Examine Measurement Variation.

Did everyone in class find the same value? What are some possible explanations for different measurements?
Solution.
We suspect not everyone will find the same exact measurement. There could be different measurement techniques, different lighting at the time of the measurement, different tennis balls.
We can think of these measurements as observations from a random process, which we can summarize with a statistical model. If we consider 28.5cm the β€œtrue value,” then we can write our statistical model as
Measurement recorded = 28.5 + random error

8. Estimate Measurement Error.

How could we estimate the likely amount of random error in your class’s measurement process?
Hint.
Solution.
You could consider the standard deviation as a measure of a typical prediction error from the mean.

Insight 1.3.1.

Much of what scientists do is try to measure, explain, and minimize the amount of random variation.

10. Understand Measurement Model.

What does the term β€œmodel” mean here? Why would such a model be useful? What assumptions need to be made?
Hint.
Solution.
Again, the model is a simpler representation of the measurement process. This would help us make predictions or identify observations that seem way different from what we might expect due to more typical measurement errors.

Subsection 1.3.2 Practice Problem C.A

Researchers often compare data generated from a model to observed data to help validate the model. If the model’s data reasonably matches the observed data, that helps confirm that they (the model builders) understand the underlying data generating process.
Open this app to model the tennis ball measurement random process

Checkpoint 1.3.2. Model Sources of Variation.

Suggest and label four possible sources of variation in the tennis ball measurements. Make each source of variation one of the pie charts and conjecture error magnitudes and probabilities for each source (e.g., right now, Spinner 1 assumes a perfect measurement with 0.50 probability, a -0.10 cm error with 0.25 probability and a +0.10 cm error with 0.25 probability). Describe your model below.

Checkpoint 1.3.3. Simulate One Measurement.

Simulate one measurement by pressing the Simulate data button. What did you find for the total random error across your four sources? Will this be the same every time?

Checkpoint 1.3.4. Compare Simulated to Observed Data.

Generate the same number of measurements as we took in class. Compare the simulated data to the actual data from class (e.g., shape, center, spread). What looks similar and what looks different?

Checkpoint 1.3.5. Adjust the Model.

Suggest a way to adjust the model that you think will lead to simulated data that better matches the observed data. Briefly justify your choice.

Subsection 1.3.3 Practice Problem C.B

Suppose human body temperatures can be modelled with a normal distribution with mean 98.6Β°F. Suppose you take repeated measures of your temperature over several days.

Checkpoint 1.3.6. Identify Believable Distribution.

Which distribution below do you think is more believable? Briefly justify your choice.
Four histograms labeled A, B, C, and D showing different distributions of body temperature measurements centered around 98.6 degrees Fahrenheit

Checkpoint 1.3.7. Evaluate Suspicious Distribution.

Explain why you might be suspicious if someone told you Distribution D was their results.

Checkpoint 1.3.8. Sources of Variation.

Suggest four possible sources of variation in your body temperature measurements.

Checkpoint 1.3.9. Compare Standard Deviations.

Distributions A-C have the same mean (98.6) but different spread or variability. Which of the graphs above has the largest standard deviation? Approximate the value of each standard deviation by interpreting the standard deviation as a β€œtypical” deviation from the mean.

Checkpoint 1.3.10. Evaluate Distribution Assumptions.

The 4 distributions are all roughly β€œbell-shaped and symmetric.” Do you think actual repeated body measurements on the same individual will behave this way? Briefly explain your reasoning.

Checkpoint 1.3.11. Estimate Probability.

Suppose I randomly select one of the temperatures from Distribution C. Approximate the probability that the temperature is larger than 99Β°F. Interpret your probability in context.
You have attempted of activities on this page.