Skip to main content

Section 1.1 Investigation A: Hurricanes and Climate Change

One of the concerns with climate change is an increased number of tropical storms (including hurricanes and major hurricanes). In particular, in the "Atlantic Basin," scientists have tracked the number of "named storms" since 1851. According to the National Hurricane Center:
  • Tropical Storm: A tropical cyclone with maximum sustained winds of 39 to 73 mph (34 to 63 knots).
  • Hurricane: A tropical cyclone with maximum sustained winds of 74 mph (64 knots) or higher. In the western North Pacific, hurricanes are called typhoons; similar storms in the Indian Ocean and South Pacific Ocean are called cyclones.
  • Major Hurricane: A tropical cyclone with max sustained winds of 111 mph (96 knots) or higher.
Snippet from NOAA data table showing storm classifications
Figure 1.1.1. Snippet from the NOAA data table
In 2020, scientists were alarmed because there were 14 recorded hurricanes, compared to 6 in 2019.
Graph showing hurricane counts for 2019 and 2020
Figure 1.1.2.

Checkpoint 1.1.3. Calculate Percentage Change.

Calculate the percentage change in the number of tropical hurricanes between these two years.
Hint.
The formula for percentage change is: \(\frac{\text{new value} - \text{old value}}{\text{old value}} \times 100\%\)
Answer.
\(\frac{14-6}{6} \times 100\% = 133.3\%\) increase

Checkpoint 1.1.4. Evaluate the Evidence.

Does this convince you that climate change is leading to an increase in the number of hurricanes (in the Atlantic)? If so, explain why. If not, explain what additional information you would want to know.
Hint.
Consider: Is comparing just two years enough evidence? What about natural year-to-year variation?
Solution.
Just because one year saw a large increase doesn’t necessarily reflect an increasing trend. It’s also hard to know whether this is a "large" increase when we don’t have information on how much this value tends to vary from year to year. We can’t draw any causal conclusions because other things could have changed in that time frame. We also need to keep in mind that "number of hurricanes" is just one possible reflection of "climate change."
Below is a dotplot of the annual number of hurricanes from 1851 to 2024 \((n = 174)\text{.}\)
Dotplot showing the distribution of annual number of hurricanes from 1851 to 2024
Figure 1.1.5. Source: https://www.stormfax.com/huryear.htm

Checkpoint 1.1.6. Interpret the Dotplot.

What does one dot in the above graph represent?
Hint.
Each dot represents data from one observational unit. What is being measured over time here?
Answer.
One dot represents the number of hurricanes in one year.
A year with 14 hurricanes is certainly close to record setting, but we expect there to be some variation from year to year. Below is a timeplot of the number of hurricanes each year.
Timeplot showing the number of hurricanes each year from 1851 to 2024
Figure 1.1.7.

Checkpoint 1.1.8. Compare Graph Types.

What additional information is provided by this graph and why is that helpful?
Hint.
Think about what a timeplot shows that a dotplot doesn’t - how does it arrange the data differently?
Answer.
Now we know which year each value corresponds to and we might see a gradual increasing trend overall since about 1970. We also see that the change from 6 to 14 is a rather large change between years.

Checkpoint 1.1.9. Assess Reported Mean.

The stormfax website reports the mean number of hurricanes between 1991-2020 to be 7. Does that appear consistent with the graph? Why do you think they chose that subset of years?
Hint.
Consider what’s special about the 1991-2020 period. Is it the most recent data? Why might recent data be more relevant?
Solution.
This is consistent with the time plot. If we were to put a horizontal line at a height that goes through the middle of the values, 7 appears a reasonable value for that height. Perhaps they wanted to look at more recent data for more direct comparison with current data (opinions may vary).
Oftentimes a mean or average is reported, but with no measure of spread or variability. If all the years between 1991-2020 had between 6 and 8 hurricanes, we would react very differently to 14 hurricanes in one year than if all the years between 1991-2020 had between 2 and 15 hurricanes.

Terminology Detour: Standard Deviation.

The most common measure of the variability in a distribution of data is the standard deviation.
\begin{equation*} s = \sqrt{\frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1}} \end{equation*}
We can roughly interpret the standard deviation as the average "deviation" of the data values in the distribution from the mean of the distribution. Another interpretation: If we were to predict 7 as the number of hurricanes in a year between 1991-2020, the standard deviation would approximate the average "prediction error" for those years.
Below is a dotplot of the data from 1991-2020 (\(n = 31\)).
Dotplot showing the distribution of annual number of hurricanes from 1991 to 2020
Figure 1.1.10.

Checkpoint 1.1.11. Compare Subset to Full Dataset.

Conjecture: The mean is actually 7.2 hurricanes for this dataset. Do you think that is larger or smaller or quite similar to the mean for the full dataset? Explain your reasoning.
Hint.
Look at the timeplot - does the 1991-2020 period appear different from earlier decades?
Solution.
Answers will vary, but one might think the average is a bit higher in this more recent time frame than across the entire data set.

Checkpoint 1.1.12. Estimate Standard Deviation.

Conjecture: Provide a guess of the value of the standard deviation of these 31 values.
Hint.
Look at the dotplot - what’s a typical distance from the mean of 7.2? Most values fall within what range?
Solution.
About half or a little more than half of the values appear to fall between 4 and 10, so maybe a standard deviation of around 3 hurricanes?

Checkpoint 1.1.13. Compare Standard Deviations.

Conjecture: How do you think the standard deviation from (g) compares to the standard deviation of the full dataset? Explain your reasoning.
Hint.
Compare the spread in the 1991-2020 dotplot to the spread in the full 1851-2024 dotplot.
Solution.
The largest values (e.g., 14, 15) are in this subset and the spread in the values does appear larger in the more recent years than overall (fewer values in the 3 to 7 range compared to the full dataset?).
We will often use the standard deviation as a "ruler" to help us measure distances of observations from the mean of the distribution.

Checkpoint 1.1.14. Standardize the Value.

If we use a mean of 7.2 and a standard deviation of 3.3 hurricanes, how many standard deviations away from the mean is a value of 14 hurricanes? Above or below the mean?
Hint.
Calculate: \(\frac{14 - 7.2}{3.3}\)
Answer.
\(\frac{14 - 7.2}{3.3} = 2.06\) standard deviations above the mean

Terminology Detour: Standardizing.

The general formula for standardizing an observation’s position in the distribution is:
\begin{equation*} \frac{\text{observation value} - \text{mean of distribution}}{\text{standard deviation of distribution}} \end{equation*}
We will often consider a value far from the mean of a distribution if it is more than two standard deviations away.
In this investigation, you have just touched on one piece of information related to climate change. In fact, scientists are less concerned about the number of storms but in the intensity of the storms and how warming of the surface ocean may be leading to more destructive storms. Looking at a single year in isolation or even a pair of years creates a very incomplete picture of trends over time, and while we expect some natural variation from year-to-year, the question to scientists is whether the overall trend being observed is larger than what we can reasonably attributed to natural variation.

Insight 1.1.15. Points to keep in mind.

  • It’s important to determine which variables are most relevant to the research question and whether you can collect the data you need to answer the question.
  • Simple graphs can be very informative, but you should also take care in considering the most meaningful variable representation of what you are studying even before you begin graphing.
  • It is imperative to consider variability and to think about possible sources of variation. Sometimes you may be able to explain and "control for" a source of variation. Often you will have to dig deeper into reasons for unusual observations and whether it is appropriate to remove them from the analysis.
  • The quality of your inferences will depend A LOT on the quality of the data that are collected. Not much can be learned from poorly or improperly collected data or data from a completely different time period.

Subsection 1.1.1 Discussion

When exploring a research question, one of the first steps is to define the variable involved, e.g., the number of hurricanes. This is an example of quantitative variables, as opposed to a categorical variable like whether the storm has winds over 74 mph. Dotplots are good choices for visualizing a quantitative variable for a small dataset. When looking at a distribution of a single quantitative variable like this, we are often interested in three key features:
  • Center: What would you consider a "typical" value in the distribution?
  • Variability: How clustered together or consistent are the observations? Or are they far apart?
  • Shape: Are some values more common than others? Are the values symmetric about the center?
  • Are there any unusual observations that don’t follow the overall pattern? Are there any explanations for these values?
To summarize the center of the distribution, we often report the mean (the arithmetic average of all the numerical values in the data set) and/or the median (a middle value such that 50% of the data values are smaller and 50% are larger).
With most investigations we will also provide a follow-up practice problem or two for you to try on your own to assess your understanding of the material.

Subsection 1.1.2 Practice Problem A.A

Open the Descriptive Statistics applet to complete this practice problem.
Screenshot of Descriptive Statistics applet interface
From the ISCAM data files and applets page (Chapter 0), you can view the data in AtlanticStorms.txt. Use your mouse (or ctrl-A) to highlight all four columns and then copy and paste these data to your clipboard. In the Descriptive Statistics applet, clear the existing data (press Clear), click in the Paste data box and paste the data from your clipboard. Press the Use Data button. [Alternatively, this dataset is listed in the Select data pull-down menu as Atlantic Storms.] Use the Quantitative Variable pull-down menu to select the Number of Hurricanes variable (Hurricanes).
Loading Data in R: You can also load this data directly in R using the code below:

Shape of Distributions.

The shape of a distribution is often classified as symmetric (mirror image on each side of the center) or skewed.
The skewness statistic measures the lack of asymmetry in a distribution (due values above the mean extend further than values below the mean on average) using a \((y_i-\bar{y})^3\) term. Positive values indicate a skewed right distribution, negative values indicate skewed left, and values near 0 indicate a symmetric distribution.

Checkpoint 1.1.16. Identify Variables.

Based on the graph, would you consider the "number of hurricanes" distribution to be symmetric, skewed to the right or skewed to the left? Check the Skewness statistic box, does the value agree with your judgement from the graph?
Hint.
Look at the tails of the distribution. Which side extends further? A positive skewness value indicates right skew, negative indicates left skew, and values near 0 indicate symmetry.
Solution.
The distribution appears to be skewed to the right, with a longer tail extending toward the higher values. The skewness statistic should be positive, confirming this visual assessment.

Mean and Median.

If there are \(n\) numerical values and we refer to them as \(y_1, y_2, \ldots, y_n\text{,}\)
The mean, \(\bar{y}\text{,}\) is the average of all numerical values in the data set:
\begin{equation*} \bar{y} = \frac{\sum_{i=1}^n y_i}{n} \end{equation*}
The median is a value such that 50% of the data lies below and 50% of the data lies above that value: median position: \((n+1)/2\)

Checkpoint 1.1.17. Dotplot Analysis.

Explore the mean and median:
  • Check the box next to Guess for the Mean. Move the red line to where you think the mean of the distribution is.
  • Check the box next to Guess for the Median. Move the blue line to where you think the median of the distribution is.
  • Now check Actual for both. Which is larger, the mean or the median?

Checkpoint 1.1.18. Distribution Shape.

Check the box for Guess for the standard deviation. Use your mouse to move one of the edges of the red rectangle to a distance that you think is representative of a "typical distance from the mean" (some values are closer, some are further). Then check the Actual box. How did you do?

Checkpoint 1.1.19. Compare Distributions.

How does the standard deviation of the full dataset compare to the 3.3 value for the years 1991-2020? Summarize what this tells us about the behavior of hurricanes.
You have attempted of activities on this page.