Skip to main content

Subsection 9.1 Chapter 2 Summary

In this chapter, you focused on distributions of quantitative data, following up earlier discussion in Investigation A. In exploring quantitative data, we start with a graph displaying the distribution, such as a dotplot or a histogram. The most relevant features include shape, center, and variability, as well as deviations from the overall pattern. The appropriate numerical summaries of center include the mean and median, but keep in mind that the median is more resistant than the mean to outliers and skewness. The appropriate numerical summaries of variability include interquartile range (IQR) and standard deviation, with the IQR also more resistant to outliers and skewness. Remember when examining the variability of a data set, you are interested in the spread of the values that the variable takes (along the horizontal axis). In summarizing data, the five-number summary can be very descriptive and you were also introduced to boxplots, a graphical display of the five-number summary.
After examining the sample distribution, it may be appropriate to ask whether a sample statistic differs significantly from some claim about the population. For example, whether the sample mean is far enough from a conjectured population mean to convince you that the discrepancy did not arise β€œby chance alone.” You explored some simulation models, mostly to convince you that the \(t\)-distribution provides a very convenient approximation to the sampling distribution of the standardized statistic of a sample mean. This allowed us to approximate p-values and calculate confidence intervals. Keep in mind that the logic of statistical significance hasn’t changed – we assume the null hypothesis is true and determine how often we would get a sample statistic at least as extreme as what was observed. You saw that this likelihood is strongly affected by not only the sample size but also how much variability is inherent in the data. You also learned about prediction intervals, as opposed to confidence intervals, for estimating individual values rather than the population mean.
These \(t\)-procedures do come with some technical conditions and should be applied with extreme caution with small sample sizes or extreme skewness in the distribution. In this case, you could consider transforming the variable to rescale the observations to a known probability distribution such as the normal distribution. You can also apply the sign test which looks at the sign of the values rather than the numerical outcome and allows you to apply the binomial distribution. Other approaches such as bootstrapping are very flexible in the choice of statistic.
You have attempted of activities on this page.