Skip to main content

Section 22.5 Chapter 4 Summary

Subsection 22.5.1 Summary

In this chapter, you have again focused on comparing the outcomes for different groups, but for quantitative data. The tools for descriptive statistics are the same as in Chapter 2 (e.g., means, medians, standard deviations, interquartile ranges, dotplots, histograms, and boxplots).
After exploring and comparing the groups descriptively, it may be appropriate to ask whether the differences observed could have arisen "by chance alone." In other words, is the observed difference large enough to convince us that it arose from a genuine difference in the groups instead of from the randomness inherent in designing the experiment or in selecting the samples? Analogous to comparing proportions in Chapter 3, we used simulation to approximate how often we would obtain a difference in means at least as extreme as observed just by chance (random sampling or random assignment). Then we learned of a "large sample" approach we can use in either case to approximate the null distribution of the standardized statistic, yielding the two-sample t-test. Keep in mind that the logic of statistical significance hasn’t changed – we assume there is no effect or no population difference and determine how often we would get a sample difference result at least as extreme as what was observed. You saw that this chance is strongly affected by the amount of variability within the groups. It is very important when reporting on your statistical study that you also mention the sample sizes. This helps you evaluate the power and practical significance of the study results. You were also reminded that no matter how small a p-value is, we cannot draw cause-and-effect conclusions unless the data came from a properly designed randomized experiment.
You will often have the choice between a randomization-based test or the t-procedures. The advantage to the randomization procedures is they allow us to work with many types of statistics (such as median and trimmed mean) where we do not have theoretical results on the large sample behavior of the randomization distribution. The advantage of the t-procedures is the ease in calculating confidence intervals. It is always important to consider the technical conditions of an inference procedure before applying the procedure (e.g., make sure you really have two independent samples). If the conditions are not met, transformations may be one way of rescaling the data to be more appropriate for t procedures. If data are paired, then you should consider one-sample procedures on the differences. Such pairing is often very useful in increasing the efficiency of the study. These t-procedures are quite robust in calculating p-values and confidence intervals.
Always keep in mind the importance of considering whether there is random sampling and/or random assignment in the study design and how that impacts the scope of conclusions that you can draw from your study.

Subsection 22.5.2 Summary of What You Have Learned in This Chapter

Subsection 22.5.3 Technology Summary

Subsection 22.5.4 Quick Reference to ISCAM Workspace Functions and other R Commands

Procedure Desired Function Name (options)
Stacked Dotplots iscamdotplot(response, explanatory, names)
Stack data stack(data[, c("var1", "var2")])
Parallel Dotplots (stacked data) iscamdotplot(response, explanatory)
Parallel Histograms (stacked data) Load Lattice package, then
histogram(~response | explanatory, layout=c(1,2))
Parallel Boxplots (stacked data) boxplot(response~explanatory, horizontal=TRUE)
Numerical Summaries iscamsummary(response, explanatory)
two-sample t-procedures t.test(response~explanatory, alt, var.equal=FALSE)
two-sample t-procedures (summary data) iscamtwosamplet(x1, s1, n1, x2, s2, n2,
hypothesized, alternative, conf.level)
Paired t-test t.test(v1, v2, alternative, conf.level, paired=TRUE)

Subsection 22.5.5 Quick Reference to JMP Commands

Procedure Desired Menu; Hot spot
Stacked histograms (unstacked data) Analyze > Distribution
Stack data Tables > Stack
Parallel Dotplots (stacked data) Analyze > Fit Y by X (response, explanatory)
Parallel Histograms (stacked data) Analyze > Fit Y by X; Display Options > Histograms
Parallel Boxplots (stacked data) Analyze > Fit Y by X; Display Options > Boxplots
Numerical Summaries Analyze > Fit Y by X; Means and Std Dev
two-sample t-procedures Analyze > Fit Y by X; t Test
two-sample t-procedures (summary data) Journal file: Hypothesis Test for Two Means;
t-test and Unequal Variances
Paired t-test Analyze > Matched Pairs

Subsection 22.5.6 Choice of Procedures for Comparing Two Means

Study design
Randomized Experiment or
independent samples
Randomized experiment or independent random samples
Matched paired design
Parameter Difference in population means or treatment means \((\mu_1 - \mu_2)\) Difference in population medians
or treatment medians \((\tilde\mu_1 - \tilde\mu_2)\)
Mean difference (\(\mu_d\))
Null Hypothesis \(H_0: \mu_1 - \mu_2 = 0\) \(H_0: \tilde\mu_1 - \tilde\mu_2 = 0\) \(H_0: \mu_{diff} = 0\)
Simulation Randomly reassign response values between groups (see Comparing Groups (Quantitative) applet or Sampling from Two Populations – Quantitative applet) Randomly reassign response
values between groups
Flip a coin to see whether to interchange the order of the observations
Exact p-value All possible random assignments N/A All possible sign assignments
Valid to use
t-procedures if
Both sample sizes at least 20 or
both population distributions normal
N/A At least 30 differences or
differences are normal
Standardized
(Test) Statistic
\(t_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\) N/A \(t = \frac{\bar{x}_d - 0}{s_d/\sqrt{n_d}}\)
Confidence interval \((\bar{x}_1 - \bar{x}_2) \pm t^* \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) N/A \(\bar{x}_d \pm t_{n-1}^* \frac{s_d}{\sqrt{n}}\)
Prediction interval N/A N/A \(\bar{x}_d \pm t_{n-1}^* s_d \sqrt{1 + \frac{1}{n}}\)
R Commands iscamtwosamplet(xbar1, s1,
n1, xbar2, s2, n2)
Optional: hypothesized difference and alternative ("less", "greater", or "two.sided"),
Optional: conf.level

t.test(resp~exp, alt,
var.equal=FALSE)
N/A t.test(list1, list2, alt,
conf.level, paired=TRUE)
JMP Analyze > Fit Y by X >
t Test
N/A Analyze > Matched Pairs
TBI applet Two means N/A One mean (differences)
Note: You can also consider transforming the data before applying normal-based methods.
You have attempted of activities on this page.