Skip to main content

Section 21.1 Investigation 4.8: Speed It Up

In this investigation, you will consider an alternative study design for detecting differences between two groups.

Exercises 21.1.1 Independent vs. Paired Designs

Student researchers (Coutin & Heffler, 2021) wanted to know whether listening to up-tempo music causes college students to tend to type faster. To collect their data (number of words typed correctly in one minute), the students planned to use the 60-second Easy-Text typing test (TypingTest.com).
Image showing typing test with music
They recruited 34 college students from groups they were associated with on campus (e.g., athletic teams, musical groups). For the up-tempo music they selected Overture to Candide performed by the London Symphony Orchestra.

Ask a Research Question.

Let \(\mu_{nomusic}\) represent the population mean typing speed without the music and \(\mu_{music}\) the population mean typing speed with the music.

1. State Hypotheses.

State the student researchers’ null and alternative hypotheses in symbols and in words.
Solution.
The students are interested in testing
\begin{gather*} H_0: \mu_{nomusic} - \mu_{music} = 0\\ \text{ (there is no difference in the mean typing speed with and without the music)}\\ H_a: \mu_{nomusic} - \mu_{music} \lt 0 \\ \text{ (the average typing speed is larger when the up-tempo music is playing)} \end{gather*}

Design.

2. Completely Randomized Design.

Describe a completely randomized design for conducting this study.
Design:
Identify the experimental units and variables of interest. Classify the variables as explanatory and response, as well as quantitative or categorical.
Explanatory variable:
Type:
Response variable:
Type:
Solution.
Completely randomized design
Diagram showing completely randomized design for typing study
The explanatory variable (whether or not the music is playing) is categorical and the response variable (typing speed measured in words per minute) is quantitative.

3. Study Precautions.

Identify some precautions you would take in carrying out this design.
Solution.
You would want to make sure the conditions for each participant are as identical as possible (e.g., no distractions, same level of motivation). You would want to make sure that the timing starts at the same time for everyone, that everyone listens to the same music at the same volume etc.
Consider the following results for the number of words per minute by each participant.
Dotplots and summary statistics comparing typing speeds with and without music

4. Examine Distributions.

Do you see much of a difference between the two distributions? What is the observed difference in the sample means? Do you think this difference in sample means will be statistically significant? Explain.
Solution.
There is very little difference between the distributions. The means are only about 3.5 words per minute apart and there is substantial variation with standard deviations of 10.8 and 16.1 words per minute. With such much overlap between the two distributions and relatively small sample sizes, it is unlikely that this difference will be considered statistically significant.

5. Validity of Two-Sample \(t\)-Test.

Is a two-sample t-test likely valid for these data? Explain.
Solution.
The sample sizes are both below 20 and we do have a few outliers. Apart from the outliers, the shapes are somewhat symmetric and similar to each other. This is a pretty borderline case and we may want to explore additional approaches as well.
The one-sided p-value from a two-sample t-test is 0.2299, failing to provide convincing evidence that the average typing speed is faster with (this) music than the average typing speed without music. But maybe there is a genuine difference and our sample sizes are just too small to detect it.

6. Impact of Sample Size.

Suppose everything else remained the same but the sample sizes had been 34 in each group. How would that impact the p-value? By a lot or by a little? Would the difference now be statistically significant?
Solution.
If everything stays the same but the sample sizes are doubled, then the p-value will be smaller. We could compare the two analyses directly:
Analysis with \(n = 17\)
Results with sample size 17
Analysis with \(n = 34\)
Results with sample size 34
In this case, we would still fail to reject the null hypothesis. This is because \(SD(\bar{x}_1 - \bar{x}_2) = \sqrt{\frac{10.8^2}{17} + \frac{16.1^2}{17}} \approx 4.71\) is still fairly large, compared to \(SD(\bar{x}_1 - \bar{x}_2) = \sqrt{\frac{10.8^2}{34} + \frac{16.1^2}{34}} \approx 3.33\text{.}\)

Discussion.

The images below show the null distribution (in blue) for the difference in sample means, and the alternative distribution (in green) for the difference in sample means when the actual difference in population means is 4 wpm. We see quite a bit of overlap in the two distributions so the probability of rejecting the null hypothesis is small.
Sample sizes of \(n = 17\)
Power analysis with sample size 17
Sample sizes of \(n = 34\)
Power analysis with sample size 34
Increasing the sample size helps reduce the "random chance" variation in our statistic, but does not reduce the person-to-person variation in typing speeds. When the person-to-person variation is large, it will still be difficult for us to detect the underlying treatment difference in treatment means even if one truly exists. If the actual difference in typing speeds is 4 words per minute on average, we would need sample sizes on the order of 100 participants, in each group, to have at least 80% power.
Alternative power analysis distributions
Power analysis with larger sample sizes

Alternative study design.

But there is another way we can improve the design of this study for detecting differences in typing speeds without using any additional people!

7. Modified Design.

How could you modify the experimental design to give you a better chance of detecting a difference in the average typing speeds between the two treatments if one exists? Be sure to explain why you believe this new design will be advantageous in detecting a difference between the two treatments.
Solution.
One suggestion is to have each individual do the same typing test twice, once with music and once without.

8. Role of Randomness.

How will randomness be used in this new study design and why is that important?
Solution.
We could randomize which test condition (with or without music) they use first.

9. Response Variable in New Design.

In your new study design, what response variable will you measure on each individual? Will it be quantitative or categorical?
Solution.
We could measure the difference in words per minute on the two tests, a quantitative variable.

Definition: Paired Design.

In a paired design, rather than splitting the observational units into two distinct groups, they are paired in a way where we expect the observations within a pair to be more similar to each other than to observations in other pairs. This can explain some variation in the response variable.
For example, we can have each person take the typing test both with and without music. This allows us to compare the two typing speeds for each individual to each other, which should be very similar apart from the music, and to account for variation in typing speeds across individuals.

10. Identical Typing Tests?

Should the participants be given an identical typing test both times? Explain.
Solution.
If there is no memory/advantage in taking the test the second time (or what advantage there is will be balanced out by the random assignment suggested in (i)), then they could be identical. Should at least be of the same level of difficulty.

11. Alternative Pairing Strategy.

The students originally wanted to compare music students to athletes. Explain how you could create a paired design using information on whether the participant was a music student or an athlete. Do you think this design will be as effective as the above design? Explain.
Solution.
If we thought music students had similar typing abilities to each other, you could create pairs of music students, given one the typing test with music and one without music. Similarly for the athletes. But is unlikely that the students in a pair will be nearly the same in typing ability as measuring the same participant twice.

Subsection 21.1.2 Practice Problem 4.8A

Suppose that a baseball manager wants to study whether a player can run from second base to home plate more quickly by taking a wide angle around third base or a narrow angle. Forty players are available to use as subjects in an experiment.

Checkpoint 21.1.1. Design Comparison.

Suggest a completely randomized design and a paired design for this research question and explain why a paired design is likely to be more effective.

Checkpoint 21.1.2. Pairing by Arrival Time.

Suppose the players arrive for the study at different times. The manager decides to pair the first two arrivers and have them each do a different angle. The manager continues pairing the next two players as they arrive, etc. Is this a paired design? Is the paired design likely to be more effective than the completely randomized design in this case?

Subsection 21.1.3 Practice Problem 4.8B

For each of the following research study designs, indicate whether the data collection plan will result in two independent samples (completely randomized design) or "dependent" samples (matched-pairs design).

Checkpoint 21.1.3. Cow Names Study.

A farmer wants to see whether referring to cows by name increases their milk production. He selects half of his cows at random, gives them names, and frequently calls them by name. The other half of his cows he does not call by name. Then he measured the milk production of each cow over one week.
  • Two independent samples (completely randomized design)
  • "Dependent" samples (matched-pairs design)

Checkpoint 21.1.4. Hand vs. Machine Milking.

A farmer wants to know whether hand-milking or machine-milking tends to produce more milk from cows. He examines records of how much milk the cows have produced in the past, and order them from most to least productive. For the top two milk producers, randomly assign one to hand-milking and the other to machine-milking. Do the same for the next two and the next two and so on.
  • Two independent samples (completely randomized design)
  • "Dependent" samples (matched-pairs design)

Checkpoint 21.1.5. Cars: Students vs. Faculty.

You wonder whether students at your school tend to drive newer cars than faculty at your school. You take a random sample of 20 students and a random sample of 20 faculty members, and ask each person how old their car is.
  • Two independent samples (completely randomized design)
  • "Dependent" samples (matched-pairs design)

Checkpoint 21.1.6. Knee Surgery Effectiveness.

To investigate whether knee surgery is effective, you randomly assign half of the subjects to receive knee surgery and the other half to undergo a "placebo" operation.
  • Two independent samples (completely randomized design)
  • "Dependent" samples (matched-pairs design)

Checkpoint 21.1.7. Language Study Program.

To investigate the effectiveness of an online language study program, participants were assigned to enroll in a six-week summer session, after which their language skills were assessed, and then to spend six-weeks using an online program (Duolingo), after which their language skills were assessed.
  • Two independent samples (completely randomized design)
  • "Dependent" samples (matched-pairs design)
You have attempted of activities on this page.