Investigation 4.5: Lingering effects of sleep deprivation (cont.)

Section 20.2 Investigation 4.5: Lingering effects of sleep deprivation (cont.)

In this investigation, you will consider a "theory-based" alternative to the exact randomization distribution.

Exercises 20.2.1 Two-sample t-tests

Reconsider the exact randomization distribution of the differences in sample means for the Sleep Deprivation study. Previously you determined simulation-based p-values and the exact p-value. Now we will explore modeling the randomization distribution of the difference in sample means with a probability distribution.

🔗

Exact Randomization Distribution

🔗

1. Normal Distribution Fit.

Does this distribution appear to be well modeled by a normal distribution?

🔗

Solution.

This appears to very bell-shaped and symmetric.

🔗

2. Information Needed for Normal Model.

Suppose you want to use the normal model to approximate the p-value for obtaining a difference in means of 15.92 or larger under the null hypothesis or to compute a confidence interval. What other information do you need to know?

🔗

Solution.

You would need to know about the variability, or the standard deviation in the \(\bar{x}_1 - \bar{x}_2\) values.

🔗

Standard Error Estimation.

In Investigation 4.2, we estimated the standard error of the difference in sample means using the sample standard deviations:

🔗

\begin{equation*} SE(\bar{X}_1-\bar{X}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \end{equation*}

🔗

For the sleep deprivation data, the sample standard deviations are:

🔗

\(s_{\text{unrestricted}} = 14.73\) ms and \(s_{\text{deprived}} = 12.17\) ms

🔗

3. Compute Standard Error.

Use these values to compute the standard error for the difference in sample means.

🔗

Standard error =

🔗

Compare this to the standard deviation you observed in the applet/for the exact randomization distribution. In particular, does this formula appear to over- or under-estimate the variability?

🔗

Over-estimate
The SE formula actually underestimates the variability because the two groups are not independent in a randomization test.
Under-estimate
Correct! The SE formula underestimates the variability (SE ≈ 5.93 vs. SD ≈ 6.74) because the two groups are not independent in a randomization test.

Solution.

SE = \(\sqrt{14.73^2/10 + 12.17^2/11} \approx 5.93\)

🔗

This should be somewhat near the simulated value but does slightly underestimate the variability (SD = 6.74).

🔗

Discussion.

In Investigation 4.2, we stated that \(SD(\bar{X}_1-\bar{X}_2) = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\text{,}\) because under the assumption that the two samples are independent, variances add. In a randomization test, however, the two samples are not independent: If one group gets all of the large responses, the other group must have all of the small responses and the difference in group means will be large. Consequently, the standard error formula from Investigation 4.2 will underestimate the shuffle-to-shuffle variation in the statistic. The discrepancy between the standard deviations for the difference in means is largest when the groups are more different: When we shuffle the responses back to the groups, any value can go into either group, but if there is a genuine difference between the treatments, then a score of 40 would be unlikely to come from say the deprived treatment. So how do we estimate the standard deviation of the randomization distribution? Turns out, we are less concerned with that. What we really want to be able to do is predict the behavior of the standardized statistic.

🔗

But what about the t-statistic?

🔗

4. Calculate t-statistic.

Calculate the two-sample t-statistic from Investigation 4.2 and give a one-sentence interpretation.

🔗

\begin{equation*} t = \frac{(\bar{x}_{\text{unres}} - \bar{x}_{\text{depr}}) - 0}{\sqrt{\frac{s_{\text{unres}}^2}{n_{\text{unres}}} + \frac{s_{\text{depr}}^2}{n_{\text{depr}}}}} \end{equation*}

🔗

\(t =\)

🔗

Interpretation:

🔗

Solution.

Standardized statistic \(t = \frac{15.92 - 0}{5.93} \approx 2.685\)

🔗

The observed difference in sample means is 2.69 standard errors above the hypothesized difference of zero.

🔗

5. Re-randomization with t-statistic.

Use the Comparing Groups (Quantitative) applet to create a re-randomization null distribution. Change the Statistic pull-down menu to t-statistic and find the p-value using the value in Question 4. Is this the same p-value we found when we used the difference in means as the statistic in Investigation 4.4, Question 4?

🔗

Aside: Applet.

Solution.

Example results:

🔗

Simulated randomization distribution using t-statistic

The p-value (.0079) is similar, but probably not an exact match to Investigation 4.4 (e.g., 0.0090).

🔗

6. Exact p-value for t-statistic.

If we use similar code to that in Investigation 4.4 to find the exact distribution of these t-statistics, we can find the exact p-value for the t-statistic is 0.00748. Is this similar to the simulation results?

🔗

Solution.

0.0079 from the simulation is very close to the exact p-value of 0.00748.

🔗

Check the box to Overlay the t distribution.
🔗

🔗

7. Overlay t-distribution.

(a) Does it appear to be a reasonable model for the simulated null distribution?

🔗

(b) Does the p-value from the t-distribution appear to reasonably approximate the exact p-value?

🔗

(a) Yes, (b) Yes
Correct! The t-distribution provides a reasonable model for the simulated null distribution, and its p-value reasonably approximates the exact p-value.
(a) Yes, (b) No
The t-distribution does provide a reasonable model, but the p-value from the t-distribution also reasonably approximates the exact p-value.
(a) No, (b) Yes
Actually, the t-distribution does provide a reasonable model for the simulated null distribution.
(a) No, (b) No
Actually, the t-distribution provides a reasonable model and its p-value reasonably approximates the exact p-value.

Solution.

The t distribution should provide a reasonable model.

🔗

Example results:

🔗

Simulated randomization distribution with t-distribution overlay

🔗

The t-distribution comes to the rescue again, compensating for the underestimation in the variability, and in most situations (moderate and large sample sizes) gives an adequate approximation to the exact randomization distribution. Note the applet reports 17.56 as the degrees of freedom. This comes from the Welch-Satterthwaite approximation because the exact degrees of freedom are unknown in this case.

🔗

We will consider the t distribution to be a reasonable approximation for the randomization distribution of the t statistic as long as either (similar to what you witnessed in Investigation 4.2):

🔗

The data in both groups are symmetric and bell-shaped. When this is the case, we have evidence that the “treatment populations” are normally distributed. When this is true, the randomization distribution of the differences in group means will also follow a normal distribution.
🔗

🔗

🔗

The two sample sizes are large. Typically, the sample sizes can be as small as 5, especially if the two groups have similar sample sizes and similar distribution shapes, but conventionally 20 is used as cut-off for how large each sample should be to use this approximation. Examine graphs of your data first. If the sample distributions are not symmetric or have unusual observations, you will want larger sample sizes.
🔗

🔗

In summary, we will often apply the same two-sample t-procedures to both randomized experiments and to independent random samples. The distinction between these two sources of randomness will be most important in drawing your final conclusions (e.g., causation, generalizability). The advantage to the t-procedures over the exact randomization distribution is convenience, especially in calculating a confidence interval.

🔗

Confidence Interval.

Using the 95% t-confidence interval \((\bar{x}_{\text{unres}} - \bar{x}_{\text{depr}}) \pm t^* \sqrt{\frac{s_{\text{unres}}^2}{n_{\text{unres}}} + \frac{s_{\text{depr}}^2}{n_{\text{depr}}}}\text{,}\) we find (3.44, 28.40).

🔗

8. Interpret Confidence Interval.

Write a one-sentence interpretation of this interval in context, being especially clear how you are defining the parameter for this randomized experiment.

🔗

Solution.

We are 95% confident the long-run average improvement scores for those who are not sleep deprived are between 3.44 and 28.40 ms larger than for those who are sleep deprived.

🔗

Study Conclusions.

The approximate p-value from the (unpooled, independent samples) two-sample t-test (0.0076, df = 17.56) also provides very strong evidence that the observed difference between the two groups did not arise by chance alone. Therefore, if we were to perform unlimited administrations of the same training and reaction time test under the exact same conditions, we have convincing evidence that the long-run mean of all improvement scores that an individual would have under sleep deprivation is lower, three days later, than the theoretical mean of all improvement scores that an individual would have with unrestricted sleep (i.e., \(\mu_{\text{sleepdeprived}} < \mu_{\text{unrestricted}}\)). We are 95% confident that the long-run mean improvement under the “no restriction” treatment is 3.44 to 28.40 ms faster that the long-run mean improvement under the “sleep deprivation” treatment. However, we still need to worry about what larger population these individuals represent.

🔗

Subsection 20.2.2 Practice Problem 4.5

Recall the study on children’s television viewing habits from Practice Problem 2.6B and Practice Problem 4.3B. One school incorporated a new curriculum, the other school did not.

🔗

Checkpoint 20.2.1. Improving Randomization.

Could the randomization in this study be improved (in principle, even if difficult in practice)?

🔗

Explain:

🔗

The following summary statistics pertain to the reports of television watching at the conclusion of the study:

🔗

Follow-up	Sample size	Sample mean	Sample SD
Control group	\(n_1 = 103\)	\(\bar{x}_1 = 14.46\)	\(s_1 = 13.82\)
Intervention group	\(n_2 = 95\)	\(\bar{x}_2 = 8.80\)	\(s_2 = 10.41\)

Checkpoint 20.2.2. State Hypotheses.

The researchers want to decide whether the long-run mean number of hours of television viewing per week is lower after 6 months with the intervention than without the intervention. State the null and alternative hypotheses. If you use any symbols, make sure you clearly define them first.

🔗

Checkpoint 20.2.3. Two-Sample t-Test.

Carry out a two-sample t-test. Include your output (indicating how found) and provide a one-sentence interpretation of the p-value in context (make sure you address the statistic, the source of the randomness in the study, and what you mean by “more extreme”).

🔗

Checkpoint 20.2.4. Conclusions.

Summarize your conclusion from this test, including a discussion of causation and generalizability.

🔗

Checkpoint 20.2.5. Valid Despite Nonnormality?

The population distributions are likely skewed (bounded below by zero, SD similar to mean). Explain why the nonnormality of these distributions does not hinder the validity of using this t-test procedure.

🔗

You have attempted of activities on this page.

🔗

Prev Top Next