Section 2.5 Investigation 1.5: Buttered Toast
The folks at MythBusters, a popular television program on the Discovery Channel, wanted to investigate whether a piece of toast that has been buttered on one side is more likely to land butter/messy side down or butter-side up. First, they wanted to design a toast-dropping rig that would have no built-in bias for unbuttered toast to land on either side when dropped. For this pilot study, they labeled the sides "top" and "bottom" for ten unbuttered pieces of toast and counted how many landed "top-down."

Checkpoint 2.5.1. Define Process and Variable.
Checkpoint 2.5.2. Define Parameter.
Identify the parameter of interest in this study (in words).
Checkpoint 2.5.3. Parameter Symbol.
What symbol should we use to represent this parameter?
- \(\hat{p}\)
- This is the symbol for the sample proportion (statistic), not the population parameter.
- \(\pi\)
- Correct! We use \(\pi\) to represent the probability of success in a binomial process.
- \(\mu\)
- This is the symbol for a population mean, not a probability.
- \(p\)
- While p is sometimes used for probability, we use the Greek letter \(\pi\) for the process probability parameter.
Checkpoint 2.5.4. State Null Hypothesis.
Checkpoint 2.5.5. Two-Sided Alternative.
Hint.
You have now stated a two-sided alternative hypothesis (rather than a "one-sided" alternative of strictly less than or strictly greater than). So now we need to decide which values we will consider "or more extreme" in calculating the p-value.
Checkpoint 2.5.6. Calculate One-Sided Probability.
In this pilot study, the toast landed top-down 3 times and top-up 7 times. Use the One Proportion Inference applet to calculate the binomial probability of 3 or fewer successes in 10 attempts, assuming the null hypothesis is true (i.e., P(X β€ 3) where X is the number landing top-down).
Checkpoint 2.5.8. Identify More Extreme Outcomes.
Conjecture: What other outcomes would you consider "as or more surprising" as 3 or fewer successes?
Checkpoint 2.5.9. Calculate Two-Sided P-value.
If we calculate the two-sided p-value as P(X β€ 3) + P(X β₯ 7), what would you report for the p-value? Would you find this p-value to be convincing evidence in favor of the two-sided alternative hypothesis?
What if our original hypotheses had been \(H_0: \pi = 0.55\) vs. \(H_a: \pi \neq 0.55\text{?}\)
Checkpoint 2.5.10. Alternative Null Hypothesis.
Restate the alternative hypothesis in words. How will the null distribution change?
Checkpoint 2.5.11. Left Tail Probability.
Make this change in the applet and find the binomial probability of 3 or fewer successes.
Checkpoint 2.5.12. Right Tail Comparison.
Now find P(X β₯ 7) when Ο = 0.55. How does it compare to P(X β€ 3)? Why does this make sense?
In this text, we will consider outcomes "more extreme" than the observed outcome if the "tail probability" is less than or equal to the tail probability of the observed outcome. So for this null hypothesis (Ο = 0.55), because P(X β₯ 7) > P(X β€ 3), we would not consider x = 7 to be "more extreme" than x = 3. Therefore, we would not include x = 7 in the p-value calculation. Notice, this distinction arose once our binomial distribution was not symmetric (Ο β 0.50).
Checkpoint 2.5.13. Find More Extreme Value.
Find P(X β₯ 8). Is x = 8 considered more extreme than x = 3?
Checkpoint 2.5.14. Two-Sided P-value for Ο=0.55.
Calculate the two-sided p-value as P(X β€ 3) + P(X β₯ 8)
Checkpoint 2.5.15. Verify with Applet.
Verify your result in the applet: Start with 3 and use the "Tails" option, then check the Two-sided box.
There are other ways to define "more extreme" as well, and results will vary across different software packages (see Practice Problem 1.5). But the results will be more similar the more symmetric the null distribution is, and when the null distribution is skewed, adjusting for the asymmetry is generally preferred to going "the same distance" on each side or doubling the one-sided p-value.
Definition: Two-sided p-values.
Two-sided p-values are used with two-sided alternative hypotheses (β , not equal to). A two-sided p-value considers outcomes in both tails that are at least as rare as the observed result, where at least as rare could be defined as having a smaller tail probability of occurring as the observed statistic. If the null distribution is symmetric, the approaches are equivalent and the two-sided p-value will be double the one-sided p-value.
Checkpoint 2.5.17. Evaluate Pilot Study Conclusion.
The MythBusters decided a 3/7 split was "way outside a random sample" and so they needed to build a different toast-dropping rig. Do you agree with their conclusion from the pilot study? Write a short paragraph to the MythBusters justifying your answer. Include appropriate numerical evidence to support your argument.
Solution.
Based on the two-sided p-values, both 0.50 and 0.55 are plausible values for the probability of a piece of toast landing top side down. Because 0.50 is plausible (p-value = 0.3438), these data do not provide convincing evidence that there is a built-in bias with the mechanism and they could have continued to use it.
After building a new rig (that they were more confident had no bias), they used the rig to drop buttered toast from the roof of a building. The Mythbusters werenβt sure whether the toast would land butter-side down (as in the myth) or whether, like a curved leaf falling from a tree, the toast might try to "right itself" and land with the indented side (from being buttered) down. So again, they wanted to use a two-sided alternative hypothesis, allowing for either possibility to be of interest.

They dropped 48 pieces of toast, with 19 landing butter-side-down.
Checkpoint 2.5.18. Buttered Toast Two-Sided Test.
Use the applet to find the two-sided p-value for testing \(H_0: \pi = 0.5\) vs. \(H_a: \pi \neq 0.50\text{.}\) Do you have strong enough evidence to reject the null hypothesis and conclude Ο differs from 0.5? Why are you making this decision?
Study Conclusions.
These data (19 out of 48) do not provide convincing evidence against the null hypothesis that toast buttered on one side is equally likely to land butter side up or down (two-sided p-value = 0.1934). In other words, it is plausible that this is a 50/50 process. However, this analysis cannot be used as proof that Ο = 0.50, as other values could be plausible as well. The sample data allow for many (technically, infinitely many) plausible values of this probability, as you will explore in the next investigation.
Discussion: Recall that the process we are using to testing our hypotheses is to first assume the null hypothesis is true. For this reason, we can never use this process as evidence for the null hypothesis, only lack of evidence against it. Keep in mind the saying "Absence of evidence is not evidence of absence." Also, note that our evidence depends on the form of the alternative hypothesis. A more complete statement is that we do not have evidence in favor of the alternative that we specified, and that our strength of evidence could differ if we started with a different alternative hypothesis.
Technology Detour β Two-sided p-values.
Checkpoint 2.5.20. Two-sided p-values in One Proportion Inference applet.
-
Check the two-sided box
-
For the "smallest tail probability" approach, set the pull-down menu to "Tails" (finds tail value on other side that is first below the one-sided p-value)
-
For the "smallest p-value" approach, set the pull-down menu to "Individual" (sums all probabilities smaller than the observed)
-
(These will match when the distribution is symmetric.)
Checkpoint 2.5.21. Two-sided p-values in R.
(Smallest p-value method)
iscambinomtest(observed=19, n=48, hyp=0.5, alt="two.sided")
Checkpoint 2.5.22. Two-sided p-values in JMP.
Note: The default two-sided test under Analyze > Distribution is not an "exact" binomial test.

Subsection 2.5.1 Practice Problem 1.5
There are other approaches for calculating the two-sided p-value. For example, we could consider an outcome k more extreme than the observed, if P(X = k) < P(X = observed).
Checkpoint 2.5.24. Explain Alternative Approach.
Explain the distinction in this approach and the one used above.
Checkpoint 2.5.25. Apply Alternative Method.
Return to the pilot study with 3 successes out of n = 10 drops. Based on this new approach, which values above 5 would you consider more extreme than 3 when Ο = 0.55? Document your justification. What is the resulting two-sided p-value? [Hints: Use As extreme as = in the applet. Confirm your results by changing the pull-down menu from "tails" to "individual."]
Checkpoint 2.5.26. Strange Behavior with Extreme Observations.
Suppose n = 20 and you observed 0 successes. Find the two-sided p-values using the approach suggested in (b) for Ο = 0.13, Ο = 0.145, and Ο = 0.15. According to the two-sided p-values, which of these values are plausible for Ο at the 10% level of significance? What strange behavior do you observe?
Checkpoint 2.5.27. Why Not Other Methods?
Or we could choose to use x values that are the same number (or more) of standard deviations above the mean or double the one-sided p-value. Why do you think these methods are not recommended in general?
Checkpoint 2.5.28. When to Use One-Sided Tests.
Itβs interesting that the one-sided p-value for the pilot study is also not statistically significant. If it had been, would it be reasonable for the Mythbusters to use the one-sided p-value instead to support their conclusion? Explain.
You have attempted of activities on this page.


