Skip to main content

Section 2.6 Investigation 1.6: Kissing the Right Way

In the previous investigation, you learned how to decide whether a hypothesized value of the parameter is plausible based on a two-sided p-value. The two-sided p-value is used when you do not have a prior suspicion or interest in whether the hypothesized value is too large or too small. In fact, in many studies we may not even really have a hypothesized value, but are more interested in using the sample data to estimate the value of the parameter. What are all the plausible values for the parameter?
Most people are right-handed and even the right eye is dominant for most people. Researchers have long believed that late-stage human embryos tend to turn their heads to the right. German bio-psychologist Onur GΓΌntΓΌrkΓΌn (Nature, 2003) conjectured that this tendency to turn to the right manifests itself in other ways as well, so he studied kissing couples to see whether both people tend to lean their heads to the right more often than to their left (and if so, how strong the tendency is).
Kissing couples investigation introduction
He and his researchers observed couples from age 13 to 70 in public places such as airports, train stations, beaches, and parks in the United States, Germany, and Turkey. The observers were careful not to include couples who were holding objects such as luggage that might have affected which direction they turned. We will model the overall decision-making process when kissing as a binomial random process.

Checkpoint 2.6.1. Identify Sample, Variable, and Parameter.

Identify the sample and variable in this study, as well as the parameter of interest.
Sample:
Variable:
Parameter:
Hint.
What group was observed, what was measured, and what long-run proportion are we interested in?
Solution.
Sample: 124 observed kissing couples in public places
Variable: Whether the couple leaned right or left
Parameter: \(\pi\) = the long-run probability (proportion) that a kissing couple leans right

Checkpoint 2.6.2. One-Sided or Two-Sided?

If Dr. GΓΌntΓΌrkΓΌn wants to investigate whether a majority of kissing couples lean right in the long run, is this a one-sided or a two-sided test?
  • One-sided test
  • Correct! Testing for a "majority" means \(\pi > 0.5\text{,}\) which is a one-sided alternative.
  • Two-sided test
  • Not quite. A "majority" suggests a specific direction (greater than 0.5), so this is one-sided.
Hint.
Does "majority" suggest a specific direction?

Checkpoint 2.6.3. State Hypotheses for Two Conjectures.

Hint.
For each conjecture, we’re testing whether the data support the specific claimed value using two-sided tests.

Checkpoint 2.6.5. Calculate Sample Proportion.

Dr. GΓΌntΓΌrkΓΌn found 80 out of 124 couples leaned right. Calculate the observed sample proportion \(\hat{p}\text{.}\) Round to three decimal places.
\(\hat{p}\) =
Hint.
Sample proportion = (number of successes)/(sample size)

Checkpoint 2.6.6. Predict Plausible Values.

Based on Dr. GΓΌntΓΌrkΓΌn’s result, what is your best guess for \(\pi\text{,}\) the long-run probability a kissing couple leans right? Do you think 0.50 will be a plausible value for \(\pi\text{?}\) 2/3? 0.74? Explain your reasoning.
Hint.
The sample proportion provides a point estimate. Consider how far each hypothesized value is from the observed proportion.
Solution.
Best guess: \(\pi \approx 0.645\) (the sample proportion)
0.50 seems too low - likely not plausible
2/3 (0.667) is quite close - likely plausible
0.74 is somewhat higher - might be plausible, but less certain
We could employ a "trial-and-error" type of approach to determine which values of \(\pi\) appear plausible based on what we observed in the sample. This involves testing different values of \(\pi\) and seeing whether the corresponding two-sided p-value is larger than some pre-specified cut-off, typically 0.05. (This cut-off is often called the level of significance.) That is, we will consider \(\pi_0\) a plausible value for \(\pi\) if assuming \(\pi = \pi_0\) does not make our sample statistic look surprising (yielding a small p-value).

Checkpoint 2.6.7. Identify Plausible Values on Number Line.

Use the One Proportion Inference applet to test different values of \(\pi_0\) (multiples of 0.01). For each value below, determine if observing 80 of 124 successes yields a two-sided p-value greater than 0.05. Check all plausible values (p-value \(\geq\) 0.05):
  • 0.50
  • The p-value for \(\pi_0 = 0.50\) is very small (less than 0.05), so this is not plausible.
  • 0.51
  • The p-value is still less than 0.05.
  • 0.52
  • The p-value is still less than 0.05.
  • 0.53
  • The p-value is still less than 0.05.
  • 0.54
  • The p-value is still less than 0.05.
  • 0.55
  • The p-value is still less than 0.05.
  • 0.56
  • Correct! The p-value is greater than 0.05, making this plausible.
  • 0.645
  • Correct! This is our observed sample proportion, definitely plausible.
  • 0.72
  • Correct! The p-value is greater than 0.05, making this plausible.
  • 0.73
  • The p-value is less than 0.05, so this is not plausible.
Hint.
Use the applet to find the two-sided p-value for each hypothesized value. Plausible values have p-values \(\geq\) 0.05.

Checkpoint 2.6.8. Find 95% Confidence Interval.

Use the One Proportion Inference applet to determine the values of \(\pi_0\) such that observing 80 of 124 successes or a result more extreme occurs in at least 5% of samples. [Hints: Use values of \(\pi\) that are multiples of 0.01 until you can find the boundaries where the exact two-sided p-values (using the tails of the binomial distribution) change from below 0.05 to above 0.05. Then feel free to "zoom in" to three decimal places of accuracy if you’d like.] On a number line, indicate which values are rejected (not plausible) and which values are not rejected and therefore considered plausible.
Hint.
Test values systematically: start with values like 0.50, 0.55, 0.60, etc., and find where the p-value crosses 0.05.
Solution.
The 95% confidence interval using the "smallest tail probability" (Blaker) approach is approximately \(0.557\) to \(0.727\text{.}\)
Values below 0.557 and above 0.727 would have two-sided p-values less than 0.05 and would be rejected.
Values between 0.557 and 0.727 would have two-sided p-values greater than 0.05 and are considered plausible.

Definition: Confidence Interval.

A confidence interval (CI) specifies the plausible values of the parameter based on the sample result.
What you found in the previous exercise will be called a "95% confidence interval" as it was derived using the \(1 - 0.95 = 0.05\) cut-off value/significance level.

Checkpoint 2.6.9. Interpret 95% CI.

Interpret the confidence interval from the previous exercise. What are you 95% confident of?
Hint.
A confidence interval gives a range of plausible values for the parameter.
Solution.
We are 95% confident that the true probability that a kissing couple leans right is between 0.557 and 0.727 (or approximately 56% to 73%).

Checkpoint 2.6.10. Find 99% Confidence Interval.

Use the One Proportion Inference applet to determine the 99% confidence interval by using 0.01 rather than 0.05 as the criterion for rejection/plausibility (level of significance). [Hints: You can check the Show sliders box in the applet and use the slider or edit the orange number to change the value of \(\pi_0\text{.}\) Keep in mind that you are changing the conjectured value of \(\pi\text{,}\) not the observed number of successes, which should stay at 80.]
Using 0.01 as the significance level, test different values of \(\pi_0\) to determine which yield a two-sided p-value greater than 0.01. Check all plausible values (p-value \(\geq\) 0.01):
  • 0.49
  • The p-value is less than 0.01, so this is not plausible.
  • 0.50
  • The p-value is less than 0.01, so this is not plausible.
  • 0.51
  • The p-value is less than 0.01, so this is not plausible.
  • 0.52
  • The p-value is less than 0.01, so this is not plausible.
  • 0.53
  • Correct! The p-value is greater than 0.01, making this plausible in the 99% CI.
  • 0.54
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.55
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.56
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.645
  • Correct! This is our observed sample proportion, definitely plausible.
  • 0.72
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.73
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.74
  • Correct! The p-value is greater than 0.01, making this plausible in the 99% CI.
  • 0.75
  • Correct! The p-value is greater than 0.01, making this plausible.
  • 0.76
  • The p-value is less than 0.01, so this is not plausible.
  • 0.77
  • The p-value is less than 0.01, so this is not plausible.
  • 0.78
  • The p-value is less than 0.01, so this is not plausible.
Hint.
Use the applet to find the two-sided p-value for each hypothesized value. Plausible values have p-values \(\geq\) 0.01.

Checkpoint 2.6.11. Compare 95% and 99% Confidence Intervals.

Does this "99% confidence interval" include more or fewer values than the 95% confidence interval? Explain why this makes intuitive sense.
Hint.
A higher confidence level means we want to be more certain, so we need a wider interval.
Solution.
The 99% confidence interval is approximately \(0.529\) to \(0.749\text{.}\)
This interval includes more values than the 95% confidence interval (it’s wider).
Explanation: To be more confident (99% vs. 95%) that we’ve captured the true parameter value, we need to include more values, making the interval wider.

Checkpoint 2.6.12. Compare Intervals.

Identify a value that is captured in the 99% confidence interval but not the 95% confidence interval. Interpret the meaning of this observation. Explain what your analysis reveals about this value as a plausible value of \(\pi\text{.}\)
Hint.
Look at values near the endpoints of the 95% interval.
Solution.
For example, \(\pi = 0.54\) is in the 99% CI (0.529 to 0.749) but not in the 95% CI (0.557 to 0.727).
This means that if we tested \(H_0: \pi = 0.54\) vs. \(H_a: \pi \neq 0.54\text{:}\)
So 0.54 is "somewhat" plausible but not highly plausible.

Study Conclusions.

The researchers are assuming they have a representative sample from a binomial random process and want to estimate \(\pi\text{,}\) the underlying probability that a randomly selected kissing couple leans to the right. Based on this sample of 124 observations, we estimate \(\pi\) to be close to \(\hat{p} = 80/124 = 0.645\text{.}\) However, we know there is some sampling variability, so we want to find an interval of values that appear to be plausible values of \(\pi\text{.}\) We do this by finding the values of \(\pi_0\) for which the two-sided p-value (\(H_0: \pi = \pi_0\) vs. \(H_a: \pi \neq \pi_0\)) is greater than 0.05. These are all the values of the parameter such that our sample result is not overly surprising. You should have found this "95% confidence interval," using the "smallest tail probability" approach, to be approximately 0.557 to 0.727 (results from using different software or the applet will differ slightly). Thus, based on these sample results, we are "confident" that the actual value of \(\pi\text{,}\) the probability a random kissing couple leans right, is between 0.56 and 0.73.
A 99% confidence interval for \(\pi\) extends from 0.529 to 0.749 (a smaller lower endpoint and a larger upper endpoint, but a similar midpoint) and therefore includes additional plausible values of the parameter compared to the 95% interval. The 99% confidence interval is wider than the 95% interval because a higher level of confidence requires more "room for error." You will learn other methods for calculating confidence intervals for a binomial process in the next section.
Discussion: In this investigation you have learned a second type of "statistical inference": providing an interval of plausible values for the parameter based on an observed sample statistic. Confidence intervals provide a nice companion to tests of significance and are also very useful by themselves. Whereas a test of significance allows you to test the plausibility of a specific hypothesized value, if you reject the null hypothesis, the test of significance provides no information as to how different the actual parameter is from the hypothesized value. If you fail to reject the null hypothesis, you only know that the tested value is one of many plausible values. A confidence interval provides an estimate (with bounds) of the actual value of the parameter. You will learn some additional methods for finding confidence intervals later in this text, but do be aware that some software packages use different methods for finding the "exact" binomial two-sided p-values.

Checkpoint 2.6.13. Alternative CI Method.

Alternatively, another way to define a binomial confidence interval is to find all the values of \(\pi\) such that P(X < observed) < (1 – confidence level)/2 and P(X > observed) < (1 – confidence level)/2. Use the applet to find the 95% confidence interval using this approach. [Hints: Remember to use the one-sided p-value and change the direction of the tail probability between < and >, but not =.]
How does the resulting 95% confidence interval compare to the previous one? Which interval is "better"? Explain.
Hint.
This is the Clopper-Pearson method. For 95% confidence, use 0.025 as the cutoff for each tail.
Solution.
The Clopper-Pearson 95% confidence interval will be slightly different (typically slightly wider) than the Blaker interval.
Neither is definitively "better" - they use different definitions of "more extreme." The Blaker method tends to produce slightly shorter intervals and is increasingly preferred.

Probability Detour – Binomial Confidence Intervals.

Probably the most well-known binomial confidence interval method is the Clopper-Pearson method (Biometrika, 1934). Rather than using two-sided p-values, it will consider a value for \(\pi\) plausible as long as the one-sided tail probability is smaller than (1 – confidence level)/2. One advantage of the Clopper-Pearson method is there is a simple computer algorithm for finding it, rather than needing to check all values as you have done here.
The method we first showed you (keeping all values of \(\pi\) with a two-sided p-value larger than (1 – confidence level) using the two-sided p-value based on the tail probabilities) is attributed to Blaker (The Canadian Journal of Statistics, 2000). We could refer to this as the "smallest tail probability" method.
Another approach would be to use the "smallest p-value" approach for the two-sided p-value, switching to using P(X = x) to find values of x more extreme than observed in finding the two-sided p-value. This method, attributed to Sterne (Biometrika, 1954), has the disadvantage that the interval produced can have holes! For example, a value like 0.12 may be in the interval of plausible values, the value 0.13 may not, and the value 0.14 may be again.
Many prefer the Blaker method to the holes of the Sterne method, and Blaker’s method is now gaining favor over Clopper-Pearson because the intervals tend to be shorter. We will see some other methods later in this text as well. For now, keep in mind the likely duality between confidence intervals and tests of significance: The confidence interval is the set of values for which we would fail to reject the null hypothesis in favor of the two-sided alternative. So we can interpret the confidence interval as the set of plausible values for the parameter in that they are the values such that our observed sample result would not be surprising. Keep in mind that saying a value is plausible is not the same as saying a value is probable. We won’t make probability statements about parameter values in this text.

Checkpoint 2.6.14. Misinterpreting Confidence.

Explain why it is not appropriate to say "There is a 99% probability that between 52.6% and 75.2% of kissing couples lean right when they kiss."
Hint.
The parameter is fixed (not random), but the interval is random.
Solution.
The parameter \(\pi\) is a fixed value - it either is or isn’t in the interval. The 99% refers to the confidence in our method: if we were to repeat this process many times, about 99% of the intervals we construct would contain the true parameter value. We cannot make a probability statement about this specific interval containing the parameter.

Checkpoint 2.6.15. Technology Comparison.

Use technology (see Technology Detour below) to determine the 95% and 99% Clopper-Pearson confidence intervals for the probability that a kissing couple leans to the right. Comment on how the 99% confidence interval compares to the 95% interval, examining both midpoints and widths. [Hint: Average the endpoints to find the midpoint; subtract the endpoints to find the width.]
Hint.
Use R’s iscambinomtest function or JMP with the ISCAM Journal file.
Solution.
Using technology (e.g., R with iscambinomtest(80, 124, conf.level=0.95)):
95% CI: approximately 0.556 to 0.725
99% CI: approximately 0.528 to 0.751
The 99% interval is wider (width β‰ˆ 0.223 vs. 0.169 for 95%) but has a similar midpoint (approximately 0.64).

Technology Detour – Exact Confidence Intervals.

Different software packages use different confidence interval methods, but you shouldn’t see much distinction when the sample size is large.

Checkpoint 2.6.16. Confidence Intervals in R.

In R: Clopper-Pearson (tail probability < (1-confidence)/2)
iscambinomtest(observed, n, conf.level = )
Can enter either sample count or sample proportion for "observed."
Be sure to label the confidence level. You can enter the proportion or the percentage value.
Solution.
For the kissing couples example with 80 successes out of 124 trials:
library(iscam)
iscambinomtest(80, 124, conf.level = 0.95)
This produces a 95% Clopper-Pearson confidence interval of approximately (0.556, 0.725).
For a 99% confidence interval:
iscambinomtest(80, 124, conf.level = 0.99)
This produces a 99% confidence interval of approximately (0.528, 0.751).

Checkpoint 2.6.17. Confidence Intervals in JMP.

In JMP: Clopper-Pearson (tail probability < (1-confidence)/2)
Using the ISCAM Journal file, select Confidence Interval for One Proportion (with Summary Stats).
(The raw data option does not give an exact binomial confidence interval.)
With summarized data, specify the count and sample size, and select the Binomial radio button.
Solution.
For the kissing couples example:
The output will show a 95% Clopper-Pearson confidence interval of approximately (0.556, 0.725).
You can change the confidence level to 99% in the dialog box to obtain the 99% confidence interval of approximately (0.528, 0.751).

Checkpoint 2.6.18. Optional: Compare Methods.

Optional: Compare the Clopper-Pearson 99% interval to the Blaker 99% interval. Which is narrower?
Hint.
The Blaker method typically produces shorter intervals.
Solution.
The Blaker 99% interval is typically narrower than the Clopper-Pearson 99% interval, which is one reason it’s gaining favor in practice.

Output from R.

The iscambinomtest function includes some interesting graphs.
  • The bottom graph illustrates the 95% confidence interval. The interval is centered at the observed sample proportion \(\hat{p}\) and displays the two endpoints of the interval of plausible values for the process probability.
  • The top graph shows the distribution assuming the lower value of the confidence interval as the process probability. This is as far left as we can shift that null distribution before the area to the right of the observed number of successes, 80, dips below 0.025.
  • The middle graph shows how far we can move that distribution to the right (largest plausible value of \(\pi\)) before the probability below 80 dips below 0.025.
R output from iscambinomtest showing three graphs - top and middle showing binomial distributions at interval endpoints, bottom showing confidence interval
Figure 2.6.19.

Subsection 2.6.1 Practice Problem 1.6

Checkpoint 2.6.20. St. George’s Hospital CI.

Recall the 8 out of 10 statistic for St. George’s Hospital (Investigation 1.4). Based on this result, what is an interval of plausible values for the underlying mortality rate at St. George’s? [Hint: You can use a 95% confidence level if none is stated.] Describe how you found the interval and name the method used. Report the midpoint and width of this interval.

Checkpoint 2.6.21. Larger Sample CI.

Use technology to calculate the 95% confidence interval based on the 71 deaths among 361 patients. Comment on how the width and midpoint of this interval differ from the interval in (a). Explain why these changes make sense.

Checkpoint 2.6.22. CI and Hypothesis Test Duality.

Based on the interval in (b), if you were to test \(H_0: \pi = 0.20\) vs. \(H_a: \pi \neq 0.20\text{,}\) would you reject or fail to reject the null hypothesis? Explain how you know the conclusion based on the confidence interval without actually conducting the test.
You have attempted of activities on this page.