Skip to main content

Section 4.5 Investigation 1.9: Kissing the Right Way (cont.)

The normal approximation can also be used to approximate confidence intervals.

Exercises 4.5.1 Interval of plausible values based on Central Limit Theorem

Recall the study published in Nature that found 64.5% of 124 kissing couples leaned right to kiss (Investigation 1.6). We previously used simulation and the binomial distribution to determine which values were plausible for the underlying probability that a kissing couple leans right (\(\pi\)). In particular, we found 0.5 and 0.74 were not plausible values for \(\pi\) but 0.6667 was. The 95% Clopper-Pearson Binomial confidence interval was (0.554, 0.729). Now you will consider applying the normal model as another method for producing confidence intervals for this parameter.

1. Check CLT Validity Condition.

With a sample size of 124 kissing couples, does the Central Limit Theorem predict the normal probability distribution will be a reasonable model for the distribution of the sample proportion?
Hint.
Check whether you have at least 10 successes and at least 10 failures in your sample.
Solution.
\(n\pi \geq 10\) and \(n(1-\pi) \geq 10\text{.}\) Since we don’t have an estimate for \(\pi\text{,}\) use \(\hat{p}\text{:}\)
\(n\hat{p} = 124(0.645) = 80 > 10\)
\(n(1-\hat{p}) = 124(0.355) = 44 > 10\)
Yes! Both conditions are satisfied, so the CLT validity condition is met.
When you do not have a particular value to be tested for the process probability, it’s reasonable to use the sample proportion in checking the sample size condition for the CLT. (This is equivalent to making sure there are at least 10 successes and at least 10 failures in the sample.)

2. Describe Distribution of Sample Proportion.

Do you have enough information to describe and sketch the distribution of the sample proportion as predicted by the Central Limit Theorem? Explain.
Hint.
According to the CLT, what are the mean and standard deviation of the distribution of sample proportions?
Solution.
No, you do not have enough information to sketch the distribution predicted by the CLT. We need \(\text{SD}(\hat{p}) = \sqrt{\pi(1-\pi)/n}\text{,}\) but we don’t know \(\pi\text{.}\)

3. Estimate Standard Deviation - Method.

Suggest one method for estimating the standard deviation of this distribution of sample proportions based on the observed sample data.
Hint.
What value could you substitute for \(\pi\) in the formula \(\sqrt{\pi(1-\pi)/n}\text{?}\)
Solution.
Method: Use \(\text{SE}(\hat{p}) = \sqrt{\hat{p}(1-\hat{p})/n}\)

4. Estimate Standard Deviation - Calculation.

Calculate this estimate:
Solution.
Estimate: \(\text{SE}(\hat{p}) = \sqrt{0.645(0.355)/124} = \sqrt{0.001846} = 0.043\)

Definition.

The standard error of the sample proportion, \(SE(\hat{p})\text{,}\) is an estimate for the standard deviation of \(\hat{p}\) (i.e., \(SD(\hat{p})\)) based on the sample data, found by substituting the sample proportion for \(\pi\text{:}\)
\begin{equation*} SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \end{equation*}

5. Find Largest Plausible Distance.

Now consider calculating a 95% confidence interval for the process probability \(\pi\) based on the observed sample proportion \(\hat{p}\text{.}\) What’s the largest distance you expect to see a sample proportion \(\hat{p}\) fall away from the underlying process probability \(\pi\text{?}\) [Hint: Assuming a normal distribution … 95% …]
Largest plausible distance =
Hint.
For a normal distribution, approximately 95% of observations fall within how many standard deviations of the mean? Compute a numerical value for the distance.
Solution.
Max plausible distance = \(2 \times 0.043 = 0.086\) (95% of the time)

6. Calculate Confidence Interval.

Use the distance from the previous question and the observed sample proportion of \(\hat{p} = 0.645\) to determine an interval of plausible values for \(\pi\text{,}\) the probability that a kissing couple leans to the right.
Hint.
Add and subtract the distance from Question 5 to the sample proportion.
(Lower bound: , Upper bound: )
Solution.
\(\pi\) should fall within 2 standard deviations of the observed sample proportion. So \(\pi\) is at least \(0.645 - 2(0.043) = 0.559\) and at most \(0.645 + 2(0.043) = 0.731\)
I’m 95% confident that the probability a kissing couple leans right is between 0.559 and 0.731.
An approximate 95% confidence interval for the process probability based on the normal distribution would be \(\hat{p} \pm 2\sqrt{\hat{p}(1-\hat{p})/n}\text{.}\) That is, this interval extends two standard deviations on each side of the sample proportion. We know that for the normal distribution roughly 95% of observations (here sample proportions) fall within 2 SDs of the mean (here the unknown process probability), so this method will "capture" the process probability for roughly 95% of samples.
However, we should admit that the multiplier of 2 is a bit of a simplification. So how do we find a more precise value of the multiplier to use, including for values other than 95%?

Definition.

The \((100 \times C)\)% critical value, \(z^*\text{,}\) is the \(z\)-score value such that \(P(-z^* \leq Z \leq z^*) = C\) where \(C\) is any specified probability value, and \(Z\) represents a normal distribution with mean 0 and SD 1.
Note: We use the symbol \(z^*\) to distinguish this value, found based on the confidence level, from \(z_0\text{,}\) the observed \(z\)-score for the data.

Technology Detour β€” Finding Percentiles from the Standard Normal Distribution.

Use technology to more precisely determine the number of standard deviations that capture the middle 95% of the normal distribution with mean = 0 and standard deviation = 1. [Hint: In other words, how many standard deviations do you need to go on each side of zero to capture the middle 95% of the standard normal distribution?] Keep in mind that the \(z\)-value corresponding to probability \(C\) in the middle of the distribution, also corresponds to having probability \((1-C)/2\) in each tail.
7. Find the 95% Critical Value.
Find the critical value for 95% confidence. Choose one set of instructions below by clicking on a hint.
Critical value for 95% confidence: \(z^* =\)
Hint 1. Normal Probability Calculator
  • You can leave the mean set to zero and the standard deviation to 1 and the variable is "z-scores."
  • Check the box next to the first \(<\) sign and specify the probability value to correspond to the lower tail probability of interest: \((1-C)/2\text{.}\) Press Enter/Return and it will display the negative \(z^*\)-value.
For 95% confidence, use probability 0.025 in the lower tail (or 0.975 for upper tail).
Hint 2. R/Sage Instructions
The iscaminvnorm function takes the following inputs:
  • prob1 = probability of interest
  • mean = the mean of the normal distribution (default = 0)
  • sd = the standard deviation of the normal distribution (default = 1)
  • direction = whether the probability of interest was in the lower tail ("below"), upper tail ("above"), both tails ("outside"), or in the middle of the distribution ("between")
Hint 3. JMP Instructions
In JMP using the Distribution Calculator in the ISCAM Journal File:
  1. From the Distribution pull-down menu, select Normal (at the top of the list)
  2. Specify the values for the mean (0) and standard deviation (1)
  3. Change the Type of Calculation to Input probability and calculate quantiles
  4. Select Central Probability and specify the confidence level (e.g., 0.95)
  5. Press Enter/Return.
Solution.
\(z^* = 1.96\)

8. Compare to 90% Critical Value.

Find the critical value for a 90% confidence interval. Is it larger or smaller than with 95% confidence? Why does this make sense?
Hint.
Use technology to find the critical value for 90% confidence. Think about what happens to the interval width when you require less confidence.
  • Smaller
  • The same
  • Larger
Solution.
\(z^*(90) = 1.645 < 1.96\)
This makes sense because we are only capturing the middle 90% so don’t have to extend as far from the middle.

One Sample \(z\)-Confidence Interval (Wald Interval) for \(\pi\).

When we have at least 10 successes and at least 10 failures in the sample, an approximate confidence interval for \(\pi\) is given by:
\begin{equation*} \hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \end{equation*}
where \(z^*\) corresponds to the confidence level.

9. Identify Midpoint and Width.

Based on this formula, what (expression) is midpoint of the confidence interval? What (expression) is the width of the interval?
Hint.
The interval is \(\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}\text{.}\) What’s the center? What’s the distance from the lower bound to the upper bound?
Solution.
Midpoint: \(\hat{p}\)
Width: \(2 \times z^* \sqrt{\hat{p}(1-\hat{p})/n}\)

10. Effect of Confidence Level on Width.

How does increasing the confidence level affect the width of the interval?
Hint.
What happens to \(z^*\) when the confidence level increases?
  • The width does not change
  • Increases the width
  • Decreases the width
Solution.
Increasing the Confidence Level \(\Rightarrow\) increase \(z^*\) \(\Rightarrow\) wider interval

11. Effect of Sample Size on Width.

How does increasing the sample size affect the width of the interval?
Hint.
Look at the formula for the margin of error. What happens to \(\sqrt{\hat{p}(1-\hat{p})/n}\) as \(n\) increases?
  • The width does not change
  • Increases the width
  • Decreases the width
Solution.
Increasing \(n\) \(\Rightarrow\) smaller width (\(n\) is in the denominator)

Definition.

The half-width of a confidence interval is also referred to as the margin of error.
So the above interval formula is of the common form: statistic \(\pm\) margin of error, where margin of error = critical value \(\times\) standard error of statistic. Here, margin of error = \(z^* \sqrt{\hat{p}(1-\hat{p})/n}\text{.}\)
Although 95% is the most common confidence level, a few other confidence levels and their corresponding critical values are shown in the table below.
Confidence level 90% 95% 99% 99.9%
Critical value \(z^*\) 1.645 1.960 2.576 3.291

12. Calculate 90% Confidence Interval.

Determine the 90% \(z\)-confidence interval for the probability that a kissing couple leans to the right.
Hint.
Use the critical value from the table (1.645) or technology to find the 90% interval.
(Lower bound: , Upper bound: )
Solution.
If the confidence level is 90%, then use \(z^* = 1.645\)
\(0.645 - 1.645 \times 0.043 = 0.5743\)
\(0.645 + 1.645 \times 0.043 = 0.7157\)
I am 90% confident that between 57.43% and 71.57% of kissing couples turn to the right.

13. Compare Interval Midpoints.

How does the midpoint of the 90% confidence interval compare to the midpoint of the 95% confidence interval?
The 90% interval midpoint is:
  • Larger
  • Not quite. Both intervals are centered at the same sample proportion.
  • Smaller
  • Not quite. Both intervals are centered at the same sample proportion.
  • Equal
  • Correct! Both intervals have the same midpoint (the sample proportion).
Solution.
Midpoint = \((0.5743 + 0.7157)/2 = 0.645\) (same as with the 95% confidence interval)
Both intervals are centered at the sample proportion \(\hat{p}\text{.}\)

14. Compare Interval Widths.

How does the width of the 90% confidence interval compare to the width of the 95% confidence interval?
The 90% interval width is:
  • Larger
  • Not quite. Lower confidence level means narrower interval.
  • Smaller
  • Correct! The 90% interval is narrower because we use a smaller critical value (1.645 vs 1.96).
  • Equal
  • Not quite. Different confidence levels produce different widths.
Solution.
Width = \(0.7157 - 0.5743 = 0.142\) (smaller than the 0.169 width for the 95% confidence interval)
\(2(1.645)(0.043) = 0.142\)
Comparison of 90% and 95% confidence intervals showing different widths with same midpoint

Technology Detour β€” One Proportion z-Confidence Intervals.

15. One Proportion z-interval with Technology.
Use technology (Applet, R, or JMP) to confirm your calculation of the one proportion z-test for the Halloween study (\(n\) = 284, 135 successes, hypothesized = 0.5, two-sided test). Choose one set of instructions below by clicking on a hint.
(Lower bound: , Upper bound: )
Hint 1. R/Sage Instructions
Use the iscamonepropztest function to calculate the z-confidence interval. The function takes the following inputs:
If you don’t specify a hypothesized value and alternative, be sure to label the confidence level.
Hint 2. JMP Instructions
The output will show the 90% \(z\)-confidence interval.
Hint 3. Theory-Based Inference Applet Instructions
The applet will display the 90% \(z\)-confidence interval.
Solution.
Example output from R:
R output showing z-confidence interval for kissing study
Example output from JMP:
JMP output showing z-confidence interval for kissing study
Example output from Theory-Based Inference applet:
Theory-Based Inference applet output showing z-confidence interval for kissing study

16. Compare to Exact Binomial Intervals.

How do the widths of the \(z\)-intervals compare to the Exact Binomial Confidence Intervals reported by R?
Hint.
Calculate the widths of both the \(z\)-intervals and the exact binomial intervals for comparison.
Solution.
The binomial confidence intervals are a bit longer (and even a bit more to the left).

Sample Size Determination.

Suppose you are planning your own study about kissing couples. Before you collect the data, you know you would like the margin of error to be at most 3 percentage points and that you will use a 95% confidence level. Use this information to determine the sample size necessary for your study.
[This is a very common question asked of statisticians. Think about how to determine this using the \(z\)-interval formula. What information do you know? What information are you looking for?]
17. Approach 1: Using Previous Study.
Use the sample proportion found in the original study as an estimate for the unknown value of \(\pi\text{.}\)
Hint.
Set the margin of error equal to 0.03 and solve for \(n\text{:}\) \(z^* \sqrt{\pi(1-\pi)/n} = 0.03\)
\(n =\)
Solution.
\(z^* \sqrt{\hat{p}(1-\hat{p})/n} = 0.03\)
Use the previous \(\hat{p} = 0.645\text{.}\)
Solving for \(n\text{,}\) we would get \((1.96^2)(0.645)(0.355)/(0.03^2) = 977.4\) or \(978\)
18. Approach 2: Conservative Estimate.
Without a preliminary study, you can use 0.5 as an estimate of this probability.
Hint.
Set the margin of error equal to 0.03 and solve for \(n\) using \(\pi = 0.5\text{:}\) \(z^* \sqrt{0.5(0.5)/n} = 0.03\)
\(n =\)
Solution.
\(z^* \sqrt{\hat{p}(1-\hat{p})/n} = 0.03\)
Using 0.50 as the estimate:
Solving for \(n\text{,}\) we would get \((1.96^2)(0.5)(0.5)/(0.03^2) = 1067.1\) or \(1068\text{.}\)
19. Compare Sample Size Approaches.
How does the sample size required differ based on how you estimate \(\pi\) in the calculation? Why is Approach 2 considered a "conservative" approach?
Hint.
Compare the sample sizes from the two approaches. When is \(\pi(1-\pi)\) largest?
Solution.
The value of \(n\) increases for probabilities closer to 0.50. If the process probability is not 0.50, the recommended \(n\) value will result in a smaller margin of error than requested.
We can solve for \(n\) by inverting the formula for the margin of error, assuming a value for \(\pi\text{.}\) The largest value for the margin of error occurs when \(\pi = 0.50\text{,}\) so that value can be used for a conservative estimate of the necessary sample size. You should always round your value up to the next integer to also ensure the margin of error won’t exceed the specification. Notice that this calculation is much more difficult with a binomial confidence interval which does not have a simple formula to manipulate.

Subsection 4.5.2 Practice Problem 1.9A

Recall from Practice Problem 1.8B that a student wanted to assess whether her dog Muffin tends to chase her blue ball and her red ball equally often when they are rolled at the same time. The student rolled both balls a total of 96 times, each time keeping track of which ball Muffin chased. The student found that Muffin chased the blue ball 52 times and the red ball 44 times. We arbitrarily decided to treat the blue ball as "success."

Checkpoint 4.5.6. Check Validity of \(z\)-procedures.

Is using theory-based \(z\)-procedures valid in this study? How are you deciding?

Checkpoint 4.5.7. Calculate 95% Confidence Interval.

Checkpoint 4.5.8. Compare Margins of Error.

Report the margin of error of your interval. Determine and compare to the margin of error change if with a 99% confidence level.

Checkpoint 4.5.9. Determine Required Sample Size.

What would be the necessary sample size if we wanted a margin of error of 0.01 for a confidence level of 95%? Explain how you are finding this.

Subsection 4.5.3 Practice Problem 1.9B

Checkpoint 4.5.10. Understanding Confidence Intervals.

In an actual study, how do you know whether your interval actually contains the value of the unknown parameter?

Checkpoint 4.5.11. Standard Deviation vs Standard Error.

What is the distinction between standard deviation and standard error?
Reconsider Muffin from Practice Problem 1.8B and suppose our confidence interval is (0.2689, 0.4811). For each statement below, determine whether it is a valid or invalid interpretation of this confidence interval.

Checkpoint 4.5.12. Statement 1.

You are 95% confident that the interval (0.2689, 0.4811) contains the sample proportion of blue balls chased by Muffin.
  • Valid
  • Incorrect. We know the sample proportion exactly; there’s no need for a confidence interval for it. The confidence interval is for the parameter (the true probability), not the sample proportion.
  • Invalid
  • Correct! We know the sample proportion exactly; there’s no need for a confidence interval for it. The confidence interval is for the parameter (the true probability), not the sample proportion.

Checkpoint 4.5.13. Statement 2.

There is a 95% chance that the interval (0.2689, 0.4811) captures the probability Muffin chasing the blue ball.
  • Valid
  • Incorrect. This makes it sound like the parameter is random, when it’s actually fixed. The interval is what’s random. Once we’ve constructed this specific interval, it either contains the parameter or it doesn’t.
  • Invalid
  • Correct! This makes it sound like the parameter is random, when it’s actually fixed. The interval is what’s random. Once we’ve constructed this specific interval, it either contains the parameter or it doesn’t.

Checkpoint 4.5.14. Statement 3.

95% of the time the interval (0.2689, 0.4811) contains the probability of Muffin chasing the blue ball.
  • Valid
  • Incorrect. This specific interval is fixed; it doesn’t change. The correct interpretation is about the procedure (95% of intervals constructed this way contain the parameter), not this specific interval.
  • Invalid
  • Correct! This specific interval is fixed; it doesn’t change. The correct interpretation is about the procedure (95% of intervals constructed this way contain the parameter), not this specific interval.

Checkpoint 4.5.15. Statement 4.

In the long run, 95% of sample proportions fall in between 0.2689 and 0.4811.
  • Valid
  • Incorrect. This confuses the confidence interval for the parameter with the distribution of sample proportions. The confidence interval tells us about plausible values for the parameter, not where sample proportions fall.
  • Invalid
  • Correct! This confuses the confidence interval for the parameter with the distribution of sample proportions. The confidence interval tells us about plausible values for the parameter, not where sample proportions fall.

Checkpoint 4.5.16. Statement 5.

If the null hypothesis is true, there is a 95% chance the interval contains the parameter.
  • Valid
  • Incorrect. Confidence intervals don’t depend on whether a null hypothesis is true or not. The confidence level comes from the sampling distribution, not from any hypothesis.
  • Invalid
  • Correct! Confidence intervals don’t depend on whether a null hypothesis is true or not. The confidence level comes from the sampling distribution, not from any hypothesis.

Checkpoint 4.5.17. Statement 6.

I am 95% confident that Muffin chases the blue ball between 27% and 48% of the time.
  • Valid
  • Correct! This correctly expresses confidence about the parameter (the underlying probability/long-run proportion). This is a proper interpretation of a confidence interval.
  • Invalid
  • Incorrect. This is actually a valid interpretation because it correctly expresses confidence about the parameter (the underlying probability/long-run proportion).
You have attempted of activities on this page.