Exercises4.5.1Interval of plausible values based on Central Limit Theorem
Recall the study published in Nature that found 64.5% of 124 kissing couples leaned right to kiss (Investigation 1.6). We previously used simulation and the binomial distribution to determine which values were plausible for the underlying probability that a kissing couple leans right (\(\pi\)). In particular, we found 0.5 and 0.74 were not plausible values for \(\pi\) but 0.6667 was. The 95% Clopper-Pearson Binomial confidence interval was (0.554, 0.729). Now you will consider applying the normal model as another method for producing confidence intervals for this parameter.
With a sample size of 124 kissing couples, does the Central Limit Theorem predict the normal probability distribution will be a reasonable model for the distribution of the sample proportion?
When you do not have a particular value to be tested for the process probability, itβs reasonable to use the sample proportion in checking the sample size condition for the CLT. (This is equivalent to making sure there are at least 10 successes and at least 10 failures in the sample.)
No, you do not have enough information to sketch the distribution predicted by the CLT. We need \(\text{SD}(\hat{p}) = \sqrt{\pi(1-\pi)/n}\text{,}\) but we donβt know \(\pi\text{.}\)
The standard error of the sample proportion, \(SE(\hat{p})\text{,}\) is an estimate for the standard deviation of \(\hat{p}\) (i.e., \(SD(\hat{p})\)) based on the sample data, found by substituting the sample proportion for \(\pi\text{:}\)
Now consider calculating a 95% confidence interval for the process probability \(\pi\) based on the observed sample proportion \(\hat{p}\text{.}\) Whatβs the largest distance you expect to see a sample proportion \(\hat{p}\) fall away from the underlying process probability \(\pi\text{?}\) [Hint: Assuming a normal distribution β¦ 95% β¦]
For a normal distribution, approximately 95% of observations fall within how many standard deviations of the mean? Compute a numerical value for the distance.
Use the distance from the previous question and the observed sample proportion of \(\hat{p} = 0.645\) to determine an interval of plausible values for \(\pi\text{,}\) the probability that a kissing couple leans to the right.
\(\pi\) should fall within 2 standard deviations of the observed sample proportion. So \(\pi\) is at least \(0.645 - 2(0.043) = 0.559\) and at most \(0.645 + 2(0.043) = 0.731\)
An approximate 95% confidence interval for the process probability based on the normal distribution would be \(\hat{p} \pm 2\sqrt{\hat{p}(1-\hat{p})/n}\text{.}\) That is, this interval extends two standard deviations on each side of the sample proportion. We know that for the normal distribution roughly 95% of observations (here sample proportions) fall within 2 SDs of the mean (here the unknown process probability), so this method will "capture" the process probability for roughly 95% of samples.
However, we should admit that the multiplier of 2 is a bit of a simplification. So how do we find a more precise value of the multiplier to use, including for values other than 95%?
The \((100 \times C)\)% critical value, \(z^*\text{,}\) is the \(z\)-score value such that \(P(-z^* \leq Z \leq z^*) = C\) where \(C\) is any specified probability value, and \(Z\) represents a normal distribution with mean 0 and SD 1.
Note: We use the symbol \(z^*\) to distinguish this value, found based on the confidence level, from \(z_0\text{,}\) the observed \(z\)-score for the data.
Technology Detour β Finding Percentiles from the Standard Normal Distribution.
Use technology to more precisely determine the number of standard deviations that capture the middle 95% of the normal distribution with mean = 0 and standard deviation = 1. [Hint: In other words, how many standard deviations do you need to go on each side of zero to capture the middle 95% of the standard normal distribution?] Keep in mind that the \(z\)-value corresponding to probability \(C\) in the middle of the distribution, also corresponds to having probability \((1-C)/2\) in each tail.
Check the box next to the first \(<\) sign and specify the probability value to correspond to the lower tail probability of interest: \((1-C)/2\text{.}\) Press Enter/Return and it will display the negative \(z^*\)-value.
direction = whether the probability of interest was in the lower tail ("below"), upper tail ("above"), both tails ("outside"), or in the middle of the distribution ("between")
The interval is \(\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}\text{.}\) Whatβs the center? Whatβs the distance from the lower bound to the upper bound?
So the above interval formula is of the common form: statistic \(\pm\) margin of error, where margin of error = critical value \(\times\) standard error of statistic. Here, margin of error = \(z^* \sqrt{\hat{p}(1-\hat{p})/n}\text{.}\)
Technology Detour β One Proportion z-Confidence Intervals.
15.One Proportion z-interval with Technology.
Use technology (Applet, R, or JMP) to confirm your calculation of the one proportion z-test for the Halloween study (\(n\) = 284, 135 successes, hypothesized = 0.5, two-sided test). Choose one set of instructions below by clicking on a hint.
Suppose you are planning your own study about kissing couples. Before you collect the data, you know you would like the margin of error to be at most 3 percentage points and that you will use a 95% confidence level. Use this information to determine the sample size necessary for your study.
[This is a very common question asked of statisticians. Think about how to determine this using the \(z\)-interval formula. What information do you know? What information are you looking for?]
The value of \(n\) increases for probabilities closer to 0.50. If the process probability is not 0.50, the recommended \(n\) value will result in a smaller margin of error than requested.
We can solve for \(n\) by inverting the formula for the margin of error, assuming a value for \(\pi\text{.}\) The largest value for the margin of error occurs when \(\pi = 0.50\text{,}\) so that value can be used for a conservative estimate of the necessary sample size. You should always round your value up to the next integer to also ensure the margin of error wonβt exceed the specification. Notice that this calculation is much more difficult with a binomial confidence interval which does not have a simple formula to manipulate.
Recall from Practice Problem 1.8B that a student wanted to assess whether her dog Muffin tends to chase her blue ball and her red ball equally often when they are rolled at the same time. The student rolled both balls a total of 96 times, each time keeping track of which ball Muffin chased. The student found that Muffin chased the blue ball 52 times and the red ball 44 times. We arbitrarily decided to treat the blue ball as "success."
Reconsider Muffin from Practice Problem 1.8B and suppose our confidence interval is (0.2689, 0.4811). For each statement below, determine whether it is a valid or invalid interpretation of this confidence interval.
Incorrect. We know the sample proportion exactly; thereβs no need for a confidence interval for it. The confidence interval is for the parameter (the true probability), not the sample proportion.
Invalid
Correct! We know the sample proportion exactly; thereβs no need for a confidence interval for it. The confidence interval is for the parameter (the true probability), not the sample proportion.
Incorrect. This makes it sound like the parameter is random, when itβs actually fixed. The interval is whatβs random. Once weβve constructed this specific interval, it either contains the parameter or it doesnβt.
Invalid
Correct! This makes it sound like the parameter is random, when itβs actually fixed. The interval is whatβs random. Once weβve constructed this specific interval, it either contains the parameter or it doesnβt.
Incorrect. This specific interval is fixed; it doesnβt change. The correct interpretation is about the procedure (95% of intervals constructed this way contain the parameter), not this specific interval.
Invalid
Correct! This specific interval is fixed; it doesnβt change. The correct interpretation is about the procedure (95% of intervals constructed this way contain the parameter), not this specific interval.
Incorrect. This confuses the confidence interval for the parameter with the distribution of sample proportions. The confidence interval tells us about plausible values for the parameter, not where sample proportions fall.
Invalid
Correct! This confuses the confidence interval for the parameter with the distribution of sample proportions. The confidence interval tells us about plausible values for the parameter, not where sample proportions fall.
Incorrect. Confidence intervals donβt depend on whether a null hypothesis is true or not. The confidence level comes from the sampling distribution, not from any hypothesis.
Invalid
Correct! Confidence intervals donβt depend on whether a null hypothesis is true or not. The confidence level comes from the sampling distribution, not from any hypothesis.
Correct! This correctly expresses confidence about the parameter (the underlying probability/long-run proportion). This is a proper interpretation of a confidence interval.
Invalid
Incorrect. This is actually a valid interpretation because it correctly expresses confidence about the parameter (the underlying probability/long-run proportion).