The null hypothesis will be that there is no effect due to wording of the question. So if
\(\pi_{allow}\) is the probability someone says "yes" to the allow question and if
\(\pi_{forbid}\) is the probability someone says no to the forbid question:
\(H_0: \pi_{allow}\) -
\(\pi_{forbid}\) = 0
and based on the prior research:
\(H_a: \pi_{allow}\) -
\(\pi_{forbid}\) < 0
Fisherβs Exact Test indicates how often we expect to see as few as 8 or fewer successes in the allow group (equivalently, at least as many as 12 successes in the forbid group). So if we define X to be the number in the allow group in favor of the speeches, X follows a hypergeometric distribution with N = 25, M = 20, and n = 11. The p-value will be P(X β€ 8).
P(X β€ 8) = P(X = 0) + P(X = 1) + ... + P(X = 8)
= [C(20,0)ΓC(5,11) + C(20,1)ΓC(5,10) + ... + C(20,8)ΓC(5,3)] / C(25,11)
Note, it would NOT be valid to use the normal approximation here because we do not have at least 5 failures in both groups (normal approx. p-value equals 0.2149, with continuity correction equals 0.384).
Thus, if the word choice in the question had no effect, we would get experimental results at least this extreme in about 38% of random assignments. This indicates that our experimental data is not surprising and does not provide evidence that the wording of the question had an effect on these students. Thus, we will not say the word choice in the question made a difference (though if the p-value had been small, because this was a randomized experiment, a cause-and-effect conclusion would have been valid). Furthermore, we should be hesitant in generalizing these results beyond introductory statistics students at this school. We do not know how these students were selected from this school nor whether these students might be representative of other college students. Perhaps this is a private school or the students tend to be more liberal than at other schools.