Skip to main content

Section 24.5 Section 5.1 Summary

This section introduced the chi-squared test in three related settings: comparing proportions across populations/treatments, comparing full categorical distributions across populations/treatments, and testing association between two categorical variables in one random sample.
Across all settings, the mechanics are the same: compare observed cell counts to expected counts under the null model and summarize discrepancies with the chi-squared statistic.

Summary of Chi-squared Tests.

Test of homogeneity: Data are independent random samples from multiple populations/processes or from a randomized experiment with multiple treatments.
\(H_0\text{:}\) group distributions are the same (or \(p_1 = p_2 = \cdots = p_I\) in the binary-response case).
\(H_a\text{:}\) at least one group distribution differs.
Test of association: Data are one random sample cross-classified by two categorical variables.
\(H_0\text{:}\) no association between variable 1 and variable 2 in the population.
\(H_a\text{:}\) there is an association.
Test statistic: \(\chi^2 = \sum \frac{(O-E)^2}{E}\)
Degrees of freedom: \((r-1)(c-1)\)
Technical conditions: at least 80% of expected counts are at least 5, and all expected counts are at least 1.

Technology Notes.

R: Build a matrix of counts and use chisq.test(matrix). Inspect $expected and $residuals for diagnostics.
Minitab: Stat > Tables > Chi-Square Test for Association with raw or summarized data.
JMP: Analyze > Fit Y by X with nominal variables (and frequency column if tallied).
Applet: Paste raw data or a two-way table into the Analyzing Two-way Tables applet and inspect table, \(\chi^2\text{,}\) and contributions.
When the test is significant, examining individual cell contributions helps describe the nature of the association. The data collection method still determines what conclusions are justified about causation and generalizability.
You have attempted of activities on this page.