Skip to main content

Section 24.1 Investigation 5.1: Dr. Spock’s Trial

Exercises The Study

The well-known pediatrician and child development author Dr. Benjamin Spock was also an anti-Vietnam War activist. In 1968 he was put on trial and convicted on charges of conspiracy to violate the Selective Service Act (encouraging young men to avoid the draft). The case was tried by Judge Ford in Boston’s Federal courthouse. A peculiar aspect of this case was that his jury contained no women.
A lawyer writing about the case that same year in the Chicago Law Review said, "Of all defendants at such trials, Dr. Spock, who had given wise and welcome advice on child-bearing to millions of mothers, would have liked women on his jury" (see Ziesel, 1969). Opinion polls also showed that women were generally more opposed to the Vietnam War than men.
In the Boston District Court, jurors are selected in three stages. The Clerk of the Court is supposed to select 300 names at random from the City Directory. In Dr. Spock’s trial, this sample included only 102 women, even though 53% of the eligible jurors in the district were female. At the next stage, the judge selects 30 or more names from those in the box to constitute the venire. Judge Ford chose 100 potential jurors out of these 300 people, and his choices included only 9 women. Finally, 12 actual jurors are selected after interrogation by both the prosecutor and the defense counsel. Only one potential female juror came before the court and she was dismissed by the prosecution.
In filing his appeal, Spock’s lawyers argued that Judge Ford had a history of venires in which women were systematically underrepresented. They compared the gender breakdown of this judge’s venires with the venires of six other judges in the same Boston court from a recent sample of court cases. Records revealed the following data:
Judge 1 Judge 2 Judge 3 Judge 4 Judge 5 Judge 6 Judge 7 Total
Women on jury list 119 197 118 77 30 149 86 776
Men on jury list 235 533 287 149 81 403 511 2199
Total 354 730 405 226 111 552 597 2975

Descriptive Statistics.

1. Compare Sample Proportions.
(a) Calculate the proportion of women on the jury list for each judge. Also create a segmented bar graph or mosaic plot to compare these distributions. How do the judges compare?
2. State Hypotheses.
(b) Let \(p_i\) represent the long-run probability of judge \(i\) selecting a female for the jury list. State a null and an alternative hypothesis for testing whether these data provide reason to doubt that the probability of women on jury lists is the same for all seven judges.
Note: The null hypothesis states only that the probabilities are equal; it does not specify a particular value for this common probability. The alternative states that at least one probability differs.

Standardized (Test) Statistic.

3. Propose a Statistic.
(c) Suggest a standardized statistic (formula based on your observed sample data) for assessing the strength of evidence against the null hypothesis. Write your statistic as a formula or rule for obtaining one number that takes into account information relevant to comparing all seven groups.
4. Pairwise Differences.
(d) One possibility is to compare all sample proportions to each other by looking at all pairwise differences. How many such pairs are there? Could we simply sum these differences?

Definition: Mean Group Difference.

One possible statistic for measuring how much the sample proportions differ from each other is the Mean Group Difference, which finds the absolute value of each pairwise difference, sums these values, and divides by the number of differences.
5. Interpret Statistic Values.
(e) What types of values do you expect this statistic to have when the null hypothesis is true? What about if the null hypothesis is false?
Although we could consider using the Mean Group Difference statistic here, a more common statistic is the chi-squared statistic. Rather than looking at all differences among the groups, it focuses on how each cell count differs from its expected value under the null hypothesis and then sums this up across all cells.
6. Estimate Common Probability.
(f) Assuming the null hypothesis is true and each judge has the same probability of a female juror in his pool, suggest an estimate for this common probability.
7. Expected Counts for Judge 1.
(g) Judge 1 had 354 jurors on the list. If the long-run proportion of women were 0.261, how many would you expect to be female? How many would you expect to be male?
Recall: An expected value does not need to be an integer.
8. Expected Counts for Judge 2.
(h) How many jurors on Judge 2’s list would you expect to be female if his long-run proportion were also 0.261? How many would you expect to be male?
9. Complete Expected Counts Table.
(i) Enter your expected counts below the observed counts in the following table.
Judge 1 Judge 2 Judge 3 Judge 4 Judge 5 Judge 6 Judge 7
Women (Observed) 119 197 118 77 30 149 86
Women (Expected) 105.71 58.99 28.97 144.07 155.82
Men (Observed) 235 533 287 149 81 403 511
Men (Expected) 299.30 167.01 82.03 407.93 441.18
10. Observed vs. Expected.
(j) Are the observed counts equal to the expected counts in each cell? Is it possible that the long-run probability of a female juror is the same for each judge and that the observed differences are due to random chance alone?
Chi-squared Test Statistic.
The chi-squared test statistic is used to compare observed and expected counts in a two-way table:
\begin{equation*} \chi^2 = \sum \frac{(\text{observed} - \text{expected})^2}{\text{expected}} \end{equation*}
This standardized statistic computes a discrepancy for each cell and sums across all cells.
11. Compute Chi-squared Statistic.
(k) Calculate this standardized statistic for the two-way table above. Fill in the missing terms for the first two judges and then sum all terms.
12. Direction of Evidence.
(l) What types of chi-squared values (large, small, positive, negative) constitute evidence against the null hypothesis of equal long-run probabilities? Explain.

Null Distribution.

To approximate a p-value, examine how the standardized statistic varies under the null hypothesis of equal probabilities by simulating many random samples under that model.
13. Simulation Plan.
(m) Outline the steps you would use to generate random data for each judge under the null hypothesis that the probability of a juror being female is the same for each judge.
14. Use Applet.
(n) Use the Analyzing Two-Way Tables - Samples applet: ChiSqSample.
  • In the data window, type: Spock.txt.
  • Press Use Data, then press Use Table.
  • Check Show Sampling Options.
  • Use 1000 for Number of Samples and press Draw Samples.
  • Use the pull-down to switch the statistic to the chi-squared value.
15. Describe Null Distribution.
(o) Describe the shape, center, and variability of the simulated null distribution.
16. Empirical p-value and Conclusion.
(p) Based on your simulation, determine the proportion of simulated values that are as large or larger than your value from part (k). Does this empirical p-value provide convincing evidence that the observed discrepancy is larger than expected by chance? What do you conclude about whether the seven judges had the same long-run probability of selecting a female juror?

Mathematical Model.

17. Normal Model Check.
(q) Does the normal distribution appear to adequately predict the behavior of the simulated null distribution of the chi-squared statistic?
18. Chi-squared Model Check.
(r) Check the applet option to overlay a chi-squared distribution. Does this model appear to adequately predict the simulated null distribution?
The chi-squared distribution is skewed right and provides a reasonable model for this statistic for large sample sizes. We typically use this model when all expected counts are at least 1 and at least 80% of expected counts are at least 5.
When comparing several population proportions, the chi-squared degrees of freedom are \(c-1\text{,}\) where \(c\) is the number of explanatory variable categories.
Technology Detour - Chi-squared Probabilities.
  • Applet: Use the Chi-squared Probability Calculator and specify degrees of freedom, observed value, and direction.
  • R: Use iscamchisqprob(xval, df), where xval is the observed statistic and df is degrees of freedom.
  • Minitab: Graph > Probability Distribution Plot (View Probability), choose Chi-Square, set df and right-tail cutoff.
  • JMP: Distribution Calculator, choose Chi Square, set df, and compute X > Qa for the observed value.
19. Compare p-values.
(s) How does this model-based p-value compare to the empirical p-value from part (p)?
Discussion: If the null hypothesis is rejected, the conclusion is that at least one population proportion differs from the rest, but the test itself does not identify exactly which one(s). To learn more, inspect the component terms in the chi-squared sum.
20. Largest Chi-squared Components.
(t) Return to the sum you calculated in part (k). Which cell comparisons provide the largest standardized discrepancies between observed and expected counts?
21. Direction of Largest Discrepancies.
(u) For the cells identified in part (t), which is larger: observed or expected counts? Explain the implications.
22. Identify Judge in Spock Case.
(v) Which judge do you believe tried Dr. Spock’s case? Explain.

Study Conclusions.

In this study, we modeled the jury-panel selections as independent binomial processes and compared seven sample proportions at once. One judge clearly stood out. If the judges truly had the same long-run probability of selecting women, differences as large as those observed would be extremely unlikely by chance alone. Thus the data provide strong evidence that the long-run probability of a juror being female was not the same across all seven judges. The largest contributions to the chi-squared statistic came from Judge 7, with substantially fewer women and more men than expected. This was Judge Ford, assigned to Dr. Spock’s case.
Because these data are observational and not generated by a true random mechanism, a cause-and-effect conclusion is not warranted. Still, the p-value quantifies how surprising these outcomes would be under the equal-probability model.

Technical Conditions.

The chi-squared distribution approximates the sampling distribution of the chi-squared statistic when data arise from independent binomial random variables. This approximation is generally considered valid when all expected counts are at least 1 and at least 80% of expected counts are at least 5.
The data should come from independent random samples or from a randomized comparative experiment. This procedure is often called a chi-squared test of homogeneity.
A key advantage of this test is that it provides a single overall p-value across all group comparisons, helping control the overall Type I error rate.

23. Practice Problem 5.1.

Why not just use two-sample \(z\)-procedures to compare pairs of proportions?
(a) How many two-group comparisons are there among 7 judges?
(b) If the significance level is 0.05, what is the Type I error probability for any one comparison?
(c) What about the probability of at least one Type I error among all these comparisons: larger or smaller than 0.05? Explain.

Technology Detour - Simulating Randomization Test for Two-Way Tables.

You have attempted of activities on this page.