Literary Digest was a well-respected political magazine founded in 1890. Using sampling, they correctly predicted the presidential outcomes from 1916β1932. In 1936, they conducted the most extensive (to that date) public opinion poll in history. They mailed out questionnaires (postcards) to over 10 million people (about one-fourth of voters), hand addressing more than a quarter million postcards per day) whose names and addresses they obtained from club rosters, city directories, and (mostly) vehicle registration lists and telephone books.
Based on almost 2.4 million responses (2,376,523), the Literary Digest predicted that 54% of voters would vote for Republican Alf Landon (then governor of Kansas) in the upcoming presidential election, and only 41% would vote for Democrat Franklin Roosevelt (the incumbent).
Identify the variable and population of interest, the sample, and the sampling frame in this study. Also define the parameter and statistic, in words and symbols, and indicate any values that you know.
The 0.57 represents the proportion among just Landon and Roosevelt (the two main candidates), while 0.54 includes other candidates like William Lemke (Union Party) in the denominator.
Have you ever heard of Alf Landon? He lost. By a landslide. Incumbent Democrat Franklin Roosevelt won the election, carrying 60.8% of the popular vote to Landonβs 36.5%.
Give two plausible explanations why the Literary Digest prediction was so much in error. In particular, talk about the direction of the bias β why was this sampling method vulnerable to producing an overestimate of the parameter?
Still, a 24% response rate is much higher than current polls and this was the Digestβs largest poll yet, shouldnβt this improve their prediction? How are current polls with smaller response rates and smaller sample sizes able to be more accurate? One reason suggested for the error in 1936 was that the Digest had consistently been having more Republican voters responding to their straw vote. Does the data support this argument?
Open the LitDigest1936.xlsx file in Excel or Google Sheets. This contains the raw counts for the three main candidates (including William Lemke, Union Party), as well as an overall total number of straw votes cast in each state.
Set up a formula for determining the number of respondents to the 1936 poll who said they voted in 1932 for either the Republican, Democratic, or Socialist or Other candidate. What proportion of these voted for the Republican candidate?
Now find the actual 1932 election results, what proportion of voters voted for the Republican (Hoover) candidate (among the three major candidates)? Did the Literary Digest poll have too many or too few Republican voters? How could we account for this in our estimates of vote count?
The proportion who claimed to vote Republican in the survey (0.507) is noticeably larger than the proportion that actually voted Republican (0.399). This supports the suspicion that the survey overrepresented Republicans compared to the voting population.
In the 1936 poll, about 51% of respondents said they voted Republican in 1932, but only about 40% of the actual voters in 1932 voted Republican, a ratio of 0.78. So the Digestβs sample appears to overrepresent the Republican voters. Similarly, persons who said they voted Democrat or Other in the 1932 election were unrepresented in the 1936 poll (ratios: 1.197, 2.23 respectively). (Lohr and Brick, 2017, also estimated the ratio for the non-voters and missing to be 1.1275 for Democrats and 0.871 for Republicans.)
This means we want to lower the number of Republican votes in the Digest poll by multiplying by 0.78, to adjust for the overrepresentation in the sample, and increase the number of Democrat votes in the poll by multiplying by 1.197 and so forth.
Start with the Landon voters in the Digest poll, using the breakdown by how they voted in 1932, adjust the counts using the above ratios. What is the total number of Landon votes?
Set up a formula using these weights and the breakdown of the planned Landon voters by the 1932 vote, columns D-I, to find a new count. Using 0.871 for non-voters and missing.
Between these two candidates, what is the adjusted percentage voting for Roosevelt? Is this larger or smaller than the original Literary Digest prediction?
Although this is still much lower than the actual vote share for Roosevelt between these two candidates (62%), you can see how this process tries to account for the sampling bias in the original poll. There are still a lot of assumptions (e.g., respondents accurately reported their 1932 vote), and non-respondents to the LD survey had the same relationship between the 1932 and 1936 votes as the respondents. But if you apply this technique to the individual states, 10 states change from Landon to Roosevelt, including California and New York, and Roosevelt is predicted to win 26 states (276 electoral votes), rather than 16 states (161 electoral votes).
So using this one other piece of information would have at least predicted the correct outcome, though still way underestimating the margin of victory. More complicated weighting schemes further adjust based on the size of the error in the 1928 LD poll or use a regression model based on the previous two elections. The Literary Digest also collected but did not publish data on postmarks of returned ballots and could have adjusted for rural/urban differences or county-level demographics from the 1930 census.
In this case, the method using data from 1928 and 1932 seems to be the best as the center of the state by state proportions is at least centered around the country proportion, rather than most of the states being too low.
There were two main issues in the Literary Digest poll. One, the sampling frame did not include all members of the population of interest and in particular failed to include those that were poorer (and at that time likely to be Democrats). Second, the voluntary response nature of the poll implies the surveyors were more likely to hear from those unhappy with the status quo (incumbent candidate) or with more time on their hands to complete such surveys (e.g., retired folks), and even those more willing to pay for a stamp. Both of these probably point to an overrepresentation of Republicans. Bad sampling frames and voluntary response bias are perhaps the most common sources of sampling error. By the way, a fledgling pollster of the time, George Gallup, actually bet that he would predict the percentages more accurately. Not only did he correctly predict the Digest result with only 3,000 respondents, he also correctly predicted a Roosevelt victory! The issues with the Literary Digest poll were evident in earlier elections (overpredicting the popular vote for the winner in 1924 and 1928, but the results were so one-sided, the bias didnβt matter), and even though they could have weighted the results, they chose not to and let readers "draw their own conclusions." But Lohr and Brick (2017) point out that the largest issue was probably failing to accurately assess the uncertainty in their estimates.
Lusinchi, D. (2012). "βPresidentβ Landon and the 1936 Literary Digest Poll: Were Automobile and Telephone Owners to Blame?" Social Science History, 36:23β54.
In the mid-1980s, Dr. Shere Hite, actress and writer, and subject of a 2023 documentary, undertook a survey of womenβs attitudes toward relationships, love, and sex by distributing 100,000 questionnaires in womenβs groups. Of the 4500 who returned the questionnaire, 96% said that they gave more emotional support than they received from their husbands or boyfriends. An ABC News/Washington Post poll surveyed a random sample of 767 women, finding that 44% claimed to give more emotional support than they received.
An article published in the June 6, 2006 issue of the journal Pediatrics describes a survey on the topic of college students intentionally injuring themselves. Researchers invited 8300 undergraduate and graduate students at Cornell University and Princeton University to participate in the survey. A total of 2875 students responded, with 17% of them saying that they have purposefully injured themselves. Suppose we are interested in the proportion of self-injuries in the population of all college students.
Do you think it is likely that this sample is representative of the population of all college students in the world? What about all college students in the U.S.? Explain.
For which of the following variables would you suspect this sample would be representative of the population of all U.S. college students? Justify your answer.
Notice the State Unknown row. If we donβt know the state, how do we know the electoral count for those statesβ¦ Based on the values given in that row, what do you think the counts for Landon and Roosevelt for individuals with unknown states actually were?