Example 3.2: Unbanked Households

Section 16.2 Example 3.2: Unbanked Households

Try these questions yourself before you use the solutions following to check your answers.

Some U.S. households rely strictly on cash for the majority of their financial transactions. They use cash for many household expenses (e.g., utilities), which can be very inconvenient (and an increasing number of businesses no longer accept cash!), and they often must pay higher fees to cash checks they receive. They are not able to build credit (e.g., for borrowing money, renting) or earn interest. These are known as "unbanked" or "underbanked" households. The Federal Deposit Insurance Corporation (FDIC) conducts a survey of the U.S. every two years "to obtain information about unbanked and underbanked households as part of an effort to bring them into the economic mainstream." On the 2021 FDIC survey, 67,899 respondents were asked:

🔗

If the respondent answered "No," the household is considered "unbanked." In the 2021 survey, 447 of 31,653 White households (the householder identifies as White alone and not Hispanic or Latino) answered No, and 342 of 4,758 Black households (the householder identifies as Black or African American alone and not Hispanic or Latino) answered No.

🔗

fdic2021.txt

🔗

Aside

Checkpoint 16.2.1. Test of Significance.

Carry out a test of significance to determine whether the difference in these sample proportions is convincing evidence of a difference in the population proportions. What conclusion does the test allow you to draw about the population proportions who are unbanked?

🔗

Solution.

The response variable, whether a household is considered unbanked, is categorical and binary. Suppose we define "success" as "unbanked." Then we can let $\pi_{white}$ - $\pi_{Black}$ represent the difference in the population proportions that would be classified as unbanked.

🔗

$H_0: \pi_{white}$ - $\pi_{Black} = 0$ (there is no difference in the population proportions)

🔗

$H_a: \pi_{white}$ - $\pi_{Black} \neq 0$ (there is a difference)

🔗

Note: The research question was phrased in terms of "whether there is a difference" so we will use a two-sided test.

🔗

The FDIC, like other government surveys, aims to gather a representative sample. Strategies such as weighting and combining results across surveys can also be utilized. In this case, we will treat the White and Black households as independent samples from their respective populations.

🔗

The sample sizes are large, so the normal distribution would be a reasonable model for the sampling distribution of the difference in sample proportions. Using this normal distribution model and assuming the null hypothesis is true, we would expect the distribution of the difference in sample proportions to be centered at zero, and our estimate of the standard deviation of this distribution is

🔗

$\sqrt{0.0217(1-0.0217)(1/31653+1/4758)} \approx 0.0023$

🔗

where 0.0217 = (447+342)/(31653 + 4758), the pooled estimate of the proportion of successes.

🔗

With this standard error, the observed difference in the sample proportions (0.0141 - 0.0719 = -0.058) is more than 25 standard errors from the hypothesized difference of zero. The probability of observing a standardized statistic at least this extreme (in either direction) by independent random sampling alone is essentially zero.

🔗

With such a small p-value (e.g., less than 0.01), we reject the null hypothesis. If the population proportions had been equal, this tells us that there is a very small probability of random sampling alone leading to sample proportions at least as far apart as those found in the FDIC survey. Therefore, we have strong evidence that the population proportion who are unbanked is not the same among Black and White households. A 95% confidence interval for the difference in the population proportions is (-0.0652, -0.0503). So we are 95% confident that percentage of White households is about 5 to 6.5 percentage points lower than the percent of Black households in the U.S. This is not a large difference in a practical sense, but we don’t believe a difference in sample proportions this large could have arisen by random sampling alone (in particular due to the large sample sizes). Note that the imbalance in the two sample sizes really does not negatively impact our comparison of the two proportions.

🔗

Checkpoint 16.2.2. Compare Black and Hispanic Households.

Similarly, we can compare the Hispanic households (the householder identifies as Hispanic or Latino, regardless of race) to the Black households in 2021.

🔗

Race/Ethnicity	$n$	proportion unbanked
Black	4,758	0.072
Hispanic	4,953	0.060

How do you think the p-value for comparing these two proportions will compare to what you found in Checkpoint 1? Justify your reasoning.

🔗

Solution.

Because the combined sample size is much smaller between these two groups and the two sample proportions are much closer together, both of these factors would indicate a larger p-value. The p-value for the two-sample z-test does increase to 0.0161, but is still statistically significant even though the difference is now only about 1 percentage point.

🔗

But these differences could change depending on the income level of the households. Below are the same race/ethnicity groups among those households in the "less than $15,000" income level.

🔗

Less than $15,000 Income Level.

Race/Ethnicity	$n$	proportion unbanked
White	2,449	0.084
Black	901	0.194
Hispanic	609	0.159

🔗

Checkpoint 16.2.3. Interpret Value.

Interpret the 0.159 value in context.

🔗

Solution.

For households that have a household income below $15,000 and a Black householder, about 15.9% of such households in this sample were unbanked. This is a much higher rate than we saw overall.

🔗

Checkpoint 16.2.4. Compare Low Income Groups.

How do you think the p-value for comparing these White and Black households will compare to what you found in Checkpoint 1? What about comparing the Black and Hispanic households? Justify your reasoning.

🔗

What about a higher income level?

🔗

Race/Ethnicity	$n$	proportion unbanked
White	5,920	0.0060
Black	819	0.0195
Hispanic	909	0.0319

Solution.

The difference between these two sample proportions is a bit larger (about 11 percentage points), but the sample sizes are also much smaller. The smaller sample sizes will increase the p-value. The z-value is indeed lower ($z$ = 8.90), but still highly significant.

🔗

Checkpoint 16.2.5. Impact of Higher Income.

How did the overall proportion that are unbanked change with the higher income level for the household? Is this what you expected? How will this impact the p-value?

🔗

Solution.

In the higher income range, the proportion unbanked is much smaller. If most of the respondents are in this higher range, this pulls down the overall percentage. The p-value is not necessarily impacted much by the lower proportions, the key is the difference in proportions. Even though this is likely smaller when comparing smaller numbers, the standard error will also decrease as the overall proportion is closer to 0 than 0.50.

🔗

Checkpoint 16.2.6. Differences Across Income Levels.

Among White, Hispanic, and Black households, does the difference in the proportion that are unbanked persist across income levels?

🔗

Solution.

In the two income groups we examined, we do see a consistently higher rate of unbanked households among Black and Hispanic households compared to White households. We also notice that the unbanked rate is similar (and even changes direction) between the Black and Hispanic households between income categories.

🔗

Checkpoint 16.2.7. Alternative Comparison.

The difference between the Black and Hispanic households in the $50,000-$75,000 group is less than 2 percentage points, which seems rather small. Suggest another way to compare these two values.

🔗

Solution.

This is where a relative risk calculation might be more meaningful as the Hispanic proportion is 1.6 times larger than the Black proportion.

🔗

Checkpoint 16.2.8. Impact of Missing Data.

One issue that arises with questions about income is that many households don’t want to disclose this information on a survey. In particular, 33,912 of the respondents in this survey did not provide this information. How does this impact our comparisons in Checkpoint 4 and Checkpoint 5? How does this impact the generalizability of the results of this study? Explain your reasoning.

🔗

Solution.

The missing values do decrease the sample sizes quite a bit, which we have seen directly impacts the size of the p-value. But perhaps even more problematic, we don’t know whether those who don’t answer the income question are systematically different on any of the variables, like financial transactions, than those who do answer the question. This can cause the sample proportions to not accurately reflect the population proportions.

🔗

Checkpoint 16.2.9. Data Imputation.

One method to deal with missing observations is to assume they are "missing at random" and to use "data imputation" to fill in the missing values. For example, a value can be inferred using other information provided on the survey (e.g., education, job status) or using information from similar households in the dataset (see CPS Technical Paper 77). Provide one reason for and one reason against data imputation in this context.

🔗

Solution.

A key advantage of imputation is you do not have to lose a large proportion of your sample. However, it is difficult to check the assumptions behind imputation and whether you are mimicking data you have already collected and falsely making the sample appear more homogenous and less variable.

🔗

Video walkthrough

🔗

You have attempted of activities on this page.

🔗

Prev Top Next