Skip to main content

Section 28.1 Chapter 5 Summary

In this chapter, you explored procedures for analyzing two or more groups and for analyzing relationships. As in earlier chapters, you saw that different study designs could lead to the same mathematical computations, but also remember to consider the study design when drawing your final conclusions. In particular, you should think about which variables were controlled by the researchers and which were not.
A very common procedure to use with two-way tables is the chi-squared procedure. You saw that this reduces to the two-sample \(z\)-test in the case of two groups. The chi-squared procedure is a large sample procedure and Fisher’s Exact Test can be used with smaller sample sizes. A parallel procedure for comparing several means is Analysis of Variance, which models the ratio of the variability between groups to the variability within groups using the \(F\) distribution.
Regression is a very important field of study and your next course in statistics will probably be entirely about regression models. We scratched the surface here by looking at the appropriate numerical and graphical summaries for analyzing the relationship between two quantitative variables (scatterplots and correlation coefficient) and least squares regression models for linear relationships. In deciding whether you have a statistically significant relationship, the \(t\)-statistic uses the common standardized statistic form that you saw in earlier chapters \((\text{observed} - \text{hypothesized})/\text{standard error}\text{,}\) where the standard error of the sample slopes depends on the sample size, the variability in the \(x\) values and the vertical spread about the regression line.
These calculations are complex to carry out by hand, but our focus is on knowing when to use which procedure, how to check the validity of the procedure, and how to interpret the resulting output. Learning to effectively interpret and communicate statistical results is at least as important a skill as learning which computer menus to use!

Subsection 28.1.1 Technology Summary

In this chapter, you learned how to use software to perform chi-squared tests, ANOVA, and inference for regression. You also learned how to create scatterplots, including coded scatterplots, and calculate correlation coefficients. You used applets to explore properties of \(F\)-statistics, regression lines, and regression coefficients. Though we hope these have given you some memorable visual images, you will generally use software to carry out specific analyses.

Subsection 28.1.2 Choice of Procedures for Comparing Several Populations, Exploring Relationships

Setting \(EV\text{:}\) Categorical\(RV\text{:}\) Categorical \(EV\text{:}\) Categorical\(RV\text{:}\) Quantitative \(EV\text{:}\) Quantitative\(RV\text{:}\) Quantitative
Graphical summary Segmented bar graph Stacked boxplots Scatterplot
Numerical summary Conditional proportions Group means and standard deviations Correlation coefficient
Procedure name Chi-squared ANOVA Regression
Valid to use theory-based procedure if
All expected counts \(> 1\text{,}\) at least 80% of expected counts \(> 5\)
Data are simple random samples or independently chosen random samples or random assignment
Independent SRSs or random assignment
Populations normal (probability plots of samples)
Variances are equal (\(s_{\max}/s_{\min} \lt 2\))
Linear relationship (residuals vs. \(x\))
Independent observations (e.g., random samples or randomization)
Normality of response at each \(x\) value (probability plot/histogram of residuals)
Equal variance of \(y\) at each \(x\) value (residuals vs. \(x\))
Null hypothesis \(H_0\text{:}\) Response variable distributions are the same or \(H_0\text{:}\) no association between response variable and explanatory variable \(H_0: \mu_1 = \cdots = \mu_I\) or \(H_a\text{:}\) treatment means are equal \(H_0: \beta_1 = 0\) (no relationship between response variable and explanatory variable)
Test statistic \(\chi^2 = \sum \frac{(O-E)^2}{E}\) \(F = MST/MSE\) \(t_0 = \frac{\beta_1 - \text{hypothesized value}}{SE_{\beta_1}}\)
Distribution for \(p\)-value Chi-squared with \((c-1)(r-1)\) df \(F\) with \(I-1, n-I\) df \(t\) with \(n-2\) df
R chisq.test, summary(aov(response~x)), summary(lm(response~x))
Minitab Stat > Tables > Chi-square Test for Association; Stat > ANOVA > One-way; Stat > Regression > Regression > Fit Regression Model
JMP Analyze > Fit Y by X (for all three procedures)
Applet Analyzing Two-way Tables; Comparing Groups (Quantitative); Analyzing Two Quantitative Variables

Subsection 28.1.3 Quick Reference to ISCAM R Workspace Functions and Other R Commands

Procedure Desired Function Name (Options)
Normal Probability Plot qqnorm(data)
Probability Plot qqplot(data from distribution, your data)
Probabilities from Chi-squared distribution iscamchisqprob(xval, df) returns upper tail probability
Chi-squared Test chisq.test(table(data)); chisq.test(table(data))$expected; $residuals
One-way ANOVA summary(aov(response~explanatory))
Probabilities from F distribution pf(x, df1, df2, lower.tail=FALSE)
Scatterplot plot(explanatory, response)
Correlation coefficient cor(x, y)
Least Squares Regression Line lm(response~explanatory); lm(response~explanatory)$residuals
Coded Scatterplot plot(response~explanatory, col=groups)
Inference for regression summary(lm(response~explanatory))
Prediction intervals predict(lm(response~explanatory), newdata=data.frame(explanatory-value), interval="prediction" or "confidence")
Superimpose y = x line lines(response, response)
Superimpose regression line abline(lm(response~explanatory))

Subsection 28.1.4 Quick Reference to Minitab Commands

Procedure Desired Menu
Probability Plot Graph > Probability Plot
Probabilities from Chi-Square distribution Graph > Probability Distribution Plot
Chi-square Test Stat > Tables > Chi-Square Test for Association; Stat > Tables > Cross Tabulation and Chi-Square
One-way ANOVA Stat > ANOVA > One-way
Probabilities from F distribution Graph > Probability Distribution Plot
Scatterplot Graph > Scatterplot
Correlation coefficient Stat > Basic Statistics > Correlation
Least Squares Regression Line Stat > Regression > Fitted Line Plot; Storage: Residuals
Coded Scatterplot Graph > Scatterplot, With Groups
Inference for Regression Stat > Regression > Regression > Fit Regression Model
Prediction Intervals After running the model: Stat > Regression > Regression > Predict
Superimpose y = x line right click, Add > Calculated Line
Superimpose regression line Stat > Regression > Fitted Line Plot; right click on scatterplot, Add > Regression Fit

Subsection 28.1.5 Quick Reference to JMP Commands

Procedure Desired Menu; Hot spot
Normal Probability Plot Analyze > Distribution; Normal quantile plot
Probability Plot Analyze > Distribution; Continuous Fit
Probabilities from Chi-squared distribution Distribution Calculator
Chi-squared Test Analyze > Fit Y by X
One-way ANOVA Analyze > Fit Y by X; Means/ANOVA/Pooled t
Probabilities from F distribution Distribution Calculator
Scatterplot Analyze > Fit Y by X
Correlation coefficient Analyze > Multivariate Methods > Multivariate
Least Squares Regression Line Analyze > Fit Y by X; Fit Line
Coded Scatterplot Analyze > Fit Y by X; By
Inference for regression Analyze > Fit Y by X; Fit Line
Prediction intervals Analyze > Fit Y by X; Fit Line; Linear Fit > Mean Confidence Interval Formula or Individual Confidence Interval Formula
Superimpose y = x line Analyze > Fit Y by X; Fit Special
Superimpose regression line Analyze > Fit Y by X; Fit Line
You have attempted of activities on this page.