Section 28.1 Chapter 5 Summary
In this chapter, you explored procedures for analyzing two or more groups and for analyzing relationships. As in earlier chapters, you saw that different study designs could lead to the same mathematical computations, but also remember to consider the study design when drawing your final conclusions. In particular, you should think about which variables were controlled by the researchers and which were not.
A very common procedure to use with two-way tables is the chi-squared procedure. You saw that this reduces to the two-sample \(z\)-test in the case of two groups. The chi-squared procedure is a large sample procedure and Fisher’s Exact Test can be used with smaller sample sizes. A parallel procedure for comparing several means is Analysis of Variance, which models the ratio of the variability between groups to the variability within groups using the \(F\) distribution.
Regression is a very important field of study and your next course in statistics will probably be entirely about regression models. We scratched the surface here by looking at the appropriate numerical and graphical summaries for analyzing the relationship between two quantitative variables (scatterplots and correlation coefficient) and least squares regression models for linear relationships. In deciding whether you have a statistically significant relationship, the \(t\)-statistic uses the common standardized statistic form that you saw in earlier chapters \((\text{observed} - \text{hypothesized})/\text{standard error}\text{,}\) where the standard error of the sample slopes depends on the sample size, the variability in the \(x\) values and the vertical spread about the regression line.
These calculations are complex to carry out by hand, but our focus is on knowing when to use which procedure, how to check the validity of the procedure, and how to interpret the resulting output. Learning to effectively interpret and communicate statistical results is at least as important a skill as learning which computer menus to use!
Subsection 28.1.1 Technology Summary
In this chapter, you learned how to use software to perform chi-squared tests, ANOVA, and inference for regression. You also learned how to create scatterplots, including coded scatterplots, and calculate correlation coefficients. You used applets to explore properties of \(F\)-statistics, regression lines, and regression coefficients. Though we hope these have given you some memorable visual images, you will generally use software to carry out specific analyses.
Subsection 28.1.2 Choice of Procedures for Comparing Several Populations, Exploring Relationships
| Setting | \(EV\text{:}\) Categorical\(RV\text{:}\) Categorical | \(EV\text{:}\) Categorical\(RV\text{:}\) Quantitative | \(EV\text{:}\) Quantitative\(RV\text{:}\) Quantitative |
|---|---|---|---|
| Graphical summary | Segmented bar graph | Stacked boxplots | Scatterplot |
| Numerical summary | Conditional proportions | Group means and standard deviations | Correlation coefficient |
| Procedure name | Chi-squared | ANOVA | Regression |
| Valid to use theory-based procedure if |
Data are simple random samples or independently chosen random samples or random assignment
|
Independent SRSs or random assignment
Populations normal (probability plots of samples)
Variances are equal (\(s_{\max}/s_{\min} \lt 2\))
|
Linear relationship (residuals vs. \(x\))
Independent observations (e.g., random samples or randomization)
Normality of response at each \(x\) value (probability plot/histogram of residuals)
|
| Null hypothesis | \(H_0\text{:}\) Response variable distributions are the same or \(H_0\text{:}\) no association between response variable and explanatory variable | \(H_0: \mu_1 = \cdots = \mu_I\) or \(H_a\text{:}\) treatment means are equal | \(H_0: \beta_1 = 0\) (no relationship between response variable and explanatory variable) |
| Test statistic | \(\chi^2 = \sum \frac{(O-E)^2}{E}\) | \(F = MST/MSE\) | \(t_0 = \frac{\beta_1 - \text{hypothesized value}}{SE_{\beta_1}}\) |
| Distribution for \(p\)-value | Chi-squared with \((c-1)(r-1)\) df | \(F\) with \(I-1, n-I\) df | \(t\) with \(n-2\) df |
| R |
chisq.test, summary(aov(response~x)), summary(lm(response~x))
|
| Minitab | Stat > Tables > Chi-square Test for Association; Stat > ANOVA > One-way; Stat > Regression > Regression > Fit Regression Model |
| JMP | Analyze > Fit Y by X (for all three procedures) |
| Applet | Analyzing Two-way Tables; Comparing Groups (Quantitative); Analyzing Two Quantitative Variables |
Subsection 28.1.3 Quick Reference to ISCAM R Workspace Functions and Other R Commands
| Procedure Desired | Function Name (Options) |
|---|---|
| Normal Probability Plot | qqnorm(data) |
| Probability Plot | qqplot(data from distribution, your data) |
| Probabilities from Chi-squared distribution |
iscamchisqprob(xval, df) returns upper tail probability |
| Chi-squared Test |
chisq.test(table(data)); chisq.test(table(data))$expected; $residuals
|
| One-way ANOVA | summary(aov(response~explanatory)) |
| Probabilities from F distribution | pf(x, df1, df2, lower.tail=FALSE) |
| Scatterplot | plot(explanatory, response) |
| Correlation coefficient | cor(x, y) |
| Least Squares Regression Line |
lm(response~explanatory); lm(response~explanatory)$residuals
|
| Coded Scatterplot | plot(response~explanatory, col=groups) |
| Inference for regression | summary(lm(response~explanatory)) |
| Prediction intervals |
predict(lm(response~explanatory), newdata=data.frame(explanatory-value), interval="prediction" or "confidence")
|
| Superimpose y = x line | lines(response, response) |
| Superimpose regression line | abline(lm(response~explanatory)) |
Subsection 28.1.4 Quick Reference to Minitab Commands
| Procedure Desired | Menu |
|---|---|
| Probability Plot | Graph > Probability Plot |
| Probabilities from Chi-Square distribution | Graph > Probability Distribution Plot |
| Chi-square Test | Stat > Tables > Chi-Square Test for Association; Stat > Tables > Cross Tabulation and Chi-Square |
| One-way ANOVA | Stat > ANOVA > One-way |
| Probabilities from F distribution | Graph > Probability Distribution Plot |
| Scatterplot | Graph > Scatterplot |
| Correlation coefficient | Stat > Basic Statistics > Correlation |
| Least Squares Regression Line | Stat > Regression > Fitted Line Plot; Storage: Residuals |
| Coded Scatterplot | Graph > Scatterplot, With Groups |
| Inference for Regression | Stat > Regression > Regression > Fit Regression Model |
| Prediction Intervals | After running the model: Stat > Regression > Regression > Predict |
| Superimpose y = x line | right click, Add > Calculated Line |
| Superimpose regression line | Stat > Regression > Fitted Line Plot; right click on scatterplot, Add > Regression Fit |
Subsection 28.1.5 Quick Reference to JMP Commands
| Procedure Desired | Menu; Hot spot |
|---|---|
| Normal Probability Plot | Analyze > Distribution; Normal quantile plot |
| Probability Plot | Analyze > Distribution; Continuous Fit |
| Probabilities from Chi-squared distribution | Distribution Calculator |
| Chi-squared Test | Analyze > Fit Y by X |
| One-way ANOVA | Analyze > Fit Y by X; Means/ANOVA/Pooled t |
| Probabilities from F distribution | Distribution Calculator |
| Scatterplot | Analyze > Fit Y by X |
| Correlation coefficient | Analyze > Multivariate Methods > Multivariate |
| Least Squares Regression Line | Analyze > Fit Y by X; Fit Line |
| Coded Scatterplot | Analyze > Fit Y by X; By |
| Inference for regression | Analyze > Fit Y by X; Fit Line |
| Prediction intervals | Analyze > Fit Y by X; Fit Line; Linear Fit > Mean Confidence Interval Formula or Individual Confidence Interval Formula |
| Superimpose y = x line | Analyze > Fit Y by X; Fit Special |
| Superimpose regression line | Analyze > Fit Y by X; Fit Line |
You have attempted of activities on this page.
