Skip to main content

Section 10.6 Conclusion and Next Steps

We can keep on adding variables that we think make a difference. How many variables we end up using depends, apart from our ability to think of new variables to measure, somewhat on what we want to use the model for.
The model we have developed now has two explanatory variables - one which can be any positive number, and one which is two levels. We now have what could be considered a very respectable r-squared value, so we could easily leave well enough alone. That is not to say our model is perfect, however - the graphs we have prepared suggest that the ’Members’ effect is actually different if the away team is from interstate rather than from Victoria - the crowd does not increase with additional combined membership as quickly with an away team, which is in line with what we might expect intuitively.
One thing we didn’t mention was the actual prediction equation that one might construct from the output of lm(). It is actually very simple and just uses the estimates/B-weights from the output:
MCG = (21.257 * Members) ( 4.545 * away.inter)
This equation would let us predict the attendance of any game with a good degree of accuracy, assuming that we knew the combined fan base and whether the team was interstate. Interestingly, statisticians are rarely interested in using prediction equations like the one above: They are generally more interested in just knowing that a predictor is important or unimportant. Also, one must be careful with using the slopes/B-weights obtained from a linear regression of a single sample, because they are likely to change if another sample is analyzed just because of the forces of randomness.
The material we have covered is really only a taste of multiple regression and linear modeling. On the one hand, there are a number of additional factors that may be considered before deciding on a final model. On the other hand, there are a great number of techniques that may be used in specialized circumstances. For example, in trying to model attendance at the MCG, we have seen that the standard model fits the data some of the time but not others, depending on the selection of the explanatory variables.
In general, a simple model is a good model, and will keep us from thinking that we are better than we really are. However, there are times when we will want to find as many dependent variables as possible. Contrast the needs of a manager trying to forecast sales to set inventory with an engineer or scientist trying to select parameters for further experimentation. In the first case, the manager needs to avoid a falsely precise estimate which could lead her to be overconfident in the forecast, and either order too much stock or too little. The manager wants to be conservative about deciding that particular variables make a difference to prediction variable. On the other hand the experimenter wants to find as many variables as possible for future research, so is prepared to be optimistic about whether different parameters affect the variables of interest.
You have attempted of activities on this page.