Skip to main content

Section 6.2 Key Descriptive Statistics

We have already encountered several descriptive statistics in previous chapters, but for the sake of practice here they are again, this time with the more detailed definitions:
  • The mean (technically the arithmetic mean), a measure of central tendency that is calculated by adding together all of the observations and dividing by the number of observations.
  • The median, another measure of central tendency, but one that cannot be directly calculated. Instead, you make a sorted list of all of the observations in the sample, then go halfway up that list. Whatever the value of the observation is at the halfway point, that is the median.
  • The range, which is a measure of "dispersion" how spread out a bunch of numbers in a sample are calculated by subtracting the lowest value from the highest value.
To this list we should add three more that you will run into in a variety of situations:
  • The mode, another measure of central tendency. The mode is the value that occurs most often in a sample of data. Like the median, the mode cannot be directly calculated. You just have to count up how many of each number there are and then pick the category that has the most.
  • The variance, a measure of dispersion. Like the range, the variance describes how spread out a sample of numbers is. Unlike the range, though, which just uses two numbers to calculate dispersion, the variance is obtained from all of the numbers through a simple calculation that compares each number to the mean. If you remember the ages of the family members from the previous chapter and the mean age of 22, you will be able to make sense out of the following table:
Table 6.2.1.
WHO AGE AGE-MEAN (AGEMEAN)
Dad 43 43-22=21 21*21=441
Mom 42 42-22=20 20*20=400
Sis 12 12-22=-10 -10*-10=100
Bro 8 8-22=-14 -14*-14=196
Dog 5 5-22=-17 -17*-17=289
Total: 1426
Total/4: 356.5
This table shows the calculation of the variance, which begins by obtaining the "deviations" from the mean and then "squares" them (multiply each times itself) to take care of the negative deviations (for example, -14 from the mean for Bro). We add up all of the squared deviations and then divide by the number of observations to get a kind of "average squared deviation." Note that it was not a mistake to divide by 4 instead of 5 the reasons for this will become clear later in the book when we examine the concept of degrees of freedom. This result is the variance, a very useful mathematical concept that appears all over the place in statistics. While it is mathematically useful, it is not too nice to look at. For instance, in this example we are looking at the 356.5 squared-years of deviation from the mean. Who measures anything in squared years? Squared feet maybe, but that’s a different discussion. So, to address this weirdness, statisticians have also provided us with:
  • The standard deviation, another measure of dispersion, and a cousin to the variance. The standard deviation is simply the square root of the variance, which puts us back in regular units like "years." In the example above, the standard deviation would be about 18.88 years (rounding to two decimal places, which is plenty in this case).
  • Standard deviation has a complicated formula, but you can think of it as β€œthe average distance from the mean.” In the example above, the average distance to the mean is about 18.88 years.
  • Intuitively, you can think about both the standard deviation and the variance as measuring how spread out the data is. When these numbers are 0 or very small compared to the mean, the data is not very spread out.
You have attempted of activities on this page.