Interpreting the Standard Deviation

The standard deviation is “interpreted" with statements about the proportions of the data that fall within 1, 2, or 3 standard deviations of the mean.

Chebychev’s rule applies to any set of data. It is summarized in the following table:

interval

z-score

guaranteed proportion of the data

mean – s to mean + s

–1 to 1

0% -- 100% (could be anything)
mean – 2s to mean + 2s

–2 to 2

75% -- 100% (at least 75%)
mean – 3s to mean + 3s

–3 to 3

89% -- 100% (at least 89%)
mean – 4s to mean + 4s

–4 to 4

94% -- 100% (at least 94%)

For instance, consider the data set {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2}. The mean is 1, and the standard deviation turns out to be s = 0.97. So the only observations that are within 1 standard deviation of the mean are the two 1's in the middle, and they comprise only 10% of the data. However, all of the data are within 2s of the mean.

For a more extreme example, consider the data set {0, 0, ... , 0, 0, 1, 1, 2, 2, ... , 2, 2}, where there are 99 ones and 99 twos. The mean is 1, and the standard deviation turns out to be s = 0.99975. So the only observations that are within 1 standard deviation of the mean are again the two 1's in the middle, and now they comprise only 1% of the data. However, again, all of the data are within 2s of the mean.

So we see that it is possible to have a very small proportion of a data set within 1 standard deviation of the mean. However, Chebychev's rule says that it is impossible for less than 75% of the data to be within 2s of the mean, and it is impossible for less that 89% of the data to be within 3s of the mean.

The empirical rule applies to sets of data that have approximately “bell-shaped" histograms.

interval

z-score

guaranteed proportion of the data

mean – s to mean + s

–1 to 1

approx 68%
mean – 2s to mean + 2s

–2 to 2

approx 95%
mean – 3s to mean + 3s

–3 to 3

approx 99.7% (nearly all)

So the empirical rule says that this picture is typical for data with an approximately bell-shaped histogram:

Below is a histogram that represents a set of 100 measurements. The mean of the data is approximately 23, and the standard deviation is approximately 7. Notice that the histogram is approximately bell-shaped.

In this picture, we estimate that about 70% of the data are within 1 standard deviation of the mean (between 16 and 30), about 95% are within 2 standard deviations of the mean (between 9 and 37), and all, or almost all, of the data are within 3 standard deviations of the mean (between 2 and 44). These estimates are consistent with the empirical rule.

Here’s a histogram for another set of data, in which the mean is approximately 71, and the standard deviation is approximately 10. Notice that the histogram is again approximately bell-shaped (though not as nicely as the one above).

About 68% of the data are within 1 standard deviation of the mean (between 61 and 81), about 93% are within 2 standard deviations of the mean (between 51 and 91), and (this time) all of the data are within 3 standard deviations of the mean (between 41 and 101). Again, these estimates are consistent with the empirical rule.

Remember that the Empirical rule applies only to data sets with symmetric, bell-shaped histograms. For any other set of data, we have to rely on Chebychev's rule. This includes data for which the histogram shape is unknown.

More on the Empirical rule.