Standard deviation (SD) is an integral element of inferential statistics and describes the extent to which data differ from the mean. In a nutshell, if everyone seems to be providing different answers from each other, a dataset will have high standard deviation. High standard deviation may suggest that different populations are being sampled and this has important implications for research methodology.
You are conducting research on how much Fundamentalist Christians and atheists love Jesus. You gather a sample of both groups and put this question to them, “How much do you love Jesus?” (on a 7-point scale). This is what the responses looked like:
- Atheists: 1, 2, 1, 1, 1, 1, 1, 2, 2, 2
- Fundamentalist Christians: 6, 6, 6, 7, 6, 7, 7, 7, 7, 7
After getting responses back from both samples you realize that a lot of data is clustered around 1 (“Not at all”) and a lot of data is clustered around 7 (“Quite a bit”). The overall mean score for this question was 4.00 (“I somewhat love Jesus”), but virtually no one in your sample responded with 3, 4, or 5!
In the absence of any other information, the mean is always the best estimate of a population parameter. However, if the mean were to be reported in our circumstance it would give the misleading impression that the average person within our sample responded with “4”, when in fact no one answered that that number. This information would be technically correct, but it would ignore the huge variability within the sample. A way to describe this disagreement is to report the standard deviation (SD = 2.71).
A researcher armed with information about a high standard deviation, may decide to investigate sub-samples (e.g., atheists and Fundamentalist Christians) rather than investigate the sample as a whole. He/she may then investigate the individual means of the sub-samples to provide a better representation of the data. In this case, atheists would report Mean = 1.4, SD = .52 and Fundamentalist Christians would report Mean = 6.60, SD = .52. These values provide a more nuanced response to the question “How much do you love Jesus?” because group identity may affect responses to this question.
Standard deviation is used in parametric statistical tests such as t-tests, ANOVAs, ANCOVAs, regression, etc to distinguish between statistically significant events and chance events.