When dealing with large amounts of data, it is useful to be able to summarize data sets with just a few numbers. The two most important measures are central tendency and dispersion. The dispersion measures the level of spread or variability in a data set, and is sometimes referred to as the scale. The most common measures are as follows:
Let's assume you are doing an anxiety study using the Fear of Negative Evaluation scale. In this experiment, you collect the following 11 scores:
10, 12, 16, 16, 39, 8, 13, 15, 20, 20, 18
As a first step, let's sort the data:
8, 10, 12, 13, 15, 16, 16, 18, 20, 20, 39
We can now compute the measures of dispersion:
-9, -7, -5, -4, -2, -1, -1, 1, 3, 3, 22Squaring each difference gives:
81, 49, 25, 16, 4, 1, 1, 1, 9, 9, 484Adding up all of the squares (680) and dividing by N-1 (11-1=10) gives a variance of 68. Finally, taking the square root of this value gives a standard deviation of approximately 8.25.
This data set has one value that is much higher than the others (the 39). This is known as an outlier, and may be an indication that something has gone wrong with the data collection. In fact, the maximum score on the FNE scale is 30, so this is probably a data entry error. Notice how this value has a significant impact on the range and standard deviation, but does not impact the interquartile range. In other words, the interquartile range is robust to the presence of outliers.