Dispersion

When dealing with large amounts of data, it is useful to be able to summarize data sets with just a few numbers. The two most important measures are central tendency and dispersion. The dispersion measures the level of spread or variability in a data set, and is sometimes referred to as the scale. The most common measures are as follows:

Range
The range is the difference between the highest and lowest values in a data set.
Interquartile range
The interquartile range is difference between the upper quartile and the lower quartile in a set. To find a quartile, sort the data and divide it into 4 equal sized groups. The values at the borders between the groups are the quartiles. The interquartile range is the difference between the upper quartile and the lower quartile. The advantage of this measure is that it is not sensitive to outliers in the data set.
Standard deviation
The standard deviation measures how far values are from the mean of the data set. In particular, it is the square root of the average of the squared distance from the mean. Generally in psychology, N-1 is used as the denominator as this leads to an unbiased estimator.

Example

Let's assume you are doing an anxiety study using the Fear of Negative Evaluation scale. In this experiment, you collect the following 11 scores:

10, 12, 16, 16, 39, 8, 13, 15, 20, 20, 18

As a first step, let's sort the data:

8, 10, 12, 13, 15, 16, 16, 18, 20, 20, 39

We can now compute the measures of dispersion:

Range
The maximum value (39) minus the minimum value (8) is 31.
Interquartile range
The quartile values are displayed in red above. As you can see, they divide the data set into four groups of size 2. The upper quartile (20) minus the lower quartile (12) is 8.
Standard deviation
The mean of the data set is 17. The difference between each value and the mean is:
-9, -7, -5, -4, -2, -1, -1, 1, 3, 3, 22
Squaring each difference gives:
81, 49, 25, 16, 4, 1, 1, 1, 9, 9, 484
Adding up all of the squares (680) and dividing by N-1 (11-1=10) gives a variance of 68. Finally, taking the square root of this value gives a standard deviation of approximately 8.25.

This data set has one value that is much higher than the others (the 39). This is known as an outlier, and may be an indication that something has gone wrong with the data collection. In fact, the maximum score on the FNE scale is 30, so this is probably a data entry error. Notice how this value has a significant impact on the range and standard deviation, but does not impact the interquartile range. In other words, the interquartile range is robust to the presence of outliers.

loading...