AI-Therapy creates online self-help programs using the latest evidence-based treatments, such as cognitive behavioural therapy. To find out more visit:

When dealing with large amounts of data, it is useful to be able to summarize data sets
with just a few numbers. The two most important measures are central tendency
and *dispersion*. The dispersion measures the level of spread or variability in a data set, and is sometimes
referred to as the scale.
The most common measures are as follows:

- Range
- The range is the difference between the highest and lowest values in a data set.
- Interquartile range
- The interquartile range is difference between the upper quartile and the lower quartile in a set. To find a quartile, sort the data and divide it into 4 equal sized groups. The values at the borders between the groups are the quartiles. The interquartile range is the difference between the upper quartile and the lower quartile. The advantage of this measure is that it is not sensitive to outliers in the data set.
- Standard deviation
- The standard deviation measures how far values are from the mean of the data set. In particular, it is the square root of the average of the squared distance from the mean. Generally in psychology, N-1 is used as the denominator as this leads to an unbiased estimator.

Let's assume you are doing an anxiety study using the Fear of Negative Evaluation scale. In this experiment, you collect the following 11 scores:

10, 12, 16, 16, 39, 8, 13, 15, 20, 20, 18

As a first step, let's sort the data:

8, 10, 12, 13, 15, 16, 16, 18, 20, 20, 39

We can now compute the measures of dispersion:

- Range
- The maximum value (39) minus the minimum value (8) is 31.
- Interquartile range
- The quartile values are displayed in red above. As you can see, they divide the data set into four groups of size 2. The upper quartile (20) minus the lower quartile (12) is 8.
- Standard deviation
- The mean of the data set is 17. The difference between each value
and the mean is:
-9, -7, -5, -4, -2, -1, -1, 1, 3, 3, 22

Squaring each difference gives:81, 49, 25, 16, 4, 1, 1, 1, 9, 9, 484

Adding up all of the squares (680) and dividing by N-1 (11-1=10) gives a variance of 68. Finally, taking the square root of this value gives a standard deviation of approximately 8.25.

This data set has one value that is much higher than the others (the 39).
This is known as an outlier, and may be an indication that something has gone wrong
with the data collection. In fact, the maximum score on the FNE scale is 30, so this
is probably a data entry error.
Notice how this value has a significant impact on the
range and standard deviation, but does not impact the interquartile range.
In other words, the interquartile range is *robust* to the presence of outliers.

loading...