About AI-Therapy

AI-Therapy creates online fully automated programs using the latest evidence-based treatments, such as cognitive behavioural therapy. The social anxiety program has been tested with a randomized controlled trial. To find out more visit:

Dispersion

When dealing with large amounts of data, it is useful to be able to summarize data sets with just a few numbers. The two most important measures are central tendency and dispersion. The dispersion measures the level of spread or variability in a data set, and is sometimes referred to as the scale. The most common measures are as follows:

Range: The range is the difference between the highest and lowest values in a data set.
Interquartile range: The interquartile range is difference between the upper quartile and the lower quartile in a set. To find a quartile, sort the data and divide it into 4 equal sized groups. The values at the borders between the groups are the quartiles. The interquartile range is the difference between the upper quartile and the lower quartile. The advantage of this measure is that it is not sensitive to outliers in the data set.
Standard deviation: The standard deviation measures how far values are from the mean of the data set. In particular, it is the square root of the average of the squared distance from the mean. Generally in psychology, N-1 is used as the denominator as this leads to an unbiased estimator.

Example

Let's assume you are doing an anxiety study using the Fear of Negative Evaluation scale. In this experiment, you collect the following 11 scores:

10, 12, 16, 16, 39, 8, 13, 15, 20, 20, 18

As a first step, let's sort the data:

8, 10, 12, 13, 15, 16, 16, 18, 20, 20, 39

We can now compute the measures of dispersion:

Range

The maximum value (39) minus the minimum value (8) is 31.

Interquartile range

The quartile values are displayed in red above. As you can see, they divide the data set into four groups of size 2. The upper quartile (20) minus the lower quartile (12) is 8.

Standard deviation

The mean of the data set is 17. The difference between each value and the mean is:

-9, -7, -5, -4, -2, -1, -1, 1, 3, 3, 22

Squaring each difference gives:

81, 49, 25, 16, 4, 1, 1, 1, 9, 9, 484

Adding up all of the squares (680) and dividing by N-1 (11-1=10) gives a variance of 68. Finally, taking the square root of this value gives a standard deviation of approximately 8.25.

This data set has one value that is much higher than the others (the 39). This is known as an outlier, and may be an indication that something has gone wrong with the data collection. In fact, the maximum score on the FNE scale is 30, so this is probably a data entry error. Notice how this value has a significant impact on the range and standard deviation, but does not impact the interquartile range. In other words, the interquartile range is robust to the presence of outliers.