AI-Therapy creates online self-help programs using the latest evidence-based treatments, such as cognitive behavioural therapy. To find out more visit:

A common task in statistics is to examine the relationship between two variables. For example, assume that you have collected information about anxiety and depression levels among a sample population. It is likely that you will be interested in the relationship between anxiety and depression. For example, do people with high anxiety also tend to have high depressive symptoms?

One way to observe relationships is to inspect a *scatter plot*. In a scatterplot,
each sample is represented by a dot whose location is determined by its measurements.
For example, a person's depression score may be plotted on the x-axis, and their anxiety
score is plotted on the y-axis. If enough data is available, a visual inspection of the scatterplot
will reveal patterns in the underlying data. For example, if the points approximate a straight
line, there is a *linear relationship* between the variables. Furthermore, if the points
move upwards as you move left to right along the x-axis, there is a *positive relationship*
between the variables, and if they get lower as you move right, there is a *negative relationship*.

One way to quantify the linear relationship between two variables is with their *covariance*,
which measures the degree to which the two variables vary in the same direction. Unfortunately, a covariance value alone is difficult
to interpret since it is sensitive to the scale of the variables. Therefore, a *correlation
coefficient* is typically more useful since it is a *standardized* measure. This means
that correlation coefficients can be compared even when the data sets have different scales.
The value of a correlation coefficient can be interpreted as follows:

-1 | A perfect negative relationship: all points fall on a line with a negative slope. |

0 | No linear relationship. |

+1 | A perfect positive relationship: all points fall on a line with a positive slope. |

By convention, ± 0.1 is a small effect, ± 0.3 is a medium effect, and ± 0.5 is a large effect. Here are the most common correlation coefficients:

**Pearson correlation coefficient**: This is the most common of the correlation coefficients. One disadvantage of the Pearson correlation coefficient is that it can be sensitive to the presence of outliers in the data set.**Spearman's rank correlation coefficient**: This is a non-parametric measure, so it is a more robust way to measure the relationship between two variables when there are outliers in the data set.**Kendall's tau coefficient**: Another non-parametric approach. Kendall's tau is appropriate when there is a small number of possible values (e.g. a Likert scale) with many tied responses.

loading...