A common task in statistics is to examine the relationship between two variables. For example, assume that you have collected information about anxiety and depression levels among a sample population. It is likely that you will be interested in the relationship between anxiety and depression. For example, do people with high anxiety also tend to have high depressive symptoms?
One way to observe relationships is to inspect a scatter plot. In a scatterplot, each sample is represented by a dot whose location is determined by its measurements. For example, a person's depression score may be plotted on the x-axis, and their anxiety score is plotted on the y-axis. If enough data is available, a visual inspection of the scatterplot will reveal patterns in the underlying data. For example, if the points approximate a straight line, there is a linear relationship between the variables. Furthermore, if the points move upwards as you move left to right along the x-axis, there is a positive relationship between the variables, and if they get lower as you move right, there is a negative relationship.
One way to quantify the linear relationship between two variables is with their covariance, which measures the degree to which the two variables vary in the same direction. Unfortunately, a covariance value alone is difficult to interpret since it is sensitive to the scale of the variables. Therefore, a correlation coefficient is typically more useful since it is a standardized measure. This means that correlation coefficients can be compared even when the data sets have different scales. The value of a correlation coefficient can be interpreted as follows:
|-1||A perfect negative relationship: all points fall on a line with a negative slope.|
|0||No linear relationship.|
|+1||A perfect positive relationship: all points fall on a line with a positive slope.|
By convention, ± 0.1 is a small effect, ± 0.3 is a medium effect, and ± 0.5 is a large effect. Here are the most common correlation coefficients: