Regression is closely related to correlation. The main difference is that correlation investigates the relationship between two variables, and both variables are treated equally. However, with regression the goal is to predict the value of one variable based on the value of the other variable. In other words, there is a distinction between the two variables: one variable is the independent variable, whose value is used to estimate the value of the dependent variable. Equivalently, regression can indicate the degree to which the variation in one variable can be explained by the variation in another variable.
The most common method to fit a line to a set of data points is called least squares. This technique finds a line that minimizes the sum of the squared distances between the line and the data points. After fitting a line, it is important to quantify how closely the line correlates to the data points. R-squared, or R2, is a value between 0 and 1 that indicates how well the regression line fits the data. A value near 1 means that most of the variance in the dependent variable can be explained by changes in the independent variable.