The normal distribution

The normal distribution, also known as the Gaussian distribution or bell curve, is widely used in science. It is a probability distribution that has the following properties:

  • The distribution has a single peak located at the mean. Furthermore, the mean, median and mode are all equal, so they also correspond to the location of the highest point in a normal distribution.
  • The distribution is symmetric. In other words, it has no skew.
  • It is parametric, meaning that the distribution is completely characterized by two parameters: the mean and standard deviation.
  • The normal distribution is "light tailed". This means that as you move away from the mean, the probabilities get small very quickly.
  • One of the reasons that the normal distribution is used so widely is that is arises naturally in many real world situations. For more information, see the central limit theorem.

Tests for normality

As mentioned above, the normal distribution is used very widely. In fact, in some situations it is used even when it shouldn't be. For example, some statistical properties rely on the assumption that the underlying data is normally distributed. There are several tests available to determine if a given data set appears to be normally distributed:

The first step when you have new data is to explore it using graphs. A histogram groups the data into equal sized bins. One can visually compare the shape of the histogram with the characteristic bell shape of a normal distribution. While not a formal test, it can quickly give you a good idea if your data are approximately normal.
Q-Q plot
A Q-Q plot is another way to visually examine your data. It plots the actual data points against a line that shows where they would fall for a theoretically perfect normal distribution. If the real data points are close to the line, it is an indication that the data are approximately normal.
Skewness is a measure of asymmetry of a probability distribution. The normal distribution is symmetric, meaning it has a skew of 0. A strong positive or negative skew is an indication that the data may not be normally distributed.
Kurtosis is a measure of the "peakedness" of a distribution. A normal distribution has an excess kurtosis of 0. Therefore, this value can be used to compare an arbitrary distribution with a normal distribution. A positive excess kurtosis means the distribution has a narrow peak with light tails, and a negative excess kurtosis means that a distribution has a wide peak with heavy tails.
The Kolmogorov-Smirnov test is a general statistical test to determine if a set of numbers is likely to have been drawn from a given distribution.
The Shapiro–Wilk test is another statistical test for normality that is generally considered to be more powerful than the Kolmogorov-Smirnov test.
Warning: If you have a large number of samples, the results of the statistical tests below can be misleading. This is because even a minor (and inconsequential) deviation from normality can result in rejecting the null hypothesis. Therefore, if you have a large data set it is best to observe the histogram and Q-Q plot and rely on your judgement.