The normal distribution, also known as the Gaussian distribution or bell curve, is widely used
in science. It is a probability distribution that has the following properties:
The distribution has a single peak located at the mean. Furthermore, the
mean, median and mode are all equal, so they also correspond to the location of
the highest point in a normal distribution.
The distribution is symmetric. In other words, it has no skew.
It is parametric, meaning that the distribution is completely characterized by two parameters:
and standard deviation.
The normal distribution is "light tailed". This means that as you move away from the mean,
the probabilities get small very quickly.
One of the reasons that the normal distribution is used so widely is that is
arises naturally in many real world situations. For more information, see the
central limit theorem.
Tests for normality
As mentioned above, the normal distribution is used very widely. In fact, in some
situations it is used even when it shouldn't be. For example, some statistical
properties rely on the assumption that the underlying data is normally distributed.
There are several tests available to determine if a given data set appears to be
The first step when you have new data is to explore it
using graphs. A histogram groups the data into equal sized bins. One can visually
compare the shape of the histogram with the characteristic bell shape of a normal
distribution. While not a formal test, it can quickly give you a
good idea if your data are approximately normal.
A Q-Q plot is another way to visually examine your data.
It plots the actual data points against a line that shows where they would fall
for a theoretically perfect normal distribution. If the real data points are
close to the line, it is an indication that the data are approximately normal.
Skewness is a measure of asymmetry of a probability distribution.
The normal distribution is symmetric, meaning it has a skew of 0. A strong positive
or negative skew is an indication that the data may not be normally distributed.
Kurtosis is a measure of the "peakedness" of a distribution.
A normal distribution has an excess kurtosis of 0. Therefore, this value
can be used to compare an arbitrary distribution with a normal distribution.
A positive excess kurtosis means the distribution has a narrow peak
with light tails, and a negative excess kurtosis means that a distribution has a
wide peak with heavy tails.
The Kolmogorov-Smirnov test is a general statistical
test to determine if a set of numbers is likely to have been drawn from a given
The Shapiro–Wilk test is another statistical test for
normality that is generally considered to be more powerful than the Kolmogorov-Smirnov test.
Warning: If you have a large number of samples, the results of the
statistical tests below can be misleading. This is because even a minor (and inconsequential)
deviation from normality can result in rejecting the null hypothesis.
Therefore, if you have a large data set it is best to observe the histogram and Q-Q plot
and rely on your judgement.