Statistics - Central limit theorem (CLT)

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

1 - About

The central limit theorem (CLT) establishes that when random variables (independent) are added to a set their distribution tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed.

The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

The central limit theorem began in 1733 when de Moivre approximated binomial probabilities using the integral of <math>exp(-x^2)</math>. The central limit theorem achieved its final form around 1935 in papers by Feller, Lévy, and Cramér.

The central limit theorem is a fundamental component of inferential statistics


3 - Example

3.1 - Galtonboard

Every ball is randomly pushed left and right. From

4 - Implementation

4.1 - Random Sample

The central limit theorem says that the averages of several samples obtained from the same population (ie a sampling distribution) following the central limit theorem rules (see below) will be distributed according to the normal distribution.


The population doesn't have to be normally distributed, as long as we get multiple samples of large enough size (N>30) then the sampling distribution will take on a normal distribution.


  • The sample must contain a large number of observations (N>30)
  • Each observation must be randomly generated (No relationship/dependencies between the observations)
  • The shape of the distribution of sample means is always normal (not negatively or positively skewed, not uniform)

5 - Documentation / Reference

data_mining/central_limit_theorem.txt · Last modified: 2018/04/24 17:10 by gerardnico