Statistics - Sampling Distribution

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

1 - About

Distribution of estimated statistics from different samples (same size) from the same population is called a sampling distribution

This is called a sampling distribution not a sample distribution

It permits to make probability judgement about samples.

Because of the central limit theorem, sampling distributions are known to be normal and therefore are fundamental to inferential statistics because they allow for probabilistic predictions about outcomes.

Advertising

3 - Demonstration

The code below showcase the fact that a sample distribution created from the mean of a lot of sample from the same population has a normal form.

  • Creating the population data randomly distributed
population_n = 10000;
population_data = [];
population_max = 100;
population_data = [];
 
for (i = 0; i < population_n; i++) {
  random_value = Math.floor(Math.random() * Math.floor(population_max));
  population_data.push(random_value);
}
 
histogram({ selector: "population", data: population_data});
  • Sampling the population 1000 times with a sample size of 20, calculating the mean and adding it to the sample distribution
// Sample Data
sample_distribution_data = [];
sample_distribution_n = 1000;
for (j = 0; j < sample_distribution_n; j++) {
  sample_data = [];
  sample_n = 20;
  for (i = 0; i < sample_n; i++) {
    population_random_index = Math.floor(
      Math.random() * Math.floor(population_max)
    );
    sample_data.push(population_data[population_random_index]);
  }
  sample_distribution_data.push(d3.mean(sample_data));
}
histogram({ selector: "sample", data:sample_distribution_data});

4 - Documentation / Reference