Statistics

1 - About

Statistics is a scientific discipline devoted to the study of data.

Statistics is the art of extracting information from data.

From Data to Information to Knowledge.

There are three kinds of lies:

  • lies,
  • damned lies,
  • and statistics.

Mark Twain, autobiography, 1904

Facts are stubborn things, but statistics are more pliable. Author unknown (May be Mark Twain)

2 - Concept

Statisticians refer to the entire group that is being studied as a population

Each member of the population is called a unit.

A statistician studying a population would be interested in collecting information about different characteristics of the unit. Those characteristics are called variables.

Most of the time, it is extremely difficult or very costly to collect all the information about a population. Because of these, it is common to use a smaller, representative group from the population called a sample.

In statistics, the actual number of the population is called a parameter.

The number of unit in the sample, or any other number that describes the individuals in the sample (like their length, or weight, or age), is called a statistic. In general, each statistic is an estimate of a parameter, whose value is not known exactly.

In general, the potential difference between the true parameter and the statistic obtained from using a sample is called sampling error.

The sample could have chosen in an area where a large number of tortoise tend to congregate (near a food or water source perhaps). If this sample were used to estimate the number of tortoises in all locations, it may lead to population estimate that is too high. This type of systematic error in sampling is called bias.

4 - Type

4.1 - Descriptive

Descriptive statistics: procedures used to summarize, organize, and simplify data

E.g., Median – describes data but can’t be generalized beyond that

4.2 - Inferential

Inferential statistics : procedures that allow for generalizations about population parameters based on sample statistics

E.g., t-test – enables inferences about population beyond our data

4.3 - Parametric

5 - Method

6 - Approach

Approach to Statistics

6.1 - Frequentist

P(D|H) 

Probability of seeing this data, given the (null) hypothesis

6.2 - Bayesian

P(H|D)

Probability of a given outcome, given this data

7 - Data Analyse Techniques

Data Analyse Techniques such as:

8 - Type of study

Statisticians and researchers use two main techniques to form important conclusions about the relationships between variables.

  • An observational study is when a researcher observes the subjects in the real world without manipulating them. A longitudinal study is a long-term observational study in which the same group of subjects is observed for very long periods of time
  • An experiment is an effort to establish cause-and-effect relationships where the researcher imposes a treatment on a group of subjects.

9 - Statistician

Hans Rosling: Life Expectancy: How to lie with statistics ? Dow we get older in Swedish than in Burundi ? No.

10 - Documentation / Reference

data_mining/statistics.txt · Last modified: 2017/08/31 17:37 by gerardnico