Statistics - (Probability|Sampling) Distribution

1 - About

A probability distribution is a mathematical description of a random phenomenon in terms of the probabilities of events,

Many distributions are normal but not always. An histogram can help to find the type of distribution.

A box plot is a good summary of a distribution.

3 - Discrete / Continuous

3.1 - Discrete

There is two representation of a discrete distribution:

  • the Bayesian representation: A discrete distribution plots just discrete values to probabilities such that the probabilities add up to 1.
  • the frequentist representation. A infinite lists such that as n gets larger, sampling from the collection and counting the frequencies of each element approximates the Bayesian representation of the distribution.

3.2 - Continuous

standard continuous distributions— such as Gaussian, beta, binomial, and uniform.

algebraic properties, called conjugate priors. For example, a uniform prior combined with a binomial likelihood results in a beta posterior.

4 - Function

A distribution can be specified by supplying:

5 - Characteristics

  • Mode: for a discrete random variable, the value with highest probability (the location at which the probability mass function has its peak); for a continuous random variable, the location at which the probability density function has its peak.
  • Support: the smallest closed set whose complement has probability zero.
  • Head: the range of values where the pmf or pdf is relatively high.
  • Tail: the complement of the head within the support; the large set of values where the pmf or pdf is relatively low.
  • Expected value or mean: the weighted average of the possible values, using their probabilities as their weights; or the continuous analog thereof.
  • Median: the value such that the set of values less than the median has a probability of one-half.
  • Statistics - (Variance|Dispersion|Mean Square) (MS): the second moment of the pmf or pdf about the mean; an important measure of the dispersion of the distribution.
  • Standard deviation: the square root of the variance, and hence another measure of dispersion.
  • Symmetry: a property of some distributions in which the portion of the distribution to the left of a specific value is a mirror image of the portion to its right.
  • Skewness: a measure of the extent to which a pmf or pdf “leans” to one side of its mean.

6 - Type

7 - Management

7.1 - Comparison

A Q-Q plot compare two distributions.

Example with ggplot stat_qq

ggplot(res_succes, aes(sample=res_succes$TOTAL_TIME_SEC, colour = factor(res_succes$PRESENTATION_NAME))) +
  geom_point(stat = "qq", size=0.75)

8 - Documentation / Reference

data_mining/distribution.txt · Last modified: 2017/11/16 22:40 by gerardnico