Statistics - (Probability|Sampling) Distribution
Table of Contents
1 - About
A box plot is a good summary of a distribution.
2 - Articles Related
3 - Discrete / Continuous
3.1 - Discrete
There is two representation of a discrete distribution:
- the Bayesian representation: A discrete distribution plots just discrete values to probabilities such that the probabilities add up to 1.
- the frequentist representation. A infinite lists such that as n gets larger, sampling from the collection and counting the frequencies of each element approximates the Bayesian representation of the distribution.
3.2 - Continuous
standard continuous distributions— such as Gaussian, beta, binomial, and uniform.
algebraic properties, called conjugate priors. For example, a uniform prior combined with a binomial likelihood results in a beta posterior.
4 - Function
A distribution can be specified by supplying:
- a valid:
- a valid cumulative distribution function or survival function
- a valid hazard function
- a valid characteristic function
- a rule for constructing a new random variable from other random variables whose joint probability distribution is known.
5 - Characteristics
- Mode: for a discrete random variable, the value with highest probability (the location at which the probability mass function has its peak); for a continuous random variable, the location at which the probability density function has its peak.
- Support: the smallest closed set whose complement has probability zero.
- Tail: the complement of the head within the support; the large set of values where the pmf or pdf is relatively low.
- Expected value or mean: the weighted average of the possible values, using their probabilities as their weights; or the continuous analog thereof.
- Median: the value such that the set of values less than the median has a probability of one-half.
- Statistics - (Variance|Dispersion|Mean Square) (MS|<math>\sigma</math>): the second moment of the pmf or pdf about the mean; an important measure of the dispersion of the distribution.
- Standard deviation: the square root of the variance, and hence another measure of dispersion.
- Symmetry: a property of some distributions in which the portion of the distribution to the left of a specific value is a mirror image of the portion to its right.
6 - Type
- zipF/Pareto/Yule (Govern frequencies of different terms in a document, or web site visits)
- Gamma_distribution (Long tail → Latency distribution ?)
7 - Management
7.1 - Comparison
A Q-Q plot compare two distributions.
Example with ggplot stat_qq