Statistics - Correlation (Coefficient analysis)
Table of Contents
1 - About
Correlation is a statistical analysis used to measure and describe the relationship between two variables.
The Correlations coefficient is a statistic and it can range between +1 and -1
- +1 is a perfect positive correlation. If the scores goes up for one variable the score goes up on the other.
- > 0.8 is a strong correlation
- > 0.4 is a high correlation
- > 0.2 correlate
- < 0.2 is not a strong correlation
- < 0.1 doesn't correlate
- 0 is no correlation (independence)
- -1 is a perfect negative correlation. If the scores goes up for one variable the score goes down on the other.
Correlation is used:
- mainly to describe relationship
2 - Articles Related
3 - Assumptions
4 - Correlation does not imply causation
Correlation does not imply causation but correlations are useful because they can be used to assess:
5 - Type
There are several types of correlation coefficients, for different variable types
- Pearson product-moment correlation coefficient (r) (When both variables are continuous)
- Point bi-serial correlation (When 1 variable is continuous and 1 is dichotomous)
- Phi coefficient (When both variables are dichotomous)
- Spearman rank correlation (When both variables are ordinal)
6 - Venn diagrams
Venn diagrams representation of a correlation between two variables X and Y.
Venn diagrams representing:
- All the variants in X,
- All the variants in Y
- And the overlap (the covariance). The overlap is can also be expressed as:
- the sum of the square for the model.
The degree to which x and y correlate is represented by the degree to which these two variance circles overlap. The correlation (degree|coefficient) is the systematic variance in Y that's explained by X.
The correlation is approaching:
- one for an high degree of overlap
- zero for no overlap
The residual is the unexplained variance in Y. Some of the variance in Y is explained by the model. Some if it is unexplained, that's the residual.