Statistics - (F-Statistic|F-test|F-ratio)

1 - About

The NHST anova statistic test is an F-test or F-ratio.

It's what you observe in the numerator relative to what you would expect just due to chance in the denominator.

The f statistic is the statistic that we'll obtain if we dropped out the predictors in the model.

A lot of people refer to it as an F-ratio because it's the variance between the groups relative to variants within the groups.

An ANOVA will tell with the F-ratio if:

  • there is an effect overall
  • there is significant difference somewhere

The F-test has a family of F-distributions.

The F statistic tests the null hypothesis that none of the predictors has any effect. Rejecting that null means concluding that *some* predictor has an effect, not that *all* of them do.

3 - Formulas

<MATH> \begin{array}{rrl} \text{F-ratio (F-test)} & = & \frac{\href{variance}{variance}\text{ between the groups}}{\href{variance}{variance}\text{ within the groups}} \\ & = & \frac{\text{systematic variance}}{\text{unsystematic variance}} \\ & = & \frac{\text{good variance}}{\text{bad variance}} \\ \end{array} </MATH>

The variance between the groups, or across the groups, is a good variance because it was created with our independent variable (within the experimental treatment).

The variants within the groups try to determine why in one group the score of individual differ. And we don't know because that variance is unsystematic, it's just what I would expect due to chance.

If you get a ratio of two or three, you're probably going to have a significant effect with the p-value less than 0.05.

<MATH> \begin{array}{rrl} \text{F-ratio (F-test)} & = & \frac{\href{variance}{variance}\text{ between the groups}}{\href{variance}{variance}\text{ within the groups}} \\ & = & \frac{{\href{variance}{\text{Mean Square (MS)}}}_{Betweeen}}{{\href{variance}{\text{Mean Square (MS)}}}_{Within}} \\ & = & \frac{{\href{variance}{\text{MS}}}_{A}}{{\href{variance}{\text{MS}}}_{S/A}} \\ \end{array} </MATH>

where:

  • A is the independent variable (the manipulation)
  • S within A is the way to read that error term. So, it's subjects within groups

Mean square (MS) is variance

<MATH> \begin{array}{rrl} \text{F-ratio (F-test)} & = & \frac{{\href{variance}{\text{MS}}}_{A}}{{\href{variance}{\text{MS}}}_{S/A}} \\ & = & \frac{\text{Sum of the Squares (SS)}_A}{\href{degree_of_freedom}{df}_A}. \frac{\href{degree_of_freedom}{df}_{S/A}}{\text{Sum of the Squares (SS)}_{S/A}} \\ & = & \frac{SS_A}{\href{degree_of_freedom}{df}_A}. \frac{\href{degree_of_freedom}{df}_{S/A}}{SS_{S/A}} \\ \end{array} </MATH> where:

  • <math>SS_A</math> will compare each group mean to the grand mean to get the variance across groups.
  • <math>SS_{S/A}</math> will look at each individual within a group and see how much they differ from their group mean.
  • The <math>\href{degree_of_freedom}{df}_A</math> is the number of group minus one.
  • The <math>\href{degree_of_freedom}{df}_{S/A}</math> is the number of subjects in a group minus one times the number of groups.

<MATH> \begin{array}{rrl} SS_A & = & n \sum_{j=1}^{N}(Y_j - Y_T)^2 \end{array} </MATH>

where:

  • <math>N</math> is the number of group
  • <math>n</math> is the number of subjects in each group (because if they're very large groups then we have

to take that into account)

  • <math>Y_j</math> is a group mean
  • <math>Y_T</math> is the grand mean

<MATH> \begin{array}{rrl} SS_{S/A} & = & \sum_{j=1}^{N}\sum_{i=1}^{n}(Y_{ij} - Y_j)^2 \end{array} </MATH>

where:

  • <math>N</math> is the number of group
  • <math>n</math> is the number of subjects in each group
  • <math>Y_{ij}</math> is an individual scores
  • <math>Y_j</math> is a group mean
data_mining/f_statistic.txt · Last modified: 2017/11/17 10:56 by gerardnico