Statistics - (dependent|paired sample) t-test

1 - About

A dependent t-test is appropriate when:

  • we have the same people measured twice.
  • the same subject are been compared (ex: Pre/Post Design)
  • or two samples are matched at the level of individual subjects (allowing for a difference score to be calculated)

The idea is that one measure is dependent on the other. That they're related.

Is the difference between means a significant difference or is this difference just due to chance because of sampling error ?

If the mean of this different scores is significantly different from zero, we have a significant change.

3 - Assumption

The distribution is normal

4 - Calculation

4.1 - Analysis

A thorough analysis will include:

4.2 - Different score

The same subjects or cases are measured twice. We can calculate a different score for each individual subject.

<MATH> \begin{array}{rrl} \text{Different Score} & = & \href{raw_score}{X}_1 - \href{raw_score}{X}_2 \\ \end{array} </MATH>

where:

  • <math>\href{raw_score}{X}_1</math> is a score for the group 1
  • <math>\href{raw_score}{X}_2</math> is a score for the group 2

4.2.1 - t-value

See t-value for mean <MATH> \begin{array}{rrl} \href{t-value#mean}{\text{t-value}} & = & \frac{\href{mean}{\text{Mean of the Different Scores}}}{\href{Standard_Error#mean}{\text{Standard Error of the Different Scores}}} & \\ \end{array} </MATH>

4.2.2 - p-value

The p-value will be based on:

  • the above t-value and which t-distribution we're in
  • whether we're doing a non-directional or directional test.

4.2.3 - Effect size

The most appropriate and the most common estimate of effect size is Cohen's d.

Because NHST is biased by sample size, we should supplement the analysis with an estimate of effect size: Cohen's d

And the effect size is calculated differently than in regression.

Cohen's d is a intuitive measure that tells us how much in terms of standard deviation units:

  • one measurement differ from another (in a dependent t-test)
  • one mean differ from another (in a independent t-test)

<MATH> \begin{array}{rrl} \text{Cohen's d} & = & \frac{\href{mean}{\text{Mean of the Different Scores}}}{\href{Standard Deviation}{\text{Standard deviation of the Different Scores}}} \\ \end{array} </MATH>

As you can remark:

Why ? Because:

A Cohen's d of 1 means that:

  • score's went up a whole standard deviation.
  • it's a strong effect.

0.8 is also a strong effect.

4.2.4 - Confidence Interval

We can also get interval estimates around these means rather than just point estimates.

We get the mean of the difference scores and put an upper bound and a lower bound. It's the same method than for sample means or regression coefficients.

<MATH> \begin{array}{rrl} \text{Upper bound} & = & \href{Mean}{\text{Mean of the difference scores}} & + & \href{#t-value}{\text{t-value}}.\href{Standard_Error}{\text{Standard Error}} \\ \text{Lower bound} & = & \href{Mean}{\text{Mean of the difference scores}} & - & \href{#t-value}{\text{t-value}}.\href{Standard_Error}{\text{Standard Error}} \end{array} </MATH>

That exact t-value value depends on:

  • how confident we want to be so like a 95% confidence interval Versus an 90% confidence interval.
  • which sampling distribution of t we're going to to use (because we have that family of t distribution). So it depends on the number of subjects in the sample.

When the interval does not include zero, it's significant in terms of null hypothesis significance testing.

4.3 - Simulation

Build a sampling distribution of the differences

Pseudo Code: Loop until you get a beautiful normal distribution

  • Take the two samples
  • Shuffle the observations between the two samples
  • Calculate and plot the mean

After getting the normal distribution, calculate the probability of the differences.

5 - Documentation / Reference

data_mining/dependent_t-test.txt · Last modified: 2015/09/21 10:04 by gerardnico