Statistics - Confounding (factor|variable) - (Confound|Confounder)

1 - About

In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable.

A spurious relationship is a perceived relationship between an independent variable and a dependent variable that has been estimated incorrectly because the estimate fails to account for a confounding factor.

3 - How do you fix or even detect confounding?

One way is to randomize the confounding element so that it’s effect is not influencing the element under investigation.

4 - Example

4.1 - Pet Store debacle

Sun (J2EE) vs Microsoft (ASP) performance on the Java Pet Store. The confounding in the Pet Store comparison was that it was impossible to really compare the two. They were different systems, had:

  • different URLs,
  • different form elements,
  • different backend databases ….

The comparison claimed to measure X number of “users”, but didn’t cover single page execution, database configurations used, system level tunings. Of course none of that is useful because even that just confounds shit even further.

For the Pet Store experiment to be meaningful it should have tried to keep every thing it could the same except for the minimum of different elements. The same:

  • databases,
  • operating systems,
  • file layouts,
  • forms,
  • HTML tags,
  • logic flow,
  • everything possible.

The only different element should have been the ASP vs. JSP and the Servlets vs. VB hackery.

5 - Documentation / Reference

data_mining/confounding.txt · Last modified: 2015/08/24 10:34 by gerardnico