Data Mining - Dimensionality (number of variable, parameter) (P)

1 - About

Not to confound with d: the model size. You may have 1000 attributes (p=1000) in your sample but after feature selection for instance, you model may use only a handful (d=5)

In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. (ie the number of variable to to be able to define an outcome)

2 - Article Related

3 - Curse

In high dimension, it's really difficult to stay local.

See this interactive app in R Shiny on the Curse of Dimensionality.

Circle example: The circle fills up most of the area in the square, in fact it takes up exactly <math>\pi</math> out of 4 which is about 78%. In three dimensions we have a sphere and a cube, and the ratio of sphere volume to cube volume is a bit smaller, <math>\frac{4\pi}{3}</math> out of a total of 8, which is just over 52%

4 - Documentation / Reference

data_mining/dimension.txt · Last modified: 2016/02/29 11:37 by gerardnico