Data Mining - Decision boundary Visualization

1 - About

Classifiers create boundaries in instance space. Different classifiers have different biases. You can explore them by visualizing the classification boundaries.

In Weka, the visualization is restricted to numeric attributes, and 2D plots

3 - Example

  • Logistic Regression method produces linear boundary with a gradual transition from one color to another. Logistic regression is a sophisticated way of choosing a linear decision boundary for classification.
  • Support Vector Machine method: The resulting plot have no areas of pure color
  • Random Forest method, The boundary shapes has a checkered pattern with slightly fuzzy boundaries
Algorithm Boundary Shapes
Logistic Regression Strictly Linear
Knn piecewise linear
support vector machine piecewise linear
Decision tree definitely non-linear

knn decision boundary in any localized region of instance space is linear, determined by the nearest neighbors of the various classes in that region. But the neighbors change when you move around instance space, so the boundary is a set of linear segments that join together.

Support vector machines also produce piecewise linear boundaries.

c4.5 produces decision trees, which create non-linear boundaries in instance space.

Logistic regression is a sophisticated way of producing a good linear decision boundary, which is necessarily simple and therefore less likely to overfit.

support vector machine produce piecewise linear boundaries, but is resilient against overfitting because it relies on a small number of support vectors.

The Logistic classifier (and also meta.ClassificationViaRegression) calculates a linear decision boundary.

The boosting algorithm AdaBoostM1 has a checkered pattern with crisp boundaries

4 - Documentation / Reference

data_mining/boundary.txt · Last modified: 2014/02/10 09:29 by gerardnico