Data Mining - Ensemble Learning (meta set)

1 - About

Combining multiple models into ensemble in order to produce an ensemble for learning.

(Committee| Collective Intelligence) decision of different classifier algorithms.

Having different classifiers (also known as expert, base learners) with different perspectives and let them vote is often a very effective and robust way of making good decisions.

  • Bagging randomize the training data. It obtains classifiers from different training set samples.
  • Random forests randomizing the algorithm. Randomizing the algorithm depend on what the algorithm is. (Here the decision tree algorithm). Decision Tree select the best attribute to split. This procedure is randomized by not necessarily selecting the very best but choosing a few of the best options and randomly picking amongst them. Generally, if you bag decision trees, if you randomize them and bag the result, you get better performance.
  • Boosting forces new classifiers to focus on the errors produced by earlier ones.
  • Stacking uses a “meta learner” to combine the predictions of “base learners.”

Diversity help, especially when the learners (model) are unstable (when small changes in the training data can produce large changes in the learned model).

Create diversity by

  • Bagging: resampling the training set
  • Random forests: alternative branches in decision trees
  • Boosting: focus on where the existing model makes errors
  • Stacking: combine results using another learner (instead of voting)

3 - Advantage / Disadvantage

The output is hard to analyze but the performance is really good.

4 - Documentation / Reference

data_mining/ensemble.txt · Last modified: 2013/10/13 15:24 by gerardnico