Data Mining - (Boosting|Gradient Boosting|Boosting trees)

Thomas Bayes

About

Boosting forces new classifiers to focus on the errors produced by earlier ones. boosting works by aggressively reducing the training error

Gradient Boosting is an algorithm based on an ensemble of decision tree similar to random forests.

Instead of creating trees from different random subsets, Boosting trees take the error from the previous tree and use it to improve the next one.

It's an iterative algorithm. The idea is that you create a model, and then take a look at the instance that are misclassified (The hard one to classify). You put extra weight on those instance to make a training set for producing the next model in the iteration. This encourage the new model to become an “expert” for instance that were misclassified.

Iterative: new models are influenced by performance of previously built ones

  • extra weight for instances that are misclassified (“hard” ones)
  • encourage new model to become an “expert” for instances misclassified by earlier models
  • Intuitive justification: committee members should complement each other’s expertise
  • Uses voting (or, for regression, averaging) but weights models according to their performance

If there is no structure to the features or if you have a limited amount of time to spend on a problem, one should definitely consider boosted trees.

Boosting is sequential (not parallel).

Property

Boosting seems to not overfit (Why boosting does'nt overfit)

Boosting

DecisionStump

In weka: meta > AdaBoostM1. It's a standard and very good implementation. AdaBoostM1 utilizes by default DecisionStump as its base learner

Baseline

Boosting the baseline algorithm (No Rule) will produce the same classifier for practically any subset of the data. Combining these identical classifiers would give the same result as the baseline by itself.

Performance

The default configuration of AdaBoostM1 is 10 boosting iterations using the DecisionStump classifier.

Performance might improve if :

  • 100 iterations were used instead. See below.
  • you kept to 10 iterations but used J48 instead of DecisionStump.

Number of iterations

With boosting, the accuracy generally improves, up to an asymptote, as the number of iterations increases.

Example with AdaBoostM1 on the diabetes dataset.

Adaboost Accuracy By Numiterator Boosting

numIterations Accuracies
1 71.875
10 74.349
20 75.2604
30 74.7396
40 74.7396
50 74.349
60 75.3906
70 75.1302
80 74.4792
90 74.8698
100 75.3906

Documentation / Reference





Discover More
Thomas Bayes
Data Mining - (Discriminative|conditional) models

Discriminative models, also called conditional models, are a class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x. Discriminative...
Thomas Bayes
Data Mining - Decision boundary Visualization

Classifiers create boundaries in instance space. Different classifiers have different biases. You can explore them by visualizing the classification boundaries. Logistic Regression method produces...
Thomas Bayes
Data Mining - Ensemble Learning (meta set)

Combining multiple models into ensemble in order to produce an ensemble for learning. (Committee| Collective Intelligence) decision of different classifier algorithms. Having different classifiers (also...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Thomas Bayes
Machine Learning - Decision Stump

It makes a binary split on one of the attributes. It's considered as weak learner“ because it can only produce a tree with one level as One Rule. The boosting algorithm AdaBoostM1 utilizes it by default...



Share this page:
Follow us:
Task Runner