Data Mining - (Boosting|Gradient Boosting|Boosting trees)
Table of Contents
1 - About
Boosting forces new classifiers to focus on the errors produced by earlier ones. boosting works by aggressively reducing the training error
Instead of creating trees from different random subsets, Boosting trees take the error from the previous tree and use it to improve the next one.
It's an iterative algorithm. The idea is that you create a model, and then take a look at the instance that are misclassified (The hard one to classify). You put extra weight on those instance to make a training set for producing the next model in the iteration. This encourage the new model to become an “expert” for instance that were misclassified.
Iterative: new models are influenced by performance of previously built ones
- extra weight for instances that are misclassified (“hard” ones)
- encourage new model to become an “expert” for instances misclassified by earlier models
- Intuitive justification: committee members should complement each other’s expertise
- Uses voting (or, for regression, averaging) but weights models according to their performance
If there is no structure to the features or if you have a limited amount of time to spend on a problem, one should definitely consider boosted trees.
Boosting is sequential (not parallel).
2 - Articles Related
3 - Property
4 - Boosting
4.1 - DecisionStump
In weka: meta > AdaBoostM1. It's a standard and very good implementation. AdaBoostM1 utilizes by default DecisionStump as its base learner
4.2 - Baseline
Boosting the baseline algorithm (No Rule) will produce the same classifier for practically any subset of the data. Combining these identical classifiers would give the same result as the baseline by itself.
5 - Performance
The default configuration of AdaBoostM1 is 10 boosting iterations using the DecisionStump classifier.
Performance might improve if :
- 100 iterations were used instead. See below.
- you kept to 10 iterations but used J48 instead of DecisionStump.
6 - Number of iterations
With boosting, the accuracy generally improves, up to an asymptote, as the number of iterations increases.
Example with AdaBoostM1 on the diabetes dataset.