Machine learning - Bootstrap aggregating (bagging)

1 - About

Bootstrap aggregating (bagging) is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression.

It also reduces variance and helps to avoid over-fitting. Although it is usually applied to decision tree methods, it can be used with any type of method.

Bagging:

  1. produces several different training sets of the same size with replacement
  2. and then build a model for each one using the same machine learning scheme
  3. Combine predictions by voting for a nominal target or averaging for a numeric target

Bagging can be parallelized.

3 - Advantage / Inconvenient

It's very suitable for “unstable” learning schemes which means that small change in training data can make big change in the model.

Example: decision trees is a very unstable schema but not Naïve Bayes or instance‐based learning because all attributes contributes independently

4 - Replacement

In bagging, you sample the set “with replacement” which means that you might get in your sample two of the same instance.

5 - Implementation

5.1 - Weka

meta>Bagging

6 - Documentation / Reference

data_mining/bagging.txt · Last modified: 2017/11/16 23:05 by 162.158.59.230