Data Mining - Random forest

1 - About

Random forest (or random forests) is a trademark term for an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees.

Random forests are collections of trees, all slightly different.

It randomize the algorithm, not the training data. How you randomize depends on the algorithm, for c4.5: don’t pick the best, pick randomly from the k best options

It generally improves decision trees decisions.

Unlike single decision trees which are likely to suffer from high variance or high Bias Random Forests use averaging to find a natural balance between the two extremes.

A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Each decision tree is constructed by using a Random subset of the training data.

3 - Interpretation

Diagnostics charts from random forests are much easier to understand than what comes from logistic regression

4 - Implementation

4.1 - Weka

trees>RandomForests (options: number of trees (default 10), maximum depth of trees, number of attributes)

5 - Documentation / Reference

data_mining/random_forest.txt · Last modified: 2017/09/13 16:15 by gerardnico