Data Mining - Random forest

Thomas Bayes

About

Random forest (or random forests) is a trademark term for an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the classes output by individual trees.

Random forests are collections of trees, all slightly different.

It randomize the algorithm, not the training data. How you randomize depends on the algorithm, for c4.5: don’t pick the best, pick randomly from the k best options

It generally improves decision trees decisions.

Unlike single decision trees which are likely to suffer from high variance or high Bias Random Forests use averaging to find a natural balance between the two extremes.

A random forest is a meta estimator that fits a number of classifical decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

Each decision tree is constructed by using a Random subset of the training data.

Interpretation

Diagnostics charts from random forests are much easier to understand than what comes from logistic regression

Implementation

Weka

trees>RandomForests (options: number of trees (default 10), maximum depth of trees, number of attributes)

Documentation / Reference





Discover More
Adaboost Accuracy By Numiterator Boosting
Data Mining - (Boosting|Gradient Boosting|Boosting trees)

Boosting forces new classifiers to focus on the errors produced by earlier ones. boosting works by aggressively reducing the training error Gradient Boosting is an algorithm based on an ensemble of decision...
Data Mining Algorithm
Data Mining - Algorithms

An is a mathematical procedure for solving a specific kind of problem. For some data mining functions, you can choose among several algorithms. Algorithm Function Type Description Decision...
Thomas Bayes
Data Mining - Decision Tree (DT) Algorithm

Desicion Tree (DT) are supervised Classification algorithms. They are: easy to interpret (due to the tree structure) a boolean function (If each decision is binary ie false or true) Decision trees...
Thomas Bayes
Data Mining - Decision boundary Visualization

Classifiers create boundaries in instance space. Different classifiers have different biases. You can explore them by visualizing the classification boundaries. Logistic Regression method produces...
Thomas Bayes
Data Mining - Ensemble Learning (meta set)

Combining multiple models into ensemble in order to produce an ensemble for learning. (Committee| Collective Intelligence) decision of different classifier algorithms. Having different classifiers (also...



Share this page:
Follow us:
Task Runner