Data Mining - Decision Tree (DT) Algorithm
Table of Contents
1 - About
- easy to interpret (due to the tree structure)
Decision trees extract predictive information in the form of human-understandable tree-rules. Decision Tree is a algorithm useful for many classification problems that that can help explain the model’s logic using human-readable “If…. Then…” rules.
- reliable and robust algorithm.
- simple to implement.
- work on categorical attributes,
- handle many attributes, so big p smaller n cases.
Each decision in the tree can be seen as an feature.
2 - Articles Related
3 - Algorithm
The creation of a tree is a quest for:
- purity (only pure node: only yes or no)
- the smallest tree
At each level, choose the attribute that produces the “purest” nodes (ie choosing the attribute with the highest information gain)
4 - Overfitting
Decision Trees are prone to overfitting:
- Pruning can help: remove or aggregate sub-trees that provide little discriminatory power
5 - Library
- FFTrees - Create, visualize, and test fast-and-frugal decision trees (FFTs). FFTs are very simple decision trees for binary classification problems. FFTs can be preferable to more complex algorithms because they are easy to communicate, require very little information, and are robust against overfitting.
6 - Example
6.1 - Titanic (Survive Yes or No)
if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" and age < 5 then Survive = "yes" if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" then Survive = "no" if Ticket Class = "3" if Sex = "male" then Survive = "no" if Sex = "female" then if Age < 4 then Survive = "yes" if Age >= 4 then Survive = "no"
Every path from the root is a rule
7 - Type
7.1 - Univariate
Single tests at the nodes
7.2 - multivariate
Compound tests at the nodes