Data Mining - Decision Tree (DT) Algorithm
Table of Contents
1 - About
- easy to interpret (due to the tree structure)
Decision trees extract predictive information in the form of human-understandable tree-rules. Decision Tree is a algorithm useful for many classification problems that that can help explain the model’s logic using human-readable “If…. Then…” rules.
- reliable and robust algorithm.
- simple to implement.
- work on categorical attributes,
- handle many attributes, so big p smaller n cases.
Each decision in the tree can be seen as an feature.
2 - Articles Related
3 - Algorithm
The creation of a tree is a quest for:
- purity (only pure node: only yes or no)
- the smallest tree
At each level, choose the attribute that produces the “purest” nodes (ie choosing the attribute with the highest information gain)
4 - Overfitting
Decision Trees are prone to overfitting:
- Pruning can help: remove or aggregate sub-trees that provide little discriminatory power
5 - Example
5.1 - Titanic (Survive Yes or No)
if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" and age < 5 then Survive = "yes" if Ticket Class = "1" then if Sex = "female" then Survive = "yes" if Sex = "male" then Survive = "no" if Ticket Class = "3" if Sex = "male" then Survive = "no" if Sex = "female" then if Age < 4 then Survive = "yes" if Age >= 4 then Survive = "no"
Every path from the root is a rule
6 - Type
6.1 - Univariate
Single tests at the nodes
6.2 - multivariate
Compound tests at the nodes