Machine Learning - (One|Simple) Rule - (One Level Decision Tree)

1 - About

One Rule is an simple method based on a 1‐level decision tree described in 1993 by Rob Holte, Alberta, Canada.

Simple rules often outperformed far more complex methods because some datasets are :

  • really simple
  • so small/noisy/complex that nothing can be learned from them

3 - Implementation

3.1 - Basic

  • One branch for each value
  • Each branch assigns most frequent class
  • Error rate: proportion of instances that don’t belong to the majority class of their corresponding branch
  • Choose attribute with smallest error rate
For each attribute,
   For each value of the attribute,
   make a rule as follows:
       count how often each class appears
       find the most frequent class
       make the rule assign that class to this attribute-value
   Calculate the error rate of this attribute’s rules
Choose the attribute with the smallest error rate

Example of output for the weather data set

outlook:
    if sunny	-> no
    if overcast	-> yes
    if rainy	-> yes

with this one-level decision tree, 10 instances are correct on 14.

3.2 - Other

Algorithm to choose the best rule

For each attribute:
  For each value of that attribute, create a rule:
      1. count how often each class appears
      2. find the most frequent class, c
      3. make a rule "if attribute=value then class=c"
  Calculate the error rate of this rule
Pick the attribute whose rules produce the lowest error rate

4 - One Rule vs Baseline

OneR always outperforms (or, at worst, equals) Baseline when evaluated on the training data. (evaluating on the training data doesn't reflect performance on independent test data.)

ZeroR sometimes outperforms OneR if the target distribution is skewed or limited data is available, predicting the majority class can yield better results than basing a rule on a single attribute. This happens with the nominal weather dataset

5 - minBucket Size

The “minBucket size” parameter of weka limits the complexity of rules in order to avoid overfitting (Default 6)

With one “minBucket size” the accuracy on the training data set is really high and decreases whereas the “minBucket size parameter” increases.

The cross validation evaluation method (10 folders) limits the accuracy effect and make it more stable through the “minBucket size” values.

min
Bucket
Size
Parameter
Eval
Method:
Cross
Valid-
ation
Accuracy
Eval
Method:
Training
Set
Accuracy
Number
of
conditions
generated
1 47.66 92.99 106
2 48.13 71.5 31
3 59.81 68.22 14
4 59.35 66.36 10
5 57.94 63.55 8
6 57.94 63.08 8
7 58.41 62.14 6
8 56.07 61.68 6
9 57.48 60.75 4
10 57.94 59.34 4
data_mining/one_rule.txt · Last modified: 2017/10/20 11:00 by gerardnico