Data Mining - (Decision) Rule
Table of Contents
1 - About
Some forms of predictive data mining generate rules that are conditions that imply a given outcome.
Rules are if-then-else expressions; they explain the decisions that lead to the prediction.
They are produced from a decision tree or association (such as association rule)
For example, a rule might specify that a person:
- who has a bachelor's degree (an attribute)
- and lives in a certain neighbourhood (an other attribute)
is likely to have an income greater than the regional average.
ODM Basket Analysis rules where antecedent and consequent contain the product ID.
2 - Articles Related
3 - Type
3.1 - Classification
A classification rule predicts value of a given attribute:
If outlook = sunny and humidity = high then play = no
3.2 - Association
A Association rule predicts value of arbitrary attribute (or combination)
If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no then humidity = high If windy = false and play = no then outlook = sunny and humidity = high
4 - Set of Rules
rule is easy to interpret, but a complex set of rules probably isn’t.
A sequential cover algorithm for sets of rules with complex conditions. Sets of rules are hard to interpret.
5 - Algorithm
Strategies for Learning Each Rule:
- General-to-Specific:
- Start with an empty rule
- Add constraints to eliminate negative examples
- Stop when only positives are covered
- Specific-to-General
- Start with a rule that identifies a single random instance
- Remove constraints in order to cover more positives
- Stop when further generalization results in covering negatives
If more than one rule is triggered (Conflicts),
- choose the “most specific” rule
- Use domain knowledge to order rules by priority
5.1 - General-to-Specific
Learning Rules by Sequential Covering (src: Alvarez)
Initialize the ruleset R to the empty set for each class C { while D is nonempty { Construct one rule r that correctly classifies some instances in D that belong to class C and does not incorrectly classify any non-C instances # See below "Finding next rule for class C" Add rule r to ruleset R Remove from D all instances correctly classified by r } } return the ruleset R
Sequential Covering: Finding next rule for class C
Initialize A as the set of all attributes over D while r incorrectly classifies some non-C instances of D { write r as antecedent(r) => C for each attribute-value pair (a=v), where a belongs to A and v is a value of a, compute the accuracy of the rule antecedent(r) and (a=v) => C let (a*=v*) be the attribute-value pair of maximum accuracy over D; in case of a tie, choose the pair that covers the greatest number of instances of D update r by adding (a*=v*) to its antecedent: r = ( antecedent(r) and (a*=v*) ) => C remove the attribute a* from the set A: A = A - {a*} }
6 - Properties
6.1 - Support
Rules have an associated support (What percentage of the population satisfies the rule?).
6.2 - Confidence
Confidence is the proportion of the occurrences of the antecedent that result in the consequent e.g. how many times do we get C when we have A and B {A, B} ⇒ C
6.3 - Lift
Lift indicates the strength of a rule over the random co-occurrence of the antecedent and the consequent