Classification Alternative Techniques From Chapter 5 in Introduction to Data Mining by Tan Steinbach Kumar RuleBased Classifier Classify records by using a collection of ifthen rules ID: 186598
Download Presentation The PPT/PDF document "Data Mining" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Data Mining Classification: Alternative Techniques
From Chapter 5 in Introduction to Data Mining by Tan, Steinbach, KumarSlide2
Rule-Based Classifier
Classify records by using a collection of “if…then…” rulesRule: (Condition) ywhere Condition is a conjunctions of attributes y is the class labelLHS: rule antecedent or conditionRHS: rule consequentExamples of classification rules:
(Blood Type=Warm)
(Lay Eggs=Yes) Birds (Taxable Income < 50K) (Refund=Yes) Evade=NoSlide3
Rule-based Classifier (Example)
R1: (Give Birth = no) (Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water
= sometimes) AmphibiansSlide4
Application of Rule-Based Classifier
A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule
R1: (Give Birth = no)
(Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes)
(Blood Type = warm) Mammals
R4: (Give Birth = no)
(Can Fly = no) Reptiles
R5: (Live in Water
= sometimes) Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => MammalSlide5
Rule Coverage and Accuracy
Coverage of a rule:Fraction of records that satisfy the antecedent of a ruleAccuracy of a rule:Fraction of records that satisfy both the antecedent and consequent of a rule
(Status=Single) No
Coverage = 40%, Accuracy = 50%Slide6
How does Rule-based Classifier Work?
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no)
(Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) MammalsR4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water
= sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rulesSlide7
Characteristics of Rule-Based Classifier
Mutually exclusive rulesClassifier contains mutually exclusive rules if the rules are independent of each otherEvery record is covered by at most one ruleExhaustive rulesClassifier has exhaustive coverage if it accounts for every possible combination of attribute valuesEach record is covered by at least one ruleSlide8
From Decision Trees To Rules
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the treeSlide9
Rules Can Be Simplified
Initial Rule: (Refund=No)
(Status=Married) No
Simplified Rule: (Status=Married) NoSlide10
Effect of Rule Simplification
Rules are no longer mutually exclusiveA record may trigger more than one rule Solution? Ordered rule set Unordered rule set – use voting schemesRules are no longer exhaustiveA record may not trigger any rulesSolution? Use a default classSlide11
Ordered Rule Set
Rules are rank ordered according to their priorityAn ordered rule set is known as a decision listWhen a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has triggeredIf none of the rules fired, it is assigned to the default class
R1: (Give Birth = no)
(Can Fly = yes) BirdsR2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes)
(Blood Type = warm) Mammals
R4: (Give Birth = no)
(Can Fly = no) Reptiles
R5: (Live in Water
= sometimes) Amphibians Slide12
Building Classification Rules
Direct Method: Extract rules directly from data e.g.: RIPPER, CN2, Holte’s 1RIndirect Method: Extract rules from other classification models (e.g. decision trees, neural networks, etc). e.g: C4.5rulesSlide13
Direct Method: Sequential Covering
Start from an empty ruleGrow a rule using the Learn-One-Rule functionRemove training records covered by the ruleRepeat Step (2) and (3) until stopping criterion is met Slide14
Example of Sequential Covering
A rule is desirable if it covers most of the positive and none or very few of the negative examples. Slide15
Example of Sequential Covering…Slide16
Aspects of Sequential Covering
Learn-One-Rule function is to extract a classification rule that covers many of the positive examples and none/few of the negative examples. It is computationally expensive given the exponential size of the search space. The function grow the rules by in a greedy fashionInstance EliminationRule EvaluationStopping CriterionRule PruningSlide17
Rule Growing
Two common strategies Slide18
Instance Elimination
Why do we need to eliminate instances?Otherwise, the next rule is identical to previous ruleWhy do we remove positive instances?Ensure that the next rule is differentWhy do we remove negative instances?Prevent underestimating accuracy of ruleCompare rules R2 and R3 in the diagramSlide19
Rule Evaluation
Metrics:AccuracyLaplaceM-estimate
n :
Number of instances covered by rule
n
c
:
Number of instances covered by rule
k
: Number of classes
p
: Prior probability
Rule r1, covers 50 positive examples and 5 negative examples
Rule r2, covers 2 positive examples and 0 negative examples
Laplace and m-estimate are equivalent if p=1/kSlide20
Stopping Criterion and Rule Pruning
Stopping criterionCompute the gainIf gain is not significant, discard the new ruleRule PruningSimilar to post-pruning of decision treesReduced Error Pruning: Remove one of the conjuncts in the rule Compare error rate on validation set before and after pruning If error improves, prune the conjunctSlide21
Summary of Direct Method
Grow a single ruleRemove Instances from rulePrune the rule (if necessary)Add rule to Current Rule SetRepeatSlide22
Direct Method: RIPPER
For 2-class problem, choose one of the classes as positive class, and the other as negative classLearn rules for positive classNegative class will be default classFor multi-class problemOrder the classes according to increasing class prevalence (fraction of instances that belong to a particular class)Learn the rule set for smallest class first, treat the rest as negative classRepeat with next smallest class as positive classSlide23
Direct Method: RIPPER
Growing a rule:Start from empty ruleAdd conjuncts as long as they improve FOIL’s information gainStop when rule no longer covers negative examplesPrune the rule immediately using incremental reduced error pruningMeasure for pruning: v = (p-n)/(p+n) p: number of positive examples covered by the rule in the validation set n: number of negative examples covered by the rule in
the validation setPruning method: delete any final sequence of conditions that maximizes vSlide24
Direct Method: RIPPER
Building a Rule Set:Use sequential covering algorithm Finds the best rule that covers the current set of positive examples Eliminate both positive and negative examples covered by the ruleEach time a rule is added to the rule set, compute the new description length stop adding new rules when the new description length is d bits longer than the smallest description length obtained so farSlide25
Direct Method: RIPPER
Optimize the rule set:For each rule r in the rule set R Consider 2 alternative rules:Replacement rule (r*): grow new rule from scratchRevised rule(r’): add conjuncts to extend the rule r Compare the rule set for r against the rule set for r* and r’ Choose rule set that minimizes MDL principleRepeat rule generation and rule optimization for the remaining positive examplesSlide26
Indirect Methods
Look at rule 2, 3, 5
Can be simplified into
R2’: Q= yes
+
R3: P=yes ^ R=no + Slide27
Indirect Method: C4.5rules
Extract rules from an unpruned decision treeFor each rule, r: A y, consider an alternative rule r’: A’ y where A’ is obtained by removing one of the conjuncts in ACompare the error rate for r against all r’sPrune if one of the r’s has lower error rate
Repeat until we can no longer improve generalization errorSlide28
Indirect Method: C4.5rules
Instead of ordering the rules, order subsets of rules (class ordering)Each subset is a collection of rules with the same rule consequent (class)Compute description length of each subset Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)Slide29
Advantages of Rule-Based Classifiers
As highly expressive as decision treesEasy to interpretEasy to generateCan classify new instances rapidlyPerformance comparable to decision trees