/
Data Mining Data Mining

Data Mining - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
420 views
Uploaded On 2015-11-08

Data Mining - PPT Presentation

Classification Alternative Techniques From Chapter 5 in Introduction to Data Mining by Tan Steinbach Kumar RuleBased Classifier Classify records by using a collection of ifthen rules ID: 186598

rules rule set class rule rules class set birth give examples positive negative method pruning covers fly live covered

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Mining" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Mining Classification: Alternative Techniques

From Chapter 5 in Introduction to Data Mining by Tan, Steinbach, KumarSlide2

Rule-Based Classifier

Classify records by using a collection of “if…then…” rulesRule: (Condition)  ywhere Condition is a conjunctions of attributes y is the class labelLHS: rule antecedent or conditionRHS: rule consequentExamples of classification rules:

(Blood Type=Warm) 

(Lay Eggs=Yes)  Birds (Taxable Income < 50K)  (Refund=Yes)  Evade=NoSlide3

Rule-based Classifier (Example)

R1: (Give Birth = no)  (Can Fly = yes)  BirdsR2: (Give Birth = no)  (Live in Water = yes)  FishesR3: (Give Birth = yes)  (Blood Type = warm)  MammalsR4: (Give Birth = no)  (Can Fly = no)  Reptiles

R5: (Live in Water

= sometimes)  AmphibiansSlide4

Application of Rule-Based Classifier

A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule

R1: (Give Birth = no)

 (Can Fly = yes)  BirdsR2: (Give Birth = no)  (Live in Water = yes)  Fishes

R3: (Give Birth = yes)

 (Blood Type = warm)  Mammals

R4: (Give Birth = no)

 (Can Fly = no)  Reptiles

R5: (Live in Water

= sometimes)  Amphibians

The rule R1 covers a hawk => Bird

The rule R3 covers the grizzly bear => MammalSlide5

Rule Coverage and Accuracy

Coverage of a rule:Fraction of records that satisfy the antecedent of a ruleAccuracy of a rule:Fraction of records that satisfy both the antecedent and consequent of a rule

(Status=Single)  No

Coverage = 40%, Accuracy = 50%Slide6

How does Rule-based Classifier Work?

R1: (Give Birth = no)  (Can Fly = yes)  Birds

R2: (Give Birth = no)

 (Live in Water = yes)  FishesR3: (Give Birth = yes)  (Blood Type = warm)  MammalsR4: (Give Birth = no)  (Can Fly = no)  Reptiles

R5: (Live in Water

= sometimes)  Amphibians

A lemur triggers rule R3, so it is classified as a mammal

A turtle triggers both R4 and R5

A dogfish shark triggers none of the rulesSlide7

Characteristics of Rule-Based Classifier

Mutually exclusive rulesClassifier contains mutually exclusive rules if the rules are independent of each otherEvery record is covered by at most one ruleExhaustive rulesClassifier has exhaustive coverage if it accounts for every possible combination of attribute valuesEach record is covered by at least one ruleSlide8

From Decision Trees To Rules

Rules are mutually exclusive and exhaustive

Rule set contains as much information as the treeSlide9

Rules Can Be Simplified

Initial Rule: (Refund=No)

(Status=Married)  No

Simplified Rule: (Status=Married)  NoSlide10

Effect of Rule Simplification

Rules are no longer mutually exclusiveA record may trigger more than one rule Solution? Ordered rule set Unordered rule set – use voting schemesRules are no longer exhaustiveA record may not trigger any rulesSolution? Use a default classSlide11

Ordered Rule Set

Rules are rank ordered according to their priorityAn ordered rule set is known as a decision listWhen a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has triggeredIf none of the rules fired, it is assigned to the default class

R1: (Give Birth = no)

 (Can Fly = yes)  BirdsR2: (Give Birth = no)  (Live in Water = yes)  Fishes

R3: (Give Birth = yes)

 (Blood Type = warm)  Mammals

R4: (Give Birth = no)

 (Can Fly = no)  Reptiles

R5: (Live in Water

= sometimes)  Amphibians Slide12

Building Classification Rules

Direct Method: Extract rules directly from data e.g.: RIPPER, CN2, Holte’s 1RIndirect Method: Extract rules from other classification models (e.g. decision trees, neural networks, etc). e.g: C4.5rulesSlide13

Direct Method: Sequential Covering

Start from an empty ruleGrow a rule using the Learn-One-Rule functionRemove training records covered by the ruleRepeat Step (2) and (3) until stopping criterion is met Slide14

Example of Sequential Covering

A rule is desirable if it covers most of the positive and none or very few of the negative examples. Slide15

Example of Sequential Covering…Slide16

Aspects of Sequential Covering

Learn-One-Rule function is to extract a classification rule that covers many of the positive examples and none/few of the negative examples. It is computationally expensive given the exponential size of the search space. The function grow the rules by in a greedy fashionInstance EliminationRule EvaluationStopping CriterionRule PruningSlide17

Rule Growing

Two common strategies Slide18

Instance Elimination

Why do we need to eliminate instances?Otherwise, the next rule is identical to previous ruleWhy do we remove positive instances?Ensure that the next rule is differentWhy do we remove negative instances?Prevent underestimating accuracy of ruleCompare rules R2 and R3 in the diagramSlide19

Rule Evaluation

Metrics:AccuracyLaplaceM-estimate

n :

Number of instances covered by rule

n

c

:

Number of instances covered by rule

k

: Number of classes

p

: Prior probability

Rule r1, covers 50 positive examples and 5 negative examples

Rule r2, covers 2 positive examples and 0 negative examples

Laplace and m-estimate are equivalent if p=1/kSlide20

Stopping Criterion and Rule Pruning

Stopping criterionCompute the gainIf gain is not significant, discard the new ruleRule PruningSimilar to post-pruning of decision treesReduced Error Pruning: Remove one of the conjuncts in the rule Compare error rate on validation set before and after pruning If error improves, prune the conjunctSlide21

Summary of Direct Method

Grow a single ruleRemove Instances from rulePrune the rule (if necessary)Add rule to Current Rule SetRepeatSlide22

Direct Method: RIPPER

For 2-class problem, choose one of the classes as positive class, and the other as negative classLearn rules for positive classNegative class will be default classFor multi-class problemOrder the classes according to increasing class prevalence (fraction of instances that belong to a particular class)Learn the rule set for smallest class first, treat the rest as negative classRepeat with next smallest class as positive classSlide23

Direct Method: RIPPER

Growing a rule:Start from empty ruleAdd conjuncts as long as they improve FOIL’s information gainStop when rule no longer covers negative examplesPrune the rule immediately using incremental reduced error pruningMeasure for pruning: v = (p-n)/(p+n) p: number of positive examples covered by the rule in the validation set n: number of negative examples covered by the rule in

the validation setPruning method: delete any final sequence of conditions that maximizes vSlide24

Direct Method: RIPPER

Building a Rule Set:Use sequential covering algorithm Finds the best rule that covers the current set of positive examples Eliminate both positive and negative examples covered by the ruleEach time a rule is added to the rule set, compute the new description length stop adding new rules when the new description length is d bits longer than the smallest description length obtained so farSlide25

Direct Method: RIPPER

Optimize the rule set:For each rule r in the rule set R Consider 2 alternative rules:Replacement rule (r*): grow new rule from scratchRevised rule(r’): add conjuncts to extend the rule r Compare the rule set for r against the rule set for r* and r’ Choose rule set that minimizes MDL principleRepeat rule generation and rule optimization for the remaining positive examplesSlide26

Indirect Methods

Look at rule 2, 3, 5

Can be simplified into

R2’: Q= yes

 +

R3: P=yes ^ R=no  + Slide27

Indirect Method: C4.5rules

Extract rules from an unpruned decision treeFor each rule, r: A  y, consider an alternative rule r’: A’  y where A’ is obtained by removing one of the conjuncts in ACompare the error rate for r against all r’sPrune if one of the r’s has lower error rate

Repeat until we can no longer improve generalization errorSlide28

Indirect Method: C4.5rules

Instead of ordering the rules, order subsets of rules (class ordering)Each subset is a collection of rules with the same rule consequent (class)Compute description length of each subset Description length = L(error) + g L(model) g is a parameter that takes into account the presence of redundant attributes in a rule set (default value = 0.5)Slide29

Advantages of Rule-Based Classifiers

As highly expressive as decision treesEasy to interpretEasy to generateCan classify new instances rapidlyPerformance comparable to decision trees