/
Lecture 6b – Association Pattern Mining Lecture 6b – Association Pattern Mining

Lecture 6b – Association Pattern Mining - PowerPoint Presentation

molly
molly . @molly
Follow
65 views
Uploaded On 2023-09-25

Lecture 6b – Association Pattern Mining - PPT Presentation

Dr Sampath Jayarathna Old Dominion University CS 495595 Introduction to Data Mining 1 Credit for some of the slides in this lecture goes to Xun Luo and Shun Liang Introduction Apriori ID: 1021129

tree frequent conditional node frequent tree node conditional pattern cont item header patterns support tableitemhead linksfcabmprootp base threshold 3minimum

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture 6b – Association Pattern Minin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Lecture 6b – Association Pattern MiningDr. Sampath JayarathnaOld Dominion University CS 495/595Introduction to Data Mining1Credit for some of the slides in this lecture goes to Xun Luo and Shun Liang

2. IntroductionApriori-like AlgorithmsGenerate-and-Test Cost BottleneckFP-Tree and FP-Growth AlgorithmFP-Tree: Frequent Pattern TreeFP-Growth: Mining frequent patterns with FP-Tree

3. Apriori-like AlgorithmsAlgorithmAnti-Monotone HeuristicIf any length k pattern is not in the database, its length (k+1) super-pattern can never be frequentGenerating candidate setTesting candidate set Two non-trivial costs: (Bottleneck)Candidate sets are huge. (They are pruned already but still increase exponentially with stage number k). Repeated scan the database and test the candidate set by pattern matching.

4. FP-Tree and FP-Growth AlgorithmFP-Tree: Frequent Pattern TreeCompact presentation of the DB without information loss.Easy to traverse, can quickly find out patterns associated with a certain item. Well-ordered by item frequency. FP-Growth AlgorithmStart mining from length-1 patternsRecursively do the following Constructs its conditional FP-treeConcatenate patterns from conditional FP-tree with suffixDivide-and-Conquer mining technique

5. FP-Tree DefinitionThree components:One root: labeled as “null”A set of item prefix subtreesA frequent-item header tableHeader Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3

6. FP-Tree Definition (cont.)Each node in the item prefix subtree consists of three fields:item-namenode-linkcountEach entry in the frequent-item header table consists of two fields:item-namehead of node-link

7. Example 1: FP-Tree ConstructionThe transaction database used minimum support threshold = 3TIDItems Bought100f,a,c,d,g,i,m,p200a,b,c,f,l,m,o300b,f,h,j,o400b,c,k,s,p500a,f,c,e,l,p,m,n

8. Example 1 (cont.)First Scan: count and sortcount the frequencies of each itemcollect length-1 frequent items, then sort them in support descending order into L, frequent item list. L = {(f:4), (c:4), (a:3), (b:3), (m:3), (p:3)}TIDItems Bought(Ordered) Frequent Items100f,a,c,d,g,i,m,pf,c,a,m,p200a,b,c,f,l,m,of,c,a,b,m300b,f,h,j,of,b400b,c,k,s,pc,b,p500a,f,c,e,l,p,m,nf,c,a,m,p

9. Example 1 (cont.)Second Scan: create the tree and header tablecreate the root, label it as “null”for each transaction Trans, doselect and sort the frequent items in Transincrease nodes count or create new nodes If prefix nodes already exist, increase their counts by 1; If no prefix nodes, create it and set count to 1.build the item header tablenodes with the same item-name are linked in sequence via node-links

10. Example 1 (cont.)rootCreate rootrootf : 1a : 1m : 1p : 1c : 1After trans 1 (f,c,a,m,p)The building process of the treerootb : 1m : 1f : 2a : 2m : 1p : 1c : 2After trans 2 (f,c,a,b,m)rootb : 1b : 1m : 1f : 3a : 2m : 1p : 1c : 2After trans 3 (f,b)TID(Ordered) Frequent Items100f,c,a,m,p200f,c,a,b,m300f,b400c,b,p500f,c,a,m,p

11. Example 1 (cont.)The building process of the tree (cont.)rootp : 1c : 1b : 1b : 1b : 1m : 1f : 3a : 2m : 1p : 1c : 2After trans 4 (c,b,p)rootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3After trans 5 (f,c,a,m,p)TID(Ordered) Frequent Items100f,c,a,m,p200f,c,a,b,m300f,b400c,b,p500f,c,a,m,p

12. Example 1 (cont.)Build the item header tableHeader Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3

13. FP-Tree PropertiesCompletenessEach transaction that contains frequent pattern is mapped to a path. Prefix sharing does not cause path ambiguity, as only path starts from root represents a transaction. CompactnessNumber of nodes bounded by overall occurrence of frequent items.Height of tree bounded by maximal number of frequent items in any transaction.

14. FP-Tree Properties (cont.)Traversal Friendly (for mining task) For any frequent item ai, all the possible frequent patterns that contain ai can be obtained by following ai’s node-links.This property is important for divide-and-conquer. It assures the soundness and completeness of problem reduction.

15. Example 1: Frequent Patterns from FP-treeStart from the bottom of the header table: node pTwo pathsp’s conditional pattern base{(f, c, a, m:2), (c, b:1)}p’s conditional FP-tree Only one branch (c:3) Header Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

16. Example 1 (cont.)Continue with node mTwo pathsm’s conditional pattern base{(f, c, a:2), (f, c, a, b:1)}m’s conditional FP-tree: (f:3, c:3, a:3)Header Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

17. Example 1 (cont.)Continue with node bThree pathsb’s conditional pattern base{(f, c:, a:1), (f:1), (c:1)}b’s conditional FP-treeΦHeader Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

18. Example 1 (cont.)Continue with node aOne patha’s conditional pattern base{(f, c:3)}a’s conditional FP-tree{(f:3, c:3)}Header Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

19. Example 1 (cont.)Continue with node cTwo pathsc’s conditional pattern base{(f:3)}c’s conditional FP-tree{(f:3)}Header Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

20. Example 1 (cont.)Continue with node fOne pathf’s conditional pattern base Φf’s conditional FP-treeΦHeader Tableitemhead of node-linksfcabmprootp : 1c : 1b : 1b : 1b : 1m : 1f : 4a : 3m : 2p : 2c : 3minimum support threshold = 3

21. Example 1 (cont.)Final results:itemconditional pattern baseconditional FP-treeFrequent Patternp{(f, c, a, m:2), (c, b:1)}{(c:3)}cp:3, p:3m{(f, c, a:2), (f, c, a, b:1)}{(f:3, c:3, a:3)}m:3, am:3, cm:3, fm:3, cam:3, fam:3, fcm:3, fcam:3b{(f, c, a:1), (f:1), (c:1)}Φb:3a{(f;3, c:3)}{f:3, c:3}fca:3, fa:3, ca:3, a3c{(f:3)}{(f:3)}c:4, fc:3fΦΦf:4 L = {(f:4), (c:4), (a:3), (b:3), (m:3), (p:3)}

22. FP-Grwoth : Python implementationdataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]import pandas as pdfrom mlxtend.preprocessing import TransactionEncoderte = TransactionEncoder()te_ary = te.fit(dataset).transform(dataset)df = pd.DataFrame(te_ary, columns=te.columns_)print(df)from mlxtend.frequent_patterns import fpgrowthfpgrowth(df, min_support=0.6)

23. FP-Grwoth : Python implementation#pip install pyfpgrowth at Anaconda promptimport pyfpgrowthtransactions = [[1, 2, 5], [2, 4], [2, 3], [1, 2, 4], [1, 3], [2, 3], [1, 3], [1, 2, 3, 5], [1, 2, 3]]patterns = pyfpgrowth.find_frequent_patterns(transactions, 2)rules = pyfpgrowth.generate_association_rules(patterns, 0.7)print(rules)

24. Activity 11For the following market basket, generate frequent itemset for min sup count =2 using FP-Growth Tree. You need to show both the tree and the table that consists of your final results including conditional pattern base, conditional FP-tree, and Frequent patterns.