rule discovery Market basketanalysis MIS2502 Data Analytics Adapted from Tan Steinbach and Kumar 2004 Introduction to Data Mining httpwwwuserscsumnedukumardmbook What is Association Mining ID: 155757
Download Presentation The PPT/PDF document "Association" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Association rule discovery(Market basket-analysis)
MIS2502
Data Analytics
Adapted from Tan, Steinbach, and Kumar (2004).
Introduction to Data Mining
.
http://www-users.cs.umn.edu/~kumar/dmbook/Slide2
What is Association Mining?Discovering interesting relationships between variables in
large databases
(http://
en.wikipedia.org/wiki/Association_rule_learning)
Find out which items predict the occurrence of other items
Also known as “affinity analysis” or “market basket” analysisSlide3
Examples of Association MiningMarket basket analysis/affinity analysis
What products are bought together?
Where to place items on grocery store shelves?
Amazon’s recommendation engine
“People who bought this product also bought…”
Telephone calling patternsWho do a set of people tend to call most often?Social network analysisDetermine who you “may know”Slide4
Market-Basket AnalysisLarge set of itemse.g., things sold in a supermarket
Large set of baskets
e.g., things one customer buys in one visit
Supermarket chains keep terabytes of this data
Informs store layout
Suggests “tie-ins”Place spaghetti sauce in the pasta aislePut diapers on sale and raise the price of beerSlide5
Market-Basket Transactions
Basket
Items
1
Bread,
Milk
2
Bread,
Diapers,
Beer, Eggs
3
Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke
Association Rules from these transactions
X
Y
(antecedent consequent)
{Diapers}
{Beer},
{Milk
, Bread}
{Diapers}
{
Beer
,
Bread}
{Milk},
{Bread}
{Milk, Diapers
}Slide6
Core idea: The itemset
Itemset
:
A group of items of interest
{Milk, Beer, Diapers}
This itemset is a “3 itemset” because itcontains…3 items!An association rule expresses related itemsets
X
Y, where X and Y are two
itemsets
{Milk, Diapers}
{Beer} means
“when you have milk and diapers, you also have beer)BasketItems1Bread, Milk2Bread, Diapers, Beer, Eggs3
Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke Slide7
SupportSupport count (
)
Frequency of occurrence of an
itemset
{Milk, Beer, Diapers} = 2 (
i.e., it’s in baskets 4 and 5
)
Support (s)
Fraction of transactions that contain all
itemsets
in the relationship X Ys({Milk, Diapers, Beer}) = 2/5 = 0.4You can calculate support for both X and Y separatelySupport for X = 3/5 = 0.6; Support for Y = 3/5 = 0.6BasketItems1Bread, Milk2Bread, Diapers,
Beer, Eggs3Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke
X
YSlide8
Confidence
Confidence
is the strength of the
association
Measures how often items in Y appear in
transactions that contain X
Basket
Items
1
Bread,
Milk
2Bread, Diapers, Beer, Eggs3Milk, Diapers, Beer, Coke 4
Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke This says 67% of the times when you have milk and diapers in the itemset you also have beer!
c must be between
0 and 1
1 is a complete association
0 is no associationSlide9
Some sample rules
Association Rule
Support (s)
Confidence (c)
{
Milk,Diapers}
{Beer}
2/5
= 0.4
2/3 = 0.67
{Milk,Beer} {Diapers} 2/5 = 0.42/2 = 1.0{Diapers,Beer} {Milk} 2/5 = 0.42/3 = 0.67{Beer} {Milk,Diapers}2/5 = 0.42/3 = 0.67{Diapers} {Milk,Beer}
2/5 = 0.42/4 = 0.5{Milk} {Diapers,Beer} 2/5 = 0.42/4 = 0.5BasketItems1Bread, Milk2Bread, Diapers,
Beer, Eggs
3
Milk,
Diapers,
Beer
,
Coke
4
Bread,
Milk,
Diapers,
Beer
5
Bread,
Milk,
Diapers
, Coke
All the above rules are binary partitions of the same
itemset
:
{
Milk,
Diapers,
Beer}Slide10
But don’t blindly follow the numbersRules originating from the same
itemset
(X
Y) will
have identical support Since the total elements (XY) are always the sameBut they can have different confidenceDepending on what is contained in X and Y
High confidence
suggests a strong
association
But this can be deceptive
Consider {Bread}
{Diapers}Support for the total itemset is 0.6 (3/5)And confidence is 0.75 (3/4) – pretty highBut is this just because both are frequently occurring items (s=0.8)?You’d almost expect them to show up in the same baskets by chanceSlide11
LiftTakes into account how co-occurrence differs from what is expected by chancei.e., if items were selected independently from one another
Based on the support metric
Support for total
itemset
X and Y
Support for X times support for YSlide12
Lift ExampleWhat’s the lift of the association rule
{Milk, Diapers}
{Beer}
So X = {Milk, Diapers} and Y = {Beer}
s({Milk, Diapers, Beer}) = 2/5 = 0.4
s({Milk, Diapers}) = 3/5 = 0.6s({Beer}) = 3/5 = 0.6
So
Basket
Items
1
Bread,
Milk2Bread, Diapers, Beer, Eggs3Milk, Diapers, Beer, Coke 4
Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke
When Lift > 1, the occurrence of
X
Y together is more likely than what you would expect by chanceSlide13
Another example
Checking Account
Savings Account
No
Yes
No
500
3500
4000
Yes
1000
5000
6000
10000
Are people more likely to have a checking account if they have a savings account?
Support ({Savings}
{Checking}) = 5000/10000 = 0.5
Support ({Savings}) = 6000/10000 = 0.6
Support ({Checking}) = 8500/10000 = 0.85
Confidence ({Savings}
{Checking}) = 5000/6000 = 0.83
Answer: No
In fact, it’s slightly less than what you’d expect by chance!Slide14
Selecting the rulesWe know how to calculate the measures for each rule
Support
Confidence
Lift
Then we set up
thresholds for the minimum rule strength we want to acceptFor support – called minsupFor confidence – called minconfSlide15
But this can be overwhelmingImagine all the combinations possiblein your local grocery store
Every product matched with every
combination of other products
Tens of thousands of possible rule
combinations
So where do you start?Slide16
Once you are confident in a rule, take action
{Milk, Diapers}
{Beer}