/
Association Association

Association - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
370 views
Uploaded On 2015-10-10

Association - PPT Presentation

rule discovery Market basketanalysis MIS2502 Data Analytics Adapted from Tan Steinbach and Kumar 2004 Introduction to Data Mining httpwwwuserscsumnedukumardmbook What is Association Mining ID: 155757

milk diapers support beer diapers milk beer support coke association items bread itemset rule confidence eggs3milk 4bread rules checking

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Association" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Association rule discovery(Market basket-analysis)

MIS2502

Data Analytics

Adapted from Tan, Steinbach, and Kumar (2004).

Introduction to Data Mining

.

http://www-users.cs.umn.edu/~kumar/dmbook/Slide2

What is Association Mining?Discovering interesting relationships between variables in

large databases

(http://

en.wikipedia.org/wiki/Association_rule_learning)

Find out which items predict the occurrence of other items

Also known as “affinity analysis” or “market basket” analysisSlide3

Examples of Association MiningMarket basket analysis/affinity analysis

What products are bought together?

Where to place items on grocery store shelves?

Amazon’s recommendation engine

“People who bought this product also bought…”

Telephone calling patternsWho do a set of people tend to call most often?Social network analysisDetermine who you “may know”Slide4

Market-Basket AnalysisLarge set of itemse.g., things sold in a supermarket

Large set of baskets

e.g., things one customer buys in one visit

Supermarket chains keep terabytes of this data

Informs store layout

Suggests “tie-ins”Place spaghetti sauce in the pasta aislePut diapers on sale and raise the price of beerSlide5

Market-Basket Transactions

Basket

Items

1

Bread,

Milk

2

Bread,

Diapers,

Beer, Eggs

3

Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke

Association Rules from these transactions

X

 Y

(antecedent  consequent)

{Diapers}

 {Beer},

{Milk

, Bread}

 {Diapers}

{

Beer

,

Bread}

 {Milk},

{Bread}

 {Milk, Diapers

}Slide6

Core idea: The itemset

Itemset

:

A group of items of interest

{Milk, Beer, Diapers}

This itemset is a “3 itemset” because itcontains…3 items!An association rule expresses related itemsets

X

 Y, where X and Y are two

itemsets

{Milk, Diapers}

 {Beer} means

“when you have milk and diapers, you also have beer)BasketItems1Bread, Milk2Bread, Diapers, Beer, Eggs3

Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke Slide7

SupportSupport count (

)

Frequency of occurrence of an

itemset

{Milk, Beer, Diapers} = 2 (

i.e., it’s in baskets 4 and 5

)

Support (s)

Fraction of transactions that contain all

itemsets

in the relationship X  Ys({Milk, Diapers, Beer}) = 2/5 = 0.4You can calculate support for both X and Y separatelySupport for X = 3/5 = 0.6; Support for Y = 3/5 = 0.6BasketItems1Bread, Milk2Bread, Diapers,

Beer, Eggs3Milk, Diapers, Beer, Coke 4Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke

X

YSlide8

Confidence

Confidence

is the strength of the

association

Measures how often items in Y appear in

transactions that contain X

Basket

Items

1

Bread,

Milk

2Bread, Diapers, Beer, Eggs3Milk, Diapers, Beer, Coke 4

Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke This says 67% of the times when you have milk and diapers in the itemset you also have beer!

c must be between

0 and 1

1 is a complete association

0 is no associationSlide9

Some sample rules

Association Rule

Support (s)

Confidence (c)

{

Milk,Diapers}

{Beer}

2/5

= 0.4

2/3 = 0.67

{Milk,Beer}  {Diapers} 2/5 = 0.42/2 = 1.0{Diapers,Beer}  {Milk} 2/5 = 0.42/3 = 0.67{Beer}  {Milk,Diapers}2/5 = 0.42/3 = 0.67{Diapers}  {Milk,Beer}

2/5 = 0.42/4 = 0.5{Milk}  {Diapers,Beer} 2/5 = 0.42/4 = 0.5BasketItems1Bread, Milk2Bread, Diapers,

Beer, Eggs

3

Milk,

Diapers,

Beer

,

Coke

4

Bread,

Milk,

Diapers,

Beer

5

Bread,

Milk,

Diapers

, Coke

All the above rules are binary partitions of the same

itemset

:

{

Milk,

Diapers,

Beer}Slide10

But don’t blindly follow the numbersRules originating from the same

itemset

(X

Y) will

have identical support Since the total elements (XY) are always the sameBut they can have different confidenceDepending on what is contained in X and Y

High confidence

suggests a strong

association

But this can be deceptive

Consider {Bread}

{Diapers}Support for the total itemset is 0.6 (3/5)And confidence is 0.75 (3/4) – pretty highBut is this just because both are frequently occurring items (s=0.8)?You’d almost expect them to show up in the same baskets by chanceSlide11

LiftTakes into account how co-occurrence differs from what is expected by chancei.e., if items were selected independently from one another

Based on the support metric

Support for total

itemset

X and Y

Support for X times support for YSlide12

Lift ExampleWhat’s the lift of the association rule

{Milk, Diapers}

 {Beer}

So X = {Milk, Diapers} and Y = {Beer}

s({Milk, Diapers, Beer}) = 2/5 = 0.4

s({Milk, Diapers}) = 3/5 = 0.6s({Beer}) = 3/5 = 0.6

So

Basket

Items

1

Bread,

Milk2Bread, Diapers, Beer, Eggs3Milk, Diapers, Beer, Coke 4

Bread, Milk, Diapers, Beer5Bread, Milk, Diapers, Coke

When Lift > 1, the occurrence of

X

Y together is more likely than what you would expect by chanceSlide13

Another example

Checking Account

Savings Account

No

Yes

No

500

3500

4000

Yes

1000

5000

6000

10000

Are people more likely to have a checking account if they have a savings account?

Support ({Savings}

{Checking}) = 5000/10000 = 0.5

Support ({Savings}) = 6000/10000 = 0.6

Support ({Checking}) = 8500/10000 = 0.85

Confidence ({Savings}

{Checking}) = 5000/6000 = 0.83

Answer: No

In fact, it’s slightly less than what you’d expect by chance!Slide14

Selecting the rulesWe know how to calculate the measures for each rule

Support

Confidence

Lift

Then we set up

thresholds for the minimum rule strength we want to acceptFor support – called minsupFor confidence – called minconfSlide15

But this can be overwhelmingImagine all the combinations possiblein your local grocery store

Every product matched with every

combination of other products

Tens of thousands of possible rule

combinations

So where do you start?Slide16

Once you are confident in a rule, take action

{Milk, Diapers}

 {Beer}