/
 CS 124/LINGUIST 180 From  CS 124/LINGUIST 180 From

CS 124/LINGUIST 180 From - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
342 views
Uploaded On 2020-04-05

CS 124/LINGUIST 180 From - PPT Presentation

Languages to Information Dan Jurafsky Stanford University Recommender Systems amp Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of Mozart ID: 775888

jure item leskovec slides jure item leskovec slides adapted user users items ratings rating movie movies content rated based

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " CS 124/LINGUIST 180 From " is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS 124/LINGUIST 180From Languages to Information

Dan JurafskyStanford University

Recommender Systems & Collaborative Filtering

Slides adapted from Jure

Leskovec

Slide2

Recommender Systems

Customer XBuys CD of MozartBuys CD of Haydn

Customer YDoes search on Mozart Recommender system suggests Haydn from data collected about customer X

2/22/17

Jure Leskovec, Stanford C246: Mining Massive Datasets

2

Slide3

Recommendations

2/22/17

Slides adapted from Jure

Leskovec

3

Items

Search

Recommendations

Products, web sites,

blogs

, news items, …

Examples:

Slide4

From Scarcity to Abundance

Shelf space is a scarce commodity for traditional retailers Also: TV networks, movie theaters,…Web enables near-zero-cost dissemination of information about productsFrom scarcity to abundanceMore choice necessitates better filtersRecommendation enginesHow Into Thin Air made Touching the Void a bestseller: http://www.wired.com/wired/archive/12.10/tail.html

2/22/17

4

Slides adapted from Jure

Leskovec

Slide5

Sidenote: The Long Tail

2/22/17

5

Source: Chris Anderson (2004)

Slides adapted from Jure

Leskovec

Slide6

Physical vs. Online

2/22/17

Jure Leskovec, Stanford C246: Mining Massive Datasets

6

Read

http://www.wired.com/wired/archive/12.10/tail.html

to learn more!

Slides adapted from Jure

Leskovec

Slide7

Types of Recommendations

Editorial and hand curatedList of favoritesLists of “essential” itemsSimple aggregatesTop 10, Most Popular, Recent UploadsTailored to individual usersAmazon, Netflix, …

2/22/17

7

Today class

Slides adapted from Jure

Leskovec

Slide8

Formal Model

X = set of CustomersS = set of ItemsUtility function u: X × S  RR = set of ratingsR is a totally ordered sete.g., 0-5 stars, real number in [0,1]

2/22/17

8

Slides adapted from Jure

Leskovec

Slide9

Utility Matrix

2/22/17

9

Avatar

LOTR

Matrix

Pirates

Alice

Bob

Carol

David

Slides adapted from Jure

Leskovec

Slide10

Key Problems

(1) Gathering “known” ratings for matrixHow to collect the data in the utility matrix(2) Extrapolate unknown ratings from known onesMainly interested in high unknown ratingsWe are not interested in knowing what you don’t like but what you like(3) Evaluating extrapolation methodsHow to measure success/performance of recommendation methods

2/22/17

10

Slides adapted from Jure

Leskovec

Slide11

(1) Gathering Ratings

ExplicitAsk people to rate itemsDoesn’t work well in practice – people can’t be botheredCrowdsourcing: Pay people to label itemsImplicitLearn ratings from user actionsE.g., purchase implies high ratingWhat about low ratings?

2/22/17

11

Slides adapted from Jure

Leskovec

Slide12

(2) Extrapolating Utilities

Key problem: Utility matrix U is sparseMost people have not rated most itemsCold start: New items have no ratingsNew users have no historyThree approaches to recommender systems:Content-basedCollaborative FilteringLatent factor based

2/22/17

12

This lecture!

Slides adapted from Jure

Leskovec

Slide13

Content-based Recommender Systems

2/22/17

Jure Leskovec, Stanford C246: Mining Massive Datasets

13

Slides adapted from Jure Leskovec

Slide14

Content-based Recommendations

Main idea: Recommend items to customer x similar to previous items rated highly by xExample:Movie recommendationsRecommend movies with same actor(s), director, genre, …Websites, blogs, newsRecommend other sites with “similar” content

2/22/17

14

Slides adapted from Jure

Leskovec

Slide15

Plan of Action

2/22/17

15

likes

Item profiles

Red

Circles

Triangles

User profile

match

recommend

build

Slides adapted from Jure

Leskovec

Slide16

Item Profiles

For each item, create an item profileProfile is a set (vector) of featuresMovies: author, genre, director, actors, year…Text: Set of “important” words in documentHow to pick important features?TF-IDF (Term frequency * Inverse Doc Frequency)Term … FeatureDocument … Item

2/22/17

16

Slides adapted from Jure

Leskovec

Slide17

If everything is 1 or 0 (indicator features)But what if we want to have real or ordinal features too?

Content-based Item Profiles

2/22/17

17

MelissaMcCarthy

JohnnyDepp

Movie X

Movie Y

0110110111010110

ActorA

ActorB

PirateGenre

SpyGenre

ComicGenre

Slides adapted from Jure

Leskovec

Slide18

Maybe we want a scaling factor α between binary and numeric features

Content-based Item Profiles

2/22/17

18

MelissaMcCarthy

JohnnyDepp

Movie X

Movie Y

011011013110101104

ActorA

ActorB

AvgRating

PirateGenre

SpyGenre

ComicGenre

Slides adapted from Jure

Leskovec

Slide19

Maybe there is a scaling factor α between binary and numeric featuresOr maybe α=1Cosine(Movie X, Movie Y) =

 

Content-based Item Profiles

2/22/17

19

MelissaMcCarthy

JohnnyDepp

Movie X

Movie Y

011011013α110101104α

ActorA

ActorB

AvgRating

PirateGenre

SpyGenre

ComicGenre

Slides adapted from Jure

Leskovec

Slide20

User Profiles

Want a vector with the same components/dimensions as itemsCould be 1s representing user purchasesOr arbitrary numbers from a ratingUser profile is aggregate of items:Average(weighted?)of rated item profiles

2/22/17

20

Slides adapted from Jure

Leskovec

Slide21

Sample user profile

Items are moviesUtility matrix has 1 if user has seen movie20% of the movies user U has seen have Melissa McCarthyU[“Melissa McCarthy”] = 0.2

MelissaMcCarthy

User U

0.2.00500…

ActorA

ActorB

Slide22

Prediction

User and item vectors have the same components/dimensions!So just recommend the items whose vectors are most similar to the user vector!Given user profile x and item profile i, estimate

 

2/22/17

22

Slides adapted from Jure

Leskovec

Slide23

Pros: Content-based Approach

+: No need for data on other usersNo cold-start or sparsity problems+: Able to recommend to users with unique tastes+: Able to recommend new & unpopular itemsNo first-rater problem+: Able to provide explanationsCan provide explanations of recommended items by listing content-features that caused an item to be recommended

2/22/17

23

Slides adapted from Jure

Leskovec

Slide24

Cons: Content-based Approach

–: Finding the appropriate features is hardE.g., images, movies, music–: Recommendations for new usersHow to build a user profile?–: OverspecializationNever recommends items outside user’s content profilePeople might have multiple interestsUnable to exploit quality judgments of other users

2/22/17

24

Slides adapted from Jure

Leskovec

Slide25

Collaborative Filtering

Harnessing quality judgments of other users

Slide26

Collaborative FilteringVersion 1: "User-User" Collaborative Filtering

Consider user xFind set N of other users whose ratings are “similar” to x’s ratingsEstimate x’s ratings based on ratings of users in N

2/22/17

26

x

N

Slides adapted from Jure

Leskovec

Slide27

Finding Similar Users

Let rx be the vector of user x’s ratingsJaccard similarity measureProblem: Ignores the value of the rating Cosine similarity measuresim(x, y) = cos(rx, ry) = Problem: Treats missing ratings as “negative”

 

2/22/17

27

rx = [*, _, _, *, ***]ry = [*, _, **, **, _]

rx, ry as sets:rx = {1, 4, 5}ry = {1, 3, 4}

rx, ry as points:rx = {1, 0, 0, 1, 3}ry = {1, 0, 2, 2, 0}

Slides adapted from Jure

Leskovec

Slide28

Utility Matrix

Intuitively we want:

sim(

A

,

B

) > sim(

A

,

C

)

Jaccard

similarity:

1/5

<

2/4

Cosine similarity:

0.386

>

0.322

Considers missing ratings as “negative

Slide29

Utility Matrix

Problem with cosine: 0 acts like a negative review

C really loves SW

A hates SW

B just hasn’t seen it

Another problem: we’d like to normalize for raters

D rated everything the same; not very useful

Slide30

Modified Utility Matrix:subtract the means of each row

Now a 0 means no information

And negative ratings means viewers with opposite ratings will have vectors in opposite directions!

Slide31

Modified Utility Matrix:subtract the means of each row

Cos(A,B) =

Cos(A,C) =

Now A and C are (correctly) way further apart than A,B

Slide32

Cosine after subtracting mean

Turns out to be the same as Pearson correlation coefficient!!!Cosine similarity is correlation when the data is centered at 0Terminological Note: subtracting the mean is zero-centering, not normalizing (normalizing is dividing by a norm to turn something into a probability), but the textbook (and common usage) sometimes overloads the term “normalize”

Slides adapted from Jure

Leskovec

Slide33

Finding Similar Users

Let rx be the vector of user x’s ratingsCosine similarity measuresim(x, y) = cos(rx, ry) = Problem: Treats missing ratings as “negative”Pearson correlation coefficientSxy = items rated by both users x and y

 

2/22/17

33

rx = [*, _, _, *, ***]ry = [*, _, **, **, _]

rx, ry as points:rx = {1, 0, 0, 1, 3}ry = {1, 0, 2, 2, 0}

rx, ry … avg.rating of x, y

 

Slides adapted from Jure

Leskovec

Slide34

Rating Predictions

From similarity metric to recommendations:Let rx be the vector of user x’s ratingsLet N be the set of k users most similar to x who have rated item iPrediction for item i of user x:Many other tricks possible…

 

2/22/17

34

Shorthand:

 

Slides adapted from Jure

Leskovec

Slide35

Collaborative Filtering Version 2:Item-Item Collaborative Filtering

So far: User-user collaborative filteringAlternate view that often works better: Item-itemFor item i, find other similar itemsEstimate rating for item i based on ratings for similar itemsCan use same similarity metrics and prediction functions as in user-user model

2/22/17

35

s

ij

… similarity of items i and jrxj…rating of user x on item iN(i;x)…set of items rated by x similar to i

Slides adapted from Jure

Leskovec

Slide36

Item-Item CF (|N|=2)

2/22/17

36

12111098765432145531131244525343214232454245224345423316

users

movies

- unknown rating

- rating between 1 to 5

Slides adapted from Jure

Leskovec

Slide37

Item-Item CF (|N|=2)

2/22/17

37

121110987654321455? 31131244525343214232454245224345423316

users

- estimate rating of movie

1 by user 5

movies

Slides adapted from Jure

Leskovec

Slide38

Item-Item CF (|N|=2)

2/22/17

38

121110987654321455? 31131244525343214232454245224345423316

users

Neighbor selection:Identify movies similar to movie 1, rated by user 5

movies

1.00-0.180.41-0.10-0.310.59

sim(1,m)

Here we use Pearson correlation as similarity:1) Subtract mean rating mi from each movie i m1 = (1+3+5+5+4)/5 = 3.6 row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0]2) Compute cosine similarities between rows

Slides adapted from Jure

Leskovec

Slide39

Item-Item CF (|N|=2)

2/22/17

39

121110987654321455? 31131244525343214232454245224345423316

users

Compute similarity weights:s1,3=0.41, s1,6=0.59

movies

1.00-0.180.41-0.10-0.310.59

sim(1,m)

Slides adapted from Jure

Leskovec

Slide40

Item-Item CF (|N|=2)

2/22/17

40

1211109876543214552.631131244525343214232454245224345423316

users

Predict by taking weighted average:r1,5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6

movies

 

Slides adapted from Jure

Leskovec

Slide41

Item-Item vs. User-User

2/22/17

41

Slides adapted from Jure Leskovec

In practice,

item-item

often works better than user-user

Why?

Items are simpler, users have multiple tastes

Slide42

Simplified item-item for our homework

First, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike)

12111098765432145531131244525343214232454245224345423316

users

movies

Slide43

Simplified item-item for our homework

First, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike)

1211109876543211111-111-1-111121111-1-11-13-1111-141-1-111151-111-16

users

movies

Slide44

Simplified item-item for our tiny PA6 dataset

Assume you've binarized, i.e. converted all the values to +1 (like), 0 (no rating) -1 (dislike)For this binary case, some tricks that the TAs recommend:Don't mean-center users, just keep the raw +1,0,-1Don't normalize (i.e. don't divide the dot product by the sum)i.e., instead of this:Just do this:Don't use Pearson correlation to compute sijJust use cosine

 

s

ij

… similarity of items

i

and

j

r

xj

rating of user

x

on

item

j

N(

i;x

)

set of items rated by

x

Slide45

Simplified item-item for our tiny PA6 dataset

1. binarize, i.e. convert all values to +1 (like), 0 (no rating) -1 (dislike)2. The user x gives you (say) ratings for 2 movies m1 and m2 3. For each movie i in the datasetWhere sij… cosine between vectors for movies i and j4. Recommend the movie i with max rxi

 

r

xj

rating of user

x

on item

i

Slide46

Pros/Cons of Collaborative Filtering

+ Works for any kind of itemNo feature selection needed- Cold Start:Need enough users in the system to find a match- Sparsity: The user/ratings matrix is sparseHard to find users that have rated the same items- First rater: Cannot recommend an item that has not been previously ratedNew items, Esoteric items- Popularity bias: Cannot recommend items to someone with unique taste Tends to recommend popular items

2/22/17

46

Slides adapted from Jure

Leskovec

Slide47

Hybrid Methods

Implement two or more different recommenders and combine predictionsPerhaps using a linear modelAdd content-based methods to collaborative filteringItem profiles for new item problemDemographics to deal with new user problem

2/22/17

47

Slides adapted from Jure

Leskovec

Slide48

Evaluation

2/22/17

Jure Leskovec, Stanford C246: Mining Massive Datasets

48

1343554553 32225 2113 31

movies

users

Slides adapted from Jure Leskovec

Slide49

Evaluation

2/22/17

Jure Leskovec, Stanford C246: Mining Massive Datasets

49

1343554553 32??? 21?3 ?1

Test Data Set

users

movies

Slide50

Evaluating Predictions

Compare predictions with known ratingsRoot-mean-square error (RMSE) where is predicted, is the true rating of x on iRank Correlation: Spearman’s correlation between system’s and user’s complete rankings

 

2/22/17

50

Slides adapted from Jure

Leskovec

Slide51

Problems with Error Measures

Narrow focus on accuracy sometimes misses the pointPrediction DiversityPrediction ContextOrder of predictionsIn practice, we care only to predict high ratings:RMSE might penalize a method that does well for high ratings and badly for others

2/22/17

51

Slides adapted from Jure

Leskovec

Slide52

There’s No Data like Mo’ Data

Leverage all the dataSimple methods on large data do bestAdd more datae.g., add IMDB data on genresMore data beats better algorithms

2/22/17

52

Slides adapted from Jure

Leskovec

Slide53

Famous Historical Example:The Netflix Prize

Training data100 million ratings, 480,000 users, 17,770 movies6 years of data: 2000-2005Test dataLast few ratings of each user (2.8 million)Evaluation criterion: root mean squared error (RMSE) Netflix Cinematch RMSE: 0.9514Dumb baseline does really well. For user u and movie m take the average ofThe average rating given by u on all rated moviesThe average of the ratings for movie m by all users who rated that movieCompetition2700+ teams$1 million prize for 10% improvement on CinematchBellKor system won in 2009. Combined many factorsOverall deviations of users/movies Regional effectsLocal collaborative filtering patterns Temporal biases

2/22/17

53

Slides adapted from Jure

Leskovec

Slide54

Summary on Recommendation Systems

The Long Tail

Content-based Systems

Collaborative Filtering

Latent Factors