/
CS 124/LINGUIST 180 CS 124/LINGUIST 180

CS 124/LINGUIST 180 - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
345 views
Uploaded On 2020-01-27

CS 124/LINGUIST 180 - PPT Presentation

CS 124LINGUIST 180 From Languages to Information Dan Jurafsky Stanford University Recommender Systems amp Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X ID: 773962

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS 124/LINGUIST 180" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CS 124/LINGUIST 180From Languages to Information Dan JurafskyStanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec

Recommender SystemsCustomer XBuys CD of MozartBuys CD of HaydnCustomer YDoes search on Mozart Recommender system suggests Haydn from data collected about customer X 2/19/19Jure Leskovec, Stanford C246: Mining Massive Datasets 2

Recommendations 2/19/19 Slides adapted from Jure Leskovec 3 Items Search Recommendations Products, web sites, blogs, news items, … Examples:

From Scarcity to AbundanceShelf space is a scarce commodity for traditional retailers Also: TV networks, movie theaters,…Web enables near-zero-cost dissemination of information about productsFrom scarcity to abundance 2/19/19 4 Slides adapted from Jure Leskovec

The Long Tail2/19/195 Source: Chris Anderson (2004) Slides adapted from Jure Leskovec

More choice requires: Recommendation engines!

We all need to understand how they work! How Into Thin Air made Touching the Void a bestseller: http://www.wired.com/wired/archive/12.10/tail.html Societal problems in the news: 2/19/19 7 https://www.theguardian.com/technology/2018/feb/02/how-youtubes-algorithm-distorts-truth

Types of RecommendationsEditorial and hand curatedList of favoritesLists of “essential” itemsSimple aggregatesTop 10, Most Popular, Recent Uploads Tailored to individual usersAmazon, Netflix, … 2/19/198 Today class Slides adapted from Jure Leskovec

Formal ModelX = set of CustomersS = set of ItemsUtility function u: X × S  RR = set of ratingsR is a totally ordered set e.g., 0-5 stars, real number in [0,1] 2/19/19 9 Slides adapted from Jure Leskovec

Utility Matrix 2/19/19 10 Avatar LOTR Matrix Pirates Alice Bob Carol David Slides adapted from Jure Leskovec

Key Problems(1) Gathering “known” ratings for matrixHow to collect the data in the utility matrix(2) Extrapolate unknown ratings from known onesMainly interested in high unknown ratings We are not interested in knowing what you don’t like but what you like(3) Evaluating extrapolation methodsHow to measure success/performance of recommendation methods 2/19/19 11 Slides adapted from Jure Leskovec

(1) Gathering RatingsExplicitAsk people to rate itemsDoesn’t work well in practice – people can’t be botheredCrowdsourcing: Pay people to label itemsImplicit Learn ratings from user actionsE.g., purchase implies high rating 2/19/1912 Slides adapted from Jure Leskovec

(2) Extrapolating UtilitiesKey problem: Utility matrix U is sparseMost people have not rated most itemsCold start: New items have no ratingsNew users have no historyThree approaches to recommender systems: Content-basedCollaborative FilteringLatent factor based CS246! 2/19/19 13 This lecture! Slides adapted from Jure Leskovec

Content-based Recommender Systems2/19/19Jure Leskovec, Stanford C246: Mining Massive Datasets14 Slides adapted from Jure Leskovec

Content-based RecommendationsMain idea: Recommend items to customer x similar to previous items rated highly by xExample: Movie recommendationsRecommend movies with same actor(s), director, genre, … Websites, blogs, newsRecommend other sites with “similar” content 2/19/19 15 Slides adapted from Jure Leskovec

Plan of Action2/19/1916 likes Item profiles Red Circles Triangles User profile match recommend build Slides adapted from Jure Leskovec

Item ProfilesFor each item, create an item profileProfile is a set (vector) of featuresMovies: author, genre, director, actors, year…Text: Set of “important” words in documentHow to pick important features?TF-IDF (Term frequency * Inverse Doc Frequency)Term … Feature Document … Item 2/19/19 17 Slides adapted from Jure Leskovec

If everything is 1 or 0 (indicator features)But what if we want to have real or ordinal features too? Content-based Item Profiles2/19/19 18 Melissa McCarthy Johnny Depp Movie X Movie Y 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 Actor A Actor B … Pirate Genre Spy Genre Comic Genre Slides adapted from Jure Leskovec

Maybe we want a scaling factor α between binary and numeric featuresContent-based Item Profiles2/19/19 19 MelissaMcCarthy Johnny Depp Movie X Movie Y 0 1 1 0 1 1 0 1 3 1 1 0 1 0 1 1 0 4 Actor A Actor B … Avg Rating Pirate Genre Spy Genre Comic Genre Slides adapted from Jure Leskovec

Maybe we want a scaling factor α between binary and numeric featuresOr maybe α=1Cosine(Movie X, Movie Y) =   Content-based Item Profiles 2/19/19 20 Melissa McCarthy Johnny Depp Movie X Movie Y 0 1 1 0 1 1 0 1 3α 1 1 0 1 0 1 1 0 4α Actor A Actor B … Avg Rating Pirate Genre Spy Genre Comic Genre Slides adapted from Jure Leskovec

User ProfilesWant a vector with the same components/dimensions as itemsCould be 1s representing user purchasesOr arbitrary numbers from a ratingUser profile is aggregate of items:Average(weighted?)of rated item profiles 2/19/19 21 Slides adapted from Jure Leskovec

Sample user profileItems are moviesUtility matrix has 1 if user has seen movie20% of the movies user U has seen have Melissa McCarthyU[“Melissa McCarthy”] = 0.2 Melissa McCarthy User U 0.2 .005 0 0 … Actor A Actor B …

PredictionUsers and items have the same dimensions!So just recommend the items whose vectors are most similar to the user vector! Given user profile x and item profile i, estimate   2/19/19 23 Slides adapted from Jure Leskovec Movie X 0 1 1 0 … 0.2 .005 0 0 0 User U Melissa McCarthy Actor A Actor B …

Pros: Content-based Approach+: No need for data on other usersNo cold-start or sparsity problems+: Able to recommend to users with unique tastes+: Able to recommend new & unpopular itemsNo first-rater problem+: Able to provide explanations Just list the content-features that caused an item to be recommended2/19/19 24 Slides adapted from Jure Leskovec

Cons: Content-based Approach– Finding the appropriate features is hardE.g., images, movies, music– Recommendations for new usersHow to build a user profile?– Overspecialization Never recommends items outside user’s content profilePeople might have multiple interests Unable to exploit quality judgments of other users2/19/19 25 Slides adapted from Jure Leskovec

Collaborative FilteringHarnessing quality judgments of other users

Collaborative FilteringVersion 1: "User-User" Collaborative FilteringConsider user xFind set N of other users whose ratings are “similar” to x’s ratings Estimate x’s ratings based on ratings of users in N 2/19/19 27 x N Slides adapted from Jure Leskovec

Finding Similar UsersLet rx be the vector of user x’s ratings Cosine similarity measuresim( x, y) = cos(rx, r y ) = Problem: Treats missing ratings as “negative” What do I mean?   2/19/19 28 r x = [*, _, _, *, ***] r y = [*, _, **, **, _] r x = {1, 0, 0, 1, 3} r y = {1, 0, 2, 2, 0} Slides adapted from Jure Leskovec

Utility MatrixIntuitively we want: sim(A, B) > sim( A, C)Cosine similarity: Yes, 0.386 > 0.322 But only barely works…Considers missing ratings as “negative”

Utility MatrixProblem with cosine: 0 acts like a negative reviewC really loves SW A hates SWB just hasn’t seen itAnother problem: we’d like to normalize for raters D rated everything the same; not very useful

Modified Utility Matrix:subtract the means of each row Now a 0 means no information And negative ratings means viewers with opposite ratings will have vectors in opposite directions!

Modified Utility Matrix:subtract the means of each row Cos(A,B) = Cos(A,C) = Now A and C are (correctly) way further apart than A,B

Fun fact:Cosine after subtracting meanTurns out to be the same as Pearson correlation coefficient!!!Cosine similarity is correlation when the data is centered at 0 Terminological Note: subtracting the mean is zero-centering, not normalizing (normalizing is dividing by a norm to turn something into a probability), but the textbook (and common usage) sometimes overloads the term “normalize” Slides adapted from Jure Leskovec

Finding Similar UsersLet rx be the vector of user x’s ratingsCosine similarity measuresim( x, y) = cos(rx , ry) = Problem: Treats missing ratings as “negative” Pearson correlation coefficient S xy = items rated by both users x and y   2/19/19 34 r x = [*, _, _, *, ***] r y = [*, _, **, **, _] r x , r y as points: r x = {1, 0, 0, 1, 3} r y = {1, 0, 2, 2, 0} r x , r y … avg. rating of x , y   Slides adapted from Jure Leskovec

Rating PredictionsFrom similarity metric to recommendations:Let rx be the vector of user x’s ratingsLet N be the set of k users most similar to x who have rated item iPrediction for item i of user x : Rate i as the mean of what k-people-like-me rated i Even better: Rate i as the mean weighted by their similarity to me … Many other tricks possible…   2/19/19 35 Shorthand:   Slides adapted from Jure Leskovec

Collaborative Filtering Version 2:Item-Item Collaborative FilteringSo far: User-user collaborative filteringAlternate view that often works better: Item-itemFor item i , find other similar itemsEstimate rating for item i based on ratings for similar items Can use same similarity metrics and prediction functions as in user-user model"Rate i as the mean of my ratings for other items, weighted by their similarity to i" 2/19/19 36 s ij … similarity of items i and j r xj … rating of user x on item i N( i;x ) … set of items rated by x similar to i Slides adapted from Jure Leskovec

Item-Item CF (|N|=2)2/19/1937 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users movies - unknown rating - rating between 1 to 5 Slides adapted from Jure Leskovec

Item-Item CF (|N|=2)2/19/1938 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users - estimate rating of movie 1 by user 5 movies Slides adapted from Jure Leskovec

Item-Item CF (|N|=2)2/19/1939 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users Neighbor selection: Identify movies similar to movie 1 , rated by user 5 movies 1.00 -0.18 0.41 -0.10 -0.31 0.59 sim (1,m) Here we use Pearson correlation as similarity: 1) Subtract mean rating m i from each movie i m 1 = (1+3+5+5+4)/5 = 3.6 row 1: [-2.6, 0, -0.6, 0, 0, 1.4, 0, 0, 1.4, 0, 0.4, 0] 2) Compute cosine similarities between rows Slides adapted from Jure Leskovec

Item-Item CF (|N|=2)2/19/1940 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 ? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users Compute similarity weights: s 1,3 =0.41, s 1,6 =0.59 movies 1.00 -0.18 0.41 -0.10 -0.31 0.59 sim (1,m) Slides adapted from Jure Leskovec

Item-Item CF (|N|=2)2/19/1941 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 2.6 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users Predict by taking weighted average: r 1,5 = (0.41*2 + 0.59*3) / (0.41+0.59) = 2.6 movies   Slides adapted from Jure Leskovec

Item-Item vs. User-User2/19/1942Slides adapted from Jure Leskovec In practice, item-item often works better than user-user Why? Items are simpler, users have multiple tastes (People are more complex than objects)

Simplified item-item for our homeworkFirst, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike) 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users movies

Simplified item-item for our homeworkFirst, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike) 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 -1 1 1 -1 -1 1 1 1 2 1 1 1 1 -1 -1 1 -1 3 -1 1 1 1 -1 4 1 -1 -1 1 1 1 5 1 -1 1 1 -1 6 users movies

Simplified item-item for our tiny PA6 datasetAssume you've binarized, i.e. converted all the values to +1 (like), 0 (no rating) -1 (dislike)For this binary case, some tricks that the TAs recommend:Don't mean-center users, just keep the raw +1,0,-1Don't normalize (i.e. don't divide the product by the sum) i.e., instead of this: Just do this: Don't use Pearson correlation to compute s ij Just use cosine   s ij … similarity of items i and j r xj … rating of user x on item j N( i;x ) … set of items rated by x

Simplified item-item for our tiny PA6 dataset1. binarize, i.e. convert all values to +1 (like), 0 (no rating) -1 (dislike)2. The user x gives you (say) ratings for 2 movies m1 and m2 3. For each movie i in the dataset Where s ij … cosine between vectors for movies i and j 4. Recommend the movie i with max r xi   r xj … rating of user x on item i

Pros/Cons of Collaborative Filtering+ Works for any kind of itemNo feature selection needed- Cold Start:Need enough users in the system to find a match- Sparsity: The user/ratings matrix is sparse Hard to find users that have rated the same items- First rater: Cannot recommend an item that has not been previously ratedNew items, Esoteric items - Popularity bias: Cannot recommend items to someone with unique taste Tends to recommend popular items 2/19/19 47 Slides adapted from Jure Leskovec

Hybrid MethodsImplement two or more different recommenders and combine predictionsPerhaps using a linear modelAdd content-based methods to collaborative filteringItem profiles for new item problem Demographics to deal with new user problem 2/19/1948 Slides adapted from Jure Leskovec

Evaluation2/19/19Jure Leskovec, Stanford C246: Mining Massive Datasets49 1 3 4 3 5 5 4 5 5 3 3 2 2 2 5 2 1 1 3 3 1 movies users Slides adapted from Jure Leskovec

Evaluation2/19/19Jure Leskovec, Stanford C246: Mining Massive Datasets50 1 3 4 3 5 5 4 5 5 3 3 2 ? ? ? 2 1 ? 3 ? 1 Test Data Set users movies

Evaluating PredictionsCompare predictions with known ratingsRoot-mean-square error (RMSE) where is predicted, is the true rating of x on i Rank Correlation : Spearman’s correlation between system’s and user’s complete rankings   2/19/19 51 Slides adapted from Jure Leskovec

Problems with Error MeasuresNarrow focus on accuracy sometimes misses the pointPrediction DiversityPrediction ContextIn practice, we care only to predict high ratings:RMSE might penalize a method that does well for high ratings and badly for others2/19/19 52 Slides adapted from Jure Leskovec

There’s No Data like More DataLeverage all the dataSimple methods on large data do bestAdd more datae.g., add IMDB data on genresMore data beats better algorithms 2/19/19 53Slides adapted from Jure Leskovec

Famous Historical Example:The Netflix PrizeTraining data100 million ratings, 480,000 users, 17,770 movies6 years of data: 2000-2005Test dataLast few ratings of each user (2.8 million)Evaluation criterion: root mean squared error (RMSE) Netflix Cinematch RMSE: 0.9514Dumb baseline does really well. For user u and movie m take the average ofThe average rating given by u on all rated moviesThe average of the ratings for movie m by all users who rated that movieCompetition2700+ teams $1 million prize for 10% improvement on Cinematch BellKor system won in 2009. Combined many factors Overall deviations of users/movies Regional effects Local collaborative filtering patterns Temporal biases 2/19/19 54 Slides adapted from Jure Leskovec

Summary on Recommendation SystemsThe Long TailContent-based SystemsCollaborative Filtering

State of the Art in Collaborative FilteringDimensionality reduction (SVD) techniques best performanceBut not clear if they are necessary at YouTube scale…

Open problem:ethical and societal implicationsFilter bubbles“I realized really fast that YouTube’s recommendation was putting people into filter bubbles,” Chaslot said. “There was no way out. If a person was into Flat Earth conspiracies, it was bad for watch-time to recommend anti-Flat Earth videos, so it won’t even recommend them.”

Youtube announcement 3 weeks ago"we’ll begin reducing recommendations of borderline content and content that could misinform users in harmful ways—such as videos promoting a phony miracle cure for a serious illness, claiming the earth is flat, or making blatantly false claims about historic events like 9/11.