/
Recommender Systems and Collaborative Filtering Recommender Systems and Collaborative Filtering

Recommender Systems and Collaborative Filtering - PowerPoint Presentation

eliza
eliza . @eliza
Follow
65 views
Uploaded On 2023-10-29

Recommender Systems and Collaborative Filtering - PPT Presentation

Introduction to Recommender Systems Recommender systems The task Customer W 2 Slides adapted from Jure Leskovec Plays an Ella Fitzgerald song What should we recommend next Thomas Quella Wikimedia Commons ID: 1026754

leskovec item adapted user item leskovec user adapted jure massive mining datasets ullman rajaraman cs246 items collaborative rating recommender

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Recommender Systems and Collaborative Fi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Recommender Systems and Collaborative FilteringIntroduction to Recommender Systems

2. Recommender systems: The taskCustomer W2Slides adapted from Jure LeskovecPlays an Ella Fitzgerald songWhat should we recommend next?Thomas QuellaWikimedia Commons

3. Recommendations 5/12/213ItemsSearchRecommendationsProducts, web sites, blogs, news items, …Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

4. Types of RecommendationsEditorial and hand curatedList of favoritesLists of “essential” itemsSimple aggregatesTop 10, Most Popular, Recent UploadsTailored to individual usersAmazon, Netflix, Apple Music…5/12/214Today's classSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

5. Knowing how personalized recommendations workRelevant for building practical news or product recommenders.

6. Relevant for understanding how misinformation spreads5/12/216QAnon Supporters And Anti-VaxxersAre Spreading a Hoax That Bill GatesCreated the CoronavirusIt has no basis in reality, but that hasn't slowed its spread across Facebook and Twitter TheGuardianLas Vegas survivors furious as YouTubepromotes clips calling shooting a hoax'Fiction is outperformingreality': how YouTube's algorithm distorts truthHow YouTube Drives People to the Internet's Darkest CornersGoogle's video site often recommends divisive or misleading material, despite recent changes designed to fix the problems

7. Formal ModelX = set of UsersS = set of ItemsUtility function u: X × S  RR = set of ratingsR is a totally ordered sete.g., 1-5 stars, real number in [0,1]5/12/217Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

8. Utility Matrix5/12/218Harry PotterTwilightStar WarsAnitaBeyonceCalvinDavidSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

9. Key ProblemsGathering “known” ratings for matrixHow to collect the data in the utility matrixExtrapolate unknown ratings from known onesMainly interested in high unknown ratingsWe are not interested in knowing what you don’t like but what you likeEvaluating extrapolation methodsHow to measure performance of recommendation methods5/12/219Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

10. (1) Gathering RatingsExplicitAsk people to rate itemsDoesn’t work well in practice – people can't be botheredCrowdsourcing: Pay people to label itemsImplicitLearn ratings from user actionsE.g., purchase (or watch video, or read article) implies high rating5/12/2110Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

11. (2) Extrapolating UtilitiesKey problem: Utility matrix U is sparseMost people have not rated most itemsThe "Cold Start" Problem: New items have no ratingsNew users have no history5/12/2111Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

12. (2) Extrapolating UtilitiesThree approaches to recommender systems:Content-basedCollaborative FilteringLatent factor (Neural embedding) based 5/12/2112This lecture!Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

13. Content-based vs. Collaborative FilteringCustomer DPlays Ella FitzgeraldPlays Louis ArmstrongCustomer WPlays Ella Fitzgerald What should we recommend next?13Slides adapted from Jure LeskovecSuggest Louis ArmstrongDatabaseElla Fitzgerald: Jazz, Mid-20th century, vocal legend, famous duets, …Louis Armstrong: Jazz, Mid-20th century, vocal legend, famous duets, …Content-based Collaborative filteringPhotographer: Paul Stafford for www.travelmag.com https://www.flickr.com/photos/113306963@N05/33886542421Thomas QuellaWikimedia Commons

14. Recommender Systems and Collaborative FilteringIntroduction to Recommender Systems

15. Recommender Systems and Collaborative FilteringContent-based Recommender Systems

16. Content-based RecommendationsMain idea: Recommend items to customer x similar to previous items rated highly by xMovie recommendationsRecommend movies with same actor(s), director, genre, …Websites, blogs, newsRecommend other sites with similar types or words5/12/2116Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

17. Plan of Action5/12/2117likesItem profilesRedCirclesTrianglesUser profilematchrecommendbuildSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

18. Item ProfilesFor each item, create an item profileProfile is a set (vector) of featuresMovies: genre, director, actors, year…Text: Set of “important” words in documentHow to pick important features?TF-IDF (Term frequency * Inverse Doc Frequency)For example use all words whose tf-idf > threshold, normalized for document length5/12/2118Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

19. But what if we want to have real or ordinal features too? Content-based Item Profiles5/12/2119MelissaMcCarthyJohnnyDeppMovie XMovie Y0110110111010110ActorAActorB…PirateGenreSpyGenreComicGenreSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

20. For example "average rating"Maybe we want a scaling factor α between binary and numeric featuresContent-based Item Profiles5/12/2120MelissaMcCarthyJohnnyDeppMovie XMovie Y011011013110101104ActorAActorB…AvgRatingPirateGenreSpyGenreComicGenreSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

21. Scaling factor α between binary and numeric featuresCosine(Movie X, Movie Y) = α = 1: 0.82 α = 2: 0.94 α = 0.5: 0.69 Content-based Item Profiles21MelissaMcCarthyJohnnyDeppMovie XMovie Y011011013α110101104αActorAActorB…AvgRatingPirateGenreSpyGenreComicGenreSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

22. User ProfilesWant a vector with the same components/dimensions as itemsCould be 1s representing user purchasesOr arbitrary numbers from a ratingUser profile is aggregate of items:Weighted average of rated item profiles5/12/2122Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

23. Sample user profileItems are moviesUtility matrix has 1 if user has seen movie20% of the movies user U has seen have Melissa McCarthyU[“Melissa McCarthy”] = 0.2MelissaMcCarthyUser U0.2.00500…ActorAActorB…Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

24. PredictionUsers and items have the same dimensions!So just recommend the items whose vectors are most similar to the user vector!Given user profile x and item profile i, estimate  5/12/2124Movie i0110…0.2.005000User xMelissaMcCarthyActorAActorB…Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

25. Pros: Content-based Approach+: No need for data on other usersNo user sparsity problems+: Able to recommend to users with unique tastes+: Able to recommend new & unpopular itemsNo first-rater problem+: Able to provide explanationsJust list the content-features that caused an item to be recommended5/12/2125Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

26. Cons: Content-based Approach– Finding the appropriate features is hardE.g., images, movies, music– Recommendations for new usersHow to build a user profile?– OverspecializationNever recommends items outside user's content profilePeople might have multiple interestsUnable to exploit quality judgments of other users5/12/2126Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

27. Recommender Systems and Collaborative FilteringContent-based Recommender Systems

28. Recommender Systems and Collaborative FilteringCollaborative Filtering: User-User

29. Collaborative filteringInstead of using content features of items to determine what to recommend Find similar users and recommend items that they like!

30. Collaborative FilteringVersion 1: "User-User" Collaborative FilteringConsider user xand unrated item iFind set N of other users whose ratings are “similar” to x’s ratingsEstimate x’s ratings for i based on ratings for iof users in N5/12/2130xNSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

31. Collaborative filteringFind similar users and recommend items that they like:Represent users by their rows in the utility matrixTwo users are similar if their vectors are similar!Harry PotterTwilightStar Wars

32. Finding Similar UsersLet rx be the vector of user x’s ratingsCosine similarity measuresim(x, y) = cos(rx, ry) = Problem: This representation leads to unintuitive results 5/12/2132rx = [*, _, _, *, ***]ry = [*, _, **, **, _]rx = {1, 0, 0, 1, 3}ry = {1, 0, 2, 2, 0}Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

33. Problems with raw utility matrix cosineIntuitively we want: sim(A, B) > sim(A, C)Harry PotterTwilightStar Warssim(A,B) =sim(A,C) =Yes, 0.380 > 0.322 But only barely works…

34. Problem with raw cosineProblem with cosine:C really loves SWA hates SWB just hasn’t seen itAnother problem: we’d like to normalize the ratersD rated everything the same; not very useful

35. Mean-Centered Utility Matrix:subtract the means of each rowNow a 0 means no informationAnd negative ratings means viewers with opposite ratings will have vectors in opposite directions!

36. Modified Utility Matrix:subtract the means of each rowCos(A,B) = Cos(A,C) = Now A and C are (correctly) way further apart than A,B

37. Terminological Note: subtracting the mean is mean-centering, not normalizing(normalizing is dividing by a norm to turn something into a probability), but the textbook (and common usage) sometimes overloads the term “normalize”

38. Finding similar users with overlapping-item mean-centeringLet rx be the vector of user x’s ratingsrx = {1, 0, 0, 1, 3}ry = {1, 0, 2, 2, 0}Mean-centering: For each user x, let be mean of rx (ignoring missing values) = (1 + 1 + 3)/3 = 5/3 = (1 + 2 + 2)/3 = 5/3Subtract this average from each of their ratings (but do nothing to the "missing values"; they stay "null").mean centered rx = {-2/3, 0, 0, -2/3, 4/3}One new idea: Keep only items they both rate (unlike 2 slides ago)rx = {-2/3, 0, 0, -2/3, 4/3} ry = {-2/3, 0, 1/3, 1/3, 0} rx = {-2/3, -2/3} ry = {-2/3, 1/3}Now take cosine:Now compute cosine between user vectorscos([-2/3, -2/3], [-2/3, 1/3]) 5/12/2138rx = [*, _, _, *, ***]ry = [*, _, **, **, _]Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

39. Mean-centered overlapping-item cosine similarityLet rx be the vector of user x’s ratings,and be its mean (ignoring missing values)Instead of basic cosine similarity measuresim(x, y) = cos(rx, ry) = Mean-centered overlapping-item cosine similaritySxy = items rated by both users x and y 39 (Variant of Pearson correlation)Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

40. Rating PredictionsFrom similarity metric to recommendations for an unrated item i:Let rx be the vector of user x’s ratingsLet N be the set of k users most similar to x who have rated item iPrediction for item i of user x:Rate i as the mean of what k-people-like-me rated i Even better: Rate i as the mean weighted by their similarity to me …Many other tricks possible… 40Shorthand:  Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

41. Recommender Systems and Collaborative FilteringCollaborative Filtering: User-User

42. Recommender Systems and Collaborative FilteringCollaborative Filtering: Item-Item

43. Collaborative Filtering Version 2:Item-Item Collaborative FilteringSo far: User-user collaborative filteringAlternate view that often works better: Item-itemFor item i, find other similar itemsEstimate rating for item i based on ratings for those similar itemsCan use same similarity metrics and prediction functions as in user-user model"Rate i as the mean of my ratings for other items, weighted by their similarity to i"N(i;x)…set of items rated by x and similar to isij… similarity of items i and jrxj…rating of user x on item jSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

44. Item-Item CF (|N|=2)5/12/214412111098765432145531131244525343214232454245224345423316usersmovies- unknown rating- rating between 1 to 5

45. Item-Item CF (|N|=2)5/12/2145121110987654321455? 31131244525343214232454245224345423316users- estimate rating of movie 1 by user 5moviesSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

46. Item-Item CF (|N|=2)5/12/2146121110987654321455? 31131244525343214232454245224345423316usersNeighbor selection:Identify movies similar to movie 1, rated by user 5movies1.00..?....sim(1,m)Here we use mean centered item-overlap cosine as similarity:1) Subtract mean rating mi from each movie i between rows2) Compute (item-overlapping) cosine similarities

47. Item-Item CF (|N|=2)5/12/2147121110987654321455? 31131244525343214232454245224345423316usersNeighbor selection:Identify movies similar to movie 1, rated by user 5moviesHere we use mean centered item-overlap cosine as similarity:1) Subtract mean rating mi from each movie i2) Compute (item-overlapping) cosine similarities between rowsSubtract mean rating mi from each movie i m1 = (1+3+5+5+4)/5 = 18/5Showing computation only for #3 and #6

48. Item-Item CF (|N|=2)5/12/21481211109876543212/57/57/5? -3/5-13/5131244522010-1-21-1324542452243457/5-3/52/52/5-8/56usersNeighbor selection:Identify movies similar to movie 1, rated by user 5movies1.00..?....?sim(1,m)Here we use mean centered item-overlap cosine as similarity:1) Subtract mean rating mi from each movie i2) Compute (item-overlapping) cosine similarities between rows

49. Compute Cosine Similarity:For rows 1 and 3, they both have values for users 1, 9 and 11.For rows 1 and 6, they both have values for users 1, 3 and 11.

50. Item-Item CF (|N|=2)5/12/2150121110987654321455? 31131244525343214232454245224345423316usersCompute similarity weights:s1,3=.658, s1,6=.768 (we compute s1,2, s1,4, s1,5 too; let's assume those are smaller)moviessim(1,m)1.000...658.....768Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

51. Item-Item CF (|N|=2)Approximate rating with weighted mean5/12/21511211109876543214552.5431131244525343214232454245224345423316usersPredict by taking weighted average:r1,5 = (0.658*2 + 0.768*3) / (0.658+0.768) = 2.54movies sim(1,m)1.000...658.....768Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

52. Item-Item vs. User-User5/12/2152In practice, item-item often works better than user-userWhy? Items are simpler, users have multiple tastes(People are more complex than objects)Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

53. Pros/Cons of Collaborative Filtering+ Works for any kind of itemNo feature selection needed- Cold Start:Need enough users in the system to find a match- Sparsity: The user/ratings matrix is sparseHard to find users that have rated the same items- First rater: Cannot recommend an item that has not been previously rated- Popularity bias: Cannot recommend items to someone with unique taste Tends to recommend popular items- Ethical and social issues: Can lead to filter bubbles and radicalization spirals53Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

54. Recommender Systems and Collaborative FilteringCollaborative Filtering: Item-Item

55. Recommender Systems and Collaborative FilteringSimplified item-item similarity computation for our tiny PA6 dataset

56. Simplified item-item for our tiny PA6 datasetFirst, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike)12111098765432145531131244525343214232454245224345423316usersmovies

57. Simplified item-item for our tiny PA6 datasetFirst, assume you've converted all the values to +1 (like), 0 (no rating)-1 (dislike)1211109876543211111-111-1-111121111-1-11-13-1111-141-1-111151-111-16usersmovies

58. Simplified item-item for our tiny PA6 datasetAssume you've binarized, i.e. converted all the values to +1 (like), 0 (no rating) -1 (dislike)For this binary case, some tricks that the TAs recommend:Don't mean-center users, just keep the raw +1,0,-1Don't normalize (i.e. don't divide the product by the sum)i.e., instead of this:Just do this:Don't use mean-centered item-overlap cosine to compute sijJust use cosine sij… similarity of items i and jrxj…rating of user x on item jN(i;x)…set of items rated by x

59. Simplified item-item for our tiny PA6 dataset1. binarize, i.e. convert all values to +1 (like), 0 (no rating) -1 (dislike)2. The user x gives you (say) ratings for 2 movies m1 and m2 rxj…rating of user x on item j3. For each movie i in the datasetWhere sij… cosine between vectors for movies i and j4. Recommend the movie i with max rxi 

60. Recommender Systems and Collaborative FilteringSimplified item-item similarity computation for our tiny PA6 dataset

61. Recommender Systems and Collaborative FilteringEvaluation and Implications

62. YouTube's Recommendation AlgorithmRepresent each video and user as an embeddingTrain a huge neural net classifier (softmax over millions of possible videos) to predict the next video the user will watchInput features:User's watch history (video ids)User's recent queries (word embeddings)Date, popularity, virality of videoLearn embeddings for videos and users in trainingCovington, Adams, Sargin 2016. Deep Neural Networks for YouTube Recommendations

63. Evaluation5/12/21631343554553 32225 2113 31moviesusersSlides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

64. Evaluation5/12/21Jure Leskovec, Stanford C246: Mining Massive Datasets641343554553 32??? 21?3 ?1Test Data Setusersmovies

65. Evaluating PredictionsCompare predictions with known ratingsRoot-mean-square error (RMSE)where is predicted, is the true rating of x on iRank Correlation: Spearman’s correlation between system’s and user’s complete rankings 5/12/2165Slides adapted from Jure Leskovec, CS246 and J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets

66. But is predicting watching the right loss function?

67. What could go wrong? Ethical and societal implications in recommendation engines.Milano, Silvia, Mariarosaria Taddeo, and Luciano Floridi. "Recommender systems and their ethical challenges." AI & SOCIETY 35, no. 4 (2020): 957-967.Spread of misinformation and propagandaFilter bubblesInappropriate or unethical contentOpacityViolating user privacy

68. What could go wrong? Ethical and societal implicationsPropaganda campaignsRussia Internet Research Agency (IRA) attack on the United States 2013-2018computational propaganda on YouTube, Facebook, Instagram, to misinform/polarize US voters.Goal: induce African American, Mexican American voters to boycott electionsHoward, Ganesh, Lioustiou. 2019. The IRA, Social Media, and Political Polarization in the United States, 2012-2018

69. Ethical and societal implications: Filter bubbles“I realized really fast that YouTube’s recommendation was putting people into filter bubbles,” Chaslot said. “There was no way out. If a person was into Flat Earth conspiracies, it was bad for watch-time to recommend anti-Flat Earth videos, so it won’t even recommend them.”How YouTube Drives People to the Internet's Darkest CornersGoogle's video site often recommends divisive or misleading material, despite recent changes designed to fix the problems

70. “The question before us is the ethics of leading people down hateful rabbit holes full of misinformation and lies at scale just because it works to increase the time people spend on the site – and it does work” – Zeynep Tufekci

71. Open research questionsWhat would algorithms look like that could recommend but also include these social costs?

72. Recommender Systems and Collaborative FilteringEvaluation and Implications