Making recommendations David Sheth Making Recommendations Why Recommend How to generate recommendations How to present recommendations Why Recommend Help people find what they want Can lead to more sales ID: 761527
Download Presentation The PPT/PDF document "Making recommendations David Sheth Makin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Making recommendations David Sheth
Making Recommendations Why Recommend How to generate recommendations How to present recommendations
Why Recommend Help people find what they want Can lead to more sales Can lead to higher satisfaction / fewer returns Leads to repeat visits
How to generate recommendations Gather the data Choose some recommenders to start with Generate Recommendations Evaluate the Recommenders
Gather the data Explicit ratings Users rating things they have seen/read/purchased Users liking/tweeting/pinning/posting. (Binary data) Implicit feedback Users clicking to view things P urchases Examination duration time Save to favorites Print
Sample data Movielens Many research papers based on the movielens dataset—10M ratings, 100K tags, 10K movies, 72K users. http :// grouplens.org/datasets/movielens/ Search for movielens dataset Book Crossing dataset 1M ratings, 271K books, 278K users http://www.informatik.uni-freiburg.de/~cziegler/BX / Search for book crossing dataset
Choose some recommenders (1/4): Non Personalized People who bought this also bought that Doesn’t matter what else we know about the person Spaghetti -> Sauce
Choose some recommenders (2/4): Content based Based on the attributes of an item Example for a book: Pride and Prejudice: Historical fiction, early 1800s, set in England, female protagonist, comic tone, female author Fellowship of the Rings: Fantasy fiction, male protagonist, set in the past, male author Requires some way to assign attributes to items. This can be a hard problem Most prominent example: Pandora. Built on the Music genome project: up to 450 attributes per song As a user rates or buys items, the recommendation engine learns what attributes a user likes and dislikes
Choose some recommenders (3/4): User-User Find people who have purchased or rated highly items similar to what our target user has purchased or rated highly See if they have purchased or rated highly anything the target user doesn’t already own Recommend the thing that user doesn’t own Example item 1 item 2 item 3 item 4 Bob 4 2 5 5 Sally 2 5 1 4 Jim 3 5 2 1Lucy 5 2 4 ????
Choose some recommenders (4/4): Item-Item Find items that are rated in a similar manner—these items are considered “similar” Find items that are similar to the items a user has already bought or rated highly Recommend those items Example: item 1 item 2 item 3 item 4 Harry 4 2 5 5 John 2 5 1 4 Lily 3 5 2 1Fred 5 2 ??? 5
Choosing prior to generating recommendations Look at the data you have Look at your budget Look at the size of your data Look at how quickly you need to provide recommendations
Process the data (1/4): Non P ersonalized Percent of people that bought A that also bought B Y ou will find bananas in most people’s cart, so it will be the recommended item for pretty much everything.
Process the data (1/4): Non Personalized Want to calculate the specific influence of A—i.e. how much more likely does buying A mean that a person will buy B.
Process the Data (2/4): Content based Want to look at all the attributes a user likes, and the compare that with the attributes of all the potential items. Consider the simplest case: We have determined that the user likes computer languages with 0.7 and Austin with 0.3 Possible meetups : Austin java (languages 0.8, Austin 0.2) Dallas Java (languages 1.0, Austin 0.0) Austin outdoors (languages 0.0, Austin 1.0) Can plot these as vectors
Process the Data (2/4): Content based
Process the data (2/4): Content based Cosine similarity In 2 dimensions, the smaller the angle between two vectors, the bigger the cosine.
Process the data (2/4): Content based If there are more attributes, we just treat that as vectors that have more dimensions -- i.e. we have 3, 10, or 100 dimensional vectors. C osine similarity between high dimensional vectors? (Dot product of user vector and the item vector) / (normalized user vector * normalized item vector)
Process the data (2/4): Content based Lenskit : A Open Source Recommendation Tool From Academia (University of Minnesota) Lets you add your own version of any part of the recommendation engine Comes with data structures that let you do recommendation based calculations
Process the data (3/4): User-User Find people who have purchased or rated highly items similar to what our target user has purchased or rated highly See if they have purchased or rated highly anything the target user doesn’t already own Recommend the thing that user doesn’t own Example item 1 item 2 item 3 item 4 Bob 4 2 5 5 Sally 2 5 1 4 James 3 5 2 1Lucy 5 2 4 ????
Process the data (3/4): User-User Determine how similar other users are to you: Can use cosine similarity again Others possible—Pearson similarity. Test and see what works best for your data/budget Select the closest n% or the closest n Select all the items from the similar users Calculate the average preference for that item among all the similar users Or weighted average—the more similar the person is to the user, the greater the weight. Return the items with the highest average
Process the data (3/4): User-User Ideal case: A class with a constructor which takes The rating data The algorithm for finding similarities The algorithm for selecting the closest (n% or nearest N) Method which takes a user, returns a list of recommendations Where can we find such a class?
Process the data (3/4): User-User Mahout From Apache Provides Machine learning libraries, focused on Clustering, Classification, and Recommendations. Provides Hadoop versions of some of the algorithms. Recommendation is to stay with non-Hadoop versions if your data is small enough
Process the data (4/4): Item-Item Find items that are rated in a similar manner—these items are considered “similar” Find items that are similar to the items a user has already bought or rated highly Recommend those items Example: item 1 item 2 item 3 item 4 Harry 4 2 5 5 John 2 5 1 4 Lily 3 5 2 1 Fred 5 2 ??? 5
Process the data (4/4): Item-Item Can use mahout for this as well: What if data is too big?
Process the data (4/4): Item-Item Steps Convert items to index Convert ratings to vector per user Build Item Vectors Calculate Item Similarity Calculate Item/User ratings Extract the top N recommendations Job itemIdIndex toUserVectors toItemVectors RowSimilarityJob Partial Multiply AggregrateAndRecommend
Analyze the data Both Lenskit and Mahout let you hold back part of the ratings, and then let you see if the recommender recommends the missing data
Choosing after generating recommendations Check the evaluation score Can also evaluate based on variety of recommendations—you can score this number, and blend it with the evaluation score Performance (to see if it meets your budget) Runtime characteristics (online/offline ) Usual findings: Item-Item more stable than User-User Item-Item generally provides better recommendations Non Personalized is fast Content based depends on data
Present the data No statistical terms Visualizations work well Histograms—bar for 1 star, bar for 2 stars, bar for 3 stars, bar for 4 stars Tables Complicated charts do not work well—i.e. graph based on similarity to the user. Helpful to explain “why” if you can do it in a non statistical way
Summary
Other items of interest Other recommenders Dimensionality reduction via Singular Value Decomposition (SVD) Slope recommenders Making the data better Use rating above user rating Tossing out newer users to avoid fraud Other fraud detection mechanisms
Resources Mahout In Action Lenskit Documentation—designed for students to pick up and use Mahout Documentation—assumes some recommendation background. Explaining Collaborative Filtering Recommendations, Jonathan L. Herlocker , Joseph A. Konstan , and John Riedll . CSCW '00 Proceedings of the 2000 ACM conference on Computer supported cooperative work, 2000 Introduction to Recommender System ( Coursera )—may be offered againACM Conference Series on Recommender Systems