/
Making recommendations David Sheth Making Recommendations  Why Recommend Making recommendations David Sheth Making Recommendations  Why Recommend

Making recommendations David Sheth Making Recommendations Why Recommend - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
342 views
Uploaded On 2019-10-31

Making recommendations David Sheth Making Recommendations Why Recommend - PPT Presentation

Making recommendations David Sheth Making Recommendations Why Recommend How to generate recommendations How to present recommendations Why Recommend Help people find what they want Can lead to more sales ID: 761527

user item items data item user data items recommendations process based rated similar find users recommenders highly content attributes

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Making recommendations David Sheth Makin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Making recommendations David Sheth

Making Recommendations Why Recommend How to generate recommendations How to present recommendations

Why Recommend Help people find what they want Can lead to more sales Can lead to higher satisfaction / fewer returns Leads to repeat visits

How to generate recommendations Gather the data Choose some recommenders to start with Generate Recommendations Evaluate the Recommenders

Gather the data Explicit ratings Users rating things they have seen/read/purchased Users liking/tweeting/pinning/posting. (Binary data) Implicit feedback Users clicking to view things P urchases Examination duration time Save to favorites Print

Sample data Movielens Many research papers based on the movielens dataset—10M ratings, 100K tags, 10K movies, 72K users. http :// grouplens.org/datasets/movielens/ Search for movielens dataset Book Crossing dataset 1M ratings, 271K books, 278K users http://www.informatik.uni-freiburg.de/~cziegler/BX / Search for book crossing dataset

Choose some recommenders (1/4): Non Personalized People who bought this also bought that Doesn’t matter what else we know about the person Spaghetti -> Sauce

Choose some recommenders (2/4): Content based Based on the attributes of an item Example for a book: Pride and Prejudice: Historical fiction, early 1800s, set in England, female protagonist, comic tone, female author Fellowship of the Rings: Fantasy fiction, male protagonist, set in the past, male author Requires some way to assign attributes to items. This can be a hard problem Most prominent example: Pandora. Built on the Music genome project: up to 450 attributes per song As a user rates or buys items, the recommendation engine learns what attributes a user likes and dislikes

Choose some recommenders (3/4): User-User Find people who have purchased or rated highly items similar to what our target user has purchased or rated highly See if they have purchased or rated highly anything the target user doesn’t already own Recommend the thing that user doesn’t own Example item 1 item 2 item 3 item 4 Bob 4 2 5 5 Sally 2 5 1 4 Jim 3 5 2 1Lucy 5 2 4 ????

Choose some recommenders (4/4): Item-Item Find items that are rated in a similar manner—these items are considered “similar” Find items that are similar to the items a user has already bought or rated highly Recommend those items Example: item 1 item 2 item 3 item 4 Harry 4 2 5 5 John 2 5 1 4 Lily 3 5 2 1Fred 5 2 ??? 5

Choosing prior to generating recommendations Look at the data you have Look at your budget Look at the size of your data Look at how quickly you need to provide recommendations

Process the data (1/4): Non P ersonalized Percent of people that bought A that also bought B Y ou will find bananas in most people’s cart, so it will be the recommended item for pretty much everything.  

Process the data (1/4): Non Personalized Want to calculate the specific influence of A—i.e. how much more likely does buying A mean that a person will buy B.  

 

Process the Data (2/4): Content based Want to look at all the attributes a user likes, and the compare that with the attributes of all the potential items. Consider the simplest case: We have determined that the user likes computer languages with 0.7 and Austin with 0.3 Possible meetups : Austin java (languages 0.8, Austin 0.2) Dallas Java (languages 1.0, Austin 0.0) Austin outdoors (languages 0.0, Austin 1.0) Can plot these as vectors

Process the Data (2/4): Content based

Process the data (2/4): Content based Cosine similarity In 2 dimensions, the smaller the angle between two vectors, the bigger the cosine.

Process the data (2/4): Content based If there are more attributes, we just treat that as vectors that have more dimensions -- i.e. we have 3, 10, or 100 dimensional vectors. C osine similarity between high dimensional vectors? (Dot product of user vector and the item vector) / (normalized user vector * normalized item vector)

Process the data (2/4): Content based Lenskit : A Open Source Recommendation Tool From Academia (University of Minnesota) Lets you add your own version of any part of the recommendation engine Comes with data structures that let you do recommendation based calculations

Process the data (3/4): User-User Find people who have purchased or rated highly items similar to what our target user has purchased or rated highly See if they have purchased or rated highly anything the target user doesn’t already own Recommend the thing that user doesn’t own Example item 1 item 2 item 3 item 4 Bob 4 2 5 5 Sally 2 5 1 4 James 3 5 2 1Lucy 5 2 4 ????

Process the data (3/4): User-User Determine how similar other users are to you: Can use cosine similarity again Others possible—Pearson similarity. Test and see what works best for your data/budget Select the closest n% or the closest n Select all the items from the similar users Calculate the average preference for that item among all the similar users Or weighted average—the more similar the person is to the user, the greater the weight. Return the items with the highest average

Process the data (3/4): User-User Ideal case: A class with a constructor which takes The rating data The algorithm for finding similarities The algorithm for selecting the closest (n% or nearest N) Method which takes a user, returns a list of recommendations Where can we find such a class?

Process the data (3/4): User-User Mahout From Apache Provides Machine learning libraries, focused on Clustering, Classification, and Recommendations. Provides Hadoop versions of some of the algorithms. Recommendation is to stay with non-Hadoop versions if your data is small enough

Process the data (4/4): Item-Item Find items that are rated in a similar manner—these items are considered “similar” Find items that are similar to the items a user has already bought or rated highly Recommend those items Example: item 1 item 2 item 3 item 4 Harry 4 2 5 5 John 2 5 1 4 Lily 3 5 2 1 Fred 5 2 ??? 5

Process the data (4/4): Item-Item Can use mahout for this as well: What if data is too big?

Process the data (4/4): Item-Item Steps Convert items to index Convert ratings to vector per user Build Item Vectors Calculate Item Similarity Calculate Item/User ratings Extract the top N recommendations Job itemIdIndex toUserVectors toItemVectors RowSimilarityJob Partial Multiply AggregrateAndRecommend

Analyze the data Both Lenskit and Mahout let you hold back part of the ratings, and then let you see if the recommender recommends the missing data

Choosing after generating recommendations Check the evaluation score Can also evaluate based on variety of recommendations—you can score this number, and blend it with the evaluation score Performance (to see if it meets your budget) Runtime characteristics (online/offline ) Usual findings: Item-Item more stable than User-User Item-Item generally provides better recommendations Non Personalized is fast Content based depends on data

Present the data No statistical terms Visualizations work well Histograms—bar for 1 star, bar for 2 stars, bar for 3 stars, bar for 4 stars Tables Complicated charts do not work well—i.e. graph based on similarity to the user. Helpful to explain “why” if you can do it in a non statistical way

Summary

Other items of interest Other recommenders Dimensionality reduction via Singular Value Decomposition (SVD) Slope recommenders Making the data better Use rating above user rating Tossing out newer users to avoid fraud Other fraud detection mechanisms

Resources Mahout In Action Lenskit Documentation—designed for students to pick up and use Mahout Documentation—assumes some recommendation background. Explaining Collaborative Filtering Recommendations, Jonathan L. Herlocker , Joseph A. Konstan , and John Riedll . CSCW '00 Proceedings of the 2000 ACM conference on Computer supported cooperative work,  2000 Introduction to Recommender System ( Coursera )—may be offered againACM Conference Series on Recommender Systems