/
A Monte Carlo Algorithm for Cold Start Recommendation A Monte Carlo Algorithm for Cold Start Recommendation

A Monte Carlo Algorithm for Cold Start Recommendation - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
401 views
Uploaded On 2016-06-29

A Monte Carlo Algorithm for Cold Start Recommendation - PPT Presentation

1 Authors Yu Rong Xio Wen Hong Cheng Word Wide Web Conference 2014 Presented by Priagung Khusumanegara Table of Contents Problems Preliminary Concepts Random Walk On Bipartite Graph ID: 382373

users user random training user users training random walk target algorithm rating set carlo similarity number monte item matrix

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Monte Carlo Algorithm for Cold Start R..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Monte Carlo Algorithm for Cold Start Recommendation

1

Authors: Yu

Rong

,

Xio

Wen, Hong Cheng

Word Wide Web Conference 2014

Presented by: Priagung

KhusumanegaraSlide2

Table of Contents

Problems

Preliminary ConceptsRandom Walk On Bipartite Graph

Monte Carlo Algorithm

Experiments and Results

Conclusions

2Slide3

ProblemsData

sparsity problemArises from the phenomenon that users in general rate only limited number of items

.Cold-start problemWhen a new entity enters the system for the first time.

It usually includes 2 entities:

Items

Users

3Slide4

Introduction (Cont’d)

Random walk on bipartite graph of users and itemsIt simulates the preference propagation among users.Monte Carlo algorithm

It estimates the similarity between users.It takes a pre-computation approach and thus can efficiently compute the user similarity given any new user for rating prediction.

4Slide5

Problem DefinitionA user set

An item set

The set of items rate by

is denoted as

The set of users who have rated

is denoted as

The rating of user

on item

is denoted by

Matrix

of size

to represent the ratings between the user set

and the item set

 

Preliminary Concepts

5Slide6

Preliminary Concepts (Cont’d)

Definition 1.

(Rating Prediction)User-based collaborative filtering

It uses the similarity between users to make the rating prediction.

= predicted rating

= average rating of user

in

= average of all available ratings in matrix

= similarity degree of

user

to user

= rating of user

on item

 

6Slide7

Preliminary Concepts (Cont’d)

Definition 2. (Similarity Estimation)

Given the rating matrix and the set of items

rated by a target user

, estimate the similarity vector

of

from matrix

How to make predictions

for such a cold start user

who has limited rating information remains a big challenge.

 

7Slide8

Preliminary Concepts (Cont’d)Challenges and Intuitions

The essence of the cold start problem is the data sparsity.

The key issue to address the cold start problem is how to make the most use of the available rating information. Instead of considering only the users who have common rated items with the target user , we can utilize more data to estimate similarity based on the preference propagation

8Slide9

Preliminary Concepts (Cont’d)9Slide10

Random Walk on Bipartite Graph

What is Bipartite Graph?

A graph whose vertices can be divided into two disjoint sets

and

and every

edge

connects a vertex in

to one in

.

There exists an edge between user

and item

if

 

10

 Slide11

Random Walk Construction on Bipartite Graph

The random walk starts from and end at user nodes in

to obtain the stationary distribution over the users set.

They construct an even-length random walk process

on

from user to user to estimate the similarity vector for any single user.

R

andom

walk process

is comprised of two types of walks on

Type 1

From user to itemType 2  From item to another user 11Slide12

Random Walk Construction

Type 1

(From user

to item

)

where

Type 2

(From item

to user

)

where

Transition probability between two users based on length-2 walk

) =

 

12Slide13

Random Walk on Bipartite Graph

Based on those definition

of transition probability, the transition matrix between users can be denoted as

.

The random walk process can be defined as follows:

begins with the user

i.e ,

At each step this random walk terminates with probability

or makes the transition between two users according to the matrix

with probability 1 –

.

Each step contains two types of transitions:

A type 1 walk from a user to an item A type 2 walk from the item to another user

 

13Slide14

Monte Carlo Algorithm

Online Monte Carlo algorithmThe number of target users is small

To estimate the similarity vector for the target userMonte Carlo Algorithm with Pre-computation

The number of target users is very large, so the random walk simulation is very time-consuming

To build a model to estimate any target users’ similarity vector

14Slide15

Online Monte Carlo Algorithm

In

, the transition matrix

is:

(1)

is the initial distribution determined by the target user

, because

always starts from

.

Similarity vector for target user

(2)

According to (1) and (2), we have:

(3)

Algorithm 1. Simulate

runs of random walk

process

starting from the target user . Evaluate as the fraction of random walks which end at user

 

15Slide16

Monte Carlo Algorithm Pre-Computation

P =

(4

)

is a

transition vector from the target user

to training users,

is a

transition vector from the training users to target user

,

is the transition probability from target user

to himself,

is the transition matrix of the training user set.

Set

,

by approximation, to avoid jump back to the target user in a random walk

The approximation allows us to separate the training users from the target user.

This approximation is also reasonable in the sense that we don’t need the similarity of target user to himself for rating prediction 16Slide17

Monte Carlo Algorithm Pre-Computation

From (3), we can obtain the stationary distribution

:

is an

identity matrix.

According to (4), we can get a closed from of

:

Where

I is an

identity

matrix

 

17Slide18

Monte Carlo Algorithm Pre-Computation

Stationary distribution can be written as:

Since the first component of

is the similarity to the target user himself, we only need the last components of

, which correspond to the target user’s similarity

to the training users.

How to estimate

?

 

18Slide19

Monte Carlo Algorithm Pre-Computation

For all

= ,

the element

of matrix

Can be regarded as the average number of times that the random walk visits a user

given that this random walk

starts at user

. Denote

so

Thus we can propose an estimator based on complete path of the random walk

 

19Slide20

Monte Carlo Algorithm Pre-Computation

Algorithm 2. MC Complete Path Algorithm.

Pre-computation stage: Simulate the random walk

exactly

times from each training user. For any user

, evaluate

as the average number of visits to user

given the random walk starts from

denoted as

Similarity estimation stage

For any target user

, they calculate

.

by enumerating all the paths from to the training users. Then they estimate as

Rating prediction stage

For the target user

, the estimated rating on item

is

 

20Slide21

Theoretical Analysis

The key issue in Algorithm 2 is how many rounds to simulate for each training user in stage 1 to guarantee the estimation accuracy

Expectation and Variance of

Estimation of Simulation Round

 

21Slide22

Theoretical Analysis

(Expectation and Variance of

)

 

Let , be independent random variables distributed as

,

. The estimator produced by Algorithm 2 can be written as

For

define

Then the estimator can be rewritten as

 

22Slide23

Theoretical Analysis

(Expectation and Variance of

)

 

Assuming that all

’s are independent, we obtain

 

23Slide24

Theoretical Analysis

(Estimation of Simulation Round

)

 

Theorem 1:

If we run

rounds of random walk for each training user, then with probability

, the estimator

output by Algorithm 2 satisfies

Where

is the restart probability,

is the target user’s similarity to training user

, and

is

quantile

of the standard normal distribution

 

24Slide25

Theoretical Analysis

(Estimation of Simulation Round

)

 

Proof

.

Consider the confidence interval for

defined as

.

Since

is a sum of a large number of terms, the random variable

has approximately a standard normal distribution. Thus, they deduce

Which results in

 

25Slide26

Extensions

Parallel Implementation

Simulations are independent of each other, so easily parallelize pre-computation algorithm in a shared memory environmentEx:We have

processors and the number of training users in

The training users are evenly distributed across the processors and each processor is assigned

training users for the Monte Carlo simulation

 

26Slide27

Extensions (Cont’d)

Dynamic UpdatesA common scenario is that new users along with their ratings are added who can sever as additional training users.

Instead of re-computing the model from scratch, they treat a new training user as a target user and compute the stationary distribution based on the original model to approximate the new model.

For a small number of new training users, this approximation works

well.

27Slide28

Experiments

Real World data sets:

MovieLens-1MEpinions

Bookcrossing

Amazon

Yahoo ! Music

Note :

The

rating in the first four data sets are real number in range [1,5], while the ratings in Yahoo! Music are integers in the range [1,100

].

The statistics of those data sets

28Slide29

Experiments (Cont’d)Experimental Configuration

They perform the test using 4-fold cross validation to reduce influence of sampling.

For each test user, his ratings are split into two parts:Observed items

Held-out items

The ratings of the

observed items to predict the ratings of the held-out items

29Slide30

Experiments (Cont’d)They use the split ratio of 10 : 90, i.e., for any test user, they use 10% of the items rated by the user to predict the ratings of the remaining 90% items

The test users are divided into three subsets based

on the number of their observed items.

30Slide31

Experiments (Cont’d)

Evaluation metrics

Mean Absolute Error :

a

quantity used to measure how close

predictions are to the eventual outcomes.

Where

are the true rating and predict rating of the item

by user

respectively.

 

31Slide32

Experimental Results (Rating Prediction)

Note:

CDTF cannot terminate in 48 hours on amazon

MCCP :

Monte Carlo Complete Path Algorithm (restart probability

= 0.8 and the number of simulations

according to their theoretical analysis )

SVD++:

A

l

atent

factor model which combines the matrix factorization technique with implicit feedback from the users.CDTF: A generalized Cross Domain Triadic Factorization over the triadic relation user-item-domain.LFL: A latent feature log-linear model for the dyadic prediction task. 32Slide33

Experimental Results (Impact of Parameter)

Restart Probability

Using Movie Lens datasets

N = m and

from 0.1 to 1.0 linearly

When

increases from 0.1 to 0.9, the MEA becomes lower, which means a better recommendation performance

This result indicates that in reality, the influence of preference propagation for rating prediction decays quickly in a small number of hops.

When

= 1.0 the MEA increases again indicating a worse performance because when

= 1.0, there is no random walk simulation between training users to propagate the preference among them .

 

33Slide34

Experimental Results (Impact

of Parameter)

The number of simulation N:

= 0.8 and vary the number of simulation from

to

linearly

Their method performs already well even

round of simulation

More simulation rounds yield a little improvement on the prediction accuracy.

 

34Slide35

Experimental Results (Scalability Test)

The experiment is conducted on a Windows server with an Intel Xeon 2.4 GHz CPU and 384 GB

memory.

Implement in

Matlab

and C++.

For the testing process, their randomly select 2000 users in the test set, and set the split ratio as 5 : 95 to divide the ratings into

observed items

and

held-out items

.

35Slide36

Experimental Results (Handling Dynamic Updates)

To simulate the situation where new training users are added to the system, they divide the training set of the ML-1M data into 10 parts.

Part 1-5 as the new training set respectively and the remaining as the original training set.

The new data ratio, defined as the percentage of new training users, varies from 11.1% to 100%.

Incremental

: treat a new training user as a target user and compute the stationary distribution based on the original model to approximate the new model.

Re-computing:

combines the original training set and the new training set and re-runs the random walk simulation

.

36Slide37

Experimental Results (Handling Dynamic Updates)

MAE increases very little using the incremental computing approach, which demonstrates that the incremental approach can achieve almost the same result with the

recomputing

approach

The MAE of the incremental computing approach increases slightly with the increasing of the new data ratio. This is because the incremental approach simply treats the new training users as have no connections with each other, and thus ignores the preference propagation among them.

The prediction performance of the

recomputing

approach is not affected by the varying ratio of new training data, as it combines the original and new training sets (always equivalent to the 10 parts of training users) for model training.

37Slide38

Conclusion

To overcome the data sparsity issue, they deigned a random walk process on the bipartite graph to model the preference propagation among users.

They proposed a Monte Carlo Algorithm which can be efficiently applied for rating prediction on any new user.The nature of Monte Carlo simulation enables the parallel implementation of our algorithm

38