1 Authors Yu Rong Xio Wen Hong Cheng Word Wide Web Conference 2014 Presented by Priagung Khusumanegara Table of Contents Problems Preliminary Concepts Random Walk On Bipartite Graph ID: 382373
Download Presentation The PPT/PDF document "A Monte Carlo Algorithm for Cold Start R..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Monte Carlo Algorithm for Cold Start Recommendation
1
Authors: Yu
Rong
,
Xio
Wen, Hong Cheng
Word Wide Web Conference 2014
Presented by: Priagung
KhusumanegaraSlide2
Table of Contents
Problems
Preliminary ConceptsRandom Walk On Bipartite Graph
Monte Carlo Algorithm
Experiments and Results
Conclusions
2Slide3
ProblemsData
sparsity problemArises from the phenomenon that users in general rate only limited number of items
.Cold-start problemWhen a new entity enters the system for the first time.
It usually includes 2 entities:
Items
Users
3Slide4
Introduction (Cont’d)
Random walk on bipartite graph of users and itemsIt simulates the preference propagation among users.Monte Carlo algorithm
It estimates the similarity between users.It takes a pre-computation approach and thus can efficiently compute the user similarity given any new user for rating prediction.
4Slide5
Problem DefinitionA user set
An item set
The set of items rate by
is denoted as
The set of users who have rated
is denoted as
The rating of user
on item
is denoted by
Matrix
of size
to represent the ratings between the user set
and the item set
Preliminary Concepts
5Slide6
Preliminary Concepts (Cont’d)
Definition 1.
(Rating Prediction)User-based collaborative filtering
It uses the similarity between users to make the rating prediction.
= predicted rating
= average rating of user
in
= average of all available ratings in matrix
= similarity degree of
user
to user
= rating of user
on item
6Slide7
Preliminary Concepts (Cont’d)
Definition 2. (Similarity Estimation)
Given the rating matrix and the set of items
rated by a target user
, estimate the similarity vector
of
from matrix
How to make predictions
for such a cold start user
who has limited rating information remains a big challenge.
7Slide8
Preliminary Concepts (Cont’d)Challenges and Intuitions
The essence of the cold start problem is the data sparsity.
The key issue to address the cold start problem is how to make the most use of the available rating information. Instead of considering only the users who have common rated items with the target user , we can utilize more data to estimate similarity based on the preference propagation
8Slide9
Preliminary Concepts (Cont’d)9Slide10
Random Walk on Bipartite Graph
What is Bipartite Graph?
A graph whose vertices can be divided into two disjoint sets
and
and every
edge
connects a vertex in
to one in
.
There exists an edge between user
and item
if
10
Slide11
Random Walk Construction on Bipartite Graph
The random walk starts from and end at user nodes in
to obtain the stationary distribution over the users set.
They construct an even-length random walk process
on
from user to user to estimate the similarity vector for any single user.
R
andom
walk process
is comprised of two types of walks on
Type 1
From user to itemType 2 From item to another user 11Slide12
Random Walk Construction
Type 1
(From user
to item
)
where
Type 2
(From item
to user
)
where
Transition probability between two users based on length-2 walk
) =
12Slide13
Random Walk on Bipartite Graph
Based on those definition
of transition probability, the transition matrix between users can be denoted as
.
The random walk process can be defined as follows:
begins with the user
i.e ,
At each step this random walk terminates with probability
or makes the transition between two users according to the matrix
with probability 1 –
.
Each step contains two types of transitions:
A type 1 walk from a user to an item A type 2 walk from the item to another user
13Slide14
Monte Carlo Algorithm
Online Monte Carlo algorithmThe number of target users is small
To estimate the similarity vector for the target userMonte Carlo Algorithm with Pre-computation
The number of target users is very large, so the random walk simulation is very time-consuming
To build a model to estimate any target users’ similarity vector
14Slide15
Online Monte Carlo Algorithm
In
, the transition matrix
is:
(1)
is the initial distribution determined by the target user
, because
always starts from
.
Similarity vector for target user
(2)
According to (1) and (2), we have:
(3)
Algorithm 1. Simulate
runs of random walk
process
starting from the target user . Evaluate as the fraction of random walks which end at user
15Slide16
Monte Carlo Algorithm Pre-Computation
P =
(4
)
is a
transition vector from the target user
to training users,
is a
transition vector from the training users to target user
,
is the transition probability from target user
to himself,
is the transition matrix of the training user set.
Set
,
by approximation, to avoid jump back to the target user in a random walk
The approximation allows us to separate the training users from the target user.
This approximation is also reasonable in the sense that we don’t need the similarity of target user to himself for rating prediction 16Slide17
Monte Carlo Algorithm Pre-Computation
From (3), we can obtain the stationary distribution
:
is an
identity matrix.
According to (4), we can get a closed from of
:
Where
I is an
identity
matrix
17Slide18
Monte Carlo Algorithm Pre-Computation
Stationary distribution can be written as:
Since the first component of
is the similarity to the target user himself, we only need the last components of
, which correspond to the target user’s similarity
to the training users.
How to estimate
?
18Slide19
Monte Carlo Algorithm Pre-Computation
For all
= ,
the element
of matrix
Can be regarded as the average number of times that the random walk visits a user
given that this random walk
starts at user
. Denote
so
Thus we can propose an estimator based on complete path of the random walk
19Slide20
Monte Carlo Algorithm Pre-Computation
Algorithm 2. MC Complete Path Algorithm.
Pre-computation stage: Simulate the random walk
exactly
times from each training user. For any user
, evaluate
as the average number of visits to user
given the random walk starts from
denoted as
Similarity estimation stage
For any target user
, they calculate
.
by enumerating all the paths from to the training users. Then they estimate as
Rating prediction stage
For the target user
, the estimated rating on item
is
20Slide21
Theoretical Analysis
The key issue in Algorithm 2 is how many rounds to simulate for each training user in stage 1 to guarantee the estimation accuracy
Expectation and Variance of
Estimation of Simulation Round
21Slide22
Theoretical Analysis
(Expectation and Variance of
)
Let , be independent random variables distributed as
,
. The estimator produced by Algorithm 2 can be written as
For
define
Then the estimator can be rewritten as
22Slide23
Theoretical Analysis
(Expectation and Variance of
)
Assuming that all
’s are independent, we obtain
23Slide24
Theoretical Analysis
(Estimation of Simulation Round
)
Theorem 1:
If we run
rounds of random walk for each training user, then with probability
, the estimator
output by Algorithm 2 satisfies
Where
is the restart probability,
is the target user’s similarity to training user
, and
is
quantile
of the standard normal distribution
24Slide25
Theoretical Analysis
(Estimation of Simulation Round
)
Proof
.
Consider the confidence interval for
defined as
.
Since
is a sum of a large number of terms, the random variable
has approximately a standard normal distribution. Thus, they deduce
Which results in
25Slide26
Extensions
Parallel Implementation
Simulations are independent of each other, so easily parallelize pre-computation algorithm in a shared memory environmentEx:We have
processors and the number of training users in
The training users are evenly distributed across the processors and each processor is assigned
training users for the Monte Carlo simulation
26Slide27
Extensions (Cont’d)
Dynamic UpdatesA common scenario is that new users along with their ratings are added who can sever as additional training users.
Instead of re-computing the model from scratch, they treat a new training user as a target user and compute the stationary distribution based on the original model to approximate the new model.
For a small number of new training users, this approximation works
well.
27Slide28
Experiments
Real World data sets:
MovieLens-1MEpinions
Bookcrossing
Amazon
Yahoo ! Music
Note :
The
rating in the first four data sets are real number in range [1,5], while the ratings in Yahoo! Music are integers in the range [1,100
].
The statistics of those data sets
28Slide29
Experiments (Cont’d)Experimental Configuration
They perform the test using 4-fold cross validation to reduce influence of sampling.
For each test user, his ratings are split into two parts:Observed items
Held-out items
The ratings of the
observed items to predict the ratings of the held-out items
29Slide30
Experiments (Cont’d)They use the split ratio of 10 : 90, i.e., for any test user, they use 10% of the items rated by the user to predict the ratings of the remaining 90% items
The test users are divided into three subsets based
on the number of their observed items.
30Slide31
Experiments (Cont’d)
Evaluation metrics
Mean Absolute Error :
a
quantity used to measure how close
predictions are to the eventual outcomes.
Where
are the true rating and predict rating of the item
by user
respectively.
31Slide32
Experimental Results (Rating Prediction)
Note:
CDTF cannot terminate in 48 hours on amazon
MCCP :
Monte Carlo Complete Path Algorithm (restart probability
= 0.8 and the number of simulations
according to their theoretical analysis )
SVD++:
A
l
atent
factor model which combines the matrix factorization technique with implicit feedback from the users.CDTF: A generalized Cross Domain Triadic Factorization over the triadic relation user-item-domain.LFL: A latent feature log-linear model for the dyadic prediction task. 32Slide33
Experimental Results (Impact of Parameter)
Restart Probability
Using Movie Lens datasets
N = m and
from 0.1 to 1.0 linearly
When
increases from 0.1 to 0.9, the MEA becomes lower, which means a better recommendation performance
This result indicates that in reality, the influence of preference propagation for rating prediction decays quickly in a small number of hops.
When
= 1.0 the MEA increases again indicating a worse performance because when
= 1.0, there is no random walk simulation between training users to propagate the preference among them .
33Slide34
Experimental Results (Impact
of Parameter)
The number of simulation N:
= 0.8 and vary the number of simulation from
to
linearly
Their method performs already well even
round of simulation
More simulation rounds yield a little improvement on the prediction accuracy.
34Slide35
Experimental Results (Scalability Test)
The experiment is conducted on a Windows server with an Intel Xeon 2.4 GHz CPU and 384 GB
memory.
Implement in
Matlab
and C++.
For the testing process, their randomly select 2000 users in the test set, and set the split ratio as 5 : 95 to divide the ratings into
observed items
and
held-out items
.
35Slide36
Experimental Results (Handling Dynamic Updates)
To simulate the situation where new training users are added to the system, they divide the training set of the ML-1M data into 10 parts.
Part 1-5 as the new training set respectively and the remaining as the original training set.
The new data ratio, defined as the percentage of new training users, varies from 11.1% to 100%.
Incremental
: treat a new training user as a target user and compute the stationary distribution based on the original model to approximate the new model.
Re-computing:
combines the original training set and the new training set and re-runs the random walk simulation
.
36Slide37
Experimental Results (Handling Dynamic Updates)
MAE increases very little using the incremental computing approach, which demonstrates that the incremental approach can achieve almost the same result with the
recomputing
approach
The MAE of the incremental computing approach increases slightly with the increasing of the new data ratio. This is because the incremental approach simply treats the new training users as have no connections with each other, and thus ignores the preference propagation among them.
The prediction performance of the
recomputing
approach is not affected by the varying ratio of new training data, as it combines the original and new training sets (always equivalent to the 10 parts of training users) for model training.
37Slide38
Conclusion
To overcome the data sparsity issue, they deigned a random walk process on the bipartite graph to model the preference propagation among users.
They proposed a Monte Carlo Algorithm which can be efficiently applied for rating prediction on any new user.The nature of Monte Carlo simulation enables the parallel implementation of our algorithm
38