May 17 BePI Fast and MemoryEfficient Method for BillionScale Random Walk with Restart 1 Jinhong Jung Namyong Park Lee Sael U Kang Outline Introduction Proposed Method Experiment Conclusion ID: 830195
Download The PPT/PDF document "BePI : Fast and Memory-Efficient Method ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
1
Jinhong Jung
Namyong
ParkLee SaelU Kang
Slide2Outline
IntroductionProposed Method
Experiment
Conclusion
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
2IntroductionProposed MethodExperimentConclusion
Slide3Introduction Random Walk with Restart
Measures the relevance between nodes in a graphUsed in many datamining applications based on graphs
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
3
Introduction
Proposed MethodExperimentConclusion
Anomaly detection
Spammer, trolls, frauds
Question & Answering System
Subgraph matching
Random Walk with Restart is an important tool for graph analysis!
Recommendation
Friends, movies, documents
Slide4Random Walk with Restart (1)
Measures node relevance scores using random surferThe surfer starts at query node s on a graph
Random walk: moves to one of neighbors with prob.
Restart
: jumps back to query node s with prob.
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart4IntroductionProposed MethodExperiment
Conclusion
Random walk (with prob.
)
Restart (with prob. )
Random Walk with Restart (2)
Computes the stationary probability that the surfer stays at each node
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
5
Introduction
Proposed MethodExperimentConclusionNode
RWR Score(relevance with node 2) 10.19
20.323
0.2940.10
5
0.10
Restarting probability = 0.2
query node
2
1
3
5
4
0.32
0.19
0.29
0.1
0.1
Slide6Problem Definition – RWR (1)
Given:
adjacency matrix ,
query node
and restar
t probability Find: RWR score vector w.r.t. the query node
Input:
:
row-normalized adjacency matrix
:
query vector
(
-
th
unit vector)
:
restart probability
Output:
:
RWR score vector with regard to
query
node
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
6
Introduction
Proposed Method
Experiment
Conclusion
Slide7Problem Definition – RWR (2)
Computing RWR is equivalent to solving a linear system
Given
,
, and
, solve the linear system
to obtain
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
7
Introduction
Proposed Method
Experiment
Conclusion
Existing methods for RWR
Iterative Methods
Iteratively update RWR scores until convergencee.g., Power iteration
No preprocessing phase
Query phase (repetitive cost)
Given
, repeat the update rule until convergence
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart8
Introduction
Proposed Method
Experiment
Conclusion
Preprocessing Methods
Compute RWR scores directly from precomputed data
e.g., Inversion
Preprocessing phase (one time)
Compute
Query phase (repetitive cost)
Given
, compute
ChallengesQ. How can we compute RWR scores quickly on very large graphs?
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
9
Introduction
Proposed Method
ExperimentConclusion
Iterative methods
Pros
: scale to very large graphs
Do not need preprocessed data
Cons
: slow RWR computation speed
The whole iterations need to be repeated for each
query node
Preprocessing methods
Pros
: fast RWR computation speed
Directly compute the scores
from precomputed results
Cons
: cannot handle very large graphs
Heavy computation cost
and
memory consumption
due
to matrix inversion
Challenge: How to devise a fast and scalable algorithm for computing RWR scores on very large graphs?
Slide10Outline
IntroductionProposed Method
Experiment
Conclusion
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
10IntroductionProposed MethodExperimentConclusion
Slide11Proposed Method
BePI (
Best of Preprocessing and
Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative
approachesKey Ideas
Idea 1) Exploit graph characteristics to adopt a preprocessing approach for fast query speed
Idea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart11
IntroductionProposed Method
ExperimentConclusion
Slide12Proposed Method
BePI (
Best of Preprocessing and
Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative approaches
Key IdeasIdea 1)
Exploit graph characteristics to adopt a preprocessing approach for fast query speed
Idea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate the RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart12
Introduction
Proposed MethodExperiment
Conclusion
Slide13Proposed Method – Idea 1
Exploit graph characteristics
to adopt a preprocessing approach for fast query speedReorder node ids to permute
based on deadend
and hub-and-spoke structuresApply block elimination as a preprocessing approach
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart13IntroductionProposed Method
ExperimentConclusion
Deadend
Deadend
is a node having no out-going edges
File or Image in web-document networks
Deadends
get high ids
Non-
deadends
get low ids
Non-
deadends
Deadends
Original Matrix
Reordered Matrix
Source
Destination
Source
Destination
Slide14Proposed Method
– Idea 1
Reorder node ids to permute
based on deadend
and hub-and-spoke structuresThe entries of
are concentrated by reordering nodes based on
hub-and-spoke structure [Kang et al., `11] May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart14Introduction
Proposed MethodExperiment
Conclusion
Hub-and-spoke
Hubs
are high degree nodes,
spokes
are low degree nodes
Few hubs
, and a majority of spokes in real-world graphs
Hubs get high ids
Spokes get low ids
Spokes
Hubs
Original Matrix
Reordered Matrix
Proposed Method – Idea 1
Combine deadend and hub & spoke reordering
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
15
Introduction
Proposed MethodExperimentConclusion
Deadend
Hub & Spoke
on
is a block diagonal matrix!
Proposed Method – Idea 1
Apply block elimination as a preprocessing approach
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
16
Introduction
Proposed MethodExperimentConclusion
, the
Schur
complement of
Precompute the blue-colored matrices to make RWR computation fast!
Block
elimination
See Lemma 1
Details
Slide17Proposed Method
BePI (
Best of Preprocessing and
Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative methods
Key Ideas
Idea 1) Exploit graph characteristics
to adopt a preprocessing approachIdea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart17
Introduction
Proposed MethodExperiment
Conclusion
Slide18Proposed Method – Idea 2
Incorporate an iterative method into the
preprocessing approach to increase the scalability Computing
is trivial since it is block
diagonal
But, inverting
is impractical in very large graphs # of hubs > 1 million () in large graphse.g., 10 million hubs in the Twitter network May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
18
Introduction
Proposed MethodExperiment
Conclusion
Proposed Method – Idea 2
Incorporate an iterative method into the
preprocessing approach
Solution. Solve the linear system on
using an iterative linear solver
[Saad et al., `86]
Linear solvers obtain the solution
without inverting
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
19
Introduction
Proposed Method
Experiment
Conclusion
Introducing the linear solver increases the scalability of RWR computation!
Slide20Proposed Method
BePI (
Best of Preprocessing and
Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative methods
Key Ideas
Idea 1) Exploit graph characteristics
to adopt a preprocessing approachIdea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart20
Introduction
Proposed MethodExperiment
Conclusion
Slide21Proposed Method – Idea 3
Optimize the performance of the iterative method
to accelerate RWR computation speed
The running time of linear solvers is
: number
of non-zeros of
: number of iterationsOptimization 1) How to decrease ?
Control hub selection ratio in hub & spoke methodOptimization 2) How to decrease
? Exploit a preconditioner
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart21
Introduction
Proposed Method
Experiment
Conclusion
Slide22May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
22
Introduction
Proposed Method
Experiment
Conclusion
Propose MethodIdea 3Optimization 1
Hub-and-spokereorderingmethod
Slide23Hub-and-spoke reordering
For each iteration, select hubs with a hub selection ratio
Disconnect the hubs and assign node ids for hubs & spokes
Repeat the above in the GCC (Giant Connected Comp.)
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart23IntroductionProposed MethodExperiment
Conclusion
,
select
hub for each iteration
Selected hubSpoke
Node in GCC
According to hub selection ratio
, # of hubs changes
[Kang et al., `11]
Slide24May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
24
Introduction
Proposed Method
Experiment
Conclusion
Propose MethodIdea 3Optimization 1Hub-and-spoke
reorderingmethod
Slide25Proposed Method – Idea 3
Optimization 1)
Reduce the number of non-zeros of
According to hub selection ratio
, # of hubs is different
# of non-zeros of sub-matrices in
changes # of non-zeros of changes ()
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
25
Introduction
Proposed Method
Experiment
Conclusion
If
increases, then
# of hubs increases
increases
decrease
decreases
a lot!
Thus,
decreases
!!
Proposed
Method
–
Idea 3
Optimization 1)
Reduce the number of non-zeros of
Pick a hub selection ratio that minimizes
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart26
Introduction
Proposed Method
Experiment
Conclusion
Efficiency of the iterative method on
is improved!
where
is # of
iter
.
Space efficiency for
is also improved!
No loss of accuracy!
provides a good performance in large-scale graphs
Proposed Method – Idea 3
Optimization 2) Exploit the preconditioner for the linear system on
Make the iterative method converge faster
Exploit
incomplete LU decomposition
as preconditioners
Fast decomposition and the sparsity pattern of is preservedImplicit preconditioned system
Preconditioned iterative solvers [
Saad
`93] solve the implicit preconditioned system without matrix inversion
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
27
Introduction
Proposed Method
Experiment
Conclusion
Outline
Introduction
Proposed MethodExperiment
Conclusion
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
28IntroductionProposed MethodExperimentConclusion
Slide29Experimental QuestionsQ1. (Space)
How much memory space does BePI
requires for their preprocessed results?Q2. (Prep. Time) How long does the preprocessing phase of
BePI take?
Q3. (Query Time) How quickly does BePI
respond to an RWR query?Q4. (Scalability) How well does
BePI scale up?May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart29IntroductionProposed Method
ExperimentConclusion
Slide30Experimental SettingsMachine: single workstation with 512GB memory
Datasets: large-scale real-world graph data
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
30
Introduction
Proposed MethodExperimentConclusion
: the number of nodes: the number of edges
Various domain of graphsSocial, web, vote, …500K
2B edges in graphs
Q1. Space EfficiencyHow much memory space does
BePI requires for their preprocessed results
?
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
31
Introduction
Proposed MethodExperimentConclusion
BePI requires up to
less memory space than other preprocessing methods!Only BePI preprocesses all datasets.
Proposed
Slide32Q2. Preprocessing TimeHow long does the preprocessing phase of
BePI take?
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
32
Introduction
Proposed MethodExperimentConclusionBePI is
significantly faster than other methods in terms of preprocessing time!
Slide33Q3. Query TimeHow quickly does
BePI respond to an RWR query?
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
33
Introduction
Proposed MethodExperimentConclusionBePI is up to
9 faster than other competitors in terms of query speed!
Q4. Scalability of BePI
How well does
BePI scale up?Processes
larger graphs than other preprocessing methods
Shows the fastest RWR computation speed among othersProvides near linear scalability in terms of time and memory usage
May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart34IntroductionProposed Method
ExperimentConclusion
BePI
shows the best performance in terms of scalability and running time!
Slide35Outline
Introduction
Proposed MethodExperiment
Conclusion
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
35IntroductionProposed MethodExperimentConclusion
Slide36Conclusion
BePI
(Best of Preprocessing and
Iterative approaches)Idea 1)
Exploit graph characteristics for a prep. methodIdea 2) Incorporate an iterative method into the prep. method
Idea 3) Optimize the performance of the iterative methodMain Results
Fast and scalable computation for RWR in large-scale graphsRequires 130 less memory space & processes larger graphs than other preprocessing methodsComputes RWR scores faster than other existing methods May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
36
Introduction
Proposed Method
Experiment
Conclusion
Slide37Thank you!
Codes & datasets
http://datalab.snu.ac.kr
/bepi
May 17
BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart
37