/
BePI : Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart BePI : Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

BePI : Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart - PowerPoint Presentation

isla
isla . @isla
Follow
345 views
Uploaded On 2021-01-27

BePI : Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart - PPT Presentation

May 17 BePI Fast and MemoryEfficient Method for BillionScale Random Walk with Restart 1 Jinhong Jung Namyong Park Lee Sael U Kang Outline Introduction Proposed Method Experiment Conclusion ID: 830195

fast method random memory method fast memory random scale walk proposed bepi efficient billion preprocessing restart iterative introduction rwr

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "BePI : Fast and Memory-Efficient Method ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

1

Jinhong Jung

Namyong

ParkLee SaelU Kang

Slide2

Outline

IntroductionProposed Method

Experiment

Conclusion

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

2IntroductionProposed MethodExperimentConclusion

Slide3

Introduction Random Walk with Restart

Measures the relevance between nodes in a graphUsed in many datamining applications based on graphs

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

3

Introduction

Proposed MethodExperimentConclusion

Anomaly detection

Spammer, trolls, frauds

Question & Answering System

Subgraph matching

Random Walk with Restart is an important tool for graph analysis!

Recommendation

Friends, movies, documents

Slide4

Random Walk with Restart (1)

Measures node relevance scores using random surferThe surfer starts at query node s on a graph

Random walk: moves to one of neighbors with prob.

Restart

: jumps back to query node s with prob.

 

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart4IntroductionProposed MethodExperiment

Conclusion

Random walk (with prob.

)

 

Restart (with prob. )

 

 

 

Slide5

Random Walk with Restart (2)

Computes the stationary probability that the surfer stays at each node

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

5

Introduction

Proposed MethodExperimentConclusionNode

RWR Score(relevance with node 2) 10.19

20.323

0.2940.10

5

0.10

Restarting probability = 0.2

 

query node

2

1

3

5

4

0.32

0.19

0.29

0.1

0.1

Slide6

Problem Definition – RWR (1)

Given:

adjacency matrix ,

query node

and restar

t probability Find: RWR score vector w.r.t. the query node

Input:

:

row-normalized adjacency matrix

:

query vector

(

-

th

unit vector)

:

restart probability

Output:

:

RWR score vector with regard to

query

node

 

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

6

Introduction

Proposed Method

Experiment

Conclusion

Slide7

Problem Definition – RWR (2)

Computing RWR is equivalent to solving a linear system

Given

,

, and

, solve the linear system

to obtain

 May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

7

Introduction

Proposed Method

Experiment

Conclusion

 

Slide8

Existing methods for RWR

Iterative Methods

Iteratively update RWR scores until convergencee.g., Power iteration

No preprocessing phase

Query phase (repetitive cost)

Given

, repeat the update rule until convergence

 

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart8

Introduction

Proposed Method

Experiment

Conclusion

Preprocessing Methods

Compute RWR scores directly from precomputed data

e.g., Inversion

Preprocessing phase (one time)

Compute

Query phase (repetitive cost)

Given

, compute

 

Slide9

ChallengesQ. How can we compute RWR scores quickly on very large graphs?

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

9

Introduction

Proposed Method

ExperimentConclusion

Iterative methods

Pros

: scale to very large graphs

Do not need preprocessed data

Cons

: slow RWR computation speed

The whole iterations need to be repeated for each

query node

Preprocessing methods

Pros

: fast RWR computation speed

Directly compute the scores

from precomputed results

Cons

: cannot handle very large graphs

Heavy computation cost

and

memory consumption

due

to matrix inversion

Challenge: How to devise a fast and scalable algorithm for computing RWR scores on very large graphs?

Slide10

Outline

IntroductionProposed Method

Experiment

Conclusion

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

10IntroductionProposed MethodExperimentConclusion

Slide11

Proposed Method

BePI (

Best of Preprocessing and

Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative

approachesKey Ideas

Idea 1) Exploit graph characteristics to adopt a preprocessing approach for fast query speed

Idea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart11

IntroductionProposed Method

ExperimentConclusion

Slide12

Proposed Method

BePI (

Best of Preprocessing and

Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative approaches

Key IdeasIdea 1)

Exploit graph characteristics to adopt a preprocessing approach for fast query speed

Idea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate the RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart12

Introduction

Proposed MethodExperiment

Conclusion

Slide13

Proposed Method – Idea 1

Exploit graph characteristics

to adopt a preprocessing approach for fast query speedReorder node ids to permute

based on deadend

and hub-and-spoke structuresApply block elimination as a preprocessing approach

 

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart13IntroductionProposed Method

ExperimentConclusion

Deadend

Deadend

is a node having no out-going edges

File or Image in web-document networks

Deadends

get high ids

Non-

deadends

get low ids

Non-

deadends

Deadends

Original Matrix

Reordered Matrix

 

Source

Destination

Source

Destination

Slide14

Proposed Method

– Idea 1

Reorder node ids to permute

based on deadend

and hub-and-spoke structuresThe entries of

are concentrated by reordering nodes based on

hub-and-spoke structure [Kang et al., `11] May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart14Introduction

Proposed MethodExperiment

Conclusion

Hub-and-spoke

Hubs

are high degree nodes,

spokes

are low degree nodes

Few hubs

, and a majority of spokes in real-world graphs

Hubs get high ids

Spokes get low ids

Spokes

Hubs

Original Matrix

Reordered Matrix

 

Slide15

Proposed Method – Idea 1

Combine deadend and hub & spoke reordering

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

15

Introduction

Proposed MethodExperimentConclusion

Deadend

Hub & Spoke

on

 

 

is a block diagonal matrix!

 

Slide16

Proposed Method – Idea 1

Apply block elimination as a preprocessing approach

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

16

Introduction

Proposed MethodExperimentConclusion

, the

Schur

complement of

 

Precompute the blue-colored matrices to make RWR computation fast!

 

 

Block

elimination

See Lemma 1

Details

Slide17

Proposed Method

BePI (

Best of Preprocessing and

Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative methods

Key Ideas

Idea 1) Exploit graph characteristics

to adopt a preprocessing approachIdea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart17

Introduction

Proposed MethodExperiment

Conclusion

Slide18

Proposed Method – Idea 2

Incorporate an iterative method into the

preprocessing approach to increase the scalability Computing

is trivial since it is block

diagonal

But, inverting

is impractical in very large graphs # of hubs > 1 million () in large graphse.g., 10 million hubs in the Twitter network May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

18

Introduction

Proposed MethodExperiment

Conclusion

 

 

Slide19

Proposed Method – Idea 2

Incorporate an iterative method into the

preprocessing approach

Solution. Solve the linear system on

using an iterative linear solver

[Saad et al., `86]

Linear solvers obtain the solution

without inverting

 

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

19

Introduction

Proposed Method

Experiment

Conclusion

Introducing the linear solver increases the scalability of RWR computation!

Slide20

Proposed Method

BePI (

Best of Preprocessing and

Iterative approaches)A fast and scalable method by taking the advantages of both preprocessing and iterative methods

Key Ideas

Idea 1) Exploit graph characteristics

to adopt a preprocessing approachIdea 2) Incorporate an iterative method into the preprocessing approach to increase the scalabilityIdea 3) Optimize the performance of the iterative method to accelerate RWR computation speedMay 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart20

Introduction

Proposed MethodExperiment

Conclusion

Slide21

Proposed Method – Idea 3

Optimize the performance of the iterative method

to accelerate RWR computation speed

The running time of linear solvers is

: number

of non-zeros of

: number of iterationsOptimization 1) How to decrease ?

Control hub selection ratio in hub & spoke methodOptimization 2) How to decrease

? Exploit a preconditioner

 

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart21

Introduction

Proposed Method

Experiment

Conclusion

Slide22

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

22

Introduction

Proposed Method

Experiment

Conclusion

Propose MethodIdea 3Optimization 1

Hub-and-spokereorderingmethod

Slide23

Hub-and-spoke reordering

For each iteration, select hubs with a hub selection ratio

Disconnect the hubs and assign node ids for hubs & spokes

Repeat the above in the GCC (Giant Connected Comp.)

 

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart23IntroductionProposed MethodExperiment

Conclusion

,

select

hub for each iteration

 

Selected hubSpoke

Node in GCC

According to hub selection ratio

, # of hubs changes

 

[Kang et al., `11]

Slide24

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

24

Introduction

Proposed Method

Experiment

Conclusion

Propose MethodIdea 3Optimization 1Hub-and-spoke

reorderingmethod

Slide25

Proposed Method – Idea 3

Optimization 1)

Reduce the number of non-zeros of

According to hub selection ratio

, # of hubs is different

# of non-zeros of sub-matrices in

changes # of non-zeros of changes ()

 May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

25

Introduction

Proposed Method

Experiment

Conclusion

 

 

 

 

 

 

 

 

 

 

 

If

increases, then

# of hubs increases

increases

decrease

decreases

a lot!

Thus,

decreases

!!

 

Slide26

Proposed

Method

Idea 3

Optimization 1)

Reduce the number of non-zeros of

Pick a hub selection ratio that minimizes

 May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart26

Introduction

Proposed Method

Experiment

Conclusion

 

Efficiency of the iterative method on

is improved!

where

is # of

iter

.

Space efficiency for

is also improved!

No loss of accuracy!

provides a good performance in large-scale graphs

 

Slide27

Proposed Method – Idea 3

Optimization 2) Exploit the preconditioner for the linear system on

Make the iterative method converge faster

Exploit

incomplete LU decomposition

as preconditioners

Fast decomposition and the sparsity pattern of is preservedImplicit preconditioned system

Preconditioned iterative solvers [

Saad

`93] solve the implicit preconditioned system without matrix inversion

 

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

27

Introduction

Proposed Method

Experiment

Conclusion

 

Slide28

Outline

Introduction

Proposed MethodExperiment

Conclusion

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

28IntroductionProposed MethodExperimentConclusion

Slide29

Experimental QuestionsQ1. (Space)

How much memory space does BePI

requires for their preprocessed results?Q2. (Prep. Time) How long does the preprocessing phase of

BePI take?

Q3. (Query Time) How quickly does BePI

respond to an RWR query?Q4. (Scalability) How well does

BePI scale up?May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart29IntroductionProposed Method

ExperimentConclusion

Slide30

Experimental SettingsMachine: single workstation with 512GB memory

Datasets: large-scale real-world graph data

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

30

Introduction

Proposed MethodExperimentConclusion

: the number of nodes: the number of edges

Various domain of graphsSocial, web, vote, …500K

2B edges in graphs

 

Slide31

Q1. Space EfficiencyHow much memory space does

BePI requires for their preprocessed results

?

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

31

Introduction

Proposed MethodExperimentConclusion

BePI requires up to

less memory space than other preprocessing methods!Only BePI preprocesses all datasets.

 

Proposed

Slide32

Q2. Preprocessing TimeHow long does the preprocessing phase of

BePI take?

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

32

Introduction

Proposed MethodExperimentConclusionBePI is

significantly faster than other methods in terms of preprocessing time!

Slide33

Q3. Query TimeHow quickly does

BePI respond to an RWR query?

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

33

Introduction

Proposed MethodExperimentConclusionBePI is up to

9 faster than other competitors in terms of query speed!

 

Slide34

Q4. Scalability of BePI

How well does

BePI scale up?Processes

larger graphs than other preprocessing methods

Shows the fastest RWR computation speed among othersProvides near linear scalability in terms of time and memory usage

 

May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart34IntroductionProposed Method

ExperimentConclusion

BePI

shows the best performance in terms of scalability and running time!

Slide35

Outline

Introduction

Proposed MethodExperiment

Conclusion

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

35IntroductionProposed MethodExperimentConclusion

Slide36

Conclusion

BePI

(Best of Preprocessing and

Iterative approaches)Idea 1)

Exploit graph characteristics for a prep. methodIdea 2) Incorporate an iterative method into the prep. method

Idea 3) Optimize the performance of the iterative methodMain Results

Fast and scalable computation for RWR in large-scale graphsRequires 130 less memory space & processes larger graphs than other preprocessing methodsComputes RWR scores faster than other existing methods May 17BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

36

Introduction

Proposed Method

Experiment

Conclusion

Slide37

Thank you!

Codes & datasets

http://datalab.snu.ac.kr

/bepi

May 17

BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart

37