/
X-Stream: Edge-Centric Graph Processing using Streaming Par X-Stream: Edge-Centric Graph Processing using Streaming Par

X-Stream: Edge-Centric Graph Processing using Streaming Par - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
401 views
Uploaded On 2017-01-13

X-Stream: Edge-Centric Graph Processing using Streaming Par - PPT Presentation

Amitabha Roy Ivo Mihailovic Willy Zwaenepoel 1 Graphs 2 HyperANF Pagerank ALS Interesting information is encoded as graphs Big Graphs Large graphs are a subset of the big data ID: 509299

stream scatter centric gather scatter stream gather centric partitions edge ssd graphchi vertex graphs edges streaming access source vertices

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "X-Stream: Edge-Centric Graph Processing ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

X-Stream: Edge-Centric Graph Processing using Streaming Partitions

Amitabha RoyIvo MihailovicWilly Zwaenepoel

1Slide2

Graphs

2

+

HyperANF

Pagerank

ALS

….

Interesting information is encoded as graphsSlide3

Big Graphs

Large graphs are a subset of the big data problemBillions of vertices and edges, hundreds of gigabytesNormally tackled on large clustersPregel,

Giraph, Graphlab …Complexity, Power consumption …

Can we do large graphs on a single machine ?

3Slide4

X-Stream

Process large graphs on a single machine

1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB

drive

4Slide5

Approach

Problem: Graph traversal = random accessRandom access is inefficient for storageDisk (500X slower)SSD (20X slower)RAM (2X slower)

Solution: X-Stream makes graph accesses sequential

5Slide6

Contributions

Edge-centric scatter gather modelStreaming partitions6Slide7

Standard Scatter Gather

Edge-centric scatter gather based on Standard Scatter gatherPopular graph processing model Pregel [Google, SIGMOD 2010] …

Powergraph [OSDI 2012]7Slide8

Standard Scatter Gather

State stored in verticesVertex operationsScatter updates along outgoing edgesGather updates from incoming edges

8

V

V

Scatter

GatherSlide9

1

6

3

5

8

7

4

2

BFS

9

Standard Scatter GatherSlide10

Vertex-Centric Scatter Gather

Iterates over vertices 10

for each

vertex v

if v has update

for

each edge e from v

scatter update along e

Standard scatter gather is

v

ertex-centric

Does not work well with storage

ScatterSlide11

1

6

3

5

8

7

4

2

BFS

SOURCE

DEST

1

3

1

5

2

7

2

4

3

2

3

8

4

3

4

7

4

8

5

6

6

1

8

5

8

6

V

1

2

3

4

5

6

7

8

Vertex-Centric

Scatter Gather

Lookup IndexSlide12

Transformation

12

for each vertex v

if v has update

for

each edge e from v

scatter

update along e

for each

edge e

If e.src has update scatter update along e

Vertex-Centric

Edge-Centric

Scatter

ScatterSlide13

1

6

3

5

8

7

4

2

SOURCE

DEST

1

3

1

5

2

7

2

4

3

2

3

8

4

3

4

7

4

8

5

6

6

1

8

5

8

6

V

1

2

3

4

5

6

7

8

BFS

Edge-Centric Scatter GatherSlide14

SOURCE

DEST1

31

5

2

7

2

4

3

2

3

8

4

347

4

8

5

6

6

1

8

5

8

6

14

=

SOURCE

DEST

1

3

8

6

5

6

2

4

3

2

4

7

4

3

3

8

4

8

2

7

6

1

8

5

1

5

No

index

No

clustering

No

sortingSlide15

Tradeoff

Edge-centric Scatter-Gather:

 

Vertex-centric Scatter-Gather:

 

15

Sequential Access Bandwidth >> Random Access Bandwidth

Few scatter gather iterations for real world graphs

W

ell connected,

v

ariety of datasets covered in the paperSlide16

Contributions

Edge-centric scatter gather modelStreaming partitions16Slide17

Streaming Partitions

Problem: still have random access to vertex setV

12

3

4

5

6

7

8

Solution: partition the graph into streaming partitions

17Slide18

Streaming Partitions

A streaming partition isA subset of the vertices that fits in RAMAll edges whose source vertex is in that subsetNo requirement on quality of the partition

18Slide19

V1

123

4

V2

5

6

7

8

SOURCE

DEST

1

5

4

7

2

7

4

3

4

8

3

8

2

4

1

3

3

2

SOURCE

DEST

5

6

8

6

8

5

6

1

19

Partitioning the Graph

Subset of verticesSlide20

V1

123

4

20

Random Accesses for Free

SOURCE

DEST

1

5

4

7

2

7

4

3

4

8

3

8

2

4

1

3

3

2Slide21

V1

123

4

21

Generalization

Fast storage

Slow storage

Applies to any two level memory hierarchy

SOURCE

DEST

1

5

4

7

2

7

4

3

4

8

3

8

2

4

1

3

3

2Slide22

Generally Applicable

OR

Disk

OR

SSD

RAM

RAM

RAM

CPU Cache

22Slide23

Parallelism

Simple ParallelismState is stored in vertexStreaming partitions have disjoint verticesCan process streaming partitions in parallel23Slide24

Gathering Updates

24

Edges

Vertices

X

X

Y

Vertices

Y

Shuffler

M

inimize random access for large number of partitions

Multi-round copying akin to merge sort but cheaper

Partition 1

Partition 100Slide25

Performance

Focus on SSD results in this talkSimilar results with in-memory graphs25Slide26

Baseline

Graphchi [OSDI 2012]First to show that graph processing on a single machineIs viableIs competitiveAlso targets larger sequential bandwidth of SSD and Disk

26Slide27

Different Approaches

Fundamentally different approaches to same goalGraphchi uses “shards”

Partitions edges into sorted shardsX-Stream uses sequential scans

P

artitions edges into

unsorted

streaming partitions

27Slide28

Baseline to Graphchi

Replicated OSDI 2012 experiments on our SSD

Input

Create shards

Shards

Run Algorithm

Answer

Input

Run Algorithm

Answer

Graphchi

X-Stream

28Slide29

X-Stream Speedup over

Graphchi

29

Mean Speedup = 2.3Slide30

Baseline to Graphchi

Replicated OSDI 2012 experiments on our SSD

Input

Create shards

Shards

Run Algorithm

Answer

Input

Run Algorithm

Answer

Graphchi

X-Stream

30Slide31

X-Stream Speedup over

Graphchi ( + sharding)

31

Mean Speedup

Prev

= 2.3

Now = 3.7Slide32

Preprocessing Impact

32

X-Stream returns answers before

Graphchi

finishes

shardingSlide33

Sequential Access Bandwidth

Graphchi shardAll vertices and edges must fit in memoryX-Stream partitionOnly vertices must fit in memoryMore Graphchi shards than X-Stream partitions

Makes access more random for Graphchi

33Slide34

SSD Read Bandwidth (Pagerank on Twitter)

34Slide35

SSD Write Bandwidth (Pagerank on Twitter)

35Slide36

Disk Transfers (Pagerank on Twitter)

Metric

X-Stream

Graphchi

Data

moved

224 GB

322 GB

Time taken

398 seconds

2613 seconds

Transfer

rate578

MB/s 126 MB/s36

SSD can sustain reads = 667 MB/s, writes = 576 MB/s

X-Stream uses all available bandwidth from the storage deviceSlide37

Scaling up

37

16 GB RAM

400 GB SSD

6 TB Disk

8 Million V, 128 Million E, 8 sec

256 Million V, 4 Billion E, 33

mins

4 Billion V, 64 Billion E,

26 hoursSlide38

Conclusion

38

Big graphs

X-Stream

Good Performance

RAM, SSD, Disk

Edge-centric processing

+

Streaming Partitions

=

Sequential Access

Download from http://labos.epfl.ch/xstreamSlide39

BACKUP

39Slide40

API Restrictions

Updates must be commutative Cannot access all edges from a vertex in single step40Slide41

Applications

X-Stream can solve a variety of problemsBFS, SSSP, Weakly connected components, Strongly connected components, Maximal independent sets, Minimum cost spanning trees, Belief propagation, Alternating least squares, Pagerank, Betweenness

centrality, Triangle counting, Approximate neighborhood function, Conductance, K-Cores

Q. Average distance between people on a social network ?

A. Use approximate neighborhood function.

41Slide42

Edge-centric Scatter Gather

Real world graphs have low diameter1

6

3

8

7

4

2

5

1

2

3

4

5

6

7

8

D=3, BFS in 3 steps,

Most real-world graphs

D=7, BFS in 7 steps

42Slide43

X-Stream Main Memory Performance

43Slide44

Runtime impact of Graphchi Sharding

44Slide45

Pre-processing Overhead

Low overhead for producing streaming partitionStrictly cheaper than sorting edges by source vertex

45