Amitabha Roy Ivo Mihailovic Willy Zwaenepoel 1 Graphs 2 HyperANF Pagerank ALS Interesting information is encoded as graphs Big Graphs Large graphs are a subset of the big data ID: 509299
Download Presentation The PPT/PDF document "X-Stream: Edge-Centric Graph Processing ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Amitabha RoyIvo MihailovicWilly Zwaenepoel
1Slide2
Graphs
2
+
HyperANF
Pagerank
ALS
….
Interesting information is encoded as graphsSlide3
Big Graphs
Large graphs are a subset of the big data problemBillions of vertices and edges, hundreds of gigabytesNormally tackled on large clustersPregel,
Giraph, Graphlab …Complexity, Power consumption …
Can we do large graphs on a single machine ?
3Slide4
X-Stream
Process large graphs on a single machine
1U server = 64 GB RAM + 2 x 200 GB SSD + 3 x 3TB
drive
4Slide5
Approach
Problem: Graph traversal = random accessRandom access is inefficient for storageDisk (500X slower)SSD (20X slower)RAM (2X slower)
Solution: X-Stream makes graph accesses sequential
5Slide6
Contributions
Edge-centric scatter gather modelStreaming partitions6Slide7
Standard Scatter Gather
Edge-centric scatter gather based on Standard Scatter gatherPopular graph processing model Pregel [Google, SIGMOD 2010] …
Powergraph [OSDI 2012]7Slide8
Standard Scatter Gather
State stored in verticesVertex operationsScatter updates along outgoing edgesGather updates from incoming edges
8
V
V
Scatter
GatherSlide9
1
6
3
5
8
7
4
2
BFS
9
Standard Scatter GatherSlide10
Vertex-Centric Scatter Gather
Iterates over vertices 10
for each
vertex v
if v has update
for
each edge e from v
scatter update along e
Standard scatter gather is
v
ertex-centric
Does not work well with storage
ScatterSlide11
1
6
3
5
8
7
4
2
BFS
SOURCE
DEST
1
3
1
5
2
7
2
4
3
2
3
8
4
3
4
7
4
8
5
6
6
1
8
5
8
6
V
1
2
3
4
5
6
7
8
Vertex-Centric
Scatter Gather
Lookup IndexSlide12
Transformation
12
for each vertex v
if v has update
for
each edge e from v
scatter
update along e
for each
edge e
If e.src has update scatter update along e
Vertex-Centric
Edge-Centric
Scatter
ScatterSlide13
1
6
3
5
8
7
4
2
SOURCE
DEST
1
3
1
5
2
7
2
4
3
2
3
8
4
3
4
7
4
8
5
6
6
1
8
5
8
6
V
1
2
3
4
5
6
7
8
BFS
Edge-Centric Scatter GatherSlide14
SOURCE
DEST1
31
5
2
7
2
4
3
2
3
8
4
347
4
8
5
6
6
1
8
5
8
6
14
=
SOURCE
DEST
1
3
8
6
5
6
2
4
3
2
4
7
4
3
3
8
4
8
2
7
6
1
8
5
1
5
No
index
No
clustering
No
sortingSlide15
Tradeoff
Edge-centric Scatter-Gather:
Vertex-centric Scatter-Gather:
15
Sequential Access Bandwidth >> Random Access Bandwidth
Few scatter gather iterations for real world graphs
W
ell connected,
v
ariety of datasets covered in the paperSlide16
Contributions
Edge-centric scatter gather modelStreaming partitions16Slide17
Streaming Partitions
Problem: still have random access to vertex setV
12
3
4
5
6
7
8
Solution: partition the graph into streaming partitions
17Slide18
Streaming Partitions
A streaming partition isA subset of the vertices that fits in RAMAll edges whose source vertex is in that subsetNo requirement on quality of the partition
18Slide19
V1
123
4
V2
5
6
7
8
SOURCE
DEST
1
5
4
7
2
7
4
3
4
8
3
8
2
4
1
3
3
2
SOURCE
DEST
5
6
8
6
8
5
6
1
19
Partitioning the Graph
Subset of verticesSlide20
V1
123
4
20
Random Accesses for Free
SOURCE
DEST
1
5
4
7
2
7
4
3
4
8
3
8
2
4
1
3
3
2Slide21
V1
123
4
21
Generalization
Fast storage
Slow storage
Applies to any two level memory hierarchy
SOURCE
DEST
1
5
4
7
2
7
4
3
4
8
3
8
2
4
1
3
3
2Slide22
Generally Applicable
OR
Disk
OR
SSD
RAM
RAM
RAM
CPU Cache
22Slide23
Parallelism
Simple ParallelismState is stored in vertexStreaming partitions have disjoint verticesCan process streaming partitions in parallel23Slide24
Gathering Updates
24
Edges
Vertices
X
X
Y
Vertices
Y
Shuffler
M
inimize random access for large number of partitions
Multi-round copying akin to merge sort but cheaper
Partition 1
Partition 100Slide25
Performance
Focus on SSD results in this talkSimilar results with in-memory graphs25Slide26
Baseline
Graphchi [OSDI 2012]First to show that graph processing on a single machineIs viableIs competitiveAlso targets larger sequential bandwidth of SSD and Disk
26Slide27
Different Approaches
Fundamentally different approaches to same goalGraphchi uses “shards”
Partitions edges into sorted shardsX-Stream uses sequential scans
P
artitions edges into
unsorted
streaming partitions
27Slide28
Baseline to Graphchi
Replicated OSDI 2012 experiments on our SSD
Input
Create shards
Shards
Run Algorithm
Answer
Input
Run Algorithm
Answer
Graphchi
X-Stream
28Slide29
X-Stream Speedup over
Graphchi
29
Mean Speedup = 2.3Slide30
Baseline to Graphchi
Replicated OSDI 2012 experiments on our SSD
Input
Create shards
Shards
Run Algorithm
Answer
Input
Run Algorithm
Answer
Graphchi
X-Stream
30Slide31
X-Stream Speedup over
Graphchi ( + sharding)
31
Mean Speedup
Prev
= 2.3
Now = 3.7Slide32
Preprocessing Impact
32
X-Stream returns answers before
Graphchi
finishes
shardingSlide33
Sequential Access Bandwidth
Graphchi shardAll vertices and edges must fit in memoryX-Stream partitionOnly vertices must fit in memoryMore Graphchi shards than X-Stream partitions
Makes access more random for Graphchi
33Slide34
SSD Read Bandwidth (Pagerank on Twitter)
34Slide35
SSD Write Bandwidth (Pagerank on Twitter)
35Slide36
Disk Transfers (Pagerank on Twitter)
Metric
X-Stream
Graphchi
Data
moved
224 GB
322 GB
Time taken
398 seconds
2613 seconds
Transfer
rate578
MB/s 126 MB/s36
SSD can sustain reads = 667 MB/s, writes = 576 MB/s
X-Stream uses all available bandwidth from the storage deviceSlide37
Scaling up
37
16 GB RAM
400 GB SSD
6 TB Disk
8 Million V, 128 Million E, 8 sec
256 Million V, 4 Billion E, 33
mins
4 Billion V, 64 Billion E,
26 hoursSlide38
Conclusion
38
Big graphs
X-Stream
Good Performance
RAM, SSD, Disk
Edge-centric processing
+
Streaming Partitions
=
Sequential Access
Download from http://labos.epfl.ch/xstreamSlide39
BACKUP
39Slide40
API Restrictions
Updates must be commutative Cannot access all edges from a vertex in single step40Slide41
Applications
X-Stream can solve a variety of problemsBFS, SSSP, Weakly connected components, Strongly connected components, Maximal independent sets, Minimum cost spanning trees, Belief propagation, Alternating least squares, Pagerank, Betweenness
centrality, Triangle counting, Approximate neighborhood function, Conductance, K-Cores
Q. Average distance between people on a social network ?
A. Use approximate neighborhood function.
41Slide42
Edge-centric Scatter Gather
Real world graphs have low diameter1
6
3
8
7
4
2
5
1
2
3
4
5
6
7
8
D=3, BFS in 3 steps,
Most real-world graphs
D=7, BFS in 7 steps
42Slide43
X-Stream Main Memory Performance
43Slide44
Runtime impact of Graphchi Sharding
44Slide45
Pre-processing Overhead
Low overhead for producing streaming partitionStrictly cheaper than sorting edges by source vertex
45