Zechao Shang 1 Feifei Li 2 Jeffrey Xu Yu 1 Zhiwei Zhang 3 Hong Cheng 1 1 The Chinese University of Hong Kong 2 University of Utah 3 Hong Kong Baptist University not ID: 542646
Download Presentation The PPT/PDF document "Graph Analytics Through Fine-Grained Par..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Graph Analytics Through Fine-Grained Parallelism
Zechao Shang
1
,
Feifei
Li
2
, Jeffrey Xu Yu
1
Zhiwei
Zhang
3
, Hong Cheng
1
1
The Chinese University of Hong Kong
2
University of Utah
3
Hong Kong Baptist UniversitySlide2
not
disk
not others
Background
In-memory
Graph
Analytics
not OLTP
main
target:
computing
timeSlide3
Background
Vertex centric computing
A graph G
= (V, E)V is the vertex set
E is
the edge set
for an edge from u to vUser-defined function (UDF) for each vertex for iteration i = 1 to iterfor vertex v execute UDF(v)
Slide4
One Example: Page
Rank
Initial Page Rank
For
each iteration
Termination at fix-point Slide5
Coarse-grained parallelism
Slide6
Fine-grained parallelism
Slide7
Pros and
Cons (1)
Problem
Cause
Solution
Coarse
Fine
Straggler
Slow
worker
No easy
solution
Dynamic
rebalance
Workload imbalance
Dynamic
imbalanceSlide8
Pros and
Cons (2)
Problem
Cause
Solution
Coarse
Fine
Overheads
Data
shuffling
No
solution
Not
applicable
Storage for intermediate resultsSlide9
Pros and
Cons (3): model flexibility
In-place update immutable states?YesImmutable states In-place update?Generally no Slide10
Coarse-grained vs. fine-grained parallelism
People believed coarse-grained shared-nothing architecture is better for large scale data processing systems
We seek the potential of fine-grained shared-memory for graph analyticsThe availability of huge memoryAnd RDMAThe complexity of graph analytical jobsSlide11
The data
consistency problem
Gray vertex write two values
One read by green, the other not
May lead to incorrect results and/or corrupted dataSlide12
The data
consistency problemThe “golden
standard” of data consistency?Serializability theoryThe concurrent execution
is equivalent
to a serial
execution.Serializability controller (concurrency controller) governs the execution and ensures the correctnessSlide13
HSync:
Fine-Grained Parallelism + Data ConsistencySlide14
System setting:In-memory graph analytics
Multi-core
Distributed in futureStatic and exclusively owned dataMain target: as fast as possible
System componentsShared mutable stateControlled data consistencyUser-implemented UDFCustomizable job schedulerHSync: fine-grained
parallelismSlide15
HSync
: an example
Read and write shared-state
The “vertex transaction”All or nothingAtomicIsolationNo durability
transaction
for iteration
i = 1 to iterfor vertex v execute UDF(v) Slide16
HSync:
system
featuresStrong semanticsEquivalent
to serial executionNo asynchronization, no undeterministic
Always guarantee
consistenct resultsFlexible modelCan
simulate asynchronous, BSP, Blogel, prioritized scheduling, etc.Fast information diffusionIntra-iteration prioritized scheduling/pruningEven asymptotically faster algorithmsStochastic gradient descent vs. gradient descentSlide17
HSync:
a
hybrid scheduler
Why existing schedulers are not
working?
Conflict: two transactions access same data
degreedegreeSlide18
HSync:
conflict
rate (really!) matters
High conflict rate locking based
schedulersLow
conflict rate
optimistic schedulersSlide19
HSync: a hybrid scheduler
Handle
high/low conflict rate
transactions via different schedulersSplit the high and
low conflict
rate zoneHigh
2PL (two-phase lock)Low OCC (optimistic concurrency control)BIG
SMALLSlide20
HSync: BIG vertices
Two-phase lock
Deadlock preventionBased on the vertices orderSpin-lockOptimized for main-memorySlide21
HSync: SMALL vertices
On read/write:
keep the recordsOn
validation:Verify reads and writesBesides the traditional
verifications, we
also try the
locks used by 2PLIf there is any indication of conflict, abortSlide22
HSync:
the
hybridSMALL vertices
never abort the BIG onesSMALL vertices block BIG
ones for
a very short
period of timeBIG ones always succeedAnd we always have serializable executionsSlide23
HSync:
experimental studiesSlide24
HSync:
experiments
Large graphTwo
workloadsRead most (RM)Read write (RW)Slide25
HSync:
throughput
of schedulers
x-axis for
number of
cores, y-axis
for throughput (in 106)Slide26
HSync:
BSP,
Async and HSync
Page Rank, single
source
shortest path, graph
coloring, and alternating least square for matrix factorization. x-axis for time, y-axis for quality: in (a-b, e-h) the lower the better;
in (c-d) the higher the better.Slide27
HSync:
the
real battle
HSync: c4.8xlarge with 36 corses and 60GB
memoryPowerGraph
and PowerLyra: a 16 node m3.2xlarge (8 cores, 30GB memory) clusterGraphChi
: r3.8xlarge instance with 32 cores, 244GB memory and 320GB SSDWall-clock time. WCC for weakly connected component. m for memory limit exceededNormalized against
HSync time.Slide28
Thanks
poster #32Slide29
HSync:
other
database system considerations
Values on edges?Variation 1: locks on the edges
Variation 2:
lock both vertices
on the edgeDurability?No needCheckpoint method for fault recoveryThe choice of threshold (for the boundary of BIG and SMALL)?User choice, orBased on conflict rate heat mapSlide30
HSync: why others are not working (cont.)
Timestamp (TS) ordering
Assign timestamps to transactions before startReject operations based on TS relationshipA small-degree vertex transaction could easily abort big-degree oneMulti-version concurrency control (MVCC) and snapshot isolation (SI)
Only works where there is a large proportion of read-only transactionBut in our problem, all transactions are read-writeChromatic schedulerLarge number of colorsSlide31
HSync: a hybrid approach
The transactions are marked as BIG or SMALL depending on degree
All following calls are automatically routed by the systemSlide32
HSync:
synthetic
graphs
Varying
V
Varying A
Varying DSlide33
HSync:
abort rates
and thresholdx-axis
for the threshold, right y-axis for
throughput
(in 106
)Slide34
HSync:
scalability
of schedulers
x-axis for number of
cores,
y-axis for scalability
(normalized as the ratios to the throughput using 1 core)