/
Graph Analytics Through Fine-Grained Parallelism Graph Analytics Through Fine-Grained Parallelism

Graph Analytics Through Fine-Grained Parallelism - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
409 views
Uploaded On 2017-04-28

Graph Analytics Through Fine-Grained Parallelism - PPT Presentation

Zechao Shang 1 Feifei Li 2 Jeffrey Xu Yu 1 Zhiwei Zhang 3 Hong Cheng 1 1 The Chinese University of Hong Kong 2 University of Utah 3 Hong Kong Baptist University not ID: 542646

grained hsync fine data hsync grained data fine big conflict axis vertex memory small graph parallelism read rate vertices

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Graph Analytics Through Fine-Grained Par..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Graph Analytics Through Fine-Grained Parallelism

Zechao Shang

1

,

Feifei

Li

2

, Jeffrey Xu Yu

1

Zhiwei

Zhang

3

, Hong Cheng

1

1

The Chinese University of Hong Kong

2

University of Utah

3

Hong Kong Baptist UniversitySlide2

not

disk

not others

Background

In-memory

Graph

Analytics

not OLTP

main

target:

computing

timeSlide3

Background

Vertex centric computing

A graph G

= (V, E)V is the vertex set

E is

the edge set

for an edge from u to vUser-defined function (UDF) for each vertex for iteration i = 1 to iterfor vertex v execute UDF(v)

 Slide4

One Example: Page

Rank

Initial Page Rank

For

each iteration

Termination at fix-point Slide5

Coarse-grained parallelism

 Slide6

Fine-grained parallelism

 Slide7

Pros and

Cons (1)

Problem

Cause

Solution

Coarse

Fine

Straggler

Slow

worker

No easy

solution

Dynamic

rebalance

Workload imbalance

Dynamic

imbalanceSlide8

Pros and

Cons (2)

Problem

Cause

Solution

Coarse

Fine

Overheads

Data

shuffling

No

solution

Not

applicable

Storage for intermediate resultsSlide9

Pros and

Cons (3): model flexibility

In-place update  immutable states?YesImmutable states  In-place update?Generally no Slide10

Coarse-grained vs. fine-grained parallelism

People believed coarse-grained shared-nothing architecture is better for large scale data processing systems

We seek the potential of fine-grained shared-memory for graph analyticsThe availability of huge memoryAnd RDMAThe complexity of graph analytical jobsSlide11

The data

consistency problem

Gray vertex write two values

One read by green, the other not

May lead to incorrect results and/or corrupted dataSlide12

The data

consistency problemThe “golden

standard” of data consistency?Serializability theoryThe concurrent execution

is equivalent

to a serial

execution.Serializability controller (concurrency controller) governs the execution and ensures the correctnessSlide13

HSync:

Fine-Grained Parallelism + Data ConsistencySlide14

System setting:In-memory graph analytics

Multi-core

Distributed in futureStatic and exclusively owned dataMain target: as fast as possible

System componentsShared mutable stateControlled data consistencyUser-implemented UDFCustomizable job schedulerHSync: fine-grained

parallelismSlide15

HSync

: an example

Read and write shared-state

The “vertex transaction”All or nothingAtomicIsolationNo durability

transaction

for iteration

i = 1 to iterfor vertex v execute UDF(v) Slide16

HSync:

system

featuresStrong semanticsEquivalent

to serial executionNo asynchronization, no undeterministic

Always guarantee

consistenct resultsFlexible modelCan

simulate asynchronous, BSP, Blogel, prioritized scheduling, etc.Fast information diffusionIntra-iteration prioritized scheduling/pruningEven asymptotically faster algorithmsStochastic gradient descent vs. gradient descentSlide17

HSync:

a

hybrid scheduler

Why existing schedulers are not

working?

Conflict: two transactions access same data

degreedegreeSlide18

HSync:

conflict

rate (really!) matters

High conflict rate  locking based

schedulersLow

conflict rate

 optimistic schedulersSlide19

HSync: a hybrid scheduler

Handle

high/low conflict rate

transactions via different schedulersSplit the high and

low conflict

rate zoneHigh 

2PL (two-phase lock)Low  OCC (optimistic concurrency control)BIG

SMALLSlide20

HSync: BIG vertices

Two-phase lock

Deadlock preventionBased on the vertices orderSpin-lockOptimized for main-memorySlide21

HSync: SMALL vertices

On read/write:

keep the recordsOn

validation:Verify reads and writesBesides the traditional

verifications, we

also try the

locks used by 2PLIf there is any indication of conflict, abortSlide22

HSync:

the

hybridSMALL vertices

never abort the BIG onesSMALL vertices block BIG

ones for

a very short

period of timeBIG ones always succeedAnd we always have serializable executionsSlide23

HSync:

experimental studiesSlide24

HSync:

experiments

Large graphTwo

workloadsRead most (RM)Read write (RW)Slide25

HSync:

throughput

of schedulers

x-axis for

number of

cores, y-axis

for throughput (in 106)Slide26

HSync:

BSP,

Async and HSync

Page Rank, single

source

shortest path, graph

coloring, and alternating least square for matrix factorization. x-axis for time, y-axis for quality: in (a-b, e-h) the lower the better;

in (c-d) the higher the better.Slide27

HSync:

the

real battle

HSync: c4.8xlarge with 36 corses and 60GB

memoryPowerGraph

and PowerLyra: a 16 node m3.2xlarge (8 cores, 30GB memory) clusterGraphChi

: r3.8xlarge instance with 32 cores, 244GB memory and 320GB SSDWall-clock time. WCC for weakly connected component. m for memory limit exceededNormalized against

HSync time.Slide28

Thanks 

 poster #32Slide29

HSync:

other

database system considerations

Values on edges?Variation 1: locks on the edges

Variation 2:

lock both vertices

on the edgeDurability?No needCheckpoint method for fault recoveryThe choice of threshold (for the boundary of BIG and SMALL)?User choice, orBased on conflict rate heat mapSlide30

HSync: why others are not working (cont.)

Timestamp (TS) ordering

Assign timestamps to transactions before startReject operations based on TS relationshipA small-degree vertex transaction could easily abort big-degree oneMulti-version concurrency control (MVCC) and snapshot isolation (SI)

Only works where there is a large proportion of read-only transactionBut in our problem, all transactions are read-writeChromatic schedulerLarge number of colorsSlide31

HSync: a hybrid approach

The transactions are marked as BIG or SMALL depending on degree

All following calls are automatically routed by the systemSlide32

HSync:

synthetic

graphs

Varying

V

Varying A

Varying DSlide33

HSync:

abort rates

and thresholdx-axis

for the threshold, right y-axis for

throughput

(in 106

)Slide34

HSync:

scalability

of schedulers

x-axis for number of

cores,

y-axis for scalability

(normalized as the ratios to the throughput using 1 core)