/
HiPS : Hierarchical Parameter Synchronization HiPS : Hierarchical Parameter Synchronization

HiPS : Hierarchical Parameter Synchronization - PowerPoint Presentation

goldengirl
goldengirl . @goldengirl
Follow
344 views
Uploaded On 2020-08-26

HiPS : Hierarchical Parameter Synchronization - PPT Presentation

in LargeScale Distributed Machine Learning Jinkun Geng Dan Li Yang Cheng Shuai Wang and Junfeng Li 1 Net for ACM SIGCOMM Workshop ID: 803314

hips based designhips bcube based hips bcube designhips tree backgroundsynchronization pause performance rdma testbed torus gst simulation hierarchical frame

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "HiPS : Hierarchical Parameter Synchron..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

HiPS:Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning

Jinkun

Geng, Dan Li, Yang Cheng, Shuai Wang, and Junfeng Li

1

Slide2

NetforACM SIGCOMM

Workshop on NetAIAI2

Slide3

BackgroundDistributed Machine LearningComputationCommunication

3

Slide4

BackgroundStrong Computation Power (GPU &

TPU)4

Slide5

BackgroundCommunication ChallengeTCP: High Latency & Low Throughput, Kernel Overheads, etc.

RDMA-Promising

Alternative to TCP5

Slide6

BackgroundA MNIST Benchmark with 1

m

illion paras 6

Slide7

BackgroundRoCE/RDMA –multi-vendor ecosystem

Many

Problems in Fat-Tree based Deployment7

Slide8

BackgroundFat-Tree based DeploymentPFC

pause

frame storm [SIGCOMM’15,’16, NS-3 Simulation]Resilient RoCE-Performance Sacrifice [Chelsio-Tech]Synchronization Performance

8

Slide9

BackgroundFat-Tree based DeploymentPFC

pause

frame storm [SIGCOM’15,’16]Resilient RoCE-Performance SacrificeServer-Centric Networks9

Slide10

BackgroundFat-Tree based DeploymentSynchronization

Performance

Hierarchical Synchronization10

Slide11

BackgroundServer-Centric NetworksLess hops

lead

to less PFC pause framesServers prevent cascading effect of PFC pause frame

11

Slide12

BackgroundSynchronization AlgorithmPS-based Mesh-based

Ring-based

12

Slide13

BackgroundSynchronization AlgorithmPS-based (Pull+Push

)

13

Slide14

BackgroundSynchronization AlgorithmMesh-based (Diffuse+Collect

)

14

Slide15

BackgroundSynchronization AlgorithmRing-based (Scatter+Gather

)

15

Slide16

BackgroundSynchronization AlgorithmRing-based (Scatter+Gather

)

16

Slide17

HiPS DesignMap Logic View and

Physical

StructureFlexible (Topology-Aware)Hierarchical (Efficient)17

Slide18

HiPS DesignHiPS in BCube

18

Slide19

HiPS DesignHiPS in BCube

19

Slide20

HiPS DesignHiPS in BCube

20

Slide21

HiPS DesignHiPS in BCube (Server

<01>)

21

Slide22

HiPS DesignHiPS in BCube

22

Slide23

HiPS DesignHiPS in Torus

23

Slide24

Theoretical Evaluation24

Slide25

Theoretical Evaluation25

Slide26

Theoretical Evaluation26

Slide27

Future WorkConduct Further Comparative

Study

Integrate HiPS into DML systems27

Slide28

Simulation EvaluationGST Comparison with RDMA in Torus

GST Comparison with

RDMA in BCubeNS-3 Simulation with VGG WorkloadBCube: GST reduced by 37.5%∼61.9%.

Torus:

GST

reduced

by

49.6%∼66.4%

28

Slide29

Testbed EvaluationSystem Instance

of

HiPS: BMLAdd an OP in Tensorflow9 Servers, each equipped with

2

RNICs

(

BCube

(3,1))

MINIST

and

VGG19

as

benchmarks

Ring

Allreduce

in

Ring

and

Mesh-based

(P2P)

Sync

in

Fat-Tree

as

Baseline

29

Slide30

Testbed Evaluation30

Slide31

Testbed Evaluation18.7%~56.4% 31

Slide32

Ongoing WorkConduct Further Comparative

Study

Optimize HiPS in DML systemsMore Cases of Network for AI

32

Slide33

Thanks!NASP Research Grouphttps://nasp.cs.tsinghua.edu.cn

/

33