/
Network Requirements for Network Requirements for

Network Requirements for - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
385 views
Uploaded On 2017-10-22

Network Requirements for - PPT Presentation

Resource Disaggregation Peter Gao Berkeley Akshay Narayan MIT Sagar Karandikar Berkeley Joao Carreira Berkeley Sangjin Han Berkeley Rachit Agarwal Cornell Sylvia ID: 598329

transport network remote application network transport application remote nic resource latency switch delay 3us scale gbps bandwidth datacenter queueing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Network Requirements for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Network Requirements for Resource Disaggregation

Peter

Gao

(Berkeley),

Akshay

Narayan (MIT),

Sagar

Karandikar

(

Berkeley),

Joao

Carreira

(

Berkeley),

Sangjin

Han (

Berkeley),

Rachit

Agarwal (Cornell), Sylvia

Ratnasamy

(

Berkeley),

Scott

Shenker

(Berkeley/ICSI

)Slide2

Current Datacenter: Server-Centric

Future datacenter: Disaggregated?

Disaggregated Datacenters

GPU

GPU

GPU

G

PU

Datacenter Network

Datacenter Network

HP – The Machine

Intel - RSD

Facebook

Huawei - NUWA

SeaMicro

Berkeley -

FireBox

2Slide3

Disaggregation Benefits (Architecture Community)

3

Overcome

memory capacity

wall

H

igher resource density

Simplify

Hardware

Design

Relax

Power & Capacity

Scaling Slide4

Network is the key

Network

GPU

Network

GPU

GPU

4

QPI, SMI,

PCI-e

Existing prototypes us

e

specialized hardware, such as Silicon Photonics, PCI-e

Server-Centric

Disaggregated

GPU

Do we need specialized hardware?

e.g.: Silicon photonics, PCI-eSlide5

What end-to-end

latency and bandwidth

must the network provide for legacy apps?

Do existing

transport

protocols meet these requirements?

Do existing

OS

network stacks meet these requirements

?

Can commodity

network hardware

meet these requirements?

Application

OS

Transport

NIC

Switch

Remote Resource

OS

Transport

NIC

5

OS

Transport

NIC

Switch

OS

Transport

NIC

Commodity

hardware solutions

may be sufficient

Current OS and network stack are not

(

S

olutions

are

feasible)

Worst case performance degradation Slide6

Assumptions

CPU

Memory

Storage

Scale

Datacenter Network

Limited cache coherence

domain

Small amount of local cache (how much?)

Page-level

remote memory

access

Block-level distributed data

placement

Rack-scale?

Datacenter-scale?

6

Cache CoherenceSlide7

Methodology: Workload Driven

10

workloads on

8

applications

~ 125 GB input data5 m3.2xlarge EC2 nodes

Virtual Private Cloud enabledLatency and Bandwidth Requirements

7

Key-value Store

SQL

Streaming

Wordcount

Sort

Pagerank

Collaborative Filter.

Spark

Hadoop

Timely Dataflow

Graphlab

Memcached

HERD

Spark SQL

Spark Streaming

Batch Processing

Interactive

Workloads

ApplicationsSlide8

Disaggregated Datacenter Emulator

OS

Memory

Special Swap Device (Handles Page Fault)

Local RAM

E

mulated

Remote RAM

8

Backed by the machine’s own memory

Partition the memory into local and remote

Free

to access

Via swap device

Inject latency and bandwidth constraints

Using special swap

device

Delay = latency + request size / bandwidth

Akin to a dedicated link between CPU and memory Slide9

*Note: Delay

= latency + request size /

bandwidth

Latency and Bandwidth Requirement

5% degradation

9

5% degradation

1us

5

us

10us

100

Gbps

40

Gbps

10

Gbps

100

Gbps

40

Gbps

10

Gbps

100

Gbps

40

Gbps

10

Gbps

~3us latency

/ 40Gbps

bandwidth is enough, ignoring

queueing

delaySlide10

Understanding Performance Degradation

10

Spark Streaming

Wordcount

Memcached

YCSB

Graphlab

CF

Hadoop

Sort

Hadoop

Wordcount

Timely

Pagerank

HERD

YCSB

SparkSQL

BDB

Spark

Sort

Spark

Wordcount

Performance degradation is correlated with application memory bandwidthSlide11

Application

OS

Transport

NIC

Switch

Remote Resource

OS

Transport

NIC

Application

Remote Resource

3us end-to-end latency

40Gbps

dedicated

link (no queueing delay)

11Slide12

Transport Simulation Setting

Special Swap Instrumentation

Network

Simulator

Flow Trace

Flow completion time distribution

12

Need new transport protocolsSlide13

Application Performance Degradation

~5

% degradation

40Gbps

network

100Gbps

network

~5% degradation

13

40Gbps (no queueing delay)

DC Scale (with queueing delay)

Rack Scale (with queueing delay)

100Gbps (no queueing delay)

DC Scale (with queueing delay)

Rack Scale (with queueing delay)

100Gbps network

DC scale for some apps, rack scale for othersSlide14

Application

OS

Transport

NIC

Switch

Remote Resource

OS

Transport

NIC

3us end-to-end latency

40Gbps

dedicated

link

Transport

Transport

Efficient Transport

100Gbps network

14Slide15

Is 100Gbps/3us achievable?

15Slide16

Feasibility of end-to-end latency within a rack

Application

Remote Resource

0.32us

0.8us

2us

1.9us

Propagation

Transmission

Switching

Data Copying

OS

3us Target

16

*

Numbers estimated optimistically based on existing hardwareSlide17

Feasibility of end-to-end latency within a rack

Application

Remote Resource

0.32us

0.8us

2us

1.9us

Propagation

Transmission

Switching

Data Copying

OS

3us Target

Application

Remote Resource

0.32us

0.8us

Propagation

Transmission

Switching

2us

1.9us

Data Copying

OS

Cut-through Switch

0.48us

15

*

Numbers estimated optimistically based on existing hardwareSlide18

Feasibility of end-to-end latency

within

a rack

Application

Remote Resource

0.32us

0.8us

2us

1.9us

Propagation

Transmission

Switching

Data Copying

OS

3us Target

Application

Remote Resource

0.32us

2us

Data Copying

0.48us

Cut-through Switch

NIC Integration

1

us

1.9us

OS

15

*

Numbers estimated optimistically based on existing hardwareSlide19

Feasibility of end-to-end latency

within

a rack

Application

Remote Resource

0.32us

0.8us

2us

1.9us

Propagation

Transmission

Switching

Data Copying

OS

3us Target

Application

Remote Resource

0.32us

0.48us

Cut-through Switch

NIC Integration

1

us

1.9us

OS

Use RDMA

3us Target

15

*

Numbers estimated optimistically based on existing hardware

Feasible to meet target across

the datacenter?Slide20

Application

OS

Transport

NIC

Switch

Remote Resource

OS

Transport

NIC

3us end-to-end latency

40Gbps

dedicated

link

Efficient Transport (pFabric,SIGCOMM’13, pHost,CoNEXT’15)

100Gbps network (Available)

Kernel bypassing (RDMA common)

CPU-NIC Integration

(Coming soon)

Cut-through switch (Common?)

100Gbps links (Available)

Application

OS

Transport

NIC

Switch

Remote Resource

OS

Transport

NIC

16Slide21

What’s next?17

Please refer our paper for evaluations on improving application performance in disaggregated datacenters

Application Design

Rethinking OS Stack

Storage

Network Stack

Failure Models

Network Fabric DesignSlide22

Thank You!

Peter X. Gao

Akshay

Narayan

Sagar

Karandikar

Joao

Carreira

Sangjin

Han

Sylvia

Ratnasamy

Scott

Shenker

Rachit

AgarwalSlide23