Scaling Up/Out PowerPoint Presentation, PPT - DocSlides

Scaling Up/Out PowerPoint Presentation, PPT - DocSlides

2017-08-31 46K 46 0 0

Description

Physical . Design:. Challenges . and Opportunities. 1,. . 3. . Guojie Luo . gluo@pku.edu.cn. 1. . Wentai. Zhang, . 1. . Jiaxi. Zhang,. 2,. . 3. . Jason Cong . cong@cs.ucla.edu. 1. . School of EECS, Peking University. ID: 583900

Direct Link: Embed code:

Download this presentation

DownloadNote - The PPT/PDF document "Scaling Up/Out" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in Scaling Up/Out

Slide1

Scaling Up/Out

Physical Design:Challenges and Opportunities

1, 3 Guojie Luo gluo@pku.edu.cn1 Wentai Zhang, 1 Jiaxi Zhang,2, 3 Jason Cong cong@cs.ucla.edu1 School of EECS, Peking University2 University of California, Los Angeles3 PKU-UCLA Joint Research Institute

Slide2

Capacity scaling enables more functionalities in FPGA-based acceleratorsScalable and customizable EDA tools/flows neededWe propose a distributed EDA framework, aiming atscaling up the design size to be efficiently processedscaling out the machines which existing EDA tools execute on

Motivation

Slide3

GNU Radio compatible[T. Wang, et al., SIGARCH CAN’14]

GRT: a Reconfigurable SDR Platform

SW Scrambler

SW

FEC Encoder

SW

Interleaver

SWConstellation mapper

SWIFFT

HW Scrambler

HW

FEC Encoder

HW

Interleaver

HW

Constellation mapper

HW

IFFT

PCIe

Slide4

Deep convolutional neural networks[C. Zhang, et al., FPGA’15]

CNN:

FPGA Design Optimization

Slide5

FPGA acceleration by asynchronous parallelization[W. Zhang, et al., SPIE’15]

Medical Imaging: Simultaneous Reconstruction and Segmentation

Original

SSIM: 0.9827

PSNR: 29.3264

MSE: 76.5319

Proposed

SSIM: 0.9816

PSNR: 30.7908MSE: 54.6263

CPU

8-thread

CPU

GPU

FPGA

Slide6

Distributed design import from VivadoCustomized EDA algorithms in SparkStrategy to write consistent design data

Framework Overview

Spark

customized

EDA flows

c

ustomized

distributed

EDA algorithms

s

erver

node

existingEDAplatforms

existingEDAplatforms

s

erver node

existingEDAplatforms

existingEDAplatforms

design data + architecture information

Slide7

Existing frameworks in academia for customized FPGA EDA tools

Related (but not distributed) Frameworks

Framework

Import & export of

design

data through

RapidSmith

XDL

Torc

XDL, EDIF

Tincr

EDIF,

TCL

Slide8

ExamplesL. Wu. “Accelerating Physical Design Flow in Laker with TCL Applications and Third Party Tool Integration.” SNUG Taiwan 2015.J. Friesen. “An approach for better debuggability of TCL-driven EDA methodologies.” CDNLive Silicon Valley 2015.Design extraction

TCL, the de facto Standard Language

Slide9

HMFlow [C. Lavin, et al., FCCM’11]use RapidSmith [C. Lavin, et al., FPT’10] as its backendTincr [B. White, B. Nelson, ReConFig’14]“its performance may make it better suited to more modest circuit modifications”

Data Extraction Overhead

Raw TCL commands

Slide10

An external wrapper to load designinstead of implementing as a map func.

Import Design from Vivado

Spark

Wrapper

Input file

Output file

Vivado

Inst

[0]

wrp

Vivado

Inst

[1]

wrp

Vivado

Inst

[2]

wrp

Vivado

Inst

[3]

wrp

Slide11

Update, run, kill, and replicaterun: execute existing steps in VivadoPrepare data for design read in the next iteration

Export Design and Update Vivado Instance

s

erver

node

Vivado

data

v0.1

Vivado

data

v0.1

s

erver node

Vivado

data

v0.1

Vivado

data

v0.1

s

erver

node

Vivado

data

v0.1

Vivado

data

v0.1

s

erver

node

Vivado

data

v0.2

Vivado

data

v0.1

write

s

erver

node

s

erver

node

Vivado

data

v0.3

k

ill

run

s

erver

node

Vivado

data

v0.3

Vivado

data

v0.3

s

erver

node

Vivado

data

v0.3

Vivado

data

v

0.3

replicate

Slide12

Checkpoint/Restore in Userspace (CRIU)https://www.criu.orgrelies on the /proc file systemfiles descriptor, pipes parameters, memory maps, etc.checkpoint step1: collect process tree and freeze itcheckpoint step2: collect resources and dump themExampleVivado instance with a 3M cell design openCheckpoint time: ~ 40 (s)dump 13GB data for restoreNetwork transfer: ~ 130 (s)1000 Mbps, could be faster with high-end network storageRestore time: ~ 10 (s)

Process Replication

Slide13

design#cellsdescriptionSLAM87Kspherical coordinates algorithm for SLAMbitcoin_miner222Kdual-core version of the bitcoin FPGA minergaussianblur_d1466Kone of the pipelined loops for 3D Gaussian convolutiongaussianblur3068Kpipelined 3D Gaussian convolution

Accelerating TCL Parser

design

TCL time (read)

memory

(min)

decr

.

(GB)

incr.

SLAM

1.0

4.0

×

6.4

3.2

×

bitcoin_miner

2.1

3.9

×

8.0

3.2

×

gaussianblur_d1

6.9

2.9

×

11.0

3.0

×

gaussianblur

34.0

4.9

×

57.0

2.5

×

Slide14

Operations{<k1,v1>} = {<k0,v0>}.map(foo){<k,v>}.reduce(bar)…

Distributed Compute Engine

Slide15

Wrapping the point tool into a state-less function

Use Existing Point Tool as a Map Function

Spark

Point tool

Local thread

Wrapper

Create

Input file

Collect Output

file

map

Slide16

Toy algorithm: distributed detailed placementmanual partition into non-conflicting regionsN×N partition: degree of parallelism = N

Demonstrational Example

DP tile

Sliding window in the map operator

wirelength

-driven:

Δ

wirelength

invariant

delay-driven: may need slack distribution before DP

Slide17

Re-architecting EDA platforms by adopting new infrastructures in the big data ecosystemSome components are also essential for EDA flowsVirtualization, storage, databases, cluster services, etc.A good attempt for academia and educationConventional CMOS design will remain challenging, and the market is unlikely to shrinkEven with slowing adoption of new technology node, there are still many traditional EDA problems that are not solved adequatelyTo maintain the pipeline of new talent into traditional EDA areasFramework for adapting university research in industryHigh-quality tools and algorithms in recent ISPD/ICCAD contestsLeave decision of trying new ideas to customers instead of EDA vendors

Challenges & Opportunities

Some ideas are inspired by the 2014 CCC report on Extreme Scale Design Automation

Slide18

Standard interfacesVery challenging! There are already tons of data formatsnew devices, new technologies, new design rules, etc.In contrast, HTTP documented in 1991 and stabilized in 1999Ideal: stable for core EDA algorithms: synthesis, P&RExtendablefor multiple objectives: timing, power, congestion, etc.upward and downward for system-level design and DFM, resp.How the big data community handle unstructured data?Flow compositionLower the barrier for universities to innovate in design flows in addition to point toolsEnable design companies to adopt “open-source” flows

Challenges & Opportunities

Slide19

Last but not least,Scalable EDA algorithmsComplicated hardware: cluster + multi-core + GPU (+FPGA?)Take advantage of the progress in the big data ecosysteme.g., Spark and Tachyoncons: slower than the traditional message passing modelpros: better in handling node failure and data replicationlower the barrier to write correct distributed algorithmsbe scalable in the long run

Challenges & Opportunities

Slide20

A strong need for fast and/or customized EDA toolsPropose a distributed framework on top of commercial EDA platformsThe de facto standard of TCL is slow in data extractionSpeed up TCL parsing by parallelizationCapability to integrate existing point toolsChallenges and OpportunitiesExploration of such framework for research and education

Summary

Slide21

Thank you!

Slide22

Spark

sc

= new SparkContextf = sc.textFile(“…”)f.filter(…) .count()...

Your program

Spark client(app master)

Spark worker

HDFS,

HBase

, …

Block manager

Task threads

RDD graph

Scheduler

Block tracker

Shuffle tracker

Cluster

manager

Slide23

A light-weight virtualization that enables replicating two instances of Vivado on the same physical machine

Linux Container

a

pp 1

l

ibs

libs

app 2

app 3

guest OS

guest OS

hypervisor

host OS

infrastructure

virtual machine

a

pp 1

l

ibs

libs

app 2

app 3

Docker engine

host OS

infrastructure

container


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.