Learning-Based Prediction of Embedded Memory Timing Failure - PowerPoint Presentation

aaron . @aaron

398 views
Uploaded On 2017-09-19

Learning-Based Prediction of Embedded Memory Timing Failure - PPT Presentation

WeiTing J Chan Kun Young Chung Andrew B Kahng Nancy D MacDonald and Siddhartha Nath Outline Motivation Previous Work Our Work Multiphysics Analysis Modeling Methodology Results Conclusions ID: 589204

slack floorplan sram timing floorplan slack timing sram amp netlist multiphysics error clock prediction width post delay svm drop

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/589204" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Learning-Based Prediction of Embedded Me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Learning-Based Prediction of Embedded Memory Timing Failures During Initial Floorplan Design

Wei-Ting J. Chan, Kun Young Chung, Andrew B. Kahng, Nancy D. MacDonald and

Siddhartha NathSlide2

Outline

Motivation

Previous Work

Our Work

Multiphysics Analysis

Modeling Methodology

Results

ConclusionsSlide3

Early

rediction

Slack Failure in SRAMs

Timing closure is time-consuming and complex at advanced nodes  significantly increases turnaround timeMultiphysics effects (IR drop, thermal, etc.) affect timing closureFloorplanning with SRAMs is complicatedCreates placement and routing blockagesMakes timing unpredictable at the post-P&R stageEarly prediction of post-P&R slack can reduce design cost and turnaround timePost-P&R timing estimation at the floorplan stage is challenging due to many factorsWire delay must be estimated without information on spatial embeddingGate delay must be estimated without information on buffering

No tool predicts post-P&R slack at an early design stageSlide4

Single vs.

Multiple Physics

Multiphysics STA: performing STA with more than one “physics”

Examples

of multiple physics: IR, thermal, reliability, crosstalk, etc.Design teams can achieve more accurate timing results by closing

multiphysics analysis loops But, multiphysics results are non-trivial to predict in early stages

SRAM #1

SRAM Slack (ps)

SRAM #5

25ps

29ps

No IR

Static IR

Dynamic IR (1

loop)

Dynamic IR (2

loop)

Dynamic IR (3

loop)

Dynamic IR (4th loop)

Implementation Index

SRAM Slack (ps)Slide5

Challenge: Sensitivity of Slack to

Spacing

between M

emories

The spacing (channel width) between memories is varied in steps of 10μmThe difference in slack can be larger than 300ps at a spacing of 10μm due to congestion, buffer placement, etc.

Slack values vary in a highly nonobvious and/or noisy manner as the spacing is changed

Blockage

sram_spacing

Placement region for

tandard cells

5Slide6

Challenge: Sensitivity

of IR

Drop Map

Power Pad Locations

Distribution density and location choices of power pads affect the IR drop mapIn (a) IR map has very few IR drop hotspots for uniformly placed padsIn (b) and (c) IR maps have more hotspots due to fewer power pads

(a)

(b)

(c)Slide7

Challenge: Abstraction of P&R Stages and Tool Noise

Modeling must comprehend multiple stages of physical design

Our approach: an approximate function

to estimate the combined effects of netlist, constraints, placement, clock network synthesis, routing, extraction and timing

Slack (w/, w/o IR)

= netlist, constraints, floorplan parameters

= ???

Signoff

Extraction, Timing, Verification

Placement

Floorplan, Powerplan

Routing

Gate Netlist

Slack (w/, w/o IR)

Modeling Scope

Constraints

Clock network synthesis

Extraction, Timing

Costly IterationSlide8

Previous Work

Post-P&R timing prediction from

netlist

adoption of physical

synthesis [Alpert07]analytical buffered delay or wire models [Alpert06] [Jones94] [Vujkovic12] detection of congestion during synthesis [Clarke11] models using regression on existing

synthesized designs [Karchmer12] Thermal-aware delay model at floorplan [Kim12] Closed-form SRAM latency model w.r.t. process variation [Yaldiz09]P&R outcome prediction with machine learningDefect classification using SVMs [Huang10]Nonlinear ML models for CTS skew [Kahng13]None of the above works answer how to avoid suboptimal decisions at the floorplanning stageSlide9

Our WorkFirst to propose a

modeling methodology

predict

post- P&R slack values at endpoints on SRAMs at the floorplan stageExtend our methodology to predict multiphysics slack values of SRAMs at the

floorplan stageEnables early filtering and improvement of floorplans that would lead to timing failures at the post-layout and signoff stagesA new implementation of Boosting technique based on SVMs as weak learners and a weighting strategy for negative slack outcomes to avoid critical timing failuresSlide10

Not Explored in This Work

We consider IR drop (

RedHawk

) and crosstalk

(PTSI)

in our work

Other multiphysics effects such as thermal and reliability will be explored in the future

Multiphysics Analysis Flow

Timing Analysis (PTSI)

IR Analysis (RedHawk)

.sdc, .db, .v, .spef

Timing Windows per Pin (.timing)

.lib, .def, .spef, .tech

IR Drop per Instance (.tcl)

Temp, Reliability, Other

hysicsSlide11

Floorplanning and SRAM Placement

Floorplans

are

parameterized including core width and height, SRAM

spacings, surrounding space, and widths of routing channels

SRAM

Buffer screen

Blockages (emulate SRAMs)

ore_h

core_w

blockage_h

blockage_w

screen_w

sram_spacing

sram_h

sram_wSlide12

PDN DesignWe also parameterize PDN stripe pitches and stripe widths

Power

ring: V =

M9,

H =

M10(width = 2µm)

Top mesh:

V = M9, H =

M10

Power rail: M2

SRAM

SRAM:

from

M1 to

M1, M2, M3, M4, M5,

M6, M7, M8: signal routing

M6: local meshes

M9, M10: top mesh

M9, M10: power rings

Power pad

VDD

GND

SRAM

Secondary mesh: M6Slide13

Parameter Selection

Three categories of parameters

Netlist structure

Floorplan parameters

Layout constraintsSensitivity analysisIndependent sweeping of each parameterCombined effects of parameters using variance inflation factor (VIF)

ParameterRange of Value(s)Aspect ratio0.8~1.2

Utilization (std cells)

40%~70%

PDN stripe width

0.5

3.5μ

PDN stripe pitch

40μ

SRAM spacing (channel width)

~24μmBuffer screen width

~16μmRouting metal layers7, 8Memory placement{Face-to-face, face-to-back}

Clock period

THEIA

= 3.0~4.0nsnova =

3.2~4.2ns artificial = 2.0ns

Max transition

200~280ps

Max fanout

8~10

Threshold voltage mixes

{LVT}, {LVT, RVT

{RVT

}

Clock buffer sizes

{

X32

{X32, X24}, {X32, X24, X16

}

NDRs on clock nets

1W1S,

2W2S, 3W3S, 3W2S,

2W3SSlide14

Parameter

Description

Type

Per-memory?

Max delay across all timing paths at the post-synthesis stage Netlist Yes

Area of cells in the intersection of startpoint fanout and endpoint fanin cones of max-delay incident path

Netlist

Yes

Number of stages in the max-delay incident path

Netlist

Yes

N4, N5, N6

Max, min and average product of #transitive fanin and #transitive fanout endpoints

Netlist

Yes

Width and height of memory

Netlist

Yes

FP1 Aspect ratio of floorplan

Floorplan

FP2

Standard cell utilization

Floorplan No

FP3, FP4

PDN stripe width and pitch

Floorplan

FP5

Size of buffer screen around memories

Floorplan

FP6

Area of blockage (%) relative to floorplan area

Floorplan

FP7, FP8

Lower-left placement coordinates of memories

Floorplan

Yes

FP9, FP10

Width, height of channels for memories

Floorplan

Yes

FP11

#memory pins per channel

Floorplan

Yes

Sum of width and spacing of top-three routing layers after applying non-default rules (NDRs) C2 % cells that are LVT

Constraint

C3, C4

Max fanout of any instance in data and clock paths

Constraint

C5, C6

Max transition time of any instance in data and clock paths

Constraint

Delay of the largest buffer expressed as FO4 delay

Constraint

Clock period used for P&R expressed as FO4 delay

Constraint

Ratio of clock periods used during synthesis and P&R

Constraint

List of ParametersSlide15

Modeling T

echniques and Flow

Parameters from sequential graph of netlist

Parameters from floorplan context, constraints

ANN with 1 input, 2 hidden, 1 output layer

Slack reports from P&R,

multiphysics

STA

Save model and exit

SVM with RBF kernel

LASSO with L1 regularization

Boosting with SVM as weak learner

Combine using weights

Ground TruthSlide16

Boosting with SVM

SVM weak learner

Input parameters (netlist, floorplan context, constraints)

P&R,

Multiphysics

slack reports

SVM weak learner

∑

Boosting-predicted

outputSlide17

Experimental Setup and Testcases

Standard cells:

28nm FDSOI foundry

technology

SRAMs: 28nm FDSOI foundry SRAMsSynthesis: Design CompilerP&R: IC CompilerSTA: PrimeTime SI (PTSI)IR drop analysis: APACHE RedHawk

Netlist Clock Period(ns) #Std Cells #SRAMs

Logic Area (

m2)

SRAM Area (

m2)

THEIA v0

147274

157416

347252

THEIA v1

2.7

146505

15706840027THEIA v231469146

157012

48032

THEIA v3

146243

8156212

64043

THEIA v4

146606

155991

80054

nova

66031

68970

25117

artificial

201015

213075

14925Slide18

A floorplan is divided to a array of “tic-tac-toe” blocksThree types of blocks are defined as memory, blockage, and standard cells

enables

generality

and parameterizability

, enables the ability to explore a discrete design space systematically, and captures how designers tend to floorplan their blocksA General “Tic-tac-toe” Floorplan

Memory

Blockage

STD cellsSlide19

Example: Memory Placements

Implementation examples of tic-tac-toe

SRAM

Implementation of cross / L / T–shaped floorplansSlide20

Simple-Minded Modeling Yields Large Errors

No apparent correlation between post-P&R and post-synthesis

slack values

Modeling with only

netlist parametersWorst-case error = 358ps ; average error = 42ps

TechniqueWorst-Case Error (ps)Average Error (ps)LASSO565

SVM (linear)

412

SVM (w/ RBF kernel)

358

42Slide21

Error of Slack Prediction (ns)

Actual Slack (ns)

Post-P&R Slack Prediction

Errors in

data points

with negative slack are penalized more to avoid critical timing failures

Worst error = 224ps

Average error = 4psSlide22

Multiphysics Slack Prediction

Annotate per-cell IR-drop from RedHawk in PTSI

Worst error = 253ps

Average error = 9psSlide23

Modeling Fidelity

False negatives = 3%

pessimistic predictions in which we provide guidance to

change a

floorplan that is actually not requiredFalse positive = 4%our model incorrectly deems a floorplan to be good

False positives

False

negatives

Actual

Pass

Fail

Pass

Fail

Predicted

584

384

Positive slack data points:

Precision: tp/(tp

)

93.3

Recall: tp

/(tp

)

95.0%

Negative slack data points:

Precision:

tn/(tn

) =

92.5%

Recall:

tn/(tn

)

90.1%

Precision

Recall

Precision

RecallSlide24

Conclusions

Early stage timing failure prediction and timing closure with

multiphysics

analyses are important

We present a machine learning-based methodology for the early stage timing failure prediction problemWorst-case error = 224ps (w/o multiphysics)Worst-case error = 253ps (w/ multiphysics)We present a new implementation of Boosting based on SVMs as weak learnersOur ongoing works include Applying our methodology to product/test engineering data from an SoC company

Predicting defectivity in silicon and providing floorplan guidance to avoid such defectivitySlide25

Acknowledgments

Work supported by Samsung Electronics

We thank P. Agrawal (ANSYS) and J.-A. Desroses (ST Microelectronics) for their help with setup and enablement of iterative DVD analysis and signoff timing flowSlide26

Thank

You!Slide27

Backup