WeiTing J Chan Kun Young Chung Andrew B Kahng Nancy D MacDonald and Siddhartha Nath Outline Motivation Previous Work Our Work Multiphysics Analysis Modeling Methodology Results Conclusions ID: 589204
Download Presentation The PPT/PDF document "Learning-Based Prediction of Embedded Me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning-Based Prediction of Embedded Memory Timing Failures During Initial Floorplan Design
Wei-Ting J. Chan, Kun Young Chung, Andrew B. Kahng, Nancy D. MacDonald and
Siddhartha NathSlide2
Outline
Motivation
Previous Work
Our Work
Multiphysics Analysis
Modeling Methodology
Results
ConclusionsSlide3
Early
P
rediction
of
Slack Failure in SRAMs
Timing closure is time-consuming and complex at advanced nodes significantly increases turnaround timeMultiphysics effects (IR drop, thermal, etc.) affect timing closureFloorplanning with SRAMs is complicatedCreates placement and routing blockagesMakes timing unpredictable at the post-P&R stageEarly prediction of post-P&R slack can reduce design cost and turnaround timePost-P&R timing estimation at the floorplan stage is challenging due to many factorsWire delay must be estimated without information on spatial embeddingGate delay must be estimated without information on buffering
No tool predicts post-P&R slack at an early design stageSlide4
Single vs.
Multiple Physics
Multiphysics STA: performing STA with more than one “physics”
Examples
of multiple physics: IR, thermal, reliability, crosstalk, etc.Design teams can achieve more accurate timing results by closing
multiphysics analysis loops But, multiphysics results are non-trivial to predict in early stages
SRAM #1
SRAM Slack (ps)
SRAM #5
25ps
29ps
No IR
Static IR
Dynamic IR (1
st
loop)
Dynamic IR (2
nd
loop)
Dynamic IR (3
rd
loop)
Dynamic IR (4th loop)
Implementation Index
SRAM Slack (ps)Slide5
Challenge: Sensitivity of Slack to
Spacing
between M
emories
The spacing (channel width) between memories is varied in steps of 10μmThe difference in slack can be larger than 300ps at a spacing of 10μm due to congestion, buffer placement, etc.
Slack values vary in a highly nonobvious and/or noisy manner as the spacing is changed
Blockage
Blockage
Blockage
sram_spacing
Placement region for
s
tandard cells
1
2
3
4
5Slide6
Challenge: Sensitivity
of IR
Drop Map
to
Power Pad Locations
Distribution density and location choices of power pads affect the IR drop mapIn (a) IR map has very few IR drop hotspots for uniformly placed padsIn (b) and (c) IR maps have more hotspots due to fewer power pads
(a)
(b)
(c)Slide7
Challenge: Abstraction of P&R Stages and Tool Noise
Modeling must comprehend multiple stages of physical design
Our approach: an approximate function
f
to estimate the combined effects of netlist, constraints, placement, clock network synthesis, routing, extraction and timing
Slack (w/, w/o IR)
= netlist, constraints, floorplan parameters
= ???
Signoff
Extraction, Timing, Verification
Placement
Floorplan, Powerplan
Routing
Gate Netlist
Slack (w/, w/o IR)
Modeling Scope
Constraints
Clock network synthesis
Extraction, Timing
Costly IterationSlide8
Previous Work
Post-P&R timing prediction from
netlist
adoption of physical
synthesis [Alpert07]analytical buffered delay or wire models [Alpert06] [Jones94] [Vujkovic12] detection of congestion during synthesis [Clarke11] models using regression on existing
synthesized designs [Karchmer12] Thermal-aware delay model at floorplan [Kim12] Closed-form SRAM latency model w.r.t. process variation [Yaldiz09]P&R outcome prediction with machine learningDefect classification using SVMs [Huang10]Nonlinear ML models for CTS skew [Kahng13]None of the above works answer how to avoid suboptimal decisions at the floorplanning stageSlide9
Our WorkFirst to propose a
modeling methodology
to
predict
post- P&R slack values at endpoints on SRAMs at the floorplan stageExtend our methodology to predict multiphysics slack values of SRAMs at the
floorplan stageEnables early filtering and improvement of floorplans that would lead to timing failures at the post-layout and signoff stagesA new implementation of Boosting technique based on SVMs as weak learners and a weighting strategy for negative slack outcomes to avoid critical timing failuresSlide10
Not Explored in This Work
We consider IR drop (
RedHawk
) and crosstalk
(PTSI)
in our work
Other multiphysics effects such as thermal and reliability will be explored in the future
Multiphysics Analysis Flow
Timing Analysis (PTSI)
IR Analysis (RedHawk)
.sdc, .db, .v, .spef
Timing Windows per Pin (.timing)
.lib, .def, .spef, .tech
IR Drop per Instance (.tcl)
Temp, Reliability, Other
P
hysicsSlide11
Floorplanning and SRAM Placement
Floorplans
are
parameterized including core width and height, SRAM
spacings, surrounding space, and widths of routing channels
SRAM
Buffer screen
Blockages (emulate SRAMs)
c
ore_h
core_w
blockage_h
blockage_w
screen_w
screen_w
screen_w
screen_w
screen_w
hc
vc
sram_spacing
sram_h
sram_wSlide12
PDN DesignWe also parameterize PDN stripe pitches and stripe widths
Power
ring: V =
M9,
H =
M10(width = 2µm)
Top mesh:
V = M9, H =
M10
Power rail: M2
SRAM
SRAM:
from
M1 to
M4
M1, M2, M3, M4, M5,
M6, M7, M8: signal routing
M6: local meshes
M9, M10: top mesh
M9, M10: power rings
Power pad
VDD
GND
SRAM
Secondary mesh: M6Slide13
Parameter Selection
Three categories of parameters
Netlist structure
Floorplan parameters
Layout constraintsSensitivity analysisIndependent sweeping of each parameterCombined effects of parameters using variance inflation factor (VIF)
ParameterRange of Value(s)Aspect ratio0.8~1.2
Utilization (std cells)
40%~70%
PDN stripe width
0.5
~
3.5μ
m
PDN stripe pitch
7
~
40μ
m
SRAM spacing (channel width)
6
~24μmBuffer screen width
10
~16μmRouting metal layers7, 8Memory placement{Face-to-face, face-to-back}
Clock period
THEIA
= 3.0~4.0nsnova =
3.2~4.2ns artificial = 2.0ns
Max transition
200~280ps
Max fanout
8~10
Threshold voltage mixes
{LVT}, {LVT, RVT
},
{RVT
}
Clock buffer sizes
{
X32
},
{X32, X24}, {X32, X24, X16
}
NDRs on clock nets
1W1S,
2W2S, 3W3S, 3W2S,
2W3SSlide14
Parameter
Description
Type
Per-memory?
N1
Max delay across all timing paths at the post-synthesis stage Netlist Yes
N2
Area of cells in the intersection of startpoint fanout and endpoint fanin cones of max-delay incident path
Netlist
Yes
N3
Number of stages in the max-delay incident path
Netlist
Yes
N4, N5, N6
Max, min and average product of #transitive fanin and #transitive fanout endpoints
Netlist
Yes
N7
Width and height of memory
Netlist
Yes
FP1 Aspect ratio of floorplan
Floorplan
No
FP2
Standard cell utilization
Floorplan No
FP3, FP4
PDN stripe width and pitch
Floorplan
No
FP5
Size of buffer screen around memories
Floorplan
No
FP6
Area of blockage (%) relative to floorplan area
Floorplan
No
FP7, FP8
Lower-left placement coordinates of memories
Floorplan
Yes
FP9, FP10
Width, height of channels for memories
Floorplan
Yes
FP11
#memory pins per channel
Floorplan
Yes
C1
Sum of width and spacing of top-three routing layers after applying non-default rules (NDRs) C2 % cells that are LVT
Constraint
No
C3, C4
Max fanout of any instance in data and clock paths
Constraint
No
C5, C6
Max transition time of any instance in data and clock paths
Constraint
No
C7
Delay of the largest buffer expressed as FO4 delay
Constraint
No
C8
Clock period used for P&R expressed as FO4 delay
Constraint
No
C9
Ratio of clock periods used during synthesis and P&R
Constraint
No
List of ParametersSlide15
Modeling T
echniques and Flow
Parameters from sequential graph of netlist
Parameters from floorplan context, constraints
ANN with 1 input, 2 hidden, 1 output layer
Slack reports from P&R,
multiphysics
STA
Save model and exit
SVM with RBF kernel
LASSO with L1 regularization
Boosting with SVM as weak learner
Combine using weights
Ground TruthSlide16
Boosting with SVM
SVM weak learner
Input parameters (netlist, floorplan context, constraints)
P&R,
Multiphysics
slack reports
SVM weak learner
SVM weak learner
SVM weak learner
W
1
W
2
W
k
∑
β
1
β
2
β
3
β
k
Boosting-predicted
outputSlide17
Experimental Setup and Testcases
Standard cells:
28nm FDSOI foundry
technology
SRAMs: 28nm FDSOI foundry SRAMsSynthesis: Design CompilerP&R: IC CompilerSTA: PrimeTime SI (PTSI)IR drop analysis: APACHE RedHawk
Netlist Clock Period(ns) #Std Cells #SRAMs
Logic Area (
μ
m2)
SRAM Area (
μ
m2)
THEIA v0
3
147274
40
157416
347252
THEIA v1
2.7
146505
5
15706840027THEIA v231469146
157012
48032
THEIA v3
3
146243
8156212
64043
THEIA v4
3
146606
10
155991
80054
nova
2
66031
5
68970
25117
artificial
2
201015
6
213075
14925Slide18
A floorplan is divided to a array of “tic-tac-toe” blocksThree types of blocks are defined as memory, blockage, and standard cells
enables
generality
and parameterizability
, enables the ability to explore a discrete design space systematically, and captures how designers tend to floorplan their blocksA General “Tic-tac-toe” Floorplan
Memory
Blockage
STD cellsSlide19
Example: Memory Placements
Implementation examples of tic-tac-toe
SRAM
Implementation of cross / L / T–shaped floorplansSlide20
Simple-Minded Modeling Yields Large Errors
No apparent correlation between post-P&R and post-synthesis
slack values
Modeling with only
netlist parametersWorst-case error = 358ps ; average error = 42ps
TechniqueWorst-Case Error (ps)Average Error (ps)LASSO565
87
SVM (linear)
412
55
SVM (w/ RBF kernel)
358
42Slide21
Error of Slack Prediction (ns)
Actual Slack (ns)
Post-P&R Slack Prediction
Errors in
data points
with negative slack are penalized more to avoid critical timing failures
Worst error = 224ps
Average error = 4psSlide22
Multiphysics Slack Prediction
Annotate per-cell IR-drop from RedHawk in PTSI
Worst error = 253ps
Average error = 9psSlide23
Modeling Fidelity
False negatives = 3%
pessimistic predictions in which we provide guidance to
change a
floorplan that is actually not requiredFalse positive = 4%our model incorrectly deems a floorplan to be good
False positives
False
negatives
Actual
Pass
Fail
Pass
Fail
Predicted
584
42
384
31
Positive slack data points:
Precision: tp/(tp
+f
p
)
=
93.3
%
Recall: tp
/(tp
+
f
n
)
=
95.0%
Negative slack data points:
Precision:
tn/(tn
+f
p
) =
92.5%
Recall:
tn/(tn
+f
n
)
=
90.1%
Precision
Recall
Precision
RecallSlide24
Conclusions
Early stage timing failure prediction and timing closure with
multiphysics
analyses are important
We present a machine learning-based methodology for the early stage timing failure prediction problemWorst-case error = 224ps (w/o multiphysics)Worst-case error = 253ps (w/ multiphysics)We present a new implementation of Boosting based on SVMs as weak learnersOur ongoing works include Applying our methodology to product/test engineering data from an SoC company
Predicting defectivity in silicon and providing floorplan guidance to avoid such defectivitySlide25
Acknowledgments
Work supported by Samsung Electronics
We thank P. Agrawal (ANSYS) and J.-A. Desroses (ST Microelectronics) for their help with setup and enablement of iterative DVD analysis and signoff timing flowSlide26
Thank
You!Slide27
Backup