AccuracyEnergy Tradeoffs Armin Alaghi 3 WeiTing J Chan 1 John P Hayes 3 Andrew B Kahng 12 and Jiajia Li 1 UC San Diego 1 ECE and 2 CSE Depts ID: 730444
Download Presentation The PPT/PDF document "Optimizing Stochastic Circuits for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Optimizing Stochastic Circuits forAccuracy-Energy Tradeoffs
Armin Alaghi3, Wei-Ting J. Chan1, John P. Hayes3, Andrew B. Kahng1,2 and Jiajia Li1UC San Diego, 1ECE and 2CSE Depts., 3University of Michigan, EECS Dept. Slide2
OutlineBackground and Previous
WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide3
Motivation: Low Power ChallengeLow power design is a
grand challengeMobile devices must operate with extremely low power as the performance requirement of applications growVoltage scaling has slowed down in the recent yearsPossible solution: to employ new design paradigms to overcome the challenges and achieve the performance improvements4W mobile platform power requirement1W SOC power requirement
Slow performance improvement due to power limit + slow voltage scaling
[source] ITRSSlide4
New Paradigm: Stochastic Computing (SC)
Stochastic computing (SC) is a design paradigm that has gained attention recently due its low power and error toleranceRandom bit streams are used to represent operandsComplex arithmetic operations implemented by simple logic circuits4/8
6/8
3/8
Z
=
X
1
×
X
2
3/8 = 4/8
6/8
X
1
X
2
ZSlide5
Error Tolerance, Precision, and Accuracy
Inaccurate computation may occurNumber to represent: 5/16Stochastic: 0010 0001 0101 0010Binary: 0.0101Bit-stream length grows exponentially with precisionRedundant representation provides error toleranceCorrect = 3/8Slide6
Area, Computation Efficiency, and Delay
Stochastic multiplierConventional binary multiplierSC: smaller area, longer computation latency, and shorter critical path
Critical pathSlide7
Application Context of SC
Stochastic representation is similar to analog “pulse-mode” signals, as well as neural signals Stochastic computing circuit performs cheap pre-processing; saves resources Low cost preprocessing between two domains Slide8
Summary of Advantages/Disadvantages
AdvantagesLow-complexity circuits (allows massive parallelism)Error toleranceRobustness to voltage scaling (explored and improved this work)DisadvantagesLong computation timeLimited precisionExpensive conversion circuits and storage elementsSlide9
OutlineBackground and Previous
WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide10
Challenges, Problems, and Our Contributions
Challenges of stochastic computing (SC) design:Current digital design flow does not comprehend the tradeoff between accuracy and power in SCPhysical implementation of SC circuits has not been well exploredProblems:What is the efficient way to estimate error while exhaustive simulation is not feasible?Given a synthesized SC circuit, what is the physical implementation recipe? Our contributions:We introduce the delay matching problem in SCWe reduce the computation error by balancing delay paths We propose a Markov chain model for error estimationSlide11
Stochastic Computing: Scope of Study
Design MetricsEnergyAccuracy (new model is proposed in this work)Circuit areaDesign ParametersComputation latency (N) Frequency Scaling (f)Voltage scaling (V)Netlist Implementation (New optimization is proposed in this work)Metrics covered in this workSlide12
OutlineBackground and Previous
WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide13
Three scenarios of signal transitions (A) Ideal: stable states of logic values are captured
(B) Balanced delay: all the transitions arrive at the same time(C) Unbalanced delay: causing extra errors due to glitches or delayed transitionsBalance of Path Delay Mattersx1x0z(A) Ideal
Correct
(B) Balanced
Correct
(C) Unbalanced
Error
Sample clockSlide14
Markov Chain for Error Prediction
Markov chain (MC) has been previously used to model sequential SC circuits We augment the states for delay-induced transition errors from the behavior modelErrors induced by glitches and delayed transitions Transition probability are trained by a small set of simulation resultsStationary probability distribution is obtained by solving the Markov chainC1, D1, G1 decide the output expected valuesUsed for error estimation
Only correct states in the previous SC behavior modelSlide15
Result: Markov Chain for Error Prediction
Model is accurate for larger errorsThe model is less accurate when error is smallPrecise prediction for high error magnitudeOn-going work: to improve the accuracy for small errorsSlide16
Before our work:SC behavior model is based on pre-layout simulation
SC behavior model did not consider the cell delay and wire delay contributed by physical implementationOur work:Augment the SC behavior model by considering delayed transitions and glitches contributed by physical implementationOptimize the physical implementation by balancing the timing pathsOutcome of Accuracy Model StudyCorrectCorrectErrorBalanced delaysSlide17
OutlineBackground and Previous
WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide18
Clock is fast to compensate for long computation latencyLaunch and capture flip-flops may be far apart in a huge array of SC circuits
Unbalanced paths due to circuit structures and variations Previous analysis shows delay balance mattersThe timing is more critical when DVFS lowers the supply voltageChallenges of SC Physical Implementationx1x0z
SC
sub-circuits
faster clock to compensate for long latency
Path 1 (long)
Path 2 (short)
Analog front-end circuit
or
random number
generator
Converter to binary number system
Long physical distance in a huge arraySlide19
Problem statement: Given an SC
circuit and a range of supply voltages, we seek an implementation that minimizes error across the voltagesObservation:Transition errors increase at lower voltages due to path delay mismatchApproach: ILP-based retiming after P&R by commercial toolOptimization constraints:#Buffers / #wires inserted to compensate for shorter pathsBounded delay variation across voltages Buffer power penaltyObjective: minimize path delay differencesImproves accuracySide note: Similar to multi-corner multi-mode (MCMM) CTS skew optimization: Skew <-> Path delay differences MCMM <-> Delays are evaluated at multiple supply voltages Power penalty <-> #Buffer insertionPost-P&R Optimization for SC CircuitsSlide20
ILP Formulation for Buffer Insertion
Minimize (: max normalized delay delta) where
(
:
normalized
delay
difference)
(1)
Max path delay at highest voltage
Path delay at V
k
Max delay at V
k
after optimization
Max delay at V
k
after
optimization
Subject to
(2)
(
:
opt. path delay;
i
:
original delay)
(
binary number denoting buffer insertion) (3)
(
: empirical parameter) (4)
(5)
(
:
empirical
parameter;
: buffer leakage power;
circuit leakage power)
(1) U: To normalize delay mismatch across voltages because the ranges of delays are different for each V
k
(2)(3) The
inserted delay is decided by
(to insert buffer to a net or not) and
(cell delay at V
k
)
(4) To exclude solutions with too many buffers inserted
(5) To limit the leakage power penalty
Slide21
Heuristics for Buffer ChoicesHeuristic 1: various buffer/wire
types to compensate for delay between voltagesWe provide buffer candidates with different delay sensitivity to voltage scalingWe provide wire detour options to provide wider voltage sensitivity rangeHeuristic 2: pruning buffers in the candidates to speed up MILPSolutions are pruned within sub-regions in the tradeoff space by choosing cells in the regions with lowest leakageWithout pruningWith pruningWire detouringSlide22
Result: Improved Accuracy by Balancing Paths
Path delaysAverage ErrorsLower errorLess inter-path delay skewSTRAUSS (UMich) +Conventional P&R (ICC)ReSC (UMN) +Conventional P&R (ICC)ReSC (UMN) +Proposed P&R Opt.Slide23
Result: Improved Input Delay Window
Safe timing window: timing margin between clock edge and input delay Before optimization: small input delay variation will cause errorsAfter: Safe timing window = half of the clock cycleClock period = 150psSafe windowSafe windowOriginal delay distributionOpt.Slide24
Improved accuracy = Less voltage scaling needed = Higher energy efficiency
Conventional P&R flow (ICC) fails to
meet accuracy constraint when VDD is low
Our proposed P&R optimization reduce delay mismatch at lower voltages and leads to lower energy cost for the same accuracy
Result:
Improved Energy Cost by
Balancing PathsSlide25
The proposed Markov chain model is verified on four different SC application circuits
Green: New MC modelBlue: Exhaustive simulationMC Model: Improved Simulation Runtime #Cycle (Ex.)#Cycles (MC)GammaCorr1024
10
PolySmall
256
10
Neuron
100
10
Less simulation cyclesSlide26
Testcase: Gamma correction
Both SC and conventional circuits are signed off at 1.0VSC still generates recognizable image at 0.6VEnergy saving of SC = 66% Result: Gamma CorrectionSlide27
OutlineBackground and Previous
WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide28
ConclusionsWe identify the impact of delay-induced errors and propose a Markov chain-based model for error
estimationWe propose a new physical implementation approach that improves the energy-accuracy tradeoffThe experiment results show significant energy and benefit over previous workFuture workMarkov chain model improvementComprehensive tradeoff recipe for performance, accuracy, and energySlide29
Thank you !