/
Optimizing Stochastic Circuits for Optimizing Stochastic Circuits for

Optimizing Stochastic Circuits for - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
350 views
Uploaded On 2018-11-18

Optimizing Stochastic Circuits for - PPT Presentation

AccuracyEnergy Tradeoffs Armin Alaghi 3 WeiTing J Chan 1 John P Hayes 3 Andrew B Kahng 12 and Jiajia Li 1 UC San Diego 1 ECE and 2 CSE Depts ID: 730444

error delay path model delay error model path power physical circuits accuracy buffer stochastic previous amp chain markov voltage

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Optimizing Stochastic Circuits for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Optimizing Stochastic Circuits forAccuracy-Energy Tradeoffs

Armin Alaghi3, Wei-Ting J. Chan1, John P. Hayes3, Andrew B. Kahng1,2 and Jiajia Li1UC San Diego, 1ECE and 2CSE Depts., 3University of Michigan, EECS Dept. Slide2

OutlineBackground and Previous

WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide3

Motivation: Low Power ChallengeLow power design is a

grand challengeMobile devices must operate with extremely low power as the performance requirement of applications growVoltage scaling has slowed down in the recent yearsPossible solution: to employ new design paradigms to overcome the challenges and achieve the performance improvements4W mobile platform power requirement1W SOC power requirement

Slow performance improvement due to power limit + slow voltage scaling

[source] ITRSSlide4

New Paradigm: Stochastic Computing (SC)

Stochastic computing (SC) is a design paradigm that has gained attention recently due its low power and error toleranceRandom bit streams are used to represent operandsComplex arithmetic operations implemented by simple logic circuits4/8

6/8

3/8

Z

=

X

1

×

X

2

3/8 = 4/8

6/8

X

1

X

2

ZSlide5

Error Tolerance, Precision, and Accuracy

Inaccurate computation may occurNumber to represent: 5/16Stochastic: 0010 0001 0101 0010Binary: 0.0101Bit-stream length grows exponentially with precisionRedundant representation provides error toleranceCorrect = 3/8Slide6

Area, Computation Efficiency, and Delay

Stochastic multiplierConventional binary multiplierSC: smaller area, longer computation latency, and shorter critical path

Critical pathSlide7

Application Context of SC

Stochastic representation is similar to analog “pulse-mode” signals, as well as neural signals Stochastic computing circuit performs cheap pre-processing; saves resources Low cost preprocessing between two domains Slide8

Summary of Advantages/Disadvantages

AdvantagesLow-complexity circuits (allows massive parallelism)Error toleranceRobustness to voltage scaling (explored and improved this work)DisadvantagesLong computation timeLimited precisionExpensive conversion circuits and storage elementsSlide9

OutlineBackground and Previous

WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide10

Challenges, Problems, and Our Contributions

Challenges of stochastic computing (SC) design:Current digital design flow does not comprehend the tradeoff between accuracy and power in SCPhysical implementation of SC circuits has not been well exploredProblems:What is the efficient way to estimate error while exhaustive simulation is not feasible?Given a synthesized SC circuit, what is the physical implementation recipe? Our contributions:We introduce the delay matching problem in SCWe reduce the computation error by balancing delay paths We propose a Markov chain model for error estimationSlide11

Stochastic Computing: Scope of Study

Design MetricsEnergyAccuracy (new model is proposed in this work)Circuit areaDesign ParametersComputation latency (N) Frequency Scaling (f)Voltage scaling (V)Netlist Implementation (New optimization is proposed in this work)Metrics covered in this workSlide12

OutlineBackground and Previous

WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide13

Three scenarios of signal transitions (A) Ideal: stable states of logic values are captured

(B) Balanced delay: all the transitions arrive at the same time(C) Unbalanced delay: causing extra errors due to glitches or delayed transitionsBalance of Path Delay Mattersx1x0z(A) Ideal

Correct

(B) Balanced

Correct

(C) Unbalanced

Error

Sample clockSlide14

Markov Chain for Error Prediction

Markov chain (MC) has been previously used to model sequential SC circuits We augment the states for delay-induced transition errors from the behavior modelErrors induced by glitches and delayed transitions Transition probability are trained by a small set of simulation resultsStationary probability distribution is obtained by solving the Markov chainC1, D1, G1 decide the output expected valuesUsed for error estimation

Only correct states in the previous SC behavior modelSlide15

Result: Markov Chain for Error Prediction

Model is accurate for larger errorsThe model is less accurate when error is smallPrecise prediction for high error magnitudeOn-going work: to improve the accuracy for small errorsSlide16

Before our work:SC behavior model is based on pre-layout simulation

SC behavior model did not consider the cell delay and wire delay contributed by physical implementationOur work:Augment the SC behavior model by considering delayed transitions and glitches contributed by physical implementationOptimize the physical implementation by balancing the timing pathsOutcome of Accuracy Model StudyCorrectCorrectErrorBalanced delaysSlide17

OutlineBackground and Previous

WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide18

Clock is fast to compensate for long computation latencyLaunch and capture flip-flops may be far apart in a huge array of SC circuits

Unbalanced paths due to circuit structures and variations Previous analysis shows delay balance mattersThe timing is more critical when DVFS lowers the supply voltageChallenges of SC Physical Implementationx1x0z

SC

sub-circuits

faster clock to compensate for long latency

Path 1 (long)

Path 2 (short)

Analog front-end circuit

or

random number

generator

Converter to binary number system

Long physical distance in a huge arraySlide19

Problem statement: Given an SC

circuit and a range of supply voltages, we seek an implementation that minimizes error across the voltagesObservation:Transition errors increase at lower voltages due to path delay mismatchApproach: ILP-based retiming after P&R by commercial toolOptimization constraints:#Buffers / #wires inserted to compensate for shorter pathsBounded delay variation across voltages Buffer power penaltyObjective: minimize path delay differencesImproves accuracySide note: Similar to multi-corner multi-mode (MCMM) CTS skew optimization: Skew <-> Path delay differences MCMM <-> Delays are evaluated at multiple supply voltages Power penalty <-> #Buffer insertionPost-P&R Optimization for SC CircuitsSlide20

ILP Formulation for Buffer Insertion

Minimize (: max normalized delay delta) where

(

:

normalized

delay

difference)

(1)

 Max path delay at highest voltage

 Path delay at V

k

 Max delay at V

k

after optimization

Max delay at V

k

after

optimization

Subject to

(2)

(

:

opt. path delay;

i

:

original delay)

(

binary number denoting buffer insertion) (3)

(

: empirical parameter) (4)

(5)

(

:

empirical

parameter;

: buffer leakage power;

circuit leakage power)

 

(1) U: To normalize delay mismatch across voltages because the ranges of delays are different for each V

k

(2)(3) The

inserted delay is decided by

(to insert buffer to a net or not) and

(cell delay at V

k

)

(4) To exclude solutions with too many buffers inserted

(5) To limit the leakage power penalty

 Slide21

Heuristics for Buffer ChoicesHeuristic 1: various buffer/wire

types to compensate for delay between voltagesWe provide buffer candidates with different delay sensitivity to voltage scalingWe provide wire detour options to provide wider voltage sensitivity rangeHeuristic 2: pruning buffers in the candidates to speed up MILPSolutions are pruned within sub-regions in the tradeoff space by choosing cells in the regions with lowest leakageWithout pruningWith pruningWire detouringSlide22

Result: Improved Accuracy by Balancing Paths

Path delaysAverage ErrorsLower errorLess inter-path delay skewSTRAUSS (UMich) +Conventional P&R (ICC)ReSC (UMN) +Conventional P&R (ICC)ReSC (UMN) +Proposed P&R Opt.Slide23

Result: Improved Input Delay Window

Safe timing window: timing margin between clock edge and input delay Before optimization: small input delay variation will cause errorsAfter: Safe timing window = half of the clock cycleClock period = 150psSafe windowSafe windowOriginal delay distributionOpt.Slide24

Improved accuracy = Less voltage scaling needed = Higher energy efficiency

Conventional P&R flow (ICC) fails to

meet accuracy constraint when VDD is low

Our proposed P&R optimization reduce delay mismatch at lower voltages and leads to lower energy cost for the same accuracy

Result:

Improved Energy Cost by

Balancing PathsSlide25

The proposed Markov chain model is verified on four different SC application circuits

Green: New MC modelBlue: Exhaustive simulationMC Model: Improved Simulation Runtime #Cycle (Ex.)#Cycles (MC)GammaCorr1024

10

PolySmall

256

10

Neuron

100

10

Less simulation cyclesSlide26

Testcase: Gamma correction

Both SC and conventional circuits are signed off at 1.0VSC still generates recognizable image at 0.6VEnergy saving of SC = 66% Result: Gamma CorrectionSlide27

OutlineBackground and Previous

WorkProblem Statement in SC Physical DesignModeling ApproachOptimization ApproachConclusionsSlide28

ConclusionsWe identify the impact of delay-induced errors and propose a Markov chain-based model for error

estimationWe propose a new physical implementation approach that improves the energy-accuracy tradeoffThe experiment results show significant energy and benefit over previous workFuture workMarkov chain model improvementComprehensive tradeoff recipe for performance, accuracy, and energySlide29

Thank you !