/
Efficient IP Design flow for        Low-Power Efficient IP Design flow for        Low-Power

Efficient IP Design flow for Low-Power - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
446 views
Uploaded On 2017-04-04

Efficient IP Design flow for Low-Power - PPT Presentation

HighLevel Synthesis Quick amp Accurate Power Analysis and Optimization Flow JAN202014 Asher Berkovitz Yaniv Fais Authors Contact Details Asher Berkovitz AsherBerkovitzfreescalecom 972 099522511 ID: 533602

rtl power dff level power rtl level dff data clock count flow high gate test design key analysis clk netlist gating accurate

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Efficient IP Design flow for Low-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Efficient IP Design flow for Low-Power

High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow

JAN.20.2014

Asher Berkovitz

Yaniv FaisSlide2

Authors Contact Details

Asher Berkovitz

Asher.Berkovitz@freescale.com

+972- 09-9522511

Yaniv Fais

Yaniv.Fais@freescale.com

+972- 09-9522179

Freescale

Semiconductor

Israel

Herzelia

Shenkar

3Slide3

Outline

Challenges

High Level Synthesis flowPower EfficiencyProblems at RTL

Proposed VSIM++ Flow

Analysis

Optimization

Results on Networking Algorithm (Non-Abstract Version)

ConclusionsSlide4

Challenges

IP blocks for networking types of applications need to meet tight power consumptions while meeting aggressive performance requirements.

Making changes to micro architectures and other high abstraction modeling styles could deliver the largest benefits on overall power.It is hard to accurately measure power at higher abstractions.

Measuring accurate power upon signoff is late in the design process when high level changes are impossibleSlide5

High Level Synthesis design Flow

Algorithms

Definition

Macro-Architecture Definition

RTL2GDSII

“Normal”

flow

RTL

Macro-architecture definition:

Based on an accelerator base class

Uses unified modules (FIFOs, interfaces etc)

Commands (uArch)

Cell library (.lib)

Bit-exact

SystemC

®

Model

SystemC

®

Model:

Architecture evaluation

and

RTL generation

Accurate data path description according to macro-architecture

Design to meet processing requirements

HLS:

Builds pipelined data path and control logic

Considers real timings during RTL generation

Explore implementation tradeoffs

HLS

SystemC ®

RTL Quick explore (Timing/Area)Slide6

Power Dissipation

Static Power - ~test independent

Dynamic Power – highly dependent on application (Signal Transition)Signal transitions can be divided to:

Functional change

Glitch (signal changes that which not captured by a sequential element)

Glitches are not visible in RTL simulation and can contribute ~20% to power dissipationSlide7

Fast & Accurate power analysis flow (VSIM)

Quick Physical Design (PD) flow:

Timing violations allowedDRC violations allowed

Less than 100% RTL to GL equivalence

Costumed test bench enables Cycle accurate Gate Level Simulation

Power analysis is performed using gate level

netlist

& parasitics file.

Power analysis results are mapped backed to RTL

netlist

.

Quick PD flow

RTL DB

Power Analysis

GLV simulation

Test bench generation

Mapping GL 2 RTLSlide8

Test Bench Generation

Based on RTL to GL mapping, force RTL values on GLV simulation

Advantages:

Q

D

Std’ test bench

Q

D

“VSIM” test bench

Force the RTL value on the key point

Timing violation!

Q

D

Short run time:

Simulate selected window

Force correct value @ time point X

Q

D

GL delay for logic cones

(SDF)

Q

D

Q

D

Q

D

Values are a bit “off”

Correct values forced

GL & SDFSlide9

Cond_0

Gate level results mapping to RTL

netlist

reg

cond

[1:0]

reg

count[1:0]

always @(

posedge

clk

)

if (condition == 2’b11)

count = count + 1;

RTL

netlist

GL

netlist

26

29

Cond_1

count_1

count_0

Clock Gate

Map RTL 2 GL

For each unmapped GL instance:

Divide the power between drive/load key points

Assign GL key point power to RTL key pointThe power of each RTL hierarchy is the sum of power assigned to its key point4

810

10

21

10

10

1

1

1

1

13

13

14

15

11

11

11

11Slide10

Mapping results to high-level language (VSIM++)

Using annotation of C++ class names, variable names as well as file name/line numbers we can map power consumption from the accurate gate-level to the C++.

This capability allows us to:Analyze and fix clock gating

Redesign “power hungry” resources

Consider different architectures

reg

my_var_Ln123[1:0]

reg

count_Ln124[1:0]

always @(

posedge

clk

)

if (my_var_Ln123 == 2’b11)

count_Ln124 = count_Ln124 + 1;

RTL

netlist

void process() { … while (true) { if (my_var==3) count++; … }

}C++ code

121:122:123:124:125:126:127:Line #Slide11

DFF

DFF

Example problem identified

Tool inserts “clock gating” enabler code for RTL automatically

always @(

posedge

clk

)

if (en)

data[511:0] <=

new_data

;

C++ process condition

HLS

DFF

clk

en

new_data

data

Gate-Level implementation is not implemented as gated clock but as data logic due to timing violations

Solution – Simplify clock gating enablers to meet timing constraintsSlide12

Clock gating enabler simplification

DFF

DFF

Hash Key

clk

en

new_data

data

DFF

DFF

Header

DFF

DFF

Process control

DFF

DFF

Hash Key

clk

en

new_data

data

DFF

DFF

Process control

Original clock gating scheme –

Complicated enable logic

Synthesized to non efficient enabler

Simplified clock gating scheme –

Enable synthesized w/o changes

Leading to high clock gating efficiencySlide13

Conclusions

Use High Level Synthesis for IP Design

Quick and easy to explore architecture alternatives Quick front-end flow including verification

Power analysis:

Measure power on system level scenario

Quick (doesn’t require full physical design flow convergence)

Accurate (done on gate-level)

Analysis and Optimization in high-level design (C++)

Manual clock gating enable setting reduced dynamic power consumption by 19.4%

Early in the design cycle : Easy to change IP architecture !Slide14

BackupSlide15

Accuracy

Measured using similar methodology on a different designSi measurement compared to full T/O gate level data

Test

Dynamic

power accuracy

Single

core Fast Fourier Transform

-7.59%

Single

core Fast Fourier Transform No memory miss

-8.40%

Dual

core Fast Fourier Transform

7.57%