/
Optimizing Power @ Design Time Optimizing Power @ Design Time

Optimizing Power @ Design Time - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
415 views
Uploaded On 2016-05-16

Optimizing Power @ Design Time - PPT Presentation

Circuits Dejan Marković Borivoje Nikoli ć Chapter Outline Optimization framework for energydelay tradeoff Dynamic power optimization Multiple supply voltages Transistor sizing Technology mapping ID: 321826

delay energy design power energy delay power design leakage optimization supply ref circuit ieee ddh vth input level technology vdd voltage multiple

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Optimizing Power @ Design Time" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Optimizing Power @ Design TimeCircuits

Dejan

Marković

Borivoje

Nikoli

ćSlide2

Chapter OutlineOptimization framework for energy-delay trade-off

Dynamic power optimization

Multiple supply voltages

Transistor sizing

Technology mapping

Static power optimization

Multiple thresholds

Transistor stackingSlide3

Energy/Power Optimization Strategy

For given function and activity, an

optimal operation point

can be derived in the energy-performance space

Time of optimization depends upon activity profile

Different optimizations apply to active and static power

Fixed Activity

Variable Activity

No Activity - Standby

Active

Design time

Run time

Sleep

StaticSlide4

Maximize throughput for given energy

or

Minimize energy for given throughput

Delay

Unoptimized

design

E

max

D

max

D

min

Energy/op

E

min

Energy-Delay Optimization and Trade-off

Trade-off space

Other important metrics: Area, Reliability, ReusabilitySlide5

The Design Abstraction Stack

Logic/RT

(Micro-)Architecture

Software

Circuit

Device

System/Application

This Chapter

A very

rich set of design parameters

to consider!

It helps to consider options in relation to their abstraction layer

sizing, supply, thresholds

logic family, standard cell versus custom

Parallel versus pipelined, general purpose versus application specific

Bulk versus SOI

Choice of algorithm

Amount of concurrencySlide6

Architecture

Micro-Architecture

Circuit (Logic & FFs)

Optimization Can/Must Span Multiple Levels

Design optimization combines top-down and bottom-up:

“meet-in-the-middle”Slide7

topology A

Delay

Energy/op

Globally optimal energy-delay curve for a given function

Energy-Delay Optimization

topology B

topology A

topology B

Delay

Energy/opSlide8

Some Optimization Observations

E

/

∂A

D

/ ∂A

A=A0SA=

S

B

S

A

f

(A

0

,B)

f

(A,B0)

Delay

Energy

D0

(A0,B0

)

Energy-Delay Sensitivities

[Ref:

V.

Stojanovic

, ESSCIRC’02

]Slide9

∆E = S

A

∙(

∆D

)

+ S

B

∆D

On the optimal curve, all sensitivities must be equalFinding the Optimal Energy-Delay Curve

f

(A

0,B)

f (A,B

0)Delay

Energy

D0(A0,B0)

∆D

f

(A

1

,B)

Pareto-optimal:

the best that can be achieved without disadvantaging at least one metric.Slide10

Reducing voltages

Lowering the supply voltage (

V

DD

) at the expense of clock speed

Lowering the logic swing (

V

swing

)

Reducing transistor sizes (CL) Slows down logicReducing activity (a)Reducing switching activity through transformationsReducing glitching by balancing logicReducing Active Energy @ Design TimeSlide11

Downsizing and/or lowering the supply on the critical path lowers the operating frequencyDownsizing

non-critical paths

reduces energy for free, but

Narrows down the path delay distribution

Increases impact of variations, impacts robustness

t

p

(path)

# of pathstargetdelay

t

p (path)

# of paths

target

delay

ObservationSlide12

topology A

topology B

Delay

Energy/op

Reference case

D

min

sizing @

V

DD

max

,

V

TH

ref

minimize Energy (

V

DD, VTH, W) subject to Delay (VDD, VTH, W) ≤ Dcon

Constraints

VDDmin

< VDD < VDDmax

VTHmin < VTH

< VTHmax Wmin

< W

Circuit Optimization Framework

[Ref: V. Stojanovic, ESSCIRC’02]Slide13

i

i+1

C

w

g

C

i

C

i

C

i+1

Optimization Framework: Generic Network

V

DD,i+1

V

DD,i

Gate in stage

i

loaded by fanout (stage

i

+1)Slide14

Fit

parameters:

V

on

, 

d,

K

d,

g

Alpha-power based Delay ModelVDDref = 1.2V, technology 90 nm

(90nm technology)

0

2

4

6

8

10

0

10

20

30

40

50

60

Fanout (

C

i+

1

/

C

i

)

Delay (ps)

t

p

0.5

0.6

0.7

0.8

0.9

1

0

0.5

1

1.5

2

2.5

3

3.5

4

V

DD

/ V

DD

ref

FO4 delay (norm.)

V

on

= 0.37 V

a

d

= 1.53

simulation

model

t

nom

= 6 ps

g

= 1.35

simulation

modelSlide15

Parasitic delay

p

i

depends upon gate topology

Electrical

effort

f

i

≈ Si+1/SiLogical effort gi – depends upon gate topologyEffective fanout hi = figi

For Complex Gates

[Ref: I. Sutherland, Morgan-Kaufman’99]

Combined with Logical Effort FormulationSlide16

= energy consumed by logic gate i

Dynamic Energy

i

i+1

C

w

g

C

i

C

i

C

i+1

V

DD,i+1

V

DD,iSlide17

 for equal

h

(D

min

)

max at

V

DD

(max)

(Dmin)Depends on Sensitivity (E/D)Optimizating Return on Investment (ROI)

Gate Sizing

Supply VoltageSlide18

Properties of inverter chain

Single path topology

Energy increases geometrically from input to output

Example: Inverter Chain

1

S

1

=

1S2

…SN

S3

Goal

Find optimal sizing S = [S1, S2, …, SN], supply voltage, and buffering strategy to achieve the best energy-delay tradeoffSlide19

Variable taper achieves minimum energy

Reduce number of stages at large

d

inc

[Ref: Ma, JSSC’94

]

Inverter Chain: Gate Sizing

1

2

3

4

5

6

7

0

5

10

15

20

25

stage

effective fanout,

h

0%

1%

10%

30%

d

inc

= 50%

nom

optSlide20

V

DD

reduces energy of the final load first

Variable taper achieved by voltage scaling

Inverter Chain:

V

DD

Optimization

1

2

3

4

5

6

7

0

0.2

0.4

0.6

0.8

1.0

stage

V

DD

/ V

DD

nom

0%

1%

10%

30%

d

inc

= 50%

nom

optSlide21

Parameter with the largest sensitivity has the largest potential for energy reduction

Two discrete supplies mimic per-stage V

DD

Inverter Chain: Optimization Results

50

inc

0

10

20

30

40

0

20

40

60

80

100

d

(%)

energy reduction (%)

0

10

20

30

40

50

0

0.2

0.4

0.6

0.8

1.0

d

inc

(%)

Sensitivity (norm)

cV

DD

S

gV

DD

2V

DDSlide22

Tree adder

Long wires

Re-convergent paths

Multiple active outputs

Example: Kogge-Stone Tree Adder

[

Ref: P.

Kogge

, Trans. Comp’73]Slide23

sizing: E (-54%)

d

inc

=10%

reference

D=D

min

2V

dd

: E (-27%)dinc=10%Tree Adder: Sizing vs. Dual-VDD OptimizationReference design: all paths are critical

Internal energy  S more effective than VDDS: E(-54%), 2Vdd: E(-27%) at d

inc = 10%Slide24

Tree Adder: Multi-dimensional Search

Can get pretty close to optimum with only 2 variables

Getting the minimum speed or delay is very expensive

Energy / E

ref

Delay / D

min

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0

0.2

0.4

0.6

0.8

1

Reference

S, V

DD

V

DD

, V

TH

S, V

TH

S, V

DD

, V

THSlide25

Block-level supply assignmentHigher throughput/lower latency functions are implemented in higher

V

DD

Slower functions are implemented with lower

V

DDThis leads to so-called “voltage islands” with separate supply grids

Level conversion performed at block boundaries

Multiple supplies inside a block

Non-critical paths moved to lower supply voltage

Level conversion within the blockPhysical design challengingMultiple Supply VoltagesSlide26

V

1

= 1.5V,

V

TH

= 0.3V

Using Three V

DD

’s

+V2 (V)V

3 (V)0.4

0.6

0.8

1

1.2

1.4

0.4

0.60.811.21.4

V

2

(V)

V

3

(V)

Power Reduction Ratio

0

0.5

1

1.5

0

0.5

1

1.5

0.4

0.5

0.6

0.7

0.8

0.9

1

[Ref:

T. Kuroda,

ICCAD’02

]

© IEEE 2002Slide27

1.0

0.5

VDD Ratio

1.0

0.4

0.5

1.0

1.5

V

1

(V)

P Ratio

V

2

/

V

1

P

2

/

P

1

{

V

1

,

V

2

}

V

2

/

V

1

V

3

/

V

1

{

V

1

,

V

2

,

V

3

}

0.5

1.0

1.5

V

1

(V)

P

3

/

P

1

V

2

/

V

1

V

3

/

V

1

V

4

/

V

1

0.5

1.0

1.5

V

1

(V)

P

4

/

P

1

{

V

1

,

V

2

,

V

3

,

V

4

}

[Ref:

M. Hamada,

CICC’01]

Optimum Number of

V

DD

’s

The more V

DD

’s the less power, but the effect saturates

Power reduction effect decreases with scaling of V

DD

Optimum V

2

/V

1

is around 0.7

© IEEE 2001Slide28

Two supply voltages per block are optimalOptimal ratio between the supply voltages is 0.7

Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)

An option is to use an asynchronous level converter

More sensitive to coupling and supply noise

Lessons: Multiple Supply VoltagesSlide29

i1

o1

V

DDH

V

DDL

V

SS

Conventional

V

DDH circuit

VDDL

circuit

i2

o2

i1

o1

V

DDH

V

DDL

V

SS

Shared N-well

V

DDH

circuit

V

DDL

circuit

i2

o2

Distributing Multiple Supply VoltagesSlide30

V

DDH

circuit

V

DDH

V

DDL

V

SS

N-well isolation

V

DDL

circuit

(

a) Dedicated row

(

b) Dedicated region

V

DDH

Row

V

DDH

Row

V

DDH

Region

V

DDL

Region

Conventional

V

DDL

Row

V

DDL

RowSlide31

V

DDH

circuit

V

DDH

V

DDL

V

SS

Shared N-well

V

DDL

circuit

(

a) Floor plan image

V

DDL

circuit

V

DDH

circuit

Shared N-Well

[Shimazaki et al, ISSCC’03]Slide32

Lower V

DD

portion is shared

[Ref:

M. Takahashi,

ISSCC’98]

“Clustered voltage scaling”

Example: Multiple Supplies in a Block

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

CVS Structure

Conventional Design

Critical Path

Level-Shifting F/F

Critical Path

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

© IEEE 1998Slide33

Pulsed Half-Latch versus Master-Slave

LCFFs

Smaller # of

MOSFETs

/ clock loading

Faster level conversion using half-latch structure

Shorter D-Q path from pulsed circuit

[Ref:

F. Ishihara, ISLPED’03]Level Converting Flip-Flops (LCFFs)

Master-Slave

Pulsed Half-Latch

© IEEE 2003Slide34

Pulsed precharge LCFF (PPR)

Fast level conversion by precharge mechanism

Suppressed charge/discharge toggle by conditional capture

Short D-Q path

[

Ref: F. Ishihara,

ISLPED’03]

Dynamic Realization of Pulsed LCFF

Pulsed Precharge Latch

© IEEE 2003Slide35

carry

gen.

partial

sum

gp

gen.

5:1

MUX

ain

bin

carry

s0/s1

sum

sumb (long loop-back bus)

clk

clock gen.

:

V

DDH

circuit

:

V

DDL

circuit

INV1

INV2

0.5

pF

sum

sel.

2:1

MUX

9:1

MUX

logical

unit

9:1

MUX

ain0

Case Study: ALU for 64-bit

m

Processor

[Ref:

Y.

Shimazaki

,

ISSCC’03]

© IEEE 2003Slide36

sum

keeper

pc

sumb

V

DDH

V

DDL

INV1

INV2

domino level converter (9:1 MUX)

ain0

sel

(V

DDH

)

V

DDH

V

DDL

INV2 is placed near 9:1 MUX to increase noise immunity

Level conversion is done by a domino 9:1 MUX

Low-Swing Bus and Level Converter

[Ref:

Y.

Shimazaki

,

ISSCC’03]

© IEEE 2003Slide37

Single-supply

Shared well

(

V

DDH

=1.8V)

Energy [pJ]

T

CYCLE

[ns]

Room temperature

200

300

400

500

600

700

800

0.6

0.8

1.0

1.2

1.4

1.6

1.16

GHz

V

DDL

=1.4V

Energy:-25.3%

Delay :+2.8%

V

DDL

=1.2V

Energy:-33.3%

Delay :+8.3%

Measured Results: Energy and Delay

[Ref:

Y.

Shimazaki

,

ISSCC’03]

© IEEE 2003Slide38

Practical Transistor SizingContinuous sizing of transistors only an option in custom design

In ASIC design flows, options set by available library

Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell

Leads to larger libraries (> 800 cells)

Easily integrated into technology mappingSlide39

Larger gates reduce capacitance, but are slower

Technology Mapping

a

b

c

slack=1

d

fSlide40

(a) Implemented using 4 input NAND + INV(b) Implemented using 2 input NAND + 2-input NOR

Library 1:

High-Speed

Technology Mapping

Example: 4-input AND

Gate

type

Area

(cell unit)

Input

cap. (fF)

Average delay (ps)

Average delay (ps)

INV

3

1.8

7.0 + 3.8 C

L

12.0 + 6.0 C

L

NAND2

4

2.0

10.3 + 5.3 C

L

16.3 + 8.8 C

L

NAND4

5

2.0

13.6 + 5.8 C

L

22.7 + 10.2 C

L

NOR2

3

2.2

10.7 + 5.4 C

L

16.7 + 8.9 C

L

Library 2:

Low-Power

(delay formula: C

L

in fF)

(numbers calibrated for 90 nm)Slide41

Technology Mapping – Example

4-input AND

(a) NAND4 + INV

(b) NAND2 + NOR2

Area

8

11

HS: Delay (ps)

31.0 + 3.8 C

L

32.7 + 5.4 C

L

LP: Delay (ps)

53.1 + 6.0 C

L

52.4 + 8.9 C

L

Sw Energy (fF)

0.1 + 0.06 C

L

0.83 + 0.06 C

L

Area

4-input more compact than 2-input (2 gates vs. 3 gates)

Timing

both implementations are 2-stage realizations

2

nd

stage INV (a) is better driver than NOR2 (b)

For more complex blocks, simpler gates will show better performance

Energy

Internal switching increases energy in the 2-input case

Low-power library has worse delay, but lower leakage (see later)Slide42

Technology mapping

Gate selection

Sizing

Pin assignment

Logical Optimizations

Factoring

Restructuring

Buffer insertion/deletion

Don’t care optimization

Gate-Level Tradeoffs for PowerSlide43

Logic restructuring to minimize spurious transitions

Buffer insertion for path balancing

Logic Restructuring

0

1

1

1

0

1

1

1

0

1

1

1

1

1

111

1

2

3Slide44

Idea:

Modify network to reduce capacitance

Caveat:

This may increase activity!

p

a

= 0.1; p

b

= 0.5; p

c = 0.5Algebraic Transformations

a

b

c

f

f

a

a

b

c

p

1

=0.05

p

2

=0.05

p

3

=0.075

p

4

=0.75

p

5

=0.075Slide45

Joint optimization over multiple design parameters possible using sensitivity-based optimization frameworkEqual marginal costs

⇔ Energy-efficient design

Peak performance is VERY power inefficient

About 70% energy reduction for 20% delay penalty

Additional variables for higher energy-efficiency

Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage

Choice between sizing and supply voltage parameters depends upon circuit topology

But … leakage not considered so far

Lessons from Circuit OptimizationSlide46

Considering leakage as well as dynamic power is essential in sub-100 nm technologiesLeakage is not essentially a bad thing

Increased leakage leads to improved performance, allowing for lower supply voltages

Again a trade-off issue …

Considering Leakage @ Design TimeSlide47

Must adapt to process and activity variations

Topology

Inv

Add

Dec

(E

Lk

/E

Sw

)

opt

0.8

0.5

0.2

Leakage – Not Necessarily a Bad Thing

Optimal designs have high leakage (E

Lk

/E

Sw

0.5)

10

-2

10

-1

10

0

10

1

0

0.2

0.4

0.6

0.8

1

E

static

/E

dynamic

E

norm

V

th

ref

-180mV

0.81V

DD

max

V

th

ref

-140mV

0.52V

DD

max

Version 1

Version 2

[

Ref: D.

Markovic

, JSSC’04]

© IEEE 2004Slide48

Switching energy

Leakage energy

with:

I

0

(

Y

): normalized leakage current with inputs in state

Y

Refining the Optimization ModelSlide49

Using longer transistorsLimited benefitIncrease in active

current

Using higher thresholds

Channel doping

Stacked devices

Body biasing

Reducing the voltage!!

Reducing Leakage @ Design TimeSlide50

10% longer gates reduce leakage by 50%

Increases switching power by 18% with W/L = const.

Doubling

L

reduces leakage by 5x

Impacts performance

Attractive when don’t have to increase

W

(e.g. memory)

Longer Channels

100

110

120

130

140

150

160

170

180

190

200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Transistor length (nm)

1

2

3

4

5

6

7

8

9

10

90 nm CMOS

Switching energy

Leakage power

Normalized switching energy

Normalized leakage powerSlide51

There is no need for level conversionDual thresholds can be added to standard design flows

High-

V

Th

and Low-

VTh libraries are a standard in sub-0.18m processes

For example: can synthesize using only high-

V

Th

and then only in-place swap in low-VTh cells to improve timing.Second VTh insertion can be combined with resizingOnly two thresholds are needed per blockUsing more than two yields small improvementsUsing Multiple ThresholdsSlide52

V

DD

= 1.5V,

V

TH.

1

= 0.3V

Three

V

TH’s+VTH.3

(V)V

TH.2

(V)

0.4

0.6

0.81

1.2

1.40.40.60.81

1.2

1.4

Leakage Reduction Ratio

V

TH.

3

(V)

V

TH.

2 (V)

0

0.5

1

1.5

0

0.5

1

1.5

0

0.2

0.4

0.6

0.8

1

Impact of third threshold very limited

[Ref:

T. Kuroda,

ICCAD’02

]

© IEEE 2002Slide53

Using Multiple Thresholds

FF

FF

FF

FF

FF

Cell-by-cell

V

TH

assignment (not at block level)

Achieves all-low-VTH performance with substantial leakage reduction in leakage

Low V

TH

High VTH

[Ref: S. Date, SLPE’94

]Slide54

Shaded transistors are low threshold

Low-threshold transistors used only in critical paths

Dual-V

T

Domino

P

1

Inv

1

Inv

2

Inv

3

D

n+1

Clk

n

Clk

n+1

D

n

…Slide55

Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds

Selection of cells during technology mapping

No impact on dynamic power

No interface issues (as was the case with multiple

V

DD’s

)

Impact: Can reduce leakage power substantially

Multiple Thresholds and Design MethodologySlide56

High-

V

TH

Only

Low-

V

TH

Only

Dual

V

TH

Total Slack

-53 psec

0 psec

0 psec

Dynamic Power

3.2 mW

3.3 mW

3.2 mW

Static Power

914 nW

3873 nW

1519

nW

All designs synthesized automatically using Synopsys Flows

[Courtesy:

Synopsys, Toshiba, 2004]

Dual-

V

TH

Design for High-Performance DesignSlide57

Example: High- vs. Low-Threshold Libraries

Leakage Power (nW)

Selected combinational tests

130 nm CMOS

[Courtesy: Synopsys 2004]Slide58

Complex Gates Increase I

on

/I

off

Ratio

I

on

and I

off

of single NMOS versus stack of 10 NMOS transistorsTransistors in stack are sized up to give similar driveNo stackStack

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.5

1

1.5

2

2.5

3

V

DD

(V)

I

off

(nA)

No stack

Stack

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

20

40

60

80

100

120

140

I

on

(

m

A)

V

DD

(V)

(90nm technology)

(90nm technology)Slide59

Complex Gates Increase I

on

/I

off

Ratio

Stacking transistors suppresses submicron effects

Reduced velocity saturation

Reduced DIBL effect

Allows for operation at lower thresholds

StackNo stackFactor 10!

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.5

1

1.5

2

2.5

3

3.5

x 10

5

V

DD

(V)

I

on

/I

off

ratio

(90nm technology)Slide60

Example: 4-input NAND

With transistors sized for similar performance:

Leakage of Fan-in(2) =

Leakage of Fan-in(4)

x 3

(Averaged over all possible input patterns)

Fan-in (2)

Fan-in (4)

versus

Complex Gates Increase I

on

/I

off

Ratio

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

Input pattern

Leakage Current (nA)

Fan-in (2)

Fan-in (4)Slide61

Example: 32 bit Kogge-Stone Adder

[Ref:

S.Narendra

, ISLPED’01

]

% of input vectors

Standby leakage current (

m

A)factor 18

Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60 © Springer 2001Slide62

Circuit optimization can lead to substantial energy reduction at limited performance lossEnergy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s.

Well-defined optimization problem over

W

,

V

DD and VTH parameters

Increasingly better support by today’s CAD flows

Observe: leakage is not necessarily bad – if appropriately managed.

SummarySlide63

Books:

A

.

Bellaouar

, M.I

Elmasry, Low-Power Digital VLSI Design Circuits and Systems,

Kluwer

Academic Publishers, 1

st

Ed, 1995.D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed, Prentice Hall 2003.I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-Kaufmann, 1st Ed, 1999.Articles:

R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell

-Circuit Technology with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994.M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.

F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.

P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput

., vol. C-22, no. 8, pp. 786-793, Aug 1973. T. Kuroda, “Optimization and control of VDD and

VTH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.

ReferencesSlide64

Articles (cont.):

H.C. Lin and L.W.

Linholm

, “An Optimized Output Stage for MOS Integrated Circuits,”

IEEE J. Solid-State Circuits

, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.

S

. Ma and P.

Franzon

, “Energy Control and Accurate Delay Estimation in the Design of CMOS Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True Energy-Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004.MathWorks, http://www.mathworks.comS. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195-200, Aug. 2001.T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990.Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.V.

Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid-State Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37, Feb. 1998.

References