/
Ultra-Low Power/Voltage Design Ultra-Low Power/Voltage Design

Ultra-Low Power/Voltage Design - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
452 views
Uploaded On 2015-09-21

Ultra-Low Power/Voltage Design - PPT Presentation

Chapter Outline Rationale Lower Bounds on Computational Energy Subthreshold Logic Moderate Inversion as a Tradeoff Revisiting Logic Gate Topologies Summary Rationale Continued increase of computational density must be combined with decrease in ID: 135411

voltage energy minimum ref energy voltage ref minimum logic threshold cmos leakage operation ieee current circuits delay power subthreshold

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ultra-Low Power/Voltage Design" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Ultra-Low Power/Voltage DesignSlide2

Chapter Outline

Rationale

Lower Bounds on Computational Energy

Subthreshold Logic

Moderate Inversion as a Trade-off

Revisiting Logic Gate Topologies

SummarySlide3

Rationale

Continued increase of computational density must be combined with decrease in

energy/operation (EOP).

Further scaling of supply voltage essential to accomplish that

The only other option is to keep on reducing activity

Some key questions:

How far can the supply voltage be scaled?

What is the minimum energy per operation that can be obtained theoretically and practically?

What to do about the threshold voltage and leakage?

How to practically design circuits that approach the minimum energy bounds?Slide4

Opportunities for Ultra-Low Voltage

Number of applications emerging that do not need high performance, only extremely low power

dissipation

Examples:

Standby operation for mobile components

Implanted

electronics and artificial senses

Smart

objects, fabrics

and

e-textiles

Need power levels below 1

mW

(even

m

W

in certain cases)Slide5

Minimum Operational Voltage of Inverter

Swanson, Meindl (April 1972)

Further extended in Meindl (Oct 2000)

Limitation: gain at midpoint > -1

C

ox

: gate capacitance

C

d

: diffusion capacitance

n

: slope factor

For ideal MOSFET (60 mV/decade slope):

at 300

°

K

or

[

Ref: R. Swanson, JSSC’72; J. Meindl, JSSC’00]

© IEEE

1972Slide6

Subthreshold Modeling of CMOS Inverter

From Chapter 2:

(DIBL can be ignored at low voltages)

withSlide7

Subthreshold DC model of CMOS Inverter

Assume NMOS and PMOS are fully symmetrical and all voltages normalized to the thermal voltage

F

T

=

kT

/

q

(

x

i = Vi/FT; x

o = Vo/FT;

xD = VDD/FT)The VTC of the inverter for NMOS and PMOS in subthreshold

can be derived:

[Ref: E. Vittoz, CRC’05]

with

so that

and

For |

A

Vmax

| = 1:

x

D

= 2ln(

n

+1)Slide8

Results from Analytical Model

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

1

2

3

4

5

6

7

n

x

d

A

max

=1

A

max

=2

A

max

=4

A

max

=10

Normalized VTC for

n

=1.5 as a function of V

DD

(

x

d

)

Subthreshold Inverter

Minimum supply voltage for a given maximum gain as a function of the slope factor

n

[

Ref: E.

Vittoz

, CRC’05

]

x

dmin

= 2ln(2.5) = 1.83 for

n

=1.5

x

d

=4 sufficient for reliable operation

x

D

=8

x

D

=6

x

D

=4

x

D

=1

x

D

=2

n=1.5

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

x

i

x

oSlide9

Confirmed by simulation (at 90 nm)

Observe: non-symmetry of VTC increases

VDD

min

For

n

=1.5,

VDD

min

= 1.83

F

T= 48 mVMinimum operational supply voltage

pn-ratio

VDDmin (mV)Slide10

Also

Holds

for

More Complex Gates

Degradation due to

asymmetry

Minimum operational supply voltage

(2-input NOR)

pn

-ratioSlide11

Minimum Energy per Operation

Moving one electron over

VDD

min

:

Emin

= QV

DD

/2 =

q 2(ln2)kT/2q = kTln(2)Also called the

Von Neumann-Landauer-Shannon boundAt room temperature (300K): Emin

= 0.29 10-20 J Minimum sized CMOS inverter at 90 nm operating at 1V

E = CVDD2 = 0.8 10-15 J, or 5 orders of magnitude larger!

J.

von Neumann,

[Theory of Self-Reproducing Automata, 1966]

.

Predicted by von Neumann: kTln(2)

How close can one get?[Ref: J. Von Neumann, Ill’66]Slide12

Propagation Delay of Subthreshold Inverter

Normalizing

t

p

to

t

0

=

C

F

T

/I0:

(for VDD >> FT

)Comparison between

curve-fitted model and simulations (FO4, 90 nm)

3

4

5

6

7

8

9

10

0

20

40

60

80

100

120

x

d

t

p

t

0

= 338

n = 1.36

(nsec)Slide13

Dynamic Behavior

Also: Short circuit current ignorable if input rise time smaller than

t

0

,

or balanced slopes at in- and outputs

0

0.5

1

1.5

2

2.5

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time (normalized to

t

0

)

Voltage

(norm. to 4

F

T

)

Transient response

t

r

= 2

t

0

t

0

0.5

t

0

0

t

p

t

p

as a function of

t

rise

0

0.5

1

1.5

2

2.5

3

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

t

rise

t

p

(normalized to

t

0

)

x

D

= 4Slide14

Power Dissipation of Subthreshold Inverter

P

dyn

=

CV

DD

2

f

(nothing new)

Short-circuit power can be ignored (< 1%) for well-proportioned circuits and xD >= 4

1

2

3

4

5

6

7

8

9

10

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

x

D

I

Stat

I

0

n=1.5

circuit fails

logic levels degenerate

Leakage current equal to

I

0

for

x

D

>= 4 (ignores DIBL)

Increases for smaller values of

x

D

due to degeneration of logic levelsSlide15

Power-Delay Product and Energy-Delay

3

4

5

6

7

8

9

10

0

1

2

3

4

5

6

7

8

9

10

x

d

pdp

For low activity (

a

<< 1), large

x

D

advantageous!

3

4

5

6

7

8

9

10

0

0.5

1

1.5

ed

x

d

a

=1

a

=.5

a

=.25

a

=.1

a

=.01

a

=.05

a

=1

a

=.5

a

=.25

a

=.1

a

=.01Slide16

Energy for a Given Throughput

Most important question –

assuming 1/T

=

a

/2t

p

what minimizes the energy for a given task?

3

4

5

6

7

8

9

10

11

12

10

1

10

2

10

3

10

4

x

d

energy

Energy minimized by keeping

a

as high as possible and have computation occupy most of the time – use minimum voltage that meets T

If

a

must be low because of topology (< 0.05), there exists an optimum voltage that minimizes the energy

a

=1

a

=.1

a

=.05

a

=.01

a

=.005

a

=0.001

dynamic power dominatesSlide17

Example: Energy-Aware FFT

 

[Ref: A

. Wang,

ISSCC’04]

Architecture scales gracefully from 128 to 1024 point lengths, and supports 8b and 16b precision.

© IEEE

2004Slide18

FFT Energy-Performance Curves

The optimal

V

DD

for the 1024-point, 16b FFT is estimated from switching and leakage models for a 0.18

m

m process.

Optimal (

V

DD

, V

TH

)Threshold Voltage (

VTH)

Supply Voltage (

VDD)

 [Ref: A. Wang, ISSCC’04]

© IEEE 2004Slide19

SubThreshold FFT

0.18

m

m CMOS process

V

DD

=180mV-900mV

fclock = 164Hz-6MHz.

At 0.35V, Energy = 155nJ/FFT; fclock = 10kHz; W = 0.6

m

W

Data Memory

Twiddle

ROMs

Butterfly

Datapath

Control logic

2.1 mm

2.6 mm

V

DD

(mV)

Clock frequency

V

DD

(mV)

1024-point, 16 bit

measured

estimated

Energy (nJ)

 

[Ref: A

. Wang,

ISSCC’04]

© IEEE

2004Slide20

Challenges in Sub-Threshold Design

Obviously only for very low speed design

Analysis so far only for symmetrical gates – minimum operation voltage increases for non-symmetrical structures

Careful selection of and sizing logic structures is necessary

Data dependencies may cause gates to fail

Process

variations further confound the problem

Registers

and memory a major concernSlide21

Logic Sizing Considerations

W

p

(max)

Inverter with a minimum sized

W

n

0

1

W

p

(min)

drive current

leakage current

CMOS in subthreshold is “ratioed logic”

Careful sizing of transistors necessary to ensure adequate logic levels

Max Size

Min Size

Operational

Region

 

[Ref: A

. Wang,

ISSCC’04]

180

nm CMOS

© IEEE

2004Slide22

Logic Sizing Considerations

W

p

(max)

SF corner

W

p

(min)

FS corner

W

p

(max

)Inverter sizing analysis and minimum supply voltage analysis must be performed at the process corners.

Variations raise the minimum voltage the circuit can be run at.

Impact of Process Variations

Operational

Region

 

[Ref: A

. Wang,

ISSCC’04]

© IEEE

2004Slide23

The Impact of Data Dependencies

B

Z

B

A

A

XOR1

Z

B

B

A

B

A

XOR2

100

50

0

1m

2m

3m

4m

0

A=1 B=0

A=0 B=1

A=0 B=0

A=1 B=1

Voltage level at Z (mV)

50

0

Voltage level at Z (mV)

100

1m

2m

3m

4m

0

A=1 B=0

A=0 B=1

A=0 B=0

A=1 B=1

 

[Ref: A

. Wang,

ISSCC’04]

© IEEE

2004Slide24

The Impact of Data Dependencies

idle current

drive current

A=1, B=0, Z=1

Z

Leakage through the parallel devices causes XOR1 to fail at 100mV.

XOR1

idle current

drive current

A=1, B=0, Z=1

weak drive current

Z

Balanced number of devices reduces the effects of leakage and process variations.

XOR2

Solid sub-threshold design requires symmetry for all input vectors

 

[Ref: A

. Wang,

ISSCC’04]

© IEEE

2004Slide25

The

Sub-Threshold (Low Voltage) Memory

Challenge

Obstacles that limit functionality at low voltage

SNM

Write margin

Read current / bit-line leakage

Soft errors

Erratic behavior

Read

SNM

worst challenge

SNM read

SNM hold

SNM for sub-V

T

, 6T cell at 300mV

Variation aggravates situationSlide26

Solutions to Enable Sub-

V

TH

Memory

Standard 6T way of doing business won’t work

Voltage scaling versus transistor sizing

Current depends exponentially on voltages in sub-threshold

Use voltages (not sizing) to combat problems

New bitcells

Buffer output to remove Read SNMLower BL leakageComplemented with architectural strategies

ECC, interleaving, SRAM refresh, redundancy Slide27

Sub

-threshold SRAM

Cell

[Ref: B. Calhoun, ISSCC’06

]

Buffered read allows separate Read, Write ports

Removing Read SNM allows operation at lower

V

DD

with

same

stability

at corners;

WL_WR

BLB

BL

Q

QB

VV

DD

RBL

RWL

floating

VVDD floats during write access, but feedback restores ‘1’ to

V

DD

QB=1

RBL=1

0

QBB held

near 1 by

leakage

QB=0

RBL=1

0

QBB =1

leakage

reduced

by stack

Buffer reduces BL leakage: Allows 256 cells/BL instead of 16 cells/BL

Higher

integration reduces area of peripheral circuits

© IEEE

2006Slide28

Sub-threshold

SRAM

Chip

functions without error to below 400mV, holds without error to <250mV:

At

400mV, 3.28mW and 475kHz at 27

o

C

Reads

to 320mV (27

o

C) and 360mV (85oC)

Write to 380mV (27oC) and 350mV (85oC)

256kb SRAM Array

32kb Block

[Ref: B. Calhoun, ISSCC’06

]

Sub-V

TH operation demonstrated in 65nm memory chipSlide29

Example:

Sub-Threshold Microprocessor

Processor for sensor network applications

Simple 8-bit architecture to optimize energy efficiency

3.5

pJ

per instruction at 350mV

and 354 kHz operation

10X less energy than previously reported

11

nW

at 160 mV (300 mV RBB)

41

year operation on 1g Li-ion battery

[Ref: S. Hanson, JSSC’07]

© IEEE

2007Slide30

Prototype Implementation

6 subliminal processors

large solar cell

solar cell for adders

level converter array

discrete adders

processor memories

test memories

solar cell for processor

discrete cells / xtors

solar cell for discretes

test module

Level converter array

Chip Layout with 7 processors

[Courtesy: D. Blaauw, Univ. Michigan]Slide31

Is Sub-threshold the Way to Go?

Achieves lowest possible energy dissipation

But … at a dramatic cost in performance

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0

0.2

0.4

0.6

0.8

1

V

DD

(V)

t

p

(

m

s

)

130 nm CMOSSlide32

In Addition: Huge Timing Variance

0

10

20

30

40

50

60

70

80

0

0.2

0.4

0.6

0.8

1

V

DD

(V)

/

(%)

Normalized timing variance increases dramatically with V

DD

reduction

Design for yield means huge overhead at low voltages:

Worst-case

design at 300mV:

>

200% overkillSlide33

Increased Sensitivity to Variations

Subthreshold circuits operate at low

I

on

/

I

off

ratios, from about a 1000 to less than 10 (at

x

D = 4)Small variations in device parameters can have a large impact, and threaten the circuit operation

1

2

3

4

5

6

7

8

9

10

10

0

10

1

10

2

10

3

I

on

over

I

off

x

DDSlide34

ONE SOLUTION: Back Off A Bit …

The performance cost of minimum energy is exponentially high.

Operating slightly above the threshold voltage improves performance dramatically while having small impact on energy

The Challenge: Modeling in the

Moderate Inversion

region

Delay

Energy

Optimal E-D Trade-off CurveSlide35

The EKV Model

covers

strong, moderate and weak inversion regions

Modeling Over All Regions of Interest

Inversion Coefficient IC

measures

the

degree

of

saturation

with

k

a fit factor and

I

S

the specific current

and is related directly to

V

DD

[Ref: C.

Enz

, Analog’95]Slide36

Relationship between

V

DD

and

IC

10

-3

10

-2

10

-1

10

0

10

1

10

2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IC

V

DD

Threshold changes

move curves up or down

IC

= 1 equals

V

DD

~

V

TH

90 nm CMOS

weak

moderate

strongSlide37

10

-2

10

0

10

2

10

-2

10

-1

10

0

10

1

10

2

10

3

IC

Normalized

t

p

Model

Simulation

Provides Good Match over Most of the Range

Largest deviations in strong inversion –

Velocity saturation not well handled by simple model

strong

inversion

weak

inversionSlide38

Modeling Energy

10

-3

10

-2

10

-1

10

0

10

1

10

2

10

-16

10

-15

IC

EOP

[J]

a

=1

a

=0.2

a

=0.02

a

=0.002Slide39

High Activity Scenario

0.1

1

1

1

1

2

2

2

2

4

4

4

4

6

6

6

8

8

10

10

12

14

V

TH

V

DD

0.01

0.01

0.1

0.1

0.1

0.1

1

1

1

1

2

2

2

2

3

3

3

3

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Equal energy

Equal performance

IC

= 1

Minimum energy

(90 nm,

a

= 0.02)Slide40

Low Activity Scenario

0.1

1

1

1

1

2

2

2

2

4

4

4

4

6

6

6

8

8

10

10

12

14

V

TH

V

DD

0.01

0.1

0.1

0.1

1

1

1

1

1

1

2

2

2

2

2

3

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Equal energy

Equal performance

IC

= 1

Minimum energy

(90 nm,

a

= 0.002)Slide41

Example: Adder

Simple full-adder using NAND & INV onlySlide42

10

0

10

1

10

2

10

3

10

-3

10

-2

10

-1

10

0

Optimizing over size, V

DD

, V

TH

(full range)

delay

(min delay, max energy)

energy

10

-1

10

0

10

1

10

2

IC

Delay and energy normalized to minimum delay and corresponding maximum energy

Significant energy savings within strong inversion

Relatively little energy savings going from moderate

to weak

Higher potential for energy savings

when

activity is

lower

a

=0.1

a

=0.01

a

=0.001

V

TH

V

DD

[Ref: C.

Marcu

, UCB’06]Slide43

Sensitivity to Parameter Variations

[Ref: C.

Marcu

, UCB’06]Slide44

Moving the Minimum Energy Point

Having the minimum energy point in the sub-threshold region is unfortunate

Sub-threshold energy savings are small and expensive

Further technology scaling not offering much relief

Remember the stack effect …

Can it be moved upwards?

Or equivalently… Can we lower the threshold?

10

0

10

1

10

2

10

3

10

4

10

-3

10

-2

10

-1

10

0

10

1

Delay [

ps

]

Energy [

fJ

]

90nm

65nm

45nm

32nm

22nm

Energy LimitSlide45

Complex versus Simple Gates

Example

(from Chapter 4)

Fan-in(2)

Fan-in(4)

versus

Complex gates improve the

I

on

/

I

off

ratio!Slide46

Moving the Minimum Energy Point

stack2

stack4

stack6

V

TH

V

DDSlide47

Complex versus Simple Gates

10

-10

10

-9

10

-8

10

-18

10

-17

10

-16

10

-15

10

-14

Delay

Energy

Nand4

NaNo2

V

DD

= 1V

V

TH

= 0.1V

V

DD

= 0.14V

V

TH

= 0.25V

V

DD

= 0.1V

V

TH

= 0.22V

V

DD

= 0.34V

V

TH

= 0.43V

V

DD

= 0.29V

V

TH

= 0.38V

a

= 0.1

a

= 0.001Slide48

Controlling Leakage in PTL

Pass Transistor Network

drivers

receivers

No leakage through the logic path

No V

DD

and GND connections in the logic path

Leverage

complexity

Confine leakage to well-defined and controllable paths

[Ref: L. Alarcon, Jolpe’07]Slide49

Sense-Amplifier Based Pass-Transistor Logic (SAPTL)

Pass Transistor network

Leakage path confined to

root node driver and sense amplifier

Sense amplifier to

recover delay and

voltage swing

[Ref: L. Alarcon, Jolpe’07]

S

S

sense

amplifier

stack

root node driver

data inputs

timing control

outputsSlide50

Sense-Amplifier Based Pass-Transistor Logic (SAPTL)

Root

Input

A

B

S

S

P

0

to

sense

amp

A

B

B

B

S

S

Out

Out

CK

CK

CK

CK

CK

Outputs

pre-charged

to

V

DD

during

low CK

cycle (pre-conditioning subsequent logic module)

Latch retains

value

even

after

inputs are pulled low

Low voltage operation (300 mV

)

Current steering

Works with very low I

on

/

I

off

Regular and balanced

(Programmable)

[Ref: L. Alarcon, Jolpe’07]Slide51

Static CMOS

SAPTL

TG-CMOS

90nm CMOS

V

DD

: 300mV – 1V

V

TH

300mV

Energy-Delay

Trade-off

Energy (

fJ

)

Delay (FO4 @ 1V)

1

10

100

1K

1

10

100

1K

10K

100K

V

DD

= 450mV SAPTL

V

DD

= 300mV TG-CMOS

V

DD

= 900mV SAPTL

V

DD

= 400mV Static CMOS

V

DD

=1V TG-CMOS

V

DD

= 550mV Static CMOS

V

DD

scaling

still works!

20

2.5K

Sweet-spot:

< 10

fJ

> 2.5k FO4

10

[Ref: L. Alarcon, Jolpe07]Slide52

Summary

To continue scaling, a reduction in energy per operation is necessary

This is complicated by the perceived lower limit on the supply voltage

Design techniques such as

circuits operating in weak or moderate inversion, combined with innovative logic styles are

essential if voltage scaling is to continue

Ultimately the deterministic Boolean model of computation may have to be abandoned

.Slide53

References

Books and Book Chapters

E.

Vittoz

, “Weak Inversion for Ultimate Low-Power Logic,” in C.

Piguet

, Ed.,

Low-Power Electronics Design

, Ch. 16, CRC Press, 2005.

A. Wang, A. Chandrakasan,

Sub-Threshold Design for Ultra Low-Power Systems, Springer, 2006.ArticlesL. Alarcon, T.T. Liu, M. Pierson, J. Rabaey, “Exploring Very Low-Energy Logic: A Case Study,” Journal of Low Power Electronics, Vol. 3, No. 3. , December 2007.

B. Calhoun and A. Chandrakasan, “A 256kb Sub-threshold SRAM in 65nm CMOS,”, Digest of Technical Papers, ISSCC 2006, pp. 2592-2601, San Francisco, Febr. 2006.

J. Chen et al, “An Ultra-Low_Power Memory with a Subthreshold Power Supply Voltage,” IEEE Journal of Solid State Circuits, Vol. 41 No 10, pp. 2344-2353, Oct 2006.C. Enz, F. Krummenacher, and E. Vittoz

, “An Analytical MOS Transistor Model Valid in All Regions of Operation and Dedicated to Low Voltage and Low-Current Applications,” Analog Integrated Circuits and Signal Proc., vol. 8, pp. 83-114, July 1995.S. Hanson et al., “Exploring Variability and Performance in a Sub-200-mV Processor,” in Journal of Solid State Circuits, Vol. 43, No. 4, pp. 881-891, April 2008.R.

Landauer, “Irreversibility and heat generation in the computing process,” IBM Journal Res. Develop, 5:183-191, 1961.C. Marcu, M. Mark, and J. Richmond, “Energy-Performance Optimization Considerations in All Regions of MOSFET Operation with Emphasis on IC=1”, Project Report EE241, UC Berkeley, Spring 2006.

J.D. Meindl, J. Davis,“The

fundamental limit on binary switching energy for tera scale integration (TSI)”, IEEE Journal of Solid-State Circuits, Volume 35, Issue 10, pp. 1515 – 1516, Oct 2000.M. Seok et al, “The Phoenix Processor: A 30 pW Platform for Sensor Applications,” Proceedings VLSI Symposium, Honolulu, June 2008.Slide54

References (

cntd

)

R. Swanson and J.

Meindl

, “Ion-Implanted Complementary MOS Transistors in Low-Voltage Circuits,” IEEE J. Solid State Circuits, vol. SC-7, pp. 146-153, April 1972.

E.

Vittoz

and J.

Fellrath, “CMOS Analog Integrated Circuits based on Weak-Inversion Operation,” IEEE J. Solid State Circuits, vol. SC-12, pp. 224-231, June 1977.J. von Neumann, “Theory of Self-Reproducing Automata,” in A.W. Burks, Ed., Univ. Illinois Press, Urbana, 1966.

A. Wang, A. Chandrakasan, "A 180mV FFT Processor Using Subthreshold Circuit Techniques", Digest of Technical Papers, ISSCC 2004, pp. 292-293, San Francisco,

Febr. 2004.K. Yano et al., “A 3.8 ns CMOS 16 × 16 Multiplier using Complimentary Pass-Transistor Logic,” IEEE Journal of Solid State Circuits, vol. SC-25, No 2, pp. 388-395, April 1990.