Chapter Outline Rationale Lower Bounds on Computational Energy Subthreshold Logic Moderate Inversion as a Tradeoff Revisiting Logic Gate Topologies Summary Rationale Continued increase of computational density must be combined with decrease in ID: 135411
Download Presentation The PPT/PDF document "Ultra-Low Power/Voltage Design" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Ultra-Low Power/Voltage DesignSlide2
Chapter Outline
Rationale
Lower Bounds on Computational Energy
Subthreshold Logic
Moderate Inversion as a Trade-off
Revisiting Logic Gate Topologies
SummarySlide3
Rationale
Continued increase of computational density must be combined with decrease in
energy/operation (EOP).
Further scaling of supply voltage essential to accomplish that
The only other option is to keep on reducing activity
Some key questions:
How far can the supply voltage be scaled?
What is the minimum energy per operation that can be obtained theoretically and practically?
What to do about the threshold voltage and leakage?
How to practically design circuits that approach the minimum energy bounds?Slide4
Opportunities for Ultra-Low Voltage
Number of applications emerging that do not need high performance, only extremely low power
dissipation
Examples:
Standby operation for mobile components
Implanted
electronics and artificial senses
Smart
objects, fabrics
and
e-textiles
Need power levels below 1
mW
(even
m
W
in certain cases)Slide5
Minimum Operational Voltage of Inverter
Swanson, Meindl (April 1972)
Further extended in Meindl (Oct 2000)
Limitation: gain at midpoint > -1
C
ox
: gate capacitance
C
d
: diffusion capacitance
n
: slope factor
For ideal MOSFET (60 mV/decade slope):
at 300
°
K
or
[
Ref: R. Swanson, JSSC’72; J. Meindl, JSSC’00]
© IEEE
1972Slide6
Subthreshold Modeling of CMOS Inverter
From Chapter 2:
(DIBL can be ignored at low voltages)
withSlide7
Subthreshold DC model of CMOS Inverter
Assume NMOS and PMOS are fully symmetrical and all voltages normalized to the thermal voltage
F
T
=
kT
/
q
(
x
i = Vi/FT; x
o = Vo/FT;
xD = VDD/FT)The VTC of the inverter for NMOS and PMOS in subthreshold
can be derived:
[Ref: E. Vittoz, CRC’05]
with
so that
and
For |
A
Vmax
| = 1:
x
D
= 2ln(
n
+1)Slide8
Results from Analytical Model
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
1
2
3
4
5
6
7
n
x
d
A
max
=1
A
max
=2
A
max
=4
A
max
=10
Normalized VTC for
n
=1.5 as a function of V
DD
(
x
d
)
Subthreshold Inverter
Minimum supply voltage for a given maximum gain as a function of the slope factor
n
[
Ref: E.
Vittoz
, CRC’05
]
x
dmin
= 2ln(2.5) = 1.83 for
n
=1.5
x
d
=4 sufficient for reliable operation
x
D
=8
x
D
=6
x
D
=4
x
D
=1
x
D
=2
n=1.5
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
x
i
x
oSlide9
Confirmed by simulation (at 90 nm)
Observe: non-symmetry of VTC increases
VDD
min
For
n
=1.5,
VDD
min
= 1.83
F
T= 48 mVMinimum operational supply voltage
pn-ratio
VDDmin (mV)Slide10
Also
Holds
for
More Complex Gates
Degradation due to
asymmetry
Minimum operational supply voltage
(2-input NOR)
pn
-ratioSlide11
Minimum Energy per Operation
Moving one electron over
VDD
min
:
Emin
= QV
DD
/2 =
q 2(ln2)kT/2q = kTln(2)Also called the
Von Neumann-Landauer-Shannon boundAt room temperature (300K): Emin
= 0.29 10-20 J Minimum sized CMOS inverter at 90 nm operating at 1V
E = CVDD2 = 0.8 10-15 J, or 5 orders of magnitude larger!
J.
von Neumann,
[Theory of Self-Reproducing Automata, 1966]
.
Predicted by von Neumann: kTln(2)
How close can one get?[Ref: J. Von Neumann, Ill’66]Slide12
Propagation Delay of Subthreshold Inverter
Normalizing
t
p
to
t
0
=
C
F
T
/I0:
(for VDD >> FT
)Comparison between
curve-fitted model and simulations (FO4, 90 nm)
3
4
5
6
7
8
9
10
0
20
40
60
80
100
120
x
d
t
p
t
0
= 338
n = 1.36
(nsec)Slide13
Dynamic Behavior
Also: Short circuit current ignorable if input rise time smaller than
t
0
,
or balanced slopes at in- and outputs
0
0.5
1
1.5
2
2.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time (normalized to
t
0
)
Voltage
(norm. to 4
F
T
)
Transient response
t
r
= 2
t
0
t
0
0.5
t
0
0
t
p
t
p
as a function of
t
rise
0
0.5
1
1.5
2
2.5
3
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
t
rise
t
p
(normalized to
t
0
)
x
D
= 4Slide14
Power Dissipation of Subthreshold Inverter
P
dyn
=
CV
DD
2
f
(nothing new)
Short-circuit power can be ignored (< 1%) for well-proportioned circuits and xD >= 4
1
2
3
4
5
6
7
8
9
10
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
x
D
I
Stat
I
0
n=1.5
circuit fails
logic levels degenerate
Leakage current equal to
I
0
for
x
D
>= 4 (ignores DIBL)
Increases for smaller values of
x
D
due to degeneration of logic levelsSlide15
Power-Delay Product and Energy-Delay
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10
x
d
pdp
For low activity (
a
<< 1), large
x
D
advantageous!
3
4
5
6
7
8
9
10
0
0.5
1
1.5
ed
x
d
a
=1
a
=.5
a
=.25
a
=.1
a
=.01
a
=.05
a
=1
a
=.5
a
=.25
a
=.1
a
=.01Slide16
Energy for a Given Throughput
Most important question –
assuming 1/T
=
a
/2t
p
what minimizes the energy for a given task?
3
4
5
6
7
8
9
10
11
12
10
1
10
2
10
3
10
4
x
d
energy
Energy minimized by keeping
a
as high as possible and have computation occupy most of the time – use minimum voltage that meets T
If
a
must be low because of topology (< 0.05), there exists an optimum voltage that minimizes the energy
a
=1
a
=.1
a
=.05
a
=.01
a
=.005
a
=0.001
dynamic power dominatesSlide17
Example: Energy-Aware FFT
[Ref: A
. Wang,
ISSCC’04]
Architecture scales gracefully from 128 to 1024 point lengths, and supports 8b and 16b precision.
© IEEE
2004Slide18
FFT Energy-Performance Curves
The optimal
V
DD
for the 1024-point, 16b FFT is estimated from switching and leakage models for a 0.18
m
m process.
Optimal (
V
DD
, V
TH
)Threshold Voltage (
VTH)
Supply Voltage (
VDD)
[Ref: A. Wang, ISSCC’04]
© IEEE 2004Slide19
SubThreshold FFT
0.18
m
m CMOS process
V
DD
=180mV-900mV
fclock = 164Hz-6MHz.
At 0.35V, Energy = 155nJ/FFT; fclock = 10kHz; W = 0.6
m
W
Data Memory
Twiddle
ROMs
Butterfly
Datapath
Control logic
2.1 mm
2.6 mm
V
DD
(mV)
Clock frequency
V
DD
(mV)
1024-point, 16 bit
measured
estimated
Energy (nJ)
[Ref: A
. Wang,
ISSCC’04]
© IEEE
2004Slide20
Challenges in Sub-Threshold Design
Obviously only for very low speed design
Analysis so far only for symmetrical gates – minimum operation voltage increases for non-symmetrical structures
Careful selection of and sizing logic structures is necessary
Data dependencies may cause gates to fail
Process
variations further confound the problem
Registers
and memory a major concernSlide21
Logic Sizing Considerations
W
p
(max)
Inverter with a minimum sized
W
n
0
1
W
p
(min)
drive current
leakage current
CMOS in subthreshold is “ratioed logic”
Careful sizing of transistors necessary to ensure adequate logic levels
Max Size
Min Size
Operational
Region
[Ref: A
. Wang,
ISSCC’04]
180
nm CMOS
© IEEE
2004Slide22
Logic Sizing Considerations
W
p
(max)
SF corner
W
p
(min)
FS corner
W
p
(max
)Inverter sizing analysis and minimum supply voltage analysis must be performed at the process corners.
Variations raise the minimum voltage the circuit can be run at.
Impact of Process Variations
Operational
Region
[Ref: A
. Wang,
ISSCC’04]
© IEEE
2004Slide23
The Impact of Data Dependencies
B
Z
B
A
A
XOR1
Z
B
B
A
B
A
XOR2
100
50
0
1m
2m
3m
4m
0
A=1 B=0
A=0 B=1
A=0 B=0
A=1 B=1
Voltage level at Z (mV)
50
0
Voltage level at Z (mV)
100
1m
2m
3m
4m
0
A=1 B=0
A=0 B=1
A=0 B=0
A=1 B=1
[Ref: A
. Wang,
ISSCC’04]
© IEEE
2004Slide24
The Impact of Data Dependencies
idle current
drive current
A=1, B=0, Z=1
Z
Leakage through the parallel devices causes XOR1 to fail at 100mV.
XOR1
idle current
drive current
A=1, B=0, Z=1
weak drive current
Z
Balanced number of devices reduces the effects of leakage and process variations.
XOR2
Solid sub-threshold design requires symmetry for all input vectors
[Ref: A
. Wang,
ISSCC’04]
© IEEE
2004Slide25
The
Sub-Threshold (Low Voltage) Memory
Challenge
Obstacles that limit functionality at low voltage
SNM
Write margin
Read current / bit-line leakage
Soft errors
Erratic behavior
Read
SNM
worst challenge
SNM read
SNM hold
SNM for sub-V
T
, 6T cell at 300mV
Variation aggravates situationSlide26
Solutions to Enable Sub-
V
TH
Memory
Standard 6T way of doing business won’t work
Voltage scaling versus transistor sizing
Current depends exponentially on voltages in sub-threshold
Use voltages (not sizing) to combat problems
New bitcells
Buffer output to remove Read SNMLower BL leakageComplemented with architectural strategies
ECC, interleaving, SRAM refresh, redundancy Slide27
Sub
-threshold SRAM
Cell
[Ref: B. Calhoun, ISSCC’06
]
Buffered read allows separate Read, Write ports
Removing Read SNM allows operation at lower
V
DD
with
same
stability
at corners;
WL_WR
BLB
BL
Q
QB
VV
DD
RBL
RWL
floating
VVDD floats during write access, but feedback restores ‘1’ to
V
DD
QB=1
RBL=1
0
QBB held
near 1 by
leakage
QB=0
RBL=1
0
QBB =1
leakage
reduced
by stack
Buffer reduces BL leakage: Allows 256 cells/BL instead of 16 cells/BL
Higher
integration reduces area of peripheral circuits
© IEEE
2006Slide28
Sub-threshold
SRAM
Chip
functions without error to below 400mV, holds without error to <250mV:
At
400mV, 3.28mW and 475kHz at 27
o
C
Reads
to 320mV (27
o
C) and 360mV (85oC)
Write to 380mV (27oC) and 350mV (85oC)
256kb SRAM Array
32kb Block
[Ref: B. Calhoun, ISSCC’06
]
Sub-V
TH operation demonstrated in 65nm memory chipSlide29
Example:
Sub-Threshold Microprocessor
Processor for sensor network applications
Simple 8-bit architecture to optimize energy efficiency
3.5
pJ
per instruction at 350mV
and 354 kHz operation
10X less energy than previously reported
11
nW
at 160 mV (300 mV RBB)
41
year operation on 1g Li-ion battery
[Ref: S. Hanson, JSSC’07]
© IEEE
2007Slide30
Prototype Implementation
6 subliminal processors
large solar cell
solar cell for adders
level converter array
discrete adders
processor memories
test memories
solar cell for processor
discrete cells / xtors
solar cell for discretes
test module
Level converter array
Chip Layout with 7 processors
[Courtesy: D. Blaauw, Univ. Michigan]Slide31
Is Sub-threshold the Way to Go?
Achieves lowest possible energy dissipation
But … at a dramatic cost in performance
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0
0.2
0.4
0.6
0.8
1
V
DD
(V)
t
p
(
m
s
)
130 nm CMOSSlide32
In Addition: Huge Timing Variance
0
10
20
30
40
50
60
70
80
0
0.2
0.4
0.6
0.8
1
V
DD
(V)
/
(%)
Normalized timing variance increases dramatically with V
DD
reduction
Design for yield means huge overhead at low voltages:
Worst-case
design at 300mV:
>
200% overkillSlide33
Increased Sensitivity to Variations
Subthreshold circuits operate at low
I
on
/
I
off
ratios, from about a 1000 to less than 10 (at
x
D = 4)Small variations in device parameters can have a large impact, and threaten the circuit operation
1
2
3
4
5
6
7
8
9
10
10
0
10
1
10
2
10
3
I
on
over
I
off
x
DDSlide34
ONE SOLUTION: Back Off A Bit …
The performance cost of minimum energy is exponentially high.
Operating slightly above the threshold voltage improves performance dramatically while having small impact on energy
The Challenge: Modeling in the
Moderate Inversion
region
Delay
Energy
Optimal E-D Trade-off CurveSlide35
The EKV Model
covers
strong, moderate and weak inversion regions
Modeling Over All Regions of Interest
Inversion Coefficient IC
measures
the
degree
of
saturation
with
k
a fit factor and
I
S
the specific current
and is related directly to
V
DD
[Ref: C.
Enz
, Analog’95]Slide36
Relationship between
V
DD
and
IC
10
-3
10
-2
10
-1
10
0
10
1
10
2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IC
V
DD
Threshold changes
move curves up or down
IC
= 1 equals
V
DD
~
V
TH
90 nm CMOS
weak
moderate
strongSlide37
10
-2
10
0
10
2
10
-2
10
-1
10
0
10
1
10
2
10
3
IC
Normalized
t
p
Model
Simulation
Provides Good Match over Most of the Range
Largest deviations in strong inversion –
Velocity saturation not well handled by simple model
strong
inversion
weak
inversionSlide38
Modeling Energy
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
-16
10
-15
IC
EOP
[J]
a
=1
a
=0.2
a
=0.02
a
=0.002Slide39
High Activity Scenario
0.1
1
1
1
1
2
2
2
2
4
4
4
4
6
6
6
8
8
10
10
12
14
V
TH
V
DD
0.01
0.01
0.1
0.1
0.1
0.1
1
1
1
1
2
2
2
2
3
3
3
3
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Equal energy
Equal performance
IC
= 1
Minimum energy
(90 nm,
a
= 0.02)Slide40
Low Activity Scenario
0.1
1
1
1
1
2
2
2
2
4
4
4
4
6
6
6
8
8
10
10
12
14
V
TH
V
DD
0.01
0.1
0.1
0.1
1
1
1
1
1
1
2
2
2
2
2
3
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Equal energy
Equal performance
IC
= 1
Minimum energy
(90 nm,
a
= 0.002)Slide41
Example: Adder
Simple full-adder using NAND & INV onlySlide42
10
0
10
1
10
2
10
3
10
-3
10
-2
10
-1
10
0
Optimizing over size, V
DD
, V
TH
(full range)
delay
(min delay, max energy)
energy
10
-1
10
0
10
1
10
2
IC
Delay and energy normalized to minimum delay and corresponding maximum energy
Significant energy savings within strong inversion
Relatively little energy savings going from moderate
to weak
Higher potential for energy savings
when
activity is
lower
a
=0.1
a
=0.01
a
=0.001
V
TH
↑
V
DD
↓
[Ref: C.
Marcu
, UCB’06]Slide43
Sensitivity to Parameter Variations
[Ref: C.
Marcu
, UCB’06]Slide44
Moving the Minimum Energy Point
Having the minimum energy point in the sub-threshold region is unfortunate
Sub-threshold energy savings are small and expensive
Further technology scaling not offering much relief
Remember the stack effect …
Can it be moved upwards?
Or equivalently… Can we lower the threshold?
10
0
10
1
10
2
10
3
10
4
10
-3
10
-2
10
-1
10
0
10
1
Delay [
ps
]
Energy [
fJ
]
90nm
65nm
45nm
32nm
22nm
Energy LimitSlide45
Complex versus Simple Gates
Example
(from Chapter 4)
Fan-in(2)
Fan-in(4)
versus
Complex gates improve the
I
on
/
I
off
ratio!Slide46
Moving the Minimum Energy Point
stack2
stack4
stack6
V
TH
V
DDSlide47
Complex versus Simple Gates
10
-10
10
-9
10
-8
10
-18
10
-17
10
-16
10
-15
10
-14
Delay
Energy
Nand4
NaNo2
V
DD
= 1V
V
TH
= 0.1V
V
DD
= 0.14V
V
TH
= 0.25V
V
DD
= 0.1V
V
TH
= 0.22V
V
DD
= 0.34V
V
TH
= 0.43V
V
DD
= 0.29V
V
TH
= 0.38V
a
= 0.1
a
= 0.001Slide48
Controlling Leakage in PTL
Pass Transistor Network
drivers
receivers
No leakage through the logic path
No V
DD
and GND connections in the logic path
Leverage
complexity
Confine leakage to well-defined and controllable paths
[Ref: L. Alarcon, Jolpe’07]Slide49
Sense-Amplifier Based Pass-Transistor Logic (SAPTL)
Pass Transistor network
Leakage path confined to
root node driver and sense amplifier
Sense amplifier to
recover delay and
voltage swing
[Ref: L. Alarcon, Jolpe’07]
S
S
sense
amplifier
stack
root node driver
data inputs
timing control
outputsSlide50
Sense-Amplifier Based Pass-Transistor Logic (SAPTL)
Root
Input
A
B
S
S
P
0
to
sense
amp
A
B
B
B
S
S
Out
Out
CK
CK
CK
CK
CK
Outputs
pre-charged
to
V
DD
during
low CK
cycle (pre-conditioning subsequent logic module)
Latch retains
value
even
after
inputs are pulled low
Low voltage operation (300 mV
)
Current steering
Works with very low I
on
/
I
off
Regular and balanced
(Programmable)
[Ref: L. Alarcon, Jolpe’07]Slide51
Static CMOS
SAPTL
TG-CMOS
90nm CMOS
V
DD
: 300mV – 1V
V
TH
300mV
Energy-Delay
Trade-off
Energy (
fJ
)
Delay (FO4 @ 1V)
1
10
100
1K
1
10
100
1K
10K
100K
V
DD
= 450mV SAPTL
V
DD
= 300mV TG-CMOS
V
DD
= 900mV SAPTL
V
DD
= 400mV Static CMOS
V
DD
=1V TG-CMOS
V
DD
= 550mV Static CMOS
V
DD
scaling
still works!
20
2.5K
Sweet-spot:
< 10
fJ
> 2.5k FO4
10
[Ref: L. Alarcon, Jolpe07]Slide52
Summary
To continue scaling, a reduction in energy per operation is necessary
This is complicated by the perceived lower limit on the supply voltage
Design techniques such as
circuits operating in weak or moderate inversion, combined with innovative logic styles are
essential if voltage scaling is to continue
Ultimately the deterministic Boolean model of computation may have to be abandoned
.Slide53
References
Books and Book Chapters
E.
Vittoz
, “Weak Inversion for Ultimate Low-Power Logic,” in C.
Piguet
, Ed.,
Low-Power Electronics Design
, Ch. 16, CRC Press, 2005.
A. Wang, A. Chandrakasan,
Sub-Threshold Design for Ultra Low-Power Systems, Springer, 2006.ArticlesL. Alarcon, T.T. Liu, M. Pierson, J. Rabaey, “Exploring Very Low-Energy Logic: A Case Study,” Journal of Low Power Electronics, Vol. 3, No. 3. , December 2007.
B. Calhoun and A. Chandrakasan, “A 256kb Sub-threshold SRAM in 65nm CMOS,”, Digest of Technical Papers, ISSCC 2006, pp. 2592-2601, San Francisco, Febr. 2006.
J. Chen et al, “An Ultra-Low_Power Memory with a Subthreshold Power Supply Voltage,” IEEE Journal of Solid State Circuits, Vol. 41 No 10, pp. 2344-2353, Oct 2006.C. Enz, F. Krummenacher, and E. Vittoz
, “An Analytical MOS Transistor Model Valid in All Regions of Operation and Dedicated to Low Voltage and Low-Current Applications,” Analog Integrated Circuits and Signal Proc., vol. 8, pp. 83-114, July 1995.S. Hanson et al., “Exploring Variability and Performance in a Sub-200-mV Processor,” in Journal of Solid State Circuits, Vol. 43, No. 4, pp. 881-891, April 2008.R.
Landauer, “Irreversibility and heat generation in the computing process,” IBM Journal Res. Develop, 5:183-191, 1961.C. Marcu, M. Mark, and J. Richmond, “Energy-Performance Optimization Considerations in All Regions of MOSFET Operation with Emphasis on IC=1”, Project Report EE241, UC Berkeley, Spring 2006.
J.D. Meindl, J. Davis,“The
fundamental limit on binary switching energy for tera scale integration (TSI)”, IEEE Journal of Solid-State Circuits, Volume 35, Issue 10, pp. 1515 – 1516, Oct 2000.M. Seok et al, “The Phoenix Processor: A 30 pW Platform for Sensor Applications,” Proceedings VLSI Symposium, Honolulu, June 2008.Slide54
References (
cntd
)
R. Swanson and J.
Meindl
, “Ion-Implanted Complementary MOS Transistors in Low-Voltage Circuits,” IEEE J. Solid State Circuits, vol. SC-7, pp. 146-153, April 1972.
E.
Vittoz
and J.
Fellrath, “CMOS Analog Integrated Circuits based on Weak-Inversion Operation,” IEEE J. Solid State Circuits, vol. SC-12, pp. 224-231, June 1977.J. von Neumann, “Theory of Self-Reproducing Automata,” in A.W. Burks, Ed., Univ. Illinois Press, Urbana, 1966.
A. Wang, A. Chandrakasan, "A 180mV FFT Processor Using Subthreshold Circuit Techniques", Digest of Technical Papers, ISSCC 2004, pp. 292-293, San Francisco,
Febr. 2004.K. Yano et al., “A 3.8 ns CMOS 16 × 16 Multiplier using Complimentary Pass-Transistor Logic,” IEEE Journal of Solid State Circuits, vol. SC-25, No 2, pp. 388-395, April 1990.