Circuits Dejan Marković Borivoje Nikoli ć Chapter Outline Optimization framework for energydelay tradeoff Dynamic power optimization Multiple supply voltages Transistor sizing Technology mapping ID: 321826
Download Presentation The PPT/PDF document "Optimizing Power @ Design Time" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Optimizing Power @ Design TimeCircuits
Dejan
Marković
Borivoje
Nikoli
ćSlide2
Chapter OutlineOptimization framework for energy-delay trade-off
Dynamic power optimization
Multiple supply voltages
Transistor sizing
Technology mapping
Static power optimization
Multiple thresholds
Transistor stackingSlide3
Energy/Power Optimization Strategy
For given function and activity, an
optimal operation point
can be derived in the energy-performance space
Time of optimization depends upon activity profile
Different optimizations apply to active and static power
Fixed Activity
Variable Activity
No Activity - Standby
Active
Design time
Run time
Sleep
StaticSlide4
Maximize throughput for given energy
or
Minimize energy for given throughput
Delay
Unoptimized
design
E
max
D
max
D
min
Energy/op
E
min
Energy-Delay Optimization and Trade-off
Trade-off space
Other important metrics: Area, Reliability, ReusabilitySlide5
The Design Abstraction Stack
Logic/RT
(Micro-)Architecture
Software
Circuit
Device
System/Application
This Chapter
A very
rich set of design parameters
to consider!
It helps to consider options in relation to their abstraction layer
sizing, supply, thresholds
logic family, standard cell versus custom
Parallel versus pipelined, general purpose versus application specific
Bulk versus SOI
Choice of algorithm
Amount of concurrencySlide6
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Optimization Can/Must Span Multiple Levels
Design optimization combines top-down and bottom-up:
“meet-in-the-middle”Slide7
topology A
Delay
Energy/op
Globally optimal energy-delay curve for a given function
Energy-Delay Optimization
topology B
topology A
topology B
Delay
Energy/opSlide8
Some Optimization Observations
∂
E
/
∂A
∂
D
/ ∂A
A=A0SA=
S
B
S
A
f
(A
0
,B)
f
(A,B0)
Delay
Energy
D0
(A0,B0
)
Energy-Delay Sensitivities
[Ref:
V.
Stojanovic
, ESSCIRC’02
]Slide9
∆E = S
A
∙(
∆D
)
+ S
B
∙
∆D
On the optimal curve, all sensitivities must be equalFinding the Optimal Energy-Delay Curve
f
(A
0,B)
f (A,B
0)Delay
Energy
D0(A0,B0)
∆D
f
(A
1
,B)
Pareto-optimal:
the best that can be achieved without disadvantaging at least one metric.Slide10
Reducing voltages
Lowering the supply voltage (
V
DD
) at the expense of clock speed
Lowering the logic swing (
V
swing
)
Reducing transistor sizes (CL) Slows down logicReducing activity (a)Reducing switching activity through transformationsReducing glitching by balancing logicReducing Active Energy @ Design TimeSlide11
Downsizing and/or lowering the supply on the critical path lowers the operating frequencyDownsizing
non-critical paths
reduces energy for free, but
Narrows down the path delay distribution
Increases impact of variations, impacts robustness
t
p
(path)
# of pathstargetdelay
t
p (path)
# of paths
target
delay
ObservationSlide12
topology A
topology B
Delay
Energy/op
Reference case
D
min
sizing @
V
DD
max
,
V
TH
ref
minimize Energy (
V
DD, VTH, W) subject to Delay (VDD, VTH, W) ≤ Dcon
Constraints
VDDmin
< VDD < VDDmax
VTHmin < VTH
< VTHmax Wmin
< W
Circuit Optimization Framework
[Ref: V. Stojanovic, ESSCIRC’02]Slide13
i
i+1
C
w
g
C
i
C
i
C
i+1
Optimization Framework: Generic Network
V
DD,i+1
V
DD,i
Gate in stage
i
loaded by fanout (stage
i
+1)Slide14
Fit
parameters:
V
on
,
d,
K
d,
g
Alpha-power based Delay ModelVDDref = 1.2V, technology 90 nm
(90nm technology)
0
2
4
6
8
10
0
10
20
30
40
50
60
Fanout (
C
i+
1
/
C
i
)
Delay (ps)
t
p
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
2
2.5
3
3.5
4
V
DD
/ V
DD
ref
FO4 delay (norm.)
V
on
= 0.37 V
a
d
= 1.53
simulation
model
t
nom
= 6 ps
g
= 1.35
simulation
modelSlide15
Parasitic delay
p
i
–
depends upon gate topology
Electrical
effort
f
i
≈ Si+1/SiLogical effort gi – depends upon gate topologyEffective fanout hi = figi
For Complex Gates
[Ref: I. Sutherland, Morgan-Kaufman’99]
Combined with Logical Effort FormulationSlide16
= energy consumed by logic gate i
Dynamic Energy
i
i+1
C
w
g
C
i
C
i
C
i+1
V
DD,i+1
V
DD,iSlide17
for equal
h
(D
min
)
max at
V
DD
(max)
(Dmin)Depends on Sensitivity (E/D)Optimizating Return on Investment (ROI)
Gate Sizing
Supply VoltageSlide18
Properties of inverter chain
Single path topology
Energy increases geometrically from input to output
Example: Inverter Chain
1
S
1
=
1S2
…SN
S3
Goal
Find optimal sizing S = [S1, S2, …, SN], supply voltage, and buffering strategy to achieve the best energy-delay tradeoffSlide19
Variable taper achieves minimum energy
Reduce number of stages at large
d
inc
[Ref: Ma, JSSC’94
]
Inverter Chain: Gate Sizing
1
2
3
4
5
6
7
0
5
10
15
20
25
stage
effective fanout,
h
0%
1%
10%
30%
d
inc
= 50%
nom
optSlide20
V
DD
reduces energy of the final load first
Variable taper achieved by voltage scaling
Inverter Chain:
V
DD
Optimization
1
2
3
4
5
6
7
0
0.2
0.4
0.6
0.8
1.0
stage
V
DD
/ V
DD
nom
0%
1%
10%
30%
d
inc
= 50%
nom
optSlide21
Parameter with the largest sensitivity has the largest potential for energy reduction
Two discrete supplies mimic per-stage V
DD
Inverter Chain: Optimization Results
50
inc
0
10
20
30
40
0
20
40
60
80
100
d
(%)
energy reduction (%)
0
10
20
30
40
50
0
0.2
0.4
0.6
0.8
1.0
d
inc
(%)
Sensitivity (norm)
cV
DD
S
gV
DD
2V
DDSlide22
Tree adder
Long wires
Re-convergent paths
Multiple active outputs
Example: Kogge-Stone Tree Adder
[
Ref: P.
Kogge
, Trans. Comp’73]Slide23
sizing: E (-54%)
d
inc
=10%
reference
D=D
min
2V
dd
: E (-27%)dinc=10%Tree Adder: Sizing vs. Dual-VDD OptimizationReference design: all paths are critical
Internal energy S more effective than VDDS: E(-54%), 2Vdd: E(-27%) at d
inc = 10%Slide24
Tree Adder: Multi-dimensional Search
Can get pretty close to optimum with only 2 variables
Getting the minimum speed or delay is very expensive
Energy / E
ref
Delay / D
min
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
Reference
S, V
DD
V
DD
, V
TH
S, V
TH
S, V
DD
, V
THSlide25
Block-level supply assignmentHigher throughput/lower latency functions are implemented in higher
V
DD
Slower functions are implemented with lower
V
DDThis leads to so-called “voltage islands” with separate supply grids
Level conversion performed at block boundaries
Multiple supplies inside a block
Non-critical paths moved to lower supply voltage
Level conversion within the blockPhysical design challengingMultiple Supply VoltagesSlide26
V
1
= 1.5V,
V
TH
= 0.3V
Using Three V
DD
’s
+V2 (V)V
3 (V)0.4
0.6
0.8
1
1.2
1.4
0.4
0.60.811.21.4
V
2
(V)
V
3
(V)
Power Reduction Ratio
0
0.5
1
1.5
0
0.5
1
1.5
0.4
0.5
0.6
0.7
0.8
0.9
1
[Ref:
T. Kuroda,
ICCAD’02
]
© IEEE 2002Slide27
1.0
0.5
VDD Ratio
1.0
0.4
0.5
1.0
1.5
V
1
(V)
P Ratio
V
2
/
V
1
P
2
/
P
1
{
V
1
,
V
2
}
V
2
/
V
1
V
3
/
V
1
{
V
1
,
V
2
,
V
3
}
0.5
1.0
1.5
V
1
(V)
P
3
/
P
1
V
2
/
V
1
V
3
/
V
1
V
4
/
V
1
0.5
1.0
1.5
V
1
(V)
P
4
/
P
1
{
V
1
,
V
2
,
V
3
,
V
4
}
[Ref:
M. Hamada,
CICC’01]
Optimum Number of
V
DD
’s
The more V
DD
’s the less power, but the effect saturates
Power reduction effect decreases with scaling of V
DD
Optimum V
2
/V
1
is around 0.7
© IEEE 2001Slide28
Two supply voltages per block are optimalOptimal ratio between the supply voltages is 0.7
Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)
An option is to use an asynchronous level converter
More sensitive to coupling and supply noise
Lessons: Multiple Supply VoltagesSlide29
i1
o1
V
DDH
V
DDL
V
SS
Conventional
V
DDH circuit
VDDL
circuit
i2
o2
i1
o1
V
DDH
V
DDL
V
SS
Shared N-well
V
DDH
circuit
V
DDL
circuit
i2
o2
Distributing Multiple Supply VoltagesSlide30
V
DDH
circuit
V
DDH
V
DDL
V
SS
N-well isolation
V
DDL
circuit
(
a) Dedicated row
(
b) Dedicated region
V
DDH
Row
V
DDH
Row
V
DDH
Region
V
DDL
Region
Conventional
V
DDL
Row
V
DDL
RowSlide31
V
DDH
circuit
V
DDH
V
DDL
V
SS
Shared N-well
V
DDL
circuit
(
a) Floor plan image
V
DDL
circuit
V
DDH
circuit
Shared N-Well
[Shimazaki et al, ISSCC’03]Slide32
Lower V
DD
portion is shared
[Ref:
M. Takahashi,
ISSCC’98]
“Clustered voltage scaling”
Example: Multiple Supplies in a Block
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
CVS Structure
Conventional Design
Critical Path
Level-Shifting F/F
Critical Path
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
© IEEE 1998Slide33
Pulsed Half-Latch versus Master-Slave
LCFFs
Smaller # of
MOSFETs
/ clock loading
Faster level conversion using half-latch structure
Shorter D-Q path from pulsed circuit
[Ref:
F. Ishihara, ISLPED’03]Level Converting Flip-Flops (LCFFs)
Master-Slave
Pulsed Half-Latch
© IEEE 2003Slide34
Pulsed precharge LCFF (PPR)
Fast level conversion by precharge mechanism
Suppressed charge/discharge toggle by conditional capture
Short D-Q path
[
Ref: F. Ishihara,
ISLPED’03]
Dynamic Realization of Pulsed LCFF
Pulsed Precharge Latch
© IEEE 2003Slide35
carry
gen.
partial
sum
gp
gen.
5:1
MUX
ain
bin
carry
s0/s1
sum
sumb (long loop-back bus)
clk
clock gen.
:
V
DDH
circuit
:
V
DDL
circuit
INV1
INV2
0.5
pF
sum
sel.
2:1
MUX
9:1
MUX
logical
unit
9:1
MUX
ain0
Case Study: ALU for 64-bit
m
Processor
[Ref:
Y.
Shimazaki
,
ISSCC’03]
© IEEE 2003Slide36
sum
keeper
pc
sumb
V
DDH
V
DDL
INV1
INV2
domino level converter (9:1 MUX)
ain0
sel
(V
DDH
)
V
DDH
V
DDL
INV2 is placed near 9:1 MUX to increase noise immunity
Level conversion is done by a domino 9:1 MUX
Low-Swing Bus and Level Converter
[Ref:
Y.
Shimazaki
,
ISSCC’03]
© IEEE 2003Slide37
Single-supply
Shared well
(
V
DDH
=1.8V)
Energy [pJ]
T
CYCLE
[ns]
Room temperature
200
300
400
500
600
700
800
0.6
0.8
1.0
1.2
1.4
1.6
1.16
GHz
V
DDL
=1.4V
Energy:-25.3%
Delay :+2.8%
V
DDL
=1.2V
Energy:-33.3%
Delay :+8.3%
Measured Results: Energy and Delay
[Ref:
Y.
Shimazaki
,
ISSCC’03]
© IEEE 2003Slide38
Practical Transistor SizingContinuous sizing of transistors only an option in custom design
In ASIC design flows, options set by available library
Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell
Leads to larger libraries (> 800 cells)
Easily integrated into technology mappingSlide39
Larger gates reduce capacitance, but are slower
Technology Mapping
a
b
c
slack=1
d
fSlide40
(a) Implemented using 4 input NAND + INV(b) Implemented using 2 input NAND + 2-input NOR
Library 1:
High-Speed
Technology Mapping
Example: 4-input AND
Gate
type
Area
(cell unit)
Input
cap. (fF)
Average delay (ps)
Average delay (ps)
INV
3
1.8
7.0 + 3.8 C
L
12.0 + 6.0 C
L
NAND2
4
2.0
10.3 + 5.3 C
L
16.3 + 8.8 C
L
NAND4
5
2.0
13.6 + 5.8 C
L
22.7 + 10.2 C
L
NOR2
3
2.2
10.7 + 5.4 C
L
16.7 + 8.9 C
L
Library 2:
Low-Power
(delay formula: C
L
in fF)
(numbers calibrated for 90 nm)Slide41
Technology Mapping – Example
4-input AND
(a) NAND4 + INV
(b) NAND2 + NOR2
Area
8
11
HS: Delay (ps)
31.0 + 3.8 C
L
32.7 + 5.4 C
L
LP: Delay (ps)
53.1 + 6.0 C
L
52.4 + 8.9 C
L
Sw Energy (fF)
0.1 + 0.06 C
L
0.83 + 0.06 C
L
Area
4-input more compact than 2-input (2 gates vs. 3 gates)
Timing
both implementations are 2-stage realizations
2
nd
stage INV (a) is better driver than NOR2 (b)
For more complex blocks, simpler gates will show better performance
Energy
Internal switching increases energy in the 2-input case
Low-power library has worse delay, but lower leakage (see later)Slide42
Technology mapping
Gate selection
Sizing
Pin assignment
Logical Optimizations
Factoring
Restructuring
Buffer insertion/deletion
Don’t care optimization
Gate-Level Tradeoffs for PowerSlide43
Logic restructuring to minimize spurious transitions
Buffer insertion for path balancing
Logic Restructuring
0
1
1
1
0
1
1
1
0
1
1
1
1
1
111
1
2
3Slide44
Idea:
Modify network to reduce capacitance
Caveat:
This may increase activity!
p
a
= 0.1; p
b
= 0.5; p
c = 0.5Algebraic Transformations
a
b
c
f
f
a
a
b
c
p
1
=0.05
p
2
=0.05
p
3
=0.075
p
4
=0.75
p
5
=0.075Slide45
Joint optimization over multiple design parameters possible using sensitivity-based optimization frameworkEqual marginal costs
⇔ Energy-efficient design
Peak performance is VERY power inefficient
About 70% energy reduction for 20% delay penalty
Additional variables for higher energy-efficiency
Two supply voltages in general sufficient; 3 or more supply voltages only offer small advantage
Choice between sizing and supply voltage parameters depends upon circuit topology
But … leakage not considered so far
Lessons from Circuit OptimizationSlide46
Considering leakage as well as dynamic power is essential in sub-100 nm technologiesLeakage is not essentially a bad thing
Increased leakage leads to improved performance, allowing for lower supply voltages
Again a trade-off issue …
Considering Leakage @ Design TimeSlide47
Must adapt to process and activity variations
Topology
Inv
Add
Dec
(E
Lk
/E
Sw
)
opt
0.8
0.5
0.2
Leakage – Not Necessarily a Bad Thing
Optimal designs have high leakage (E
Lk
/E
Sw
≈
0.5)
10
-2
10
-1
10
0
10
1
0
0.2
0.4
0.6
0.8
1
E
static
/E
dynamic
E
norm
V
th
ref
-180mV
0.81V
DD
max
V
th
ref
-140mV
0.52V
DD
max
Version 1
Version 2
[
Ref: D.
Markovic
, JSSC’04]
© IEEE 2004Slide48
Switching energy
Leakage energy
with:
I
0
(
Y
): normalized leakage current with inputs in state
Y
Refining the Optimization ModelSlide49
Using longer transistorsLimited benefitIncrease in active
current
Using higher thresholds
Channel doping
Stacked devices
Body biasing
Reducing the voltage!!
Reducing Leakage @ Design TimeSlide50
10% longer gates reduce leakage by 50%
Increases switching power by 18% with W/L = const.
Doubling
L
reduces leakage by 5x
Impacts performance
Attractive when don’t have to increase
W
(e.g. memory)
Longer Channels
100
110
120
130
140
150
160
170
180
190
200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Transistor length (nm)
1
2
3
4
5
6
7
8
9
10
90 nm CMOS
Switching energy
Leakage power
Normalized switching energy
Normalized leakage powerSlide51
There is no need for level conversionDual thresholds can be added to standard design flows
High-
V
Th
and Low-
VTh libraries are a standard in sub-0.18m processes
For example: can synthesize using only high-
V
Th
and then only in-place swap in low-VTh cells to improve timing.Second VTh insertion can be combined with resizingOnly two thresholds are needed per blockUsing more than two yields small improvementsUsing Multiple ThresholdsSlide52
V
DD
= 1.5V,
V
TH.
1
= 0.3V
Three
V
TH’s+VTH.3
(V)V
TH.2
(V)
0.4
0.6
0.81
1.2
1.40.40.60.81
1.2
1.4
Leakage Reduction Ratio
V
TH.
3
(V)
V
TH.
2 (V)
0
0.5
1
1.5
0
0.5
1
1.5
0
0.2
0.4
0.6
0.8
1
Impact of third threshold very limited
[Ref:
T. Kuroda,
ICCAD’02
]
© IEEE 2002Slide53
Using Multiple Thresholds
FF
FF
FF
FF
FF
Cell-by-cell
V
TH
assignment (not at block level)
Achieves all-low-VTH performance with substantial leakage reduction in leakage
Low V
TH
High VTH
[Ref: S. Date, SLPE’94
]Slide54
Shaded transistors are low threshold
Low-threshold transistors used only in critical paths
Dual-V
T
Domino
P
1
Inv
1
Inv
2
Inv
3
D
n+1
Clk
n
Clk
n+1
D
n
…Slide55
Easily introduced in standard cell design methodology by extending cell libraries with cells with different thresholds
Selection of cells during technology mapping
No impact on dynamic power
No interface issues (as was the case with multiple
V
DD’s
)
Impact: Can reduce leakage power substantially
Multiple Thresholds and Design MethodologySlide56
High-
V
TH
Only
Low-
V
TH
Only
Dual
V
TH
Total Slack
-53 psec
0 psec
0 psec
Dynamic Power
3.2 mW
3.3 mW
3.2 mW
Static Power
914 nW
3873 nW
1519
nW
All designs synthesized automatically using Synopsys Flows
[Courtesy:
Synopsys, Toshiba, 2004]
Dual-
V
TH
Design for High-Performance DesignSlide57
Example: High- vs. Low-Threshold Libraries
Leakage Power (nW)
Selected combinational tests
130 nm CMOS
[Courtesy: Synopsys 2004]Slide58
Complex Gates Increase I
on
/I
off
Ratio
I
on
and I
off
of single NMOS versus stack of 10 NMOS transistorsTransistors in stack are sized up to give similar driveNo stackStack
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
2
2.5
3
V
DD
(V)
I
off
(nA)
No stack
Stack
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
20
40
60
80
100
120
140
I
on
(
m
A)
V
DD
(V)
(90nm technology)
(90nm technology)Slide59
Complex Gates Increase I
on
/I
off
Ratio
Stacking transistors suppresses submicron effects
Reduced velocity saturation
Reduced DIBL effect
Allows for operation at lower thresholds
StackNo stackFactor 10!
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.5
1
1.5
2
2.5
3
3.5
x 10
5
V
DD
(V)
I
on
/I
off
ratio
(90nm technology)Slide60
Example: 4-input NAND
With transistors sized for similar performance:
Leakage of Fan-in(2) =
Leakage of Fan-in(4)
x 3
(Averaged over all possible input patterns)
Fan-in (2)
Fan-in (4)
versus
Complex Gates Increase I
on
/I
off
Ratio
2
4
6
8
10
12
14
16
0
2
4
6
8
10
12
14
Input pattern
Leakage Current (nA)
Fan-in (2)
Fan-in (4)Slide61
Example: 32 bit Kogge-Stone Adder
[Ref:
S.Narendra
, ISLPED’01
]
% of input vectors
Standby leakage current (
m
A)factor 18
Reducing the threshold by 150 mV increases leakage of single NMOS transistor by factor 60 © Springer 2001Slide62
Circuit optimization can lead to substantial energy reduction at limited performance lossEnergy-delay plots the perfect mechanisms for analyzing energy-delay trade-off’s.
Well-defined optimization problem over
W
,
V
DD and VTH parameters
Increasingly better support by today’s CAD flows
Observe: leakage is not necessarily bad – if appropriately managed.
SummarySlide63
Books:
A
.
Bellaouar
, M.I
Elmasry, Low-Power Digital VLSI Design Circuits and Systems,
Kluwer
Academic Publishers, 1
st
Ed, 1995.D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed, Prentice Hall 2003.I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-Kaufmann, 1st Ed, 1999.Articles:
R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell
-Circuit Technology with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90-91, Oct. 1994.M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput
., vol. C-22, no. 8, pp. 786-793, Aug 1973. T. Kuroda, “Optimization and control of VDD and
VTH for low-power, high-speed CMOS design,” Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.
ReferencesSlide64
Articles (cont.):
H.C. Lin and L.W.
Linholm
, “An Optimized Output Stage for MOS Integrated Circuits,”
IEEE J. Solid-State Circuits
, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.
S
. Ma and P.
Franzon
, “Energy Control and Accurate Delay Estimation in the Design of CMOS Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True Energy-Performance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug. 2004.MathWorks, http://www.mathworks.comS. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195-200, Aug. 2001.T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990.Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.V.
Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European Solid-State Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37, Feb. 1998.
References