for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N Patt V f Dynamic VoltageFrequency Scaling 2 Image source intelcom f opt Impact of Frequency Scaling frequency ID: 380805
Download Presentation The PPT/PDF document "Predicting Performance Impact of DVFS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Predicting Performance Impact of DVFSfor Realistic Memory Systems
Rustam MiftakhutdinovEiman EbrahimiYale N. PattSlide2
V
f
Dynamic Voltage/Frequency Scaling
2
Image source: intel.comSlide3
f
optImpact of Frequency Scalingfrequency
time
power
energy
3Slide4
f
oImpact of Frequency Scaling
power
time
4
frequencySlide5
fopt
Prediction Overview5instructionsfrequencyenergy perinstruction100K
200K
300K
0
f
o
freq.
time
f
o
freq.
power
f
o
f
o
freq.
f
opt
energy
our work
×Slide6
Outline
Intro to performance predictionWhy realistic memory systems?Variable memory latencyPrefetching6✓Slide7
V
f
Why Realistic Memory System?
7Slide8
Prior WorkStall timeLeading loads (2010)
S. Eyerman et al. G. Keramidas et al. B. RountreeEvaluated with
constant access latency
memory system
8Slide9
Energy Savings
Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006*9Slide10
Energy Savings
10Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006*Slide11
Outline11
Intro to performance predictionWhy realistic memory systems?Variable memory latencyPrefetching✓✓Slide12
Execution Example
chip
activity
memory
requests
12
A
B
C
D
E
1
2
3
4
timeSlide13
T
= Tmemory + Tcompute13independent offrequencyproportional tocycle timeSlide14
t
oLinear Modelexecution time Tcycle time t
T
memory
T
compute
14
0Slide15
Measuring
Tmemory
chip
activity
memory
requests
time
15Slide16
Measuring
Tmemory
chip
activity
memory
requests
time
16Slide17
Causes of Request Dependences17
next
next
next
Pointer Chasing
instruction window
miss
miss
Finite Chip ResourcesSlide18
Measuring
T
memory
chip
activity
memory
requests
time
18Slide19
Critical Path Algorithm
at Tstart 1. record Tstart and TmemoryTendTstart
time
T
memory
19
at
T
end
2. compute path
=
T
memory
(
T
start
) + (
T
end
-
T
start
)
old critical path
request latency
3. set
T
memory
=
max(
T
memory
,
path)
new
T
memory
(length of critical path)Slide20
t
oLinear Modelexecution time Tcycle time t
T
memory
T
compute
20
0Slide21
Linear Model
21to
execution time
T
cycle time
t
T
memory
T
compute
0
t
o
cycle
time
T
m
time
f
o
freq.
time
f
o
freq.
power
f
o
freq.
f
opt
energy
×Slide22
Critical Path:
Variable
Access Latency
chip
activity
memory
requests
time
22
Leading Loads:
Constant
Access Latency
time
chip
activity
memory
requestsSlide23
t
oLeading Loadsexecution time Tcycle time t
T
memory
T
compute
23
0
l
eading loadsSlide24
Leading Loads24
toexecution time Tcycle time t
T
memory
T
compute
0
l
eading loads
t
o
cycle
time
T
m
time
f
o
freq.
time
f
o
freq.
power
f
o
freq.
f
opt
energy
×Slide25
Energy Savings
25Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006*Slide26
Outline26
Intro to performance predictionWhy realistic memory systems?Variable memory latencyPrefetching✓✓✓Slide27
chip
activity
memory
requests
time
Prefetcher
OFF
Prefetcher
ON
chip
activity
memory
requests
time
Prefetcher
ON
Frequency
+200
MHz
chip
activity
memory
requests
time
chip
activity
memory
requests
time
Prefetcher
ON
Frequency
+500
MHz
Streaming Workload
27Slide28
Limited Bandwidth Model
execution time Tcycle time tTdemandTcompute
T
memory
min
t
crossover
28
0Slide29
Energy Savings29
Gmean of relative savings for 13 memory-intensive SPEC 2006 benchmarks.Baseline: most energy-efficient static frequency for SPEC 2006*Slide30
Recap
30Intro to performance predictionWhy realistic memory systems?Variable memory latencyPrefetching✓✓✓✓Slide31
Final ThoughtPerformance predictors need realistic evaluation
31