Let Hard Disks Sleep Well and Work Energetically Feng Chen and Xiaodong Zhang Dept of Computer Science and Engineering The Ohio State University Power Management in Hard Disk Power management ID: 634118
Download Presentation The PPT/PDF document "Caching for Bursts ( C-Burst" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically
Feng
Chen and
Xiaodong
Zhang
Dept. of Computer Science and Engineering
The Ohio State UniversitySlide2
Power Management in Hard Disk Power management is a requirement in computer system designMaintenance costReliability & DurabilityEnvironmental effectsBattery life
Hard disk drive
is a big energy consumer, for I/O intensive jobs
e.g. hard disk drives account for 86% of total energy consumption in EMC Symmetrix 3000 storage systems. As multi-core CPUs become more energy efficient, disks are less.
2Slide3
Standard Power Management Dynamic Power Management (DPM) When disk is idle, spin it down to save energyWhen a request arrives, spin it up to service the requestFrequently spin up/down a disk incurs
substantial penalty
:
high latency and energyDisk energy consumption is highly dependent on the pattern of disk accesses (periodically sleep and work)
3Slide4
Ideal Access Patterns for Power Saving 4Ideal Disk Power Saving ConditionRequests to disk form a
periodic
and
burst pattern Hard disk can sleep well and
work energetically
time
Disk accesses
Disk Accesses in
bursts
Long Disk Sleep Interval
Increasing
Burstiness
of disk accesses is the
key
to
saving disk energySlide5
Buffer Caches Affect Disk Access Patterns5
Buffer caches in DRAM are part of hard disk service
Disk data are cached in main memory (buffer cache)
Hits in buffer caches are fast, and avoid disk accesses
The
buffer cache
is able to
filter
and
change
disk access streams
Applications
Buffer Cache
Hard Disk
requests
disk accesses
Prefetching
CachingSlide6
Existing Solution for Burst in Disks 6
Forming Burst Disk Accesses with
Prefetching
Predict the data that are likely to be accessed in the future
Preload the
to-be-used
data into memory
Directly
condense
disk accesses into a sequence of I/O bursts
Both energy efficiency and performance could be improved
Limitations
Buffer caching share the same buffer space.
Energy-unaware
replacement can easily change
burst patterns
created by prefetching
Aggressive prefetching shrinks available caching space and demands highly effective caching
Energy-aware caching policy can effectively complement prefetching
No work has been done.
Reference – Papathanasiou and Scott, USENIX’04Slide7
Caching can Easily Affect Access Patterns 7
An example
Original Disk Accesses
time
time
Bursty
Disk Accessesorganized by
Prefetching
a
b
c
Buffer Cache
a
b
c
3 blocks to be evicted by
Energy-Unaware Caching
Unfortunately, these blocks
to be accessed in a non-bursty way
Disk still cannot sleep well
Solely relying on prefetching is sub-optimal
,
Energy-aware caching policy
is needed to create
burst
disk accessesSlide8
Caching Policy is Designed for Locality8
Standard Caching Policies
Identify the data that are unlikely to be accessed in the future (LRU)
Evict the not-to-be-used data out from memoryThey are
performance-oriented
and
energy-unaware
Most are locality-based algorithms, e.g. LRU (clock), LIRS (clock-pro),
Designed for reducing
the number of disk accesses
No consideration of
creating burst disk access pattern
C-Burst (Caching for Bursts)
Our objectives:
effective buffer caching
To create burst disk accesses for disk energy savingTo retain the performance (high hit ratios in buffer cache)Slide9
Outlines9MotivationScheme Design
History-based C-Burst (HC-Burst)
Prediction-based C-Burst (PC-Burst)
Memory RegionsPerformance Loss ControlPerformance Evaluation
Programming
Multimedia
Multi-role Server
ConclusionSlide10
Restructuring Buffer Caches10
Buffer cache is segmented into two regions
Priority region (PR)
Hot blocks
are managed using
LRU-based scheme
Blocks w/ strong locality are protected in PR
Overwhelming memory misses can be avoided
Retain the performance
Energy-aware region (EAR)
Cached blocks
are managed using
C-burst schemes
Non-burst accessed blocks are kept here
Re-accessed block (strong locality) is promoted into PR
Region size is dynamically adjusted
Both performance and energy saving are considered
Priority Region
(LRU)
Energy Aware Region
(C-Burst)
Buffer Cache
Buffer
Cache
(LRU)
Our focus in this talkSlide11
History-based C-Burst (HC-Burst)11
Distinguish different streams of disk accesses
Multiple tasks run
simultaneously in practice
History record can help us to distinguish them
Various tasks feature very
different
access patterns
Burst
– e.g.
grep
, CVS
, etc.
Non-Burst
– e.g.
make, mplayer, etc.
Accesses reaching the hard disk is a mixture of both
burst and non-burst accesses
In aggregate, the disk access pattern is determined
by the most non-burst one Slide12
Basic Idea of History-based C-Burst (HC-Burst)12
time
grep
time
make
time
Grep
+
make
bursty
access
long disk Idle intervals
non-bursty
access
short disk Idle Period
Aggregate Results:
non-bursty
access
A
B
C
D
E
F
G
H
I
J
K
L
a
b
c
d
e
f
g
A
B
C
D
a
b
I
J
K
L
c
d
e
E
F
G
H
f
g
Buffer Cache
To cache the blocks being accessed in a non-burst pattern,
to
reshape
disk accesses to a
burst patternSlide13
History-based C-Burst (HC-Burst)13Identifying an application’s access patternI/O Context (IOC)An application’s access history must be tracked across many runs
Some tasks’ life times are
very short
Each task is associate with an IOC to track its data access historyIOCs are maintained in a hash table for quick accessEach IOC is identified by an I/O context ID (IOC ID)
a
hash value
of executable’s path or kernel thread’s nameSlide14
History-based C-Burst (HC-Burst)14EpochApplication’s access pattern may change over time
Execution is broken into
epochs
, say T seconds for eachToo small or
too large
are both undesired
Our choice –
T =
(
Disk Time-out Threshold
)
/ 2
No disk spin-down happens during one epoch with disk accesses
Distribution of disk accesses during one epoch can be ignored
time
grep
time
make
A
B
C
D
E
F
G
H
I
J
K
L
a
b
c
d
e
f
g
epochSlide15
History-based C-Burst (HC-Burst)15Block GroupBlocks accessed during one epoch by the same IOC are grouped into a block group
Each block group is identified by an
process ID
and an epoch timeThe
size
of a block group indicates the
burtiness
of data access pattern of one application
The larger a block group is, the more
bursty
disk accesses are
time
grep
time
make
A
B
C
D
E
F
G
H
I
J
K
L
a
b
c
d
e
f
g
epoch
Block GroupsSlide16
History-based C-Burst (HC-Burst)16HC-Burst Replacement PolicyTwo types of blocks should be evictedData blocks that are unlikely to be re-accessed
Blocks with
weak locality
(e.g. LRU blocks)Data blocks that can be re-accessed with little energy Blocks being accessed in
bursty pattern
Victim block group
– the
largest
block group
Blocks that are frequently accessed would be promoted into PR
Large block group often holds infrequently accessed blocks
Blocks that are accessed in a bursty pattern stay in a large BG
Large block group holds blocks being accessed in burstsSlide17
Level 10
History-based C-Burst (HC-Burst)
17
Multi-level Queues of Block Groups
32-level queues of block groups
A block group of
N
blocks stays in queue
Block groups on one queue are linked in the order of their epoch times
Block groups may move upwards/downwards, if # of blocks changes
The victim block group is always the LRU block group on the top queue w/ valid block groups
Level 0
Level 1
Level 9
Grep
# of
blks
= 1024
Epoch ID = 10
Block is promoted to PR
Grep
# of
blks
=
1023
Epoch ID = 10
Victim block group
LRU +
MB
make
# of blks= 1023
Epoch # 8
Block is demoted to EAR
Make
# of blks= 1024
Epoch # 8
Epoch Time
Least Recent Used (LRU)
Most Recent Used (MRU)
Burtiness
Least Bursty (LRU)
Most Bursty (MB)Slide18
Prediction-based C-Burst (PC-Burst)18Main ideaCertain disk access events are known and can be predicted.
Evicting a block that is to be accessed during a
short interval
and close to a deterministic disk access
time
deterministic
disk
accesses
short disk idle interval
Block A
Block B
long disk idle interval
predicted block accesses
With
deterministic disk accesses
and
block
reaccess
time
, selectively
evicting
blocks to be accessed in a short intervals and
holding
blocks to be accessed in long idle intervals
Holding Block B can avoid
Breaking a long idle intervalSlide19
Prediction-based C-Burst (PC-Burst)19Prediction of Deterministic Disk AccessesMany well predictable
periodic disk accesses exist in systems
Timer-controlled OS events (e.g.
pdflush)Multi-media workloads with steady consumption rateIn practice,
constant
disk accesses may fluctuate sometimes
System dynamics
may affect disk I/O occasionally
Real I/O pattern change
may happen over time
Challenge – how to accommodate
occasional system dynamics
while responding quickly to
real access pattern shiftSlide20
Prediction-based C-Burst (PC-Burst)20Prediction of Deterministic Disk AccessesTrack each task’s access history
Each task is offered credits of [ -32, 32 ]
Feedback-based Prediction
Compare
observed
interval with
predicted
interval
Wrong prediction
,
reduce
a task’s credits
Correct prediction
, increase
a task’s creditsTask w/ credit less than 0 is unpredictable
Repeated mis-prediction increases the charge of credits exponentially
Occasional dynamics only charge a task’s credit slightly
Real patter change quickly decreases a task’s credits
-32
32
Correct prediction
Correct
prediction
Wrong
prediction
credits
0Slide21
Prediction-based C-Burst (PC-Burst)21Prediction of Block Re-Access TimeEfficient Data Structure
Block table maintains 4 epoch times for all blocks, including non-resident blocks
Conservative Prediction
Only blocks having constant access intervals are believed
predictable
10
20
30
Logic Block Number (LBN)Slide22
Prediction-based C-Burst (PC-Burst)22Multi-level Queues of Block Groups in PC-Burst32-level queues of block groups – each level has
two queues
Prediction Block Group (PBG)
– blocks to be accessed in the same future epoch time
History Block Group (HBG)
– blocks being accessed in the same history epoch time
Reference Points (RP)
-
resent deterministic disk accesses
Victim block group
The PBG on
the top level
, in the
shortest interval
, closest to a RP, to be accessed in the furthest future
If no PBG is found, search the same level queue of HBG
Level 0
Level 10
Shortest Interval
Victim block group
long IntervalSlide23
Performance Loss Control23Why there is performance loss
?
Increase of memory misses due to energy-oriented caching policy
How to control performance loss
Basic Rule
Control the size of Energy Aware Region
Estimating performance loss
Ghost buffer (GB)
LRU replacement policy
The
increase of memory misses
(M)
Blocks not found in EAR, but found in GB
The average
memory miss penalty (P)Observed average I/O latency
Performance loss
L = M x PAutomatic tune Energy Aware Region sizeL < Tolerable Performance Loss
Enlarge EAR region sizeL > Tolerable Performance LossShrink EAR region size
Priority Region
Energy Aware Region
Main Memory
Ghost Buffer (
LRU
)
A memory miss
Priority Region
Energy Aware Region
Perof. Loss > Tolerable Rate
Shrink Region Size
Priority Region
Energy Aware Region
Perf. Loss < Tolerable Rate
Enlarge Region size
Check
Ghost
Buffer
Hit
in Ghost Buffer
Perf. Loss
++Slide24
Performance Evaluation24Implementation
Linux Kernel 2.6.21.5
5,500 lines of code in buffer cache management and generic block layer
Experimental Setup
Intel Pentium 4 3.0GHz
1024 MB memory
Western Digital WD160GB 7200RPM hard disk drive
RedHat
Linux WS4
Linux 2.6.21.5 kernel
Ext3 file systemSlide25
Performance Evaluation25Methodology
Workloads run on
experiment machine
Disk activities are collected in Kernel on experiment machine
Disk events are sent via
netconsole
to an
monitor machine
Disk energy consumption is calculated based on collected log of disk events using disk power models off line
Experiment
machine
Monitoring
machine
disk activities
log
Gigabit LANSlide26
Performance Evaluation26Emulated Disk Models
Hitachi DK23DA Laptop Disk
IBM
UltraStar 36Z15 SCSI Disk
Hitachi DK23DA
IBM
UltraStar
36Z15
Capacity
30GB
18.4GB
Cache
2MB
4MB
RPM
4200
15000
B/W35MB/sec
53MB/secActive Power2
watt13.5 wattIdle Power
1.6 watt10.2 wattStandby Power
0.15 watt
2.5 wattSpin up1.6 sec / 5
J10.9 sec / 135 JSpin down
2.3 sec / 2.94 J1.5 sec / 13 JSlide27
Performance Evaluation27Eight applications
3 applications w/
bursty
data accesses5 applications w/ non-bursty
data accesses
Three Case Studies
Programming
Multi-media Processing
Multi-role servers
Name
Description
MB/ epoch
Request/epoch
Make
Linux kernel builder
1.98
119.7
Vim
Text editor
0.0060.395Mpg123
Mp3 player0.153.69
Transcode
Video converter3.2-6.510.9-19.1
TPC-HDatabase query #177.3
476.7Grep
Textual search tool
102.2
10186.6
ScpRemote copy tool
51.5-53.8
135-139CVS
Version control tool
19.9
1705.7Slide28
Performance Evaluation28Case Study I – programming
Applications
:
grep, make, and vim
Grep
–
bursty
workload
Make, vim –
non-
bursty
workload
C-Burst schemes protects data set of
make
from being evicted by
grepDisk idle intervals are effectively extendedSlide29
Performance Evaluation29
Case Study I –
programming
Applications
:
grep
, make,
and
vim
Grep
–
bursty
workload
Make, vim –
non-
bursty
workload
C-Burst schemes protects data set of
make
from being evicted by
grep
Disk idle intervals are effectively extended
over 30% energy saving
Nearly 0% intervals > 16 sec
Over 50% intervals > 16 secSlide30
Performance Evaluation30Case Study II – Multi-media Processing
Applications:
transcode
, mpg123, and scp
Mpg123 – its disk accesses serve as deterministic accesses
Scp
–
bursty
disk accesses
Transcode
– non-
bursty
accesses
PC-Burst achieves better performance than HC-Burst
PC-Burst can accommodate deeper prefetching by efficiently using caching space
Around 30% energy saving
Over 70% intervals > 16 secSlide31
Performance Evaluation31Case Study III – Multi-role Server
Applications:
TPC-H # 17
, and CVS
TPC-H – non-
bursty
disk accesses ( very
random
I/O )
CVS –
bursty
disk accesses
Dataset of TPC-H is protected in memory
Performance of TPC-H is significantly improved
No improvement
on disk idle interval
Reduced I/O latency
over 35% energy savingSlide32
Our Contributions32Design a set of comprehensive energy-aware caching policies, called C-Burst, which leverages the
filtering effect
of buffer cache to manipulate disk accesses
Our scheme does not rely on complicated disk power models and requires no disk specification data, which means high compatibility to different hardwareOur scheme does
not assume using any specific disk hardware
, such as multi-speed disks, which means our scheme is beneficial to existing hardware
Our scheme provides
flexible performance guarantees
to avoid unacceptable performance degradation
Our scheme is
fully implemented in Linux kernel 2.6.21.5
, and experiments under realistic scenarios show up to 35% energy saving with minimal performance lossSlide33
Conclusion33Energy efficiency is a critical issue for computer system designIncreasing disk access burstiness
is the key to achieving disk energy conservation
Leveraging filtering effect of buffer cache can effectively
shape the disk accesses to an expected patternHC-Burst scheme can distinguish different access pattern of tasks and create a bursty stream of disk accesses
PC-burst scheme can further predict blocks’ re-access time and manipulate the timing of future disk accesses
Our implementation of C-Burst schemes in Linux kernel 2.6.21.5 and experiments show that C-Burst schemes can achieve up to 35% energy saving with minimal performance lossSlide34
34Slide35
References35[USENIX04] A. E. Papathanasiou and M. L. Scott, Energy Efficient prefetching and caching. In Proc. of USENIX’04
[EMC99] EMC Symmetrix 3000 and 5000 enterprise storage systems product description guide. http://www.emc.com, 1999.Slide36
Memory Regions36Buffer cache is segmented into two regions
Priority region (PR)
Hot blocks
are managed using
LRU-based scheme
Blocks w/ strong locality are protected
Overwhelming memory misses can be avoided
Energy-aware region (EAR)
Cold blocks
are managed using
C-burst schemes
Victim blocks are always reclaimed from EAR
Re-accessed block is promoted into PR
Accordingly, the coldest block in PR is demoted
Region size is tuned on line
Both performance and energy saving are considered
Priority Region
Energy Aware Region
Buffer Cache
Evict a cold block
Insert a
new block
Demote a
hot block
Promote a
cold blockSlide37
Motivation37LimitationsEnergy-unaware caching policy can significantly
affect periodic bursty patterns
created by prefetching
Improperly evicting a block may easily break a long disk idle intervalAggressive prefetching shrinks available caching space
and demands highly effective caching
Effective prefetching needs large volume of memory space, which raises high memory contention for caching
Energy-aware caching policy can
effectively complement prefetching
When prefetching works unsatisfactorily, caching can give a hand by carefully selecting blocks for evictionSlide38
Motivation38Prefetching
Predict the data that are likely to be accessed in the future
Preload the
to-be-used data into memory
Directly
condense
disk accesses into a sequence of I/O bursts
Both energy efficiency and performance could be improved
Caching
Identify the data that are unlikely to be accessed in the future
Evict the
not-to-be-used
data out from memory
Traditional caching policies are
performance-oriented
Designed for reducing
the number of disk accesses
No consideration of
creating bursty disk access patternSlide39
Prediction-based C-Burst (PC-Burst)39Prediction of Deterministic Disk Accesses
Track each task’s access history and offer each task credits of [-32, 32]
Feed-back based Prediction
Compare
observed
interval with
predicted
interval
If prediction is proved wrong,
reduce
a task’s credits
If prediction is proved right,
increase
a task’s credits
Task w/ credit less than 0 is
unpredictable
Repeated
mis-prediction increases the charge of credits
exponentially Occasional system dynamics
only charge a task’s credit slightlyReal patter change quickly decreases a task’s credits