Chien Haryadi S Gunawi The Tail at Store A Revelation from Millions of Hours of Disk and SSD Deployments Gokul Soundararajan Deepak KenchammanaHosekote Fast Storage devices 2 http ID: 778986
Download The PPT/PDF document "Mingzhe Hao Andrew A." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Mingzhe HaoAndrew A. ChienHaryadi S. Gunawi
The Tail at StoreA Revelation from Millions of Hours of Disk and SSD Deployments
Gokul
SoundararajanDeepak Kenchammana-Hosekote
Slide2Fast Storage devices2
http://
www.extremetech.com/extreme/211087-intel-micron-reveal-
xpoint-a-new-memory-architecture-that-claims-to-outclass-both-ddr4-and-nand
15K RPM~20K RPM
Slide33DisksWeak disk head, bad packaging, missing screws, broken/old fans, too many disks/box, firmware bugs, bad sector remappingBandwidth
drops by 80%, and introduces seconds of delay
SSD/FlashFirmware bug, GC, …4 – 100x
slowdown
Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems [SoCC '13, HotCloud ’13]Disk/SSD performance failures?Why?YesFast? Not always
Slide44
Mingzhe, these anecdotes are great, but…
Really serious problems?
How often?
Transient or permanent?Great questions!...Hmmm ..
Slide5The Tail at Store5Study of over 450,000 disks
and 4000 SSDsIn production (customer deployments)Deployed as RAID groups
87 days on averageTotal: 857 million disk hours and
7 million SSD hoursThe largest study of storage performance
variability
Slide66
Mingzhe, these anecdotes are great, but…
Really serious problems?
How often?
Transient or permanent?Great questions!..We HAVE the data!
Slide7Outline7MethodologyDatasetMajor metricsSlowdown characterizations Temporal &
spatial analysisWorkload analysisConclusion
Slide88RAID
D
Disk
SSD
#RAID groups38,029572#Data drives per group3-26 3-22Disk
SSD#RAID groups
38,029572
P
Q
Disk
SSD
#RAID groups
38,029
572
#Data drives per
group
3-26
3-22
#Data drives
458,482
4,069
Disk
SSD
#RAID groups
38,029
572
#Data drives per
group
3-26
3-22
#Data drives
458,482
4,069
Total drive hours
857,183,442
7,481,055
Total RAID hours
72,046,373
1,072,690
D
…
D
Slide9Metrics
9
L
i
= hourly average I/O latency per drive Lmedian = median (L1…LN)Slowdown Si = Li / LmedianSlow drive hour Si ≥ 2x
RAID
i
=1
2
n
3
Latency(
ms
) 10 9 22 8 25
Slowdown
1.0x 0.9x 2.2x 0.8x 2.5x
Latency(
ms
)
10
9 22 8 25
10
Slowdown
1.0x 0.9x
2.2
0.8x
2
2.2x
2.5x
Hourly average
Slide1010Longest tail T =
Max of (S1..N) per hour
Latency of full-stripe I/O follows the longest tail
“Tail hour”
T ≥ 2xi.e. at least one drive ≥ 2x…
RAID
i
=1
2
n
3
Slow!
Slow!
Slowdown 1.0x 0.9x 2.2x 0.8x 2.5x
Slowdown 1.0x 0.9x 2.2x 0.8x
2.5x
Full stripe workload
Slide11Outline11MethodologySlowdown characterizations Temporal & spatial analysis
Workload analysisConclusion
Slide12Slowdown distribution (DISK) 12
S
i
= L
i
/ Lmedian
1pm:
1.0 0.9 2.1 1.2
2pm:
1.1 0.8 1.7 1.5
3pm:
0.9 1.0
2.1
1.2
4pm:
... ... ... ...
…
…
Slowdown values
Per
drive
per
hour
800+ million
s
lowdown values
Good
Bad!
Slide13Slowdown distribution (DISK)
13
S
i
= 2x at 99.8 percentilein 1000 drive hours, 2 hours ≥ 2x slower
Si = 1.5x at
99.3 percentile
i
n 1000 drive hours, 7 hours ≥
1.5x slower
S
i
= 1.1x, 1.2x
… (more analysis possible)
≥2x
0.2%
≥1.5x
0.7%
Slide14SSD slowdown distribution is worse
Slowdown distribution - SSD
14
S
i = 2x at 99.4 percentilein 1000 drive hours, 6 hours ≥ 2x slower
Si
= 1.5x at 98.7 percentile
in 1000 drive hours,
13 hours ≥ 1.5x slower
≥2x
0.6%
≥1.5x
1.3%
Slide1515Longest tail T = Max of (S
1..N) per hour72 million T
values for disk RAID, and 1 million
T values for SSD RAID
RAID
Slowdown 1.0x 0.9x 2.2x 0.8x
2.5x
Slide16T is much worse than Si
Tail distribution - Disk
16
i
n 1000 RAID hours15 hrs at least one ≥ 2x slower diskT = 2x at 98.5 percentile
i
n 1000 RAID hours46
hrs at least one
≥ 1.5x slower disk
T =
1.5x
at
95.4
percentile
≥2x
1.5
%
≥1.5x
4.6
%
Slide17Tail distribution - SSD
17
in 1000 RAID hours
22
hrs at least one ≥ 2x slower SSDT = 2x at 97.8 percentilein 1000 RAID hours
48 hrs at least one ≥
1.5x slower SSD
T = 1.5x at
95.2 percentile
≥2x
2.2
%
≥1.5x
4.8
%
Slide1818Slowdown is not
uncommon!
Hard to get performance stability at 95 to 99.9 percentile!
Millions of slow disk hours &
Tens of thousands of slow SSD hoursDISKSSD2x Tail hours (%)1.54%2.23%1.5x Tail hours (%)4.56%
4.83%
DISKSSD
2x Slow hours (%)
0.22%0.58%1.5x Slow hours (%)
0.69%
1.27%
Slide19Outline19MethodologySlowdown characterizations Temporal & s
patial analysisWorkload analysisConclusion
Slide2012%
of disk slowdown intervals
≥ 10 hrs
Slowdown Interval
20
Q: How long can a slowdown last? (slowdown interval
)
40%
of disk slowdown intervals ≥ 2
hrs
Slowdown,
slow drives
…:
≥ 2x
Time
1
hr
3
hrs
≥2x
≥2x
1
hr
3
hrs
Many slowdowns happen in
consecutive
hours
≥2
hours
40%
≥10
hours
12%
Slide21Slowdown Inter-arrival21
Q:
What are
inter-arrivals
between slow drive hours?
90%
of disk slowdown inter-arrivals are within 24hours
Time
≥2x
1
hr
≥2x
≥2x
3
hrs
85%
of SSD slowdown inter-arrivals are
within 24hours
1
hr
3
hrs
Slow drive hour is a good indicator for further slowdowns
≤24
hours
9
0%
≤24
hours
85%
Slide22Slow Drive Population22
Q
:
How many drives
have experienced at least a slowdown within dataset time range (87 days)?25% of disks have seen
≥ 2x slowdown
29%
of SSDs have seen ≥ 2x
slowdown
A large portion of drives have experienced slowdowns
X
Replacement
n
ot a solution
Slide23Outline23MethodologySlowdown characterizations Temporal & s
patial analysisWorkload analysisI/O rate and size imbalanceConclusion
Slide24Q: Why is a drive slow?
≥ 2x slower
≥2x slower
RI
=
4x
Z
I
=
10x
I/O
r
ate
i
mbalance?
I/O si
z
e
i
mbalance?
10x
Slide25Rate Imbalance25
RI = 3x
RI = 1x
5
%
of slow hours
≥ 2x
more I/
Os
RI
=
3x
RI
=
1x
I/O
rate imbalance
is
not
a primary cause!
≥2x I/
Os
5
%
Slide26Size Imbalance26
2% of slow hours
≥ 2x larger I/
Os
10x
ZI = 10x
Z
I = 1x
Z
I
=
1x
ZI
=
10x
I/O
size
imbalance
is
not
a primary cause!
≥
2x larger I/
Os
2%
Slide27Other correlations: disk age27
Disk ages:
worse
Older disks are more unstable
Slide28Vendor matters
Performance instability
: SLC < MLC
Other correlations: Flash cells and vendors
28worseworse
Slide29Other findings29No
correlation to time of the day (0am – 24pm)
Nightly background events not a factor No explicit drive events around slow hours
Slowdown is a
“silent” faultSlow drive replacement rate is lowUnplug: 4-8% (within 24 hours after a slowdown)Replug: 89-100% of unplugged drives are replugged
Slide30Conclusion30Drive performance variability is realStable performance @95-99.9p hard to achieveRate and size imbalance are not a factor“Silent”
eventsInternal complexities?Tail drives in RAID: A slow drive affects the entire RAID performance(20 drives/RAID is common)
Need tail tolerance at low-level RAID layer
Slide31Thank you!31
Questions?
http://
ucare.cs.uchicago.edu
https://ceres.uchicago.edu