/
Coerced Cache Eviction and Discreet-Mode Journaling: Coerced Cache Eviction and Discreet-Mode Journaling:

Coerced Cache Eviction and Discreet-Mode Journaling: - PowerPoint Presentation

likets
likets . @likets
Follow
342 views
Uploaded On 2020-08-04

Coerced Cache Eviction and Discreet-Mode Journaling: - PPT Presentation

Dealing with Misbehaving Disks Abhishek Rajimwale Vijay Chidambaram Deepak Ramamurthi Andrea ArpaciDusseau Remzi ArpaciDusseau Data Domain Inc University of Wisconsin Madison ID: 797632

disk cache eviction flush cache disk flush eviction file journaling data dsn writes probability mode systems write 100 cce

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Coerced Cache Eviction and Discreet-Mode..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Coerced Cache Eviction and Discreet-Mode Journaling:Dealing with Misbehaving Disks

Abhishek Rajimwale*, Vijay Chidambaram, Deepak Ramamurthi Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

*

Data Domain

Inc

University of Wisconsin Madison

Slide2

Disks are not perfectDSN 11

27/7/11

Expanding

disk fault model

Latent Sector Errors

[

Bairavasundaram

SIGMETRICS 07]

RAID-6

Block Corruption

[Bairavasundaram FAST 08]ChecksumsThe disk cacheAlways trusted so far

Disk

Surface

Disk Cache

Slide3

Disk CachesDisk cache improves performance

But at the risk of data lossOrder of writes issued by file system:A, B ,CDisks reorder writes during destaging:B, A, CFile systems flush the disk cache to ensure correct ordering of writesA, flush, B, flush, CDSN 11

37/7/11

Disk

Surface

Disk Cache

Write to disk

Slide4

Problem: Flushing doesn’t work

Disks can fail to flush data upon requestOne reason: BugsErrors in the storage stack [Bairavasundaram FAST 08]Improper propagation of error codes [Bairavasundaram FAST 08]Inadequate failure policies

[Prabhakaran SOSP 05]Bugs in the firmware [Ghemawat SOSP 03]DSN 11

47/7/11

Slide5

Disks can lie!DSN 11

5Misbehaving disks ignore or delay flush requests

Increases risk for data lossFile systems usually blamed for such loss

7/7/11

Slide6

Disks can lie!DSN 11

6F_FULLFSYNC

From the

fcntl man page in Mac OSX:Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (

arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete.

Certain FireWire drives have also been known to ignore the request to flush their buffered data.

7/7/11

Evidence from industry experts

Microsoft

Seagate

Slide7

Ordering points are essentialAll modern file systems depend on ordering points

Journaling file systems (ext3, ext4)Data before the commit blockCopy on write file systems (ZFS)Data before the uber-blockIf ordering points are not enforced:Data corruptionInconsistent file systemDSN 117

7/7/11

Slide8

SummaryWe present

Coerced Cache Eviction (CCE)Write extra data into the cache to evict target blocksWe show how to characterize 9 SATA disk drive cacheExamine the wide range of caching policies We implement CCE in ext3Well known journaling file systemCCE provides stronger enforcement for ordering pointsAt a

cceptable overheadsDSN 118

7/7/11

Slide9

OutlineMotivation

BackgroundCoerced Cache EvictionCache FingerprintingDiscreet Mode JournalingEvaluationConclusionDSN 1197/7/11

Slide10

File System BackgroundConsider deleting a file

Removing its directory entryFreeing the space occupied by the file and its metadataJournaling file systemMakes sure all changes get to disk or none doGroups writes into transactionsWrites everything to a log firstCheckpoints to disk laterDSN 11

107/7/11

Slide11

File System BackgroundExt3 file system

Semi-modern journaling file systemWell known, well understoodVariants of journalingData journaling modeEverything (data, metadata) goes to the log firstOrdered journaling modeOnly metadata is loggedDSN 1111

7/7/11

Slide12

Disk Surface

Journal

Fixed locationsData JournalingDSN 11

12

D

D

D

C

M

M

Memory

7/7/11

B

Slide13

Disk Surface

Journal

Fixed locations

Disk CacheData JournalingDSN 11

13

D

D

D

C

M

M

Memory

7/7/11

B

Slide14

OutlineMotivation

BackgroundCoerced Cache EvictionCache FingerprintingDiscreet Mode JournalingEvaluationConclusionDSN 11147/7/11

Slide15

Coerced Cache Eviction

Ensures that cache has been truly flushed Key idea:Extra writes to flush the disk cacheDesired Order of writes: A, B, CWith CCE:Write AWrite to flush zoneWrite B

Write to flush zoneWrite CDSN 11

157/7/11

Slide16

Disk Surface

Flush Zone

Disk Cache

JournalFixed locations

Coerced Cache Eviction

DSN 11

16

D

D

D

C

M

M

Memory

F

F

F

F

F

F

F

F

7/7/11

B

F

Slide17

Coerced Cache Eviction

Desired properties:High probability of flushing target blocksLow performance overheadNeed to understand the disk cache to design the flush workloadDSN 11

177/7/11

Slide18

OutlineMotivation

BackgroundCoerced Cache EvictionCache FingerprintingDiscreet Mode JournalingEvaluationConclusionDSN 11187/7/11

Slide19

Cache FingerprintingManufacturers

don’t expose details about disk cachesDisk caches can vary in:Read/Write partition sizeNumber of segmentsReplacement policyPoorly characterized in literatureDSN 1119

7/7/11

Disk Cache

Slide20

Cache Fingerprinting

Flush micro-benchmark:Write target blockWrite varied flush workload – measure costfsync()Read target – infer evictionMicro-benchmark is repeated

Probability of eviction is calculatedVary in each workload:Number of writesAmount of data in each writeSequential/Random writes

DSN 11207/7/11

Slide21

Cache FingerprintingDSN 11

217/7/11

Eviction fingerprint

Probability of eviction is visually shown

Darker region indicates higher probability

90 – 100%

70 - 90

50 – 70

30

– 50

10 – 30

0 – 10

Eviction Probability

Slide22

Cache Fingerprinting

DSN 11227/7/11

Performance fingerprint

Time taken to write flush workload

Darker region indicates more time

500+

ms

100

– 500

50

– 100

10

- 50

0

- 10

Flush Latency

Slide23

Cache Fingerprinting

Selecting a flush workload:Combine information from both fingerprintsHigh probability of eviction Dark region in eviction fingerprintLow performance costLight region in performance fingerprintDSN 11

237/7/11

Slide24

Cache Fingerprinting

ManufacturerCache (MB)Capacity(GB)Hitachi880

Hitachi321024Samsung

8250Samsung16

250Western Digital

16

320

Western

Digital

64

800Seagate8250Seagate16320

Seagate

32750DSN 1124

7/7/11

Slide25

Cache Fingerprinting

Sequential writes may be ineffective at flushing Regardless of the size of the writeA number of random writes are requiredDSN 11

25

7/7/11

90 – 100%

70 - 90

50 – 70

30

– 50

10 – 30

0 – 10

Eviction Probability

Slide26

Cache Fingerprinting

Vertical stripes indicate that the cache is segmentedEach write, regardless of size, is sent to one segmentDSN 11

267/7/11

90 – 100%

70 - 90

50 – 70

30

– 50

10 – 30

0 – 10

Eviction Probability

Slide27

Cache Fingerprinting

Cache behavior of disks from the same manufacturer is qualitatively similar across their different modelsDSN 1127

7/7/11

90 – 100%

70 - 90

50 – 70

30

– 50

10 – 30

0 – 10

Eviction Probability

Slide28

Cache FingerprintingDSN 11

287/7/11

It’s not all good news however:

Some caches appear to use

random

replacement policies

For such caches, we cannot evict blocks with 100% certainty

A large number of

random writes

are required to get high eviction probability

Slide29

Cache Fingerprinting - Results

DriveNumber of writesTotal Data(MB)Eviction ProbabilityTime (s)

Hitachi 8 MB12.38100

0.05Hitachi 32 MB111100

0.087Seagate 8 MB

256

31

100

0.87

Seagate 16 MB

128171000.342Seagate 64 MB128

37

1000.396Samsung 8 MB12849

~ 901.328Samsung 16 MB256

128

~

90

2.872

Western Digital 16 MB

1792

19

~

90

5.107

Western

Digital 64 MB

256

1

100

7.705

DSN 11

29

7/7/11

Slide30

OutlineMotivation

BackgroundCoerced Cache EvictionCache FingerprintingDiscreet Mode JournalingEvaluationConclusionDSN 11307/7/11

Slide31

Discreet Mode JournalingIncorporating CCE into

ext3Fingerprint the disk to find optimal flush workloadCreate flush zone with suitable sizeModify ext3 to issue flush zone writes:One at each ordering point# of CCE operations = # of ordering pointsCan be used with any disk: As long as the disk is fingerprinted firstDSN 11

31

7/7/11

Slide32

OutlineMotivation

BackgroundCoerced Cache EvictionCache FingerprintingDiscreet Mode JournalingEvaluationConclusionDSN 1132

7/7/11

Slide33

EvaluationGoal:

CCE provides higher reliabilityAt what cost? Is it practical to use?Experimental setup:File system: Ext3Disk: Hitachi 8 MBJournaling mode: Data journaling(See paper for ordered journaling results)Operating system: Linux 2.6.13, Linux 2.6.23DSN 1133

7/7/11

Slide34

EvaluationWhat we compare:

Regular journaling with disk cache turned off“Safe” but slowDisk might not obey command to turn off cache!Regular journaling with disk cache turned onUnsafe but fastDiscreet mode journalingMidway option – Safe but with costDSN 11

347/7/11

Slide35

EvaluationDSN 11

357/7/11Benchmarks:OpenSSH copy, untar, configure, makePostmark

Simulates a mail serverSingle threadedFilebench Webserver I/O intensiveFilebench

VarmailMultithreaded postmark

Slide36

Evaluation – OpenSSHDSN 11

36Data Journaling Mode7/7/11

Slide37

Evaluation – PostmarkDSN 11

37Data Journaling Mode7/7/11

Slide38

Evaluation – Filebench Webserver

DSN 1138Data Journaling Mode7/7/11

Slide39

Evaluation – Filebench Varmail

DSN 1139Data Journaling Mode7/7/11

Slide40

Evaluation – Filebench varmail

Workload writes a small amount of data and calls fsync() repeatedlyEach fsync()causes 3 CCEs

Number of optimizations :Incorporate Group Commit in varmail

Improves throughput for all modesWe use a few other techniques as well (see paper)DSN 11

40

7/7/11

Slide41

Evaluation – Filebench Varmail

DSN 1141With optimizations

7/7/11Original performance

Slide42

Summary

Coerced Cache Eviction (CCE): Run file systems reliably on top of misbehaving disksCharacterization of 9 SATA disk caches through fingerprintsDiscreet Mode Journaling:

Implementation of CCE for ext3 filesystemAcceptable performance on 3 workloadsOnly if the cache doesn’t use random replacementHigh overhead for apps which call fsync() frequently

DSN 1142

7/7/11

Slide43

ConclusionTrust in disk is

weakening:Latent Sector ErrorsBlock corruptionCache flushingCloud computing systems:Virtualized hardwareLarge software stackCan such hardware be trusted? Will coercion be more widely used?

DSN 11437/7/11

Slide44

DSN 1144

Thank you!7/7/11

Advanced Systems Lab (ADSL)University of Wisconsin-Madisonhttp://www.cs.wisc.edu/adsl