/
Austere Flash Caching with Deduplication and Compression Austere Flash Caching with Deduplication and Compression

Austere Flash Caching with Deduplication and Compression - PowerPoint Presentation

teresa
teresa . @teresa
Follow
28 views
Uploaded On 2024-02-03

Austere Flash Caching with Deduplication and Compression - PPT Presentation

Qiuping Wang Jinhong Li Wen Xia Erik Kruus Biplob Debnath Patrick P C Lee The Chinese University of Hong Kong CUHK Harbin Institute of Technology Shenzhen ID: 1044533

memory compressed index flash compressed memory flash index data ratio cache arc chunk size chunks based read write fixed

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Austere Flash Caching with Deduplication..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Austere Flash Cachingwith Deduplication and CompressionQiuping Wang*, Jinhong Li*, Wen Xia#Erik Kruus^, Biplob Debnath^, Patrick P. C. Lee**The Chinese University of Hong Kong (CUHK)#Harbin Institute of Technology, Shenzhen^NEC Labs1

2. Flash CachingFlash-based solid-state drives (SSDs) Faster than hard disk drives (HDD) Better reliability Limited capacity and enduranceFlash cachingAccelerate HDD storage by caching frequently accessed blocks in flash2

3. Deduplication and CompressionReduce storage and I/O overheadsDeduplication (coarse-grained)In units of chunks (fixed- or variable-size)Compute fingerprint (e.g., SHA-1) from chunk contentReference identical (same FP) logical chunks to a physical copyCompression (fine-grained)In units of bytesTransform chunks into fewer bytes3

4. Deduplicated and Compressed Flash CacheLBA: chunk address in HDD; FP: chunk fingerprintCA: chunk address in flash cache (after dedup’ed + compressed)4SSDChunkingI/ODeduplication and compressionLBA  FPFP  CA, lengthFP-indexLBA-indexRAMHDD…Dirty listVariable-size compressed chunks (after deduplication)Fixed-size chunksLBA, CALBA, CARead/write

5. Memory Amplification for IndexingExample: 512-GiB flash cache with 4-TiB HDD working setConventional flash cacheMemory overhead: 256 MiBDeduplicated and compressed flash cacheLBA-index: 3.5 GiBFP-index: 512 MiBMemory amplification: 16xCan be higher5LBA (8B)  CA (8B)LBA (8B)  FP (20B)FP (20B)  CA (8B) + Length (4B)

6. Related WorkNitro [Li et al., ATC’14] First work to study deduplication and compression in flash cachingManage compressed data in Write-Evict Units (WEUs)CacheDedup [Li et al, FAST’16]Propose dedup-aware algorithms for flash caching to improve hit ratios6They both suffer from memory amplification!

7. Our Contribution7AustereCache: a deduplicated and compressed flash cache with austere memory-efficient managementBucketizationNo overhead for address mappingsHash chunks to storage locationsFixed-size compressed data managementNo tracking for compressed lengths of chunks in memoryBucket-based cache replacementCache replacement per bucketCount-Min Sketch [Cormode 2005] for low-memory reference countingExtensive trace-driven evaluation and prototype experiments

8. BucketizationMain ideaUse hashing to partition index and cache space(RAM) LBA-index and FP-index(SSD) metadata region and data regionStore partial keys (prefixes) in memoryMemory savingsLayoutHash entries into equal-sized bucketsEach bucket has fixed-number of slots8Bucketmapping / data……slot…

9. (RAM) LBA-index and FP-index(RAM) LBA-index and FP-indexLocate buckets with hash suffixesMatch slots with hash prefixesEach slot in FP-index corresponds to a storage location in flash9BucketLBA-indexLBA-hash prefixFP hashFlag…FP-index……FP-hash prefixFlagslotBucket………slot

10. (SSD) Metadata and Data Regions(SSD) Metadata region and data regionEach slot has full FP and list of full LBAs in metadata regionFor validation against prefix collisionsCached chunks in data region10Metadataregion……DataregionFPList of LBAsChunkBucket……slotBucket……slot

11. Fixed-size Compressed Data ManagementMain ideaSlice and pad a compressed chunk into fixed-size subchunksAdvantagesCompatible with bucketizationStore each subchunk in one slotAllow per-chunk management for cache replacement1132KiB20KiBCompressSlice and Pad8KiB each

12. Fixed-size Compressed Data ManagementLayoutOne chunk occupies multiple consecutive slotsNo additional memory for compressed length12FP-index……SSDRAM……FPList of LBAsLengthFP-hash prefixFlag……ChunkBucketMetadata RegionData RegionSubchunk

13. Bucket-based Cache ReplacementMain ideaCache replacement in each bucket independentlyEliminate priority-based structures for cache decisions13Slot…LBA-indexSlot……23ReferenceCounterOld………FP-index…RecentCombine recency and deduplicationLBA-index: least-recently-used policyFP-index: least-referenced policyWeighted reference counting based on recency in LBAs

14. Sketch-based Reference CountingHigh memory overhead for complete reference countingOne counter for every FP-hashCount-Min Sketch [Cormode 2005] Fixed memory usage with provable error bounds14+1+1+1FP-hashcount = minimum counter indexed by (i, Hi(FP-hash))wh

15. EvaluationImplement AustereCache as a user-space block device~4.5K lines of C++ code in LinuxTracesFIU traces: WebVM, Homes, MailSynthetic traces: varying I/O dedup ratio and write-read ratioI/O dedup ratio: fraction of duplicate written chunks in all written chunksSchemesAustereCache: AC-D, AC-DCCacheDedup: CD-LRU-D, CD-ARC-D, CD-ARC-DC15

16. Memory Overhead16AC-D incurs 69.9-94.9% and 70.4-94.7% less memory across all traces than CD-LRU-D and CD-ARC-D, respectively.AC-DC incurs 87.0-97.0% less memory than CD-ARC-DC.

17. Read Hit Ratios17AC-D has up to 39.2% higher read hit ratio than CD-LRU-D, and similar read hit ratio as CD-ARC-DAC-DC has up to 30.7% higher read hit ratio than CD-ARC-DC

18. Write Reduction Ratios18AC-D is comparable as CD-LRU-D and CD-ARC-DAC-DC is slightly lower (by 7.7-14.5%) than CD-ARC-DCDue to padding in compressed data management

19. Throughput19AC-DC has highest throughput Due to high write reduction ratio and high read hit ratioAC-D has slightly lower throughput than CD-ARC-DAC-D needs to access the metadata region during indexing

20. CPU Overhead and Multi-threading20Latency (32 KiB chunk write)HDD (5,997 µs) and SSD (85 µs)AustereCache (31.2 µs) (fingerprinting 15.5 µs)Latency hidden via multi-threadingMulti-threading (write-read ratio 7:3)50% I/O dedup ratio: 2.08X80% I/O dedup ratio: 2.51XHigher I/O dedup ratio implies less I/O to flash  more computation savings via multi-threading

21. Conclusion21AustereCache: memory efficiency in deduplicated and compressed flash caching viaBucketizationFixed-size compressed data managementBucket-based cache replacementSource code: http://adslab.cse.cuhk.edu.hk/software/austerecache

22. Thank You!Q & A22