/
OpenEC : Toward Unified and Configurable OpenEC : Toward Unified and Configurable

OpenEC : Toward Unified and Configurable - PowerPoint Presentation

carla
carla . @carla
Follow
64 views
Uploaded On 2024-01-29

OpenEC : Toward Unified and Configurable - PPT Presentation

Erasure Coding Management in Distributed Storage Systems Xiaolu Li 1 Runhui Li 1 Patrick P C Lee 1 Yuchong Hu 2 The Chinese University of Hong Kong 1 Huazhong University of Science and Technology ID: 1042501

ecdag coding repair erasure coding ecdag erasure repair blocks codes hdfs block encoding openec data persist bindy join decoding

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "OpenEC : Toward Unified and Configurable" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage SystemsXiaolu Li1, Runhui Li1, Patrick P. C. Lee1, Yuchong Hu2The Chinese University of Hong Kong1Huazhong University of Science and Technology2USENIX FAST 20191

2. IntroductionFault tolerance for distributed storage is critical Availability: data remains accessible under failuresDurability: no data loss even under failuresErasure coding is a promising redundancy technique Minimum data redundancy via “data encoding” Higher reliability with same storage redundancy than replicationReportedly deployed in Google, Azure, Facebook e.g., Azure reduces redundancy from 3x (replication) to 1.33x (erasure coding)  PBs saving2

3. Erasure CodingDivide file data to k data blocksEncode k data blocks to n-k parity blocksDistribute the n erasure-coded blocks (coding group) to n nodes Fault-tolerance: any k out of n blocks can recover file data3Nodes(n, k) = (4, 2)Fileencode divideABCDA+CB+DA+DB+C+DABCDA+CB+DA+DB+C+DABCD

4. Erasure CodingReed-Solomon (RS) codes are widely deployedStorage-optimalGenerality for n and kDrawback: high repair penaltyNew erasure coding solutionsRepair-optimal erasure codese.g., regenerating codes [TIT’10]; locally repairable codes (LRCs) [ATC’12, PVLDB’13]; double regenerating codes (DRC) [TOS’17]Repair-efficient algorithmse.g., Partial-parallel-repair (PPR) [Eurosys’16]; Repair pipelining [ATC’17]4

5. ChallengeDeploying new erasure coding solutions in distributed storage systems (DSSs) is a daunting taskRe-engineering of DSS workflows (e.g., read/write paths)Hard to generalize for different DSSsOur past experience: Over 4K lines-of-code change to HDFS-RAID for adding DRC [TOS’17]Review of six DSSs with erasure coding supportHDFS-RAID, Hadoop 3.0 HDFS, QFS, Tahoe-LAFS, Ceph and Swift5

6. Limitations of Current DSSsHard to add advanced erasure codesExisting DSSs only provide interfaces for basic encoding/decoding operationsMost DSSs do not support sub-packetization (e.g., regenerating codes)Hard to configure the workflows and placement of coding operations6NetworkRN1N2N3N4NetworkRN1N2N3N4Repair in fetch-and-compute mannerRepair pipelining [ATC’17] cannot be readily realized

7. Our ContributionsPropose ECDAG, a directed-acyclic-graph abstraction for realizing general erasure coding solutionsDecoupling erasure coding management from DSS workflowsPrototype OpenEC on HDFS-RAID, Hadoop 3 HDFS, and QFSMinimal code changesExtensive experiments on local and Amazon EC2 clusters7OpenEC: a unified and configurable framework for erasure coding management

8. ECDAG(n, k) codeData blocks: b0, …, bk-1Parity blocks: bk, …, bn-1Virtual blocks: bi for i ≥ nAn ECDAG is a directed acyclic graph that defines either an encoding or a decoding operationVertex vi: block bi in a coding groupEdge ei,j: block bi is an input to the linear combination of bjEach edge is associated with a coding coefficient8

9. ECDAGECDAGs for a (5,4) code:9Encoding01234Decoding12340Partial-parallel repair (PPR) [Mitra, EuroSys’16]1234560

10. ECDAGECDAGs for regenerating codes [Dimakis, TIT’10] with sub-packetizationw: sub-packetization level (number of sub-blocks per block)e.g., n=4, k=2, w=210Layout02461357b0b1b2b3Encoding01234567Decoding634527891001

11. ECDAG PrimitivesConstruction of an ECDAG:Join: describes linear combinationBindX: co-locates coding operations at same level (i.e., x-direction)BindY: co-locates coding operations across levels (i.e., y-direction)11

12. ECDAG PrimitivesEncoding of (6,4) RS code12ECDAG* ecdag = new ECDAG();ecdag->Join(4, {0,1,2,3}, {1,1,1,1});ecdag->Join(5, {0,1,2,3}, {1,2,4,8});int vidx = ecdag->BindX({4,5});ecdag->BindY(vidx, 0);012345BindX0123456BindY0123456

13. ECDAG PrimitivesDecoding via repair pipelining [Li, ATC’17]:e.g., recovering the missing block 0 for (6, 4) RS code13ECDAG* ecdag = new ECDAG();ecdag->Join(7, {1,2}, {1,1});ecdag->BindY(7, 2);ecdag->Join(8, {7,3}, {1,1});ecdag->BindY(8, 3);ecdag->Join(9, {8,4}, {1,1});ecdag->BindY(9, 4);ecdag->Join(0, {9}, {1}); 12738490

14. Erasure Coding Interfacesclass ECBase { int n, k, w; vector<int> ecoefs; public: // constructing encoding ECDAGs ECDAG* Encode(); // constructing decoding ECDAGs ECDAG* Decode(vector<int> from, vector<int> to); // organizing blocks in groups (e.g., racks) vector<vector<int>> Place();}14

15. OpenEC Design15Controller:Manages EC metadataParses ECDAGs and assigns tasks to agentsControls block placementCoordinates repair Agent:Performs coding operationsOECClient:Interfaces between applications and storageOpenEC deployment on HDFS

16. OpenEC DesignBasic operations:WritesOnline encodingOffline encodingNormal readsDegraded readsFull-node recovery16Tasks:LoadLoads an input blockFetchRetrieves blocks from other agentsComputeComputes a new blockPersistReturns a block

17. Parsing an ECDAGOnline encoding for (6,4) RS codeOn the write pathPerformed by client CVerticesNodesTasksv0CLoad b0v1CLoad b1v2CLoad b2v3CLoad b3v6CCompute b4 from {b0, b1, b2, b3} with coding coefficients {1,1,1,1};Compute b5 from {b0, b1, b2, b3} with coding coefficients {1,2,4,8};v4C-v5C--CPersist b0; Persist b1; Persist b2;Persist b3; Persist b4; Persist b5;170123456

18. Parsing an ECDAG18Offline encoding for (6,4) RS codeBlocks 0-3 are in nodes 0-3Performed by different nodesVerticesNodesTasksv0N0Load b0v1N1Load b1v2N2Load b2v3N3Load b3v6N0Fetch b1 from N1Fetch b2 from N2Fetch b3 from N3Compute b4 from {b0, b1, b2, b3} with coding coefficients {1,1,1,1};Compute b5 from {b0, b1, b2, b3} with coding coefficients {1,2,4,8};v4N4Fetch b4 from N0; Persist b4v5N5Fetch b5 from N0; Persist b50123456

19. Automated OptimizationsAutomated BindX and BindYExamines subgraph structures and calls BindX and BindY automaticallyHierarchy awareness19Pipelining123405612345678910110

20. OpenEC ImplementationMiddleware layer (7000+ lines-of-code)Coding operations in units of packetsIntel ISA-L for erasure codingRedis for communicationsIntegration with existing distributed storage systemsHDFS-RAIDHadoop 3.0 HDFSQFS (see technical report)Each integration only makes ≤ 450 lines-of-code changesChanges include: (1) interfacing with systems, (2) block placement20

21. ExperimentsLocal cluster16 machinesQuad-core 3.4 GHz Intel CPU16 GiB RAM10 Gb/s networkAmazon EC2Up to 30 instancesm5.xlarge instances10 Gb/s network21

22. Basic Operations in Local ClusterOpenEC preserves original HDFS performanceOpenEC achieves much faster offline encoding than HDFS-RAID with a simpler workflow22Comparisons with HDFS-3Comparisons with HDFS-RAID

23. Comparisons with Native Coding (without I/O)ECDAG coding computations are slower than ISA-L 29-38% lower in encoding; 0.6-3.15% lower in decodingRemains much faster than I/O; limited overhead overall23EncodingDecoding

24. Support of Erasure Coding Designs24LRCRegenerating codesRepair algorithmsDRCComparisons with six state-of-the-art erasure coding designsOpenEC’s performance conforms to the theoretical gains in network-bound environments

25. Automated OptimizationsAutomated ECDAG customization for a hierarchical topologyUp to 82% repair throughput gain25

26. Scalability in Amazon EC2OpenEC scales well with number of instances26Online EncodingOffline Encoding

27. ConclusionsOpenEC is a unified and configurable framework for flexible erasure coding managementFuture work:Integration with more systems (e.g., Ceph, Swift)Combined with software-defined storage for better configurabilitySource code:http://adslab.cse.cuhk.edu.hk/software/openec 27

28. Backup28

29. QuestionHow to construct decoding ECDAGs for different combinations of lost blocks? The Decode() function should construct different decoding ECDAGs for two cases:Decoding one lost block: uses any repair-efficient approachDecoding multiple lost blocks: picks the first k available blocks29

30. QuestionWhat happens if there is a failure during repair?We assume that OpenEC restarts the repair process by connecting to the new set of available nodes. 30

31. QuestionWhat types of codes are supported or not supported?Supported: Linear codes (e.g., RS codes, regenerating codes, LRC)Not supported:Non-linear codesSector-disk codes31

32. QuestionPerformance of automated BindX and BindY?32

33. QuestionPerformance in QFS33Single ClientMultiple Clients