/
Randomized Algorithms for Dynamic Storage Load-Balancing Randomized Algorithms for Dynamic Storage Load-Balancing

Randomized Algorithms for Dynamic Storage Load-Balancing - PowerPoint Presentation

obrien
obrien . @obrien
Follow
68 views
Uploaded On 2023-11-08

Randomized Algorithms for Dynamic Storage Load-Balancing - PPT Presentation

By Liang Liu Lance Fortnow Jin Li Yating Wang Jun Xu Presented by Grant Everett 1 Background amp Motivation Minimize the cost of storing data Evenly balance capacity across distances ID: 1030678

data load blocks server load data server blocks matrix distribution sweeping memory system cells rack shard balance distribute power

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Randomized Algorithms for Dynamic Storag..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Randomized Algorithms for Dynamic Storage Load-BalancingBy Liang Liu, Lance Fortnow, Jin Li, Yating Wang, Jun XuPresented by Grant Everett1

2. Background & MotivationMinimize the cost of storing dataEvenly balance capacity across distancesLoad balancing is tricky when failures or disruptions occurLets make distribute data evenly in a timely manner2

3. Problems and Existing SolutionsProblems:Blocks being added concurrentlySpreading blocks across system“Diversity Requirement”Existing Solutions:Erasure CodingStrategical DistributionWhat can we do when these solutions aren’t enough?Diversity Requirement means no single can take more than one block, and no two cells that take in blocks can be in the same relative location3

4. High Level Novel SolutionRandomized Algorithm where every dispatcher (compute node) balances storageSample “blocks-to-disk” assignmentUse precomputed distributionOnce mostly balanced, operate in cruising stageBalance a system such that every node is evenly utilized4

5. System Environment and Risk Control MeasuresHardware Failure Zones (rows)Software Update/Failure Zones (columns)Erasure Correction CodingBetter way of obtaining fault tolerance5

6. Cloud MatrixM x N Cloud MatrixMany different matrices for different purposesOne cell is storage space that can be perfectly balancedServer can add blocks to any disks it ownsCommunications between servers only go through top of rack (TOR)If unbalanced, rebalancing is low overhead for a single rackLastly, assume all cells are same size. This assumption is proved to be negligible Load matrix may not be evenly distributed!Load Matrix andCapacity Matrix6

7. Cloud Matrix Computation and ReductionThe following equations help understand why the homogenous capacity assumption is appropriateTake the max capacity cellDefine new capacityModify Load to account for new capacityReduction7

8. Precomputed Distribution MatrixA single server precomputes a set of Distribution MatricesCompute compensating matrixMust be decomposable Minimum must be reached for all cells to have identical load8

9. Simplified Lemma9

10. This computation is hardOne algorithm by Birkhoff Von NeumanHigh Run TimeTake this and change the inputMakes runtime slightly fasterDone by compressing CCompressing is NP-CompleteUse first fit decreasing (FFD)Useful to have faster when dealing with hundreds of machines10

11. Load Balancing StageSimple algorithm for load balancingComplexity comes from precomputing the set of matricesSample matrix based on probabilityStored in SRAM/DRAM to have fast read speedsPrecomputed matrices done by server and sent to dispatchers11

12. Cruising StageOnce mostly balanced, distribute data blocks evenly until load becomes unbalanced again (due to disruptions)Keeps balance by never filling same cell again until all cells accessed onceUses sweeping to guarantee balancePi matrix with offset (x,y)One cell per row/columnBy sweeping we go to every cellAssign same number of blocks to every cell12

13. Evaluation Set UpM=60 X N=20 Cloud Matrix10 Storage Servers Per Cell40 8-TB Disks Per ServerK=14+4=18 Coded Blocks214-MB Per Block1.5 X 10^7 Blocks Per CellTop: Average Load Ratio CalculationBottom: D is the difference between average and most loaded13

14. Near Perfect Load Balance? Yes14

15. Comparison With Uniform RandomIf D starts high, Uniform stays high while Randomized Solution Quickly reduces DThis makes sense because Uniform will simply keep the current state15

16. Two Breakdown ScenariosTest Hardware Failure of Entire Row20 Cells AffectedTest Software Update/Failure of Entire Column60 Cells Affected16

17. Take AwayNon-novel idea: Once balance happens, staying balanced requires less workNovel: Precompute distributionCan utilize technique to speedy many similar processesIe. Using strongly connected component heuristic distribute node for social graphDecomposable matrix means this can be turned into subproblemsModularity17

18. One QuestionWith the small difference between sweeping and non-sweeping, is sweeping really necessary?Yes: Very long term (years rather than 100 days) sweeping keeps the utilization near perfect. Useful for datacenters that keep data for long periods of time and are constantly under useNo?18

19. One Counter PointWhile this system is incredibly well designed and useful, I think that any large dataset where the locality of the data is relevant wouldn’t be able to use this system. A good example would be a graph store with a few strongly connected components. Now I understand that this isn’t what the system was designed for, but could some simple redesign make it useful for that? Maybe a heuristic based distribution matrix?19

20. Notes From ReviewsWhat about dispatcher problems?This is discussed in evaluation.Strong assumptions problems?Distribution takes a while to compute and must be done offline.Test vs Practiality?Did have Microsoft resources20

21. RackOutBy Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, Boris GrotPresented by Ben MillerBen Miller21

22. Motivation and BackgroundWhat’s the problem?Ben Miller22

23. Our needs and our solutionsStore more data for growing applicationsProvide a good user experience through fast response timeSupport more and more usersDistributed key-value stores!Distributed hash table to distribute and look up keys quicklyData is stored in memory Data is replicated across multiple serversBen Miller23

24. Why is this a problem?Skew: when some keys are much more popular than the average keyMakes keeping latency down hardCauses load imbalanceThe data is not accessed uniformlyIt is difficult to know the distribution of accessesServer utilization is kept lowReplication is trickyMost implementations use static replicationLots of overhead: memory usage, consistency, load monitoringBen Miller24

25. Where does skew come from?The exponent α (alpha) of the dataset’s power-law distributionThe number of data items comprising the datasetThe function used to distribute data items to micro-shardsBen Miller25

26. RackoutA new way to handle load imbalance caused by skewBen Miller26

27. Idea – Reduce the Number of NodesWe need to make better nodes:Get better servers $$$Aggregate current onesRDMA is Remote Direct Memory AccessOne-sided operationsCreate rack-scale memory poolingCombine servers into a larger logical unitBen Miller27

28. Core Idea – Scale Up NodesBen Miller28

29. Core Idea – Scale Up NodesBen Miller29

30. Unified Memory to the RescueAny server can read but only the server assigned to the shard can write (CREW)We can now preform better load balancingThe rack can now expose a super-shard made up of all micro-shards within the rackBen Miller30

31. A Note on Availability and DurabilityReplication is still relied on to handle failuresRacks already have single points of failureReplication must be done across racksBen Miller31

32. Modeling and Experimental DataAnd mathBen Miller32

33. Modeling Load Balancing, Ideal SpeedupDefinitionsRack/server skew is the ratio of traffic on the most popular rack/server over the average traffic per rack/serverShard skew1 is the shard skew in the regular scale out deploymentCompute1 is the computation power of the server serving the hottest shardComputerGF is the computation power that can serve the hottest super-shard or (GF x compute1)Ben Miller33

34. Simulation of Ideal SpeedupBen Miller34

35. Modeling Queuing for RackOutRequests for each micro-shard P follow a Poisson arrival with a rate ofThere are three types of requests which have different completion times; On any given server i, the power law distribution of the keys, the hash function, the RackOut GF and the read/write mix together determine the per-server arrival rate for each request type t, λitBen Miller35

36. Queuing Simulation ResultsBen Miller36

37. Modeling Faster Remote ReadsScale-Out NUMA is a new system that is faster than RDMAsoNUMA performs remote reads at latencies that are within 4x of local DRAMBen Miller37

38. Experimental vs ModelBen Miller38

39. Thoughts and DiscussionBen Miller39

40. Key Ideas"All problems in computer science can be solved by another level of indirection“ - David WheelerThere is a such thing as too much scaleWe can and should create mathematical models of systemsBen Miller40

41. DiscussionHow realistic is this evaluation?Is it ok to assume that the data distribution follows power-law?Fault-tolerance? Replication?Does this still work if the data is well balanced?Is there a way to balance write operations?How does this fair against other systems?What are the security implications of memory pooling?Ben Miller41