/
Limplock: Understanding Limplock: Understanding

Limplock: Understanding - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
355 views
Uploaded On 2018-03-12

Limplock: Understanding - PPT Presentation

the Impact of Limpware on Scaleout Cloud Systems 1 Thanh Do Mingzhe Hao Tanakorn Leesatapornwongsa Tiratat Patanaanake and Haryadi S Gunawi Hardware fails ID: 648805

node limplock nic limpware limplock node limpware nic failure tolerant degraded fail operation hadoop purpose point multi cluster slow

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Limplock: Understanding" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Limplock: Understanding the Impact of Limpware on Scale-out Cloud Systems

1

Thanh Do

*, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, and Haryadi S. GunawiSlide2

Hardware fails2

Growing complexity of …Technology scalingManufacturing

Design logicUsageOperating environment… makes HW fail differently

Complete fail-stopFail partial CorruptionPerformance degradation? Rich literatureSlide3

The 1st anecdote3

“…

1Gb NIC card on a machine that suddenly

starts transmitting at 1 kbps, this slow machine caused a chain reaction upstream in such a way that the performance of entire workload for a 100 node cluster was crawling at a snail's pace, effectively making the system unavailable for all practical purposes.” – Borthakur of FacebookDegraded NIC! (1000000x)Cascadingimpact!Slide4

Cases of Degraded HW 4

“In 2011, one of the DDN 9900 units had 4 servers

having high wait times on I/O for a certain set of d

isk LUNs. The maximum wait time was 103 seconds. This was left uncorrected for 50 days.” – Kasick of CMU, Harms of Argonne“The disk attempts to re-read each block multiple times before responding.” – Baptist of Cleversafe “On Intrepid, we had a bad batch of optical transceivers with an extremely high error rate. That results in an effective throughput of 1-2 Kbps.” – Harms of Argonne Many others: “Yes we've seen that in production” Slide5

Limpware5

Does HW degrade? YesLimpware: Hardware whose performance degrades significantly compared to its specification

Is this a destructive failure mode? YesCascading failures, no “fail in place”

No systematic analysis on its impactSlide6

Study Summary6

56 experiments that benchmark 5 systemsHadoop, HDFS, Zookeeper, Cassandra, HBase

22 protocols8 hours under normal scenarios207 hours

under limpware scenariosUnearth many limpware-intolerant designsOur findings:A single piece of limpware (e.g. NIC) causes severe impact on a whole clusterSlide7

Outline7

IntroductionSystem analysisLimplockLimpware-Tolerant SystemsConclusionSlide8

Anecdotal impacts8

“The performance of a

100 node cluster was crawling at a snail's pace

” – FacebookBut, … why?Slide9

System analysis9

GoalsMeasure system-level impacts

Find design flawsMethodologyTarget cloud systems (e.g., HDFS

, Hadoop, ZooKeeper)Inject load + limpwareE.g. slow a NIC to 1 Mbps, 0.1 Mbps, etc.White-box analysis (internal probes)Find design flawsSlide10

Example10

Run a distributed protocolE.g., 3-node write in HDFS

Measure slowdowns under:No failure, crash,

a degraded NICworkload

10 Mbps

NIC

1Mbps

NIC

0.1 Mbps

NIC

1

10x

slower

100x

slower

1000x

slower

Execution

slowdownSlide11

Outline11

IntroductionSystem analysisHadoop case study

LimplockLimpware Tolerant Cloud SystemsConclusionSlide12

Hadoop Spec. Exec.

12

Hadoop tail-tolerant?

Why speculative exec is not triggered?Consider degraded NIC on a map nodeTask M2’s speed = M1 and M3Input data is local!But all reducers are slowStraggler: slow vs. others of same jobNo straggler detected!FlawsTask-level straggler detectionSingle point of failure!Wordcounton Hadoop

Mappers

Reducers

M1

M2

M3

1

10Slide13

Cascading failures13

A degraded NIC

 degraded tasks(Degraded tasks are slower by orders of magnitude)

Slow tasks use up slots  degraded nodeDefault: 2 mappers and 2 reducers per nodeIf all slots are used  node is “unavailable”All nodes in limp mode  degraded clusterMMRRHealthy nodein limp modeNodewithslow NICSlide14

Cluster collapse

14

Macrobenchmark: Facebook workload

30-node cluster One node w/ degraded NIC (0.1 Mbps)Cluster collapse! Why?1 job/hour172 jobs/hourSlide15

15

Fail-stop

tolerant, but

not

limpware tolerant

(no

failover recovery)Slide16

Outline16

IntroductionSystem analysisFormalizing the problem: Limplock

Definitions and causesLimpware-Tolerant SystemsConclusionSlide17

Limplock17

DefinitionThe system progresses slowly due to limpware and is not capable of failing over to healthy components

(i.e., the system is “locked” in

limping mode)3 levels of limplockOperationNodeClusterSlide18

Limplock Levels18

Operation LimplockOperation involving

limpware is “locked” in limping mode; no failover

Node LimplockA situation where operations that must be served by this node experience limplock, although the operations do not involve limpwareCluster LimplockThe whole cluster is in limplock due to limpwareSlide19

Causes of Limplock19

Operation Limplock

Single point of failure

Hadoop slow map taskHBase “Gateway”

M1

M2

M3

Mappers

ReducersSlide20

Causes of Limplock20

Operation Limplock

Single point of failureCoarse

-grained timeout(more in the paper)512MB writeto HDFS1Slowdown10100

Reason: No

timeout is

triggered

Coarse

-grained timeout in HDFS

60 second timeout on every 64 KB

Could limp almost to

1 KB/sSlide21

Causes of Limplock21

Operation Limplock

Single point of

failureCoarse-grained timeout…Node LimplockBounded multi-purpose thread poolIn-memorymeta readsMasterMeta writes

In-memory

m

eta reads

Master

In-memory reads

> 100x slower

than normal

1

S

lowdown

10

100

Resource exhaustion

by limplocked operation

In-memory metadata reads are

blockedSlide22

Causes of Limplock22

Operation Limplock

Single point of

failureCoarse-grained timeout…Node LimplockBounded multi-purpose thread poolBounded multi-purpose queuemessages

messagesSlide23

Causes of Limplock23

Operation Limplock

Single point of

failureCoarse-grained timeout…Node LimplockBounded multi-purpose thread poolBounded multi-purpose queueUnbounded thread pool/queueEx: Backlogged queue at leaderNode limplock at leader because garbage collection works hardQuorum write: 10x slowdownStress load20 seconds

1

S

lowdown

10

ZooKeeper

Leader

Client quorum write

Followers

Stress load

600 secondsSlide24

Operation LimplockSingle point of

failure

Coarse-grained timeout

…Node LimplockBounded multi-purpose thread poolBounded multi-purpose queueUnbounded thread pool/queueCluster LimplockAll nodes in limplockEx: resource exhaustion in Hadoop, HDFS Regeneration Master limplock in master-slave architectureEx: cases in ZooKeeper, HDFSCauses of LimplockSlide25

Analysis Results25

Found 15 protocols that exhibit limplock8 in HDFS1 in Hadoop

2 in ZooKeeper4 in HBase

Limplock happens in almost all systems we have analyzedSlide26

Outline26

IntroductionSystem analysisLimplock

Limpware-Tolerant Cloud SystemsConclusionSlide27

AnticipationLimpware-tolerant design patternsLimpware static analysisLimpware statisticsExisting work: memory failure, disk failure, etc.

DetectionPerformance degradation  implicit (no hard errors)

Study explicit causes (e.g. block remapping, error correcting)

RecoveryHow to “fail in place”?Better to fail-stop than fail-slow?Quarantine?UtilizationFail-stop: fail or workingLimpware: degrade 1-100%Principles of limpware …27Slide28

Conclusion28

New failure modes  transform systemsLimpware is a “new”, destructive failure mode

Orders of magnitude slowdownCascading failures

No “fail in place” in current systemsA need for Limpware-Tolerant SystemsSlide29

Thank you!Questions?29