/
Data-Centric Reconfiguration with Network-Attached Disks Data-Centric Reconfiguration with Network-Attached Disks

Data-Centric Reconfiguration with Network-Attached Disks - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
411 views
Uploaded On 2016-06-29

Data-Centric Reconfiguration with Network-Attached Disks - PPT Presentation

Alex Shraer Technion Joint work with JP Martin D Malkhi M K Aguilera MSR I Keidar Technion Preview 2 The setting datacentric replicated storage Simple networkattached storagenodes ID: 382436

consensus storage reconfiguration config storage consensus config reconfiguration weak reconfig coordination data scan distributed centric based returns italy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data-Centric Reconfiguration with Networ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data-Centric Reconfiguration with Network-Attached Disks

Alex Shraer (Technion)

Joint work with:

J.P. Martin, D. Malkhi, M. K. Aguilera (MSR)

I.

Keidar

(

Technion

)Slide2

Preview

2The setting: data-centric replicated storageSimple network-attached storage-nodes

Our contributions:

First distributed reconfigurable R/W storage

Asynch. VS. consensus-based reconfiguration

Allows to add/remove

storage-nodes dynamicallySlide3

Enterprise Storage Systems

Highly reliable customized hardwareControllers, I/O ports may become a bottleneckExpensiveUsually not extensibleDifferent solutions for different scaleExample(HP): High end - XP (

1152

disks), Mid range – EVA (324 disks)

3Slide4

Alternative – Distributed Storage

Made up of many storage nodesUnreliable, cheap hardwareFailures are the norm, not an exception

Challenges:

Achieving

reliability

and consistencySupporting reconfigurations

4Slide5

Distributed Storage Architecture

Unpredictable network delays (asynchrony)

Cloud Storage

LAN/

WAN

read

write

5

Storage Clients

Dynamic,

Fault-prone

Fault-prone

Storage NodesSlide6

A Case for

Data-Centric Replication

Client-side code runs replication logic

Communicates with multiple storage nodes

Simple storage nodes (servers)

Can be network-attached disks Not necessarily PCs with disks Do not run application-specific code Less fault-prone componentsSimply respond to client requests High throughput

Do not communicate with each other

If storage-nodes communicate,

their failure is

likely to

be correlated!

Oblivious to where other replicas of each object are stored

Scalable, same storage node can be used for many replication sets

not-so-thin

client

thin

storage

nodeSlide7

Real Systems

Are Dynamic

7

The challenge: maintain

consistency

, reliability, availability

LAN/

WAN

reconfig

{–A, –B}

A

B

C

D

E

reconfig

{–C, +F,…,

+I}

F

G

I

HSlide8

Pitfall of Naïve Reconfiguration

8

A

B

C

D

{A, B, C, D}

{A, B, C, D}

{A, B, C, D}

{A, B, C, D}

{A, B, C, D}

{A, B, C, D}

{A, B, C, D, E}

{A, B, C}

{A, B, C, D, E}

{A, B, C, D, E}

{A, B, C}

{A, B, C}

E

delayed

delayed

delayed

delayed

reconfig

{+E

}

reconfig

{-D

}

{A, B, C, D, E}Slide9

Returns

“Italy”!

Pitfall of Naïve Reconfiguration

9

A

B

C

D

{A, B, C, D, E}

{A, B, C}

{A, B, C, D, E}

{A, B, C, D, E}

{A, B, C}

{A, B, C}

E

write x “Spain”

read x

{A, B, C, D, E}

X = “Italy”, 1

X = “Italy”, 1

X = “Spain”, 2

X = “Spain”, 2

X = “Spain”, 2

X = “Italy”, 1

X = “Italy”, 1

X = “Italy”, 1

Split Brain!Slide10

Reconfiguration Option 1: Centralized

Can be automatic E.g., Ursa Minor [Abd-El-Malek et al., FAST 05]

Downtime

Most solutions stop R/W while reconfiguring

Single point of failure

What if manager crashes while changing the system? 10

Tomorrow Technion servers will be down

for maintenance from 5:30am to 6:45am

Virtually Yours,

Moshe BarakSlide11

Reconfiguration Option 2:

Distributed AgreementServers agree

on next configuration

Previous solutions

not data-centric

No downtimeIn theory, might never terminate [FLP85]In practice, we have partial synchrony so it usually works

11Slide12

Reconfiguration Option 3: DynaStore

[Aguilera, Keidar,

Malkhi

, S., PODC09]

12

Distributed & completely asynchronous

No downtime

Always terminates

Not data-centricSlide13

In this work: Dyna

Disk dynamic data-centric R/W storage

13

First distributed data-centric solution

No downtime

Tunable reconfiguration methodModular design, coordination is separate from dataAllows easily setting/comparing the coordination methodConsensus-based VS. asynchronous reconfiguration

Many shared objects

Running a protocol instance per object too costly

Transferring all state at once might be infeasible

Our solution: incremental state transfer

Built with an external (weak) location service

We formally state the requirements from such a serviceSlide14

Location Service

Used in practice, ignored in theory

We formalize the weak external service as an

oracle

:

Not enough to solve reconfiguration

14

oracle.query

( )

returns some “legal” configuration

If reconfigurations stop and

oracle. query

() invoked infinitely many times, it eventually returns last system configurationSlide15

The Coordination Module in

DynaDisk Storage devices in a configuration conf

= {+A, +B, +C}

z

x

y

next

config

:

z

x

y

next

config

:

z

x

y

next

config

:

A

B

C

Distributed R/W objects

Updated similarly to ABD

Distributed

“weak snapshot”

object

API:

update(set of changes)→

OK

scan()

set

of updates

15Slide16

Coordination with Consensus

z

x

y

next

config

:

z

x

y

next

config

:

z

x

y

next

config

:

A

B

C

reconfig

({–C

})

reconfig

({+D

})

Consensus

+D

–C

+D

+D

+D

+D

+D

+D

update

:

scan: read & write-back next

config

from majority

every scan returns +D or

16Slide17

Weak Snapshot – Weaker than consensus

No need to agree on the next configuration, as long as each process has a set of possible next configurations, and all such sets intersectIntersection allows to converge and again use a single

config

Non-empty intersection

property of weak snapshot:

Every two non-empty sets returned by scan( ) intersect

Example: Client 1’s scan Client 2’s scan

{+D} {+D}

{–C} {+D, –C}

{+D} {–C}

Consensus

17Slide18

Coordination without consensus

z

x

y

next

config

:

z

y

next

config

:

z

y

next

config

:

A

B

C

reconfig

({–C

})

reconfig

({+D

})

update

:

scan:

read & write-back proposals from majority (twice)

CAS({–C},

,

0

)

+D

CAS({–C},

,

1

)

+D

–C

WRITE ({–C}, 0

)

OK

OK

2

2

2

1

1

1

0

0

0

–CSlide19

Tracking Evolving Config’s

With consensus: agree on next configuration

Without consensus – usually a chain, sometimes a DAG:

19

A, B, C

A,B,C,D

+D

 C

A,B

A, B, D

A, B, C

+D

+D

 C

 C

A,B,C,D

A, B, D

Inconsistent

updates found

and merged

weak snapshot

scan() returns {+D, -C}

scan() returns {+D}

All non-empty scans intersectSlide20

Consensus-based VS.

Asynch. Coordination

Two implementations of weak snapshots

Asynchronous

Partially synchronous (consensus-based)

Active Disk

Paxos

[

Chockler

,

Malkhi

, 2005]

Exponential

backoff

for leader-election

Unlike asynchronous coordination, consensus-based might not terminate

[FLP85]

Storage overhead

Asynchronous: vector of updates

vector size ≤ min(#

reconfigs

, #members in config)Consensus-based: 4 integers and the chosen updatePer storage device and configuration20Slide21

Strong progress guarantees are not for free

Consensus-based

Asynchronous (no consensus)

Significant negative effect on R/W latency

Slightly

better,much

more predictable

reconfig

latency when many

reconfig

execute simultaneously

The same when no reconfigurations

21Slide22

Future & Ongoing Work

Combine asynch. and partially-synch. coordinationConsider other weak snapshot implementations

E.g., using randomized consensus

Use weak snapshots to reconfigure other services

Not just for R/W

22Slide23

Summary

DynaDisk – dynamic data-centric R/W storageFirst decentralized solutionNo downtimeSupports many objects, provides incremental

reconfig

Uses one coordination object per

config

. (not per object)Tunable reconfiguration method We implemented asynchronous and consensus-basedMany other implementations of weak-snapshots possibleAsynchronous coordination in practice:Works in more circumstances → more robust

But, at a cost – significantly affects ongoing R/W ops

23