/
15-446 Distributed Systems 15-446 Distributed Systems

15-446 Distributed Systems - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
375 views
Uploaded On 2018-02-06

15-446 Distributed Systems - PPT Presentation

Spring 2009 L 11 Consistency 1 Important Lessons Replication good for performancereliability Key challenge keeping replicas uptodate Wide range of consistency models Will see more next lecture ID: 628581

write inf writes consistency inf write consistency writes data vector bayou conflict updates read process log time propagation operation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "15-446 Distributed Systems" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

15-446 Distributed SystemsSpring 2009

L-11 Consistency

1Slide2

Important Lessons

Replication  good for performance/reliabilityKey challenge

 keeping replicas up-to-date

Wide range of consistency modelsWill see more next lectureRange of correctness properties

Most obvious choice (sequential consistency) can be expensive to implementMulticast, primary, quorum2Slide3

Today's Lecture

ACID vs. BASE – philosophyClient-centric consistency modelsEventual consistency

Bayou

3Slide4

Two Views of Distributed Systems

Optimist: A distributed system is a collection of independent computers that appears to its users as a single coherent system

Pessimist: “You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.” (Lamport)

4Slide5

Recurring Theme

Academics like:Clean abstractionsStrong semanticsThings that prove they are smart

Users like:Systems that work (most of the time)

Systems that scaleConsistency per se isn’t importantEric Brewer had the following observations

5Slide6

A Clash of Cultures

Classic distributed systems: focused on ACID semantics (transaction semantics)A

tomicity: either the operation (e.g., write) is performed on all replicas or is not performed on any of them

Consistency: after each operation all replicas reach the same stateI

solation: no operation (e.g., read) can see the data from another operation (e.g., write) in an intermediate state Durability: once a write has been successful, that write will persist indefinitelyModern Internet systems: focused on BASEBasically AvailableSoft-state (or scalable)Eventually consistent

6Slide7

ACID vs BASE

ACID

Strong consistency for transactions highest priorityAvailability less importantPessimistic

Rigorous analysisComplex mechanisms

BASEAvailability and scaling highest prioritiesWeak consistencyOptimisticBest effortSimple and fast

7Slide8

Why Not ACID+BASE?

What goals might you want from a system?C, A, P

Strong Consistency: all clients see the same view, even in the presence of updates

High Availability: all clients can find some replica of the data, even in the presence of failures

Partition-tolerance: the system properties hold even when the system is partitioned8Slide9

CAP Theorem [Brewer]

You can only have two out of these three propertiesThe choice of which feature to discard determines the nature of your system

9Slide10

Consistency and Availability

Comment:Providing transactional semantics requires all functioning nodes to be in contact with each other (no partition)Examples:

Single-site and clustered databasesOther cluster-based designs

Typical Features:Two-phase commitCache invalidation protocolsClassic DS style

10Slide11

Partition-Tolerance and Availability

Comment:Once consistency is sacrificed, life is easy….Examples:

DNSWeb cachesPractical distributed systems for mobile environments: Coda, Bayou

Typical Features:Optimistic updating with conflict resolutionThis is the “Internet design style”TTLs and lease cache management

11Slide12

Voting with their Clicks

In terms of large-scale systems, the world has voted with their clicks:Consistency less important than availability and partition-tolerance

12Slide13

Today's Lecture

ACID vs. BASE – philosophyClient-centric consistency modelsEventual consistency

Bayou

13Slide14

Client-centric Consistency Models

A mobile user may access different replicas of a distributed database at different times. This type of behavior implies the need for a view of consistency that provides guarantees for single client regarding accesses to the data store.

14Slide15

Session Guarantees

When client move around and connects to different replicas, strange things can happenUpdates you just made are missingDatabase goes back in time

Responsibility of “session manager”, not serversTwo sets:Read-set: set of writes that are relevant to session reads

Write-set: set of writes performed in sessionUpdate dependencies captured in read sets and write setsFour different client-central consistency modelsMonotonic reads

Monotonic writesRead your writesWrites follow reads15Slide16

16

Monotonic Reads

A data store provides monotonic read consistency if when a process reads the value of a data item x, any successive read operations on x by that process will always return the same value or a more recent value.

Example error: successive access to email have ‘disappearing messages’

A monotonic-read consistent data storeA data store that does not provide monotonic reads.

indicates propagation of the earlier write

L1 and L2 are

two locations

process moves from L1 to L2

process moves from L1 to L2

No propagation guaranteesSlide17

17

Monotonic Writes

A write operation by a process on a data item

x is completed before any successive write operation on x by the same process. Implies a copy must be up to date before performing a write on it.

Example error: Library updated in wrong order.A monotonic-write consistent data store.A data store that does not provide monotonic-write consistency.

In both examples,

process performs a

write at L1, moves and performs a write at L2Slide18

18

Read Your Writes

The effect of a write operation by a process on data item

x will always be seen by a successive read operation on x by the same process. Example error: deleted email messages re-appear.

A data store that provides read-your-writes consistency.A data store that does not.

In both examples,

process performs a

write at L1, moves and performs a read at L2Slide19

19

Writes Follow Reads

A write operation by a process on a data item

x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x

that was read. Example error: Newsgroup displays responses to articles before original article has propagated thereA writes-follow-reads consistent data storeA data store that does not provide writes-follow-reads consistency

In both examples,

process performs a read at L1, moves and performs a write

at L2Slide20

Today's Lecture

ACID vs. BASE – philosophyClient-centric consistency modelsEventual consistency

Bayou

20Slide21

Many Kinds of Consistency

Strict: updates happen instantly everywhere

A read has to return the result of the latest write which occurred on that data item

Assume instantaneous propagation; not realisticLinearizable

: updates appear to happen instantaneously at some point in timeLike “Sequential” but operations are ordered using a global clockPrimarly used for formal verification of concurrent programsSequential: all updates occur in the same order everywhere

Every client sees the writes in the same order Order of writes from the same client is preservedOrder of writes from different clients may not be preservedEquivalent to Atomicity + Consistency + IsolationEventual consistency: if all updating stops then eventually all replicas will converge to the identical values

21Slide22

Eventual Consistency

There are replica situations where updates (writes) are rare and where a fair amount of inconsistency can be tolerated.DNS – names rarely changed, removed, or added and changes/additions/removals done by single authority

Web page update – pages typically have a single owner and are updated infrequently.If no updates occur for a while, all replicas should gradually become consistent.

May be a problem with mobile user who access different replicas (which may be inconsistent with each other).

22Slide23

Why (not) eventual consistency?

Support disconnected operationsBetter to read a stale value than nothingBetter to save writes somewhere than nothing

Potentially anomalous application behaviorStale reads and conflicting writes…

23Slide24

Implementing Eventual Consistency

Can be implemented with two steps:

All writes eventually propagate to all replicas

Writes, when they arrive, are written to a log and applied in the same order at all replicasEasily done with timestamps and “undo-

ing” optimistic writes24Slide25

Update Propagation

Rumor or epidemic stage:Attempt to spread an update quicklyWilling to tolerate incompletely coverage in return for reduced traffic overhead

Correcting omissions:Making sure that replicas that weren’t updated during the rumor stage get the update

25Slide26

Anti-Entropy

Every so often, two servers compare complete datasetsUse various techniques to make this cheapIf any data item is discovered to not have been fully replicated, it is considered a new rumor and spread again

26Slide27

Today's Lecture

ACID vs. BASE – philosophyClient-centric consistency modelsEventual consistency

Bayou

27Slide28

System Assumptions

Early days: nodes always on when not crashedBandwidth always plentiful (often LANs)Never needed to work on a disconnected node

Nodes never movedProtocols were “chatty”

Now: nodes detach then reconnect elsewhereEven when attached, bandwidth is variableReconnection elsewhere means often talking to different replicaWork done on detached nodes

28Slide29

Disconnected Operation

Challenge to old paradigmStandard techniques disallowed any operations while disconnectedOr disallowed operations by others

But eventual consistency not enoughReconnecting to another replica could result in strange results

E. g., not seeing your own recent writesMerely letting latest write prevail may not be appropriateNo detection of read-dependenciesWhat do we do?

29Slide30

Bayou

System developed at PARC in the mid-90’sFirst coherent attempt to fully address the problem of disconnected operationSeveral different components

30Slide31

Bayou Architecture

31Slide32

Motivating Scenario: Shared Calendar

Calendar updates made by several peoplee.g., meeting room scheduling, or exec+adminWant to allow updates offline

But conflicts can’t be prevented

Two possibilities:Disallow offline updates?Conflict resolution?

32Slide33

Conflict Resolution

Replication not transparent to applicationOnly the application knows how to resolve conflictsApplication can do record-level conflict detection, not just file-level conflict detection

Calendar example: record-level, and easy resolution

Split of responsibility:Replication system: propagates updatesApplication: resolves conflictOptimistic application of writes requires that writes be “undo-able”

33Slide34

Meeting room scheduler

Reserve same room at same time: conflict

Reserve different rooms at same time: no conflictReserve same room at different times: no conflict

Only the application would know this!

Rm2

Rm1

time

No conflict

34Slide35

Meeting Room Scheduler

Rm2

Rm1

time

No conflict

35Slide36

Meeting Room Scheduler

Conflict detection

Rm2

Rm1

time

conflict

36Slide37

Meeting Room Scheduler

Automated resolution

Rm2

Rm1

time

No conflict

37Slide38

Meeting Room Scheduler

Rm2

Rm1

time

No conflict

38Slide39

Other Resolution Strategies

Classes take priority over meetingsFaculty reservations are bumped by admin reservationsMove meetings to bigger room, if available

Point:Conflicts are detected at very fine granularityResolution can be policy-driven

39Slide40

Updates

Client sends update to a serverIdentified by a triple:<Commit-stamp, Time-stamp, Server-ID of accepting server>

Updates are either committed or tentativeCommit-stamps increase monotonically

Tentative updates have commit-stamp = inf

40Slide41

Anti-Entropy Exchange

Each server keeps a vector timestampWhen two servers connect, exchanging the version vectors allows them to identify the missing updates

These updates are exchanged in the order of the logs, so that if the connection is dropped the crucial

monotonicity property still holdsIf a server X has an update accepted by server Y, server X has all previous updates accepted by that server

41Slide42

Example with Three Servers

P

[0,0,0]

A

[0,0,0]B[0,0,0]

Version Vectors

42Slide43

Vector Clocks

Vector clocks overcome the shortcoming of Lamport logical clocks

L(e

) < L(e

’) does not imply e happened before e’Vector timestamps are used to timestamp local eventsThey are applied in schemes for replication of data

43Slide44

Vector Clocks

How to ensure causality?Two rules for delaying message processing:

VC must indicate that this is next message from sourceVC must indicate that you have all the other messages that “caused” this message

44Slide45

All Servers Write Independently

P

<inf,1,P>

<inf,4,P><inf,8,P>[8,0,0]

A<inf,2,A><inf,3,A><inf,10,A>[0,10,0]

B<inf,1,B><inf,5,B><inf,9,B>[0,0,9]45Slide46

Bayou Writes

Identifier (commit-stamp, time-stamp, server-ID)Nominal valueWrite dependencies

Merge procedure

46Slide47

Conflict Detection

Write specifies the data the write depends on:Set X=8 if Y=5 and Z=3Set Cal(11:00-12:00)=dentist if Cal(11:00-12:00) is null

These write dependencies are crucial in eliminating unnecessary conflictsIf file-level detection was used, all updates would conflict with each other

47Slide48

Conflict Resolution

Specified by merge procedure (mergeproc)When conflict is detected, mergeproc is calledMove appointments to open spot on calendar

Move meetings to open room

48Slide49

P and A Do Anti-Entropy Exchange

P

<inf,1,P>

<inf,2,A><inf,3,A><inf,4,P><inf,8,P><inf,10,A>[8,10,0]

A<inf,1,P><inf,2,A><inf,3,A><inf,4,P><inf,8,P><inf,10,A>

[8,10,0]B<inf,1,B><inf,5,B><inf,9,B>

[0,0,9]<inf,1,P><inf,4,P><inf,8,P>

[8,0,0]<inf,2,A><inf,3,A><inf,10,A>

[0,10,0]

49Slide50

Bayou uses a primary to commit a total order

Why is it important to make log stable?

Stable writes can be committed Stable portion of the log can be truncated

Problem: If any node is offline, the stable portion of all logs stops growingBayou’s solution:

A designated primary defines a total commit order Primary assigns CSNs (commit-seq-no)Any write with a known CSN is stableAll stable writes are ordered before tentative writes

50Slide51

P Commits Some Early Writes

P

<1,1,P><2,2,A>

<3,3,A><inf,4,P><inf,8,P><inf,10,A>[8,10,0]

A<inf,1,P><inf,2,A><inf,3,A><inf,4,P><inf,8,P><inf,10,A>[8,10,0]

B<inf,1,B><inf,5,B><inf,9,B>[0,0,9]

<inf,1,P><inf,2,A><inf,3,A><inf,4,P>

<inf,8,P><inf,10,A>[8,10,0]

51Slide52

P and B Do Anti-Entropy Exchange

P

<1,1,P>

<2,2,A><3,3,A><inf,1,B><inf,4,P><inf,5,B><inf,8,P><inf,9,B><inf,10,A>

[8,10,9]A<inf,1,P><inf,2,A>

<inf,3,A><inf,4,P><inf,8,P><inf,10,A>[8,10,0]

B<1,1,P><2,2,A><3,3,A><inf,1,B><inf,4,P><inf,5,B><inf,8,P><inf,9,B><inf,10,A>[8,10,9]

<1,1,P><2,2,A><3,3,A><inf,4,P><inf,8,P>

<inf,10,A>[8,10,0]<inf,1,B>

<inf,5,B><inf,9,B>[0,0,9]

52Slide53

P Commits More Writes

P

<1,1,P><2,2,A>

<3,3,A><4,1,B><5,4,P><6,5,B><7,8,P><inf,9,B><inf,10,A>

[8,10,9]P<1,1,P><2,2,A>

<3,3,A><inf,1,B><inf,4,P><inf,5,B><inf,8,P><inf,9,B><inf,10,A>[8,10,9]

53Slide54

Bayou Summary

Simple gossip based designKey difference  exploits knowledge of application semantics

To identify conflictsTo handle merges

Greater complexity for the programmerMight be useful in ubicomp context

54Slide55

Important Lessons

ACID vs. BASEUnderstand the tradeoffs you are makingACID makes things better for programmer/system designed

BASE often preferred by usersClient-centric consistency

Different guarantees than data-centricEventual consistencyBASE-like design 

better performance/availabilityMust design system to tolerateBayou a good example of making tolerance explicit55Slide56

56Slide57

Vector Clocks

Vi [

i

] is the number of events that pi has

timestampedVi [ j ] ( j

≠ i) is the number of events at pj that pi has been affected by

Vector clock Vi at process pi is an array of N integersinitially

Vi[j] = 0 for i, j = 1, 2, …Nbefore p

i timestamps an event it sets Vi[i] := Vi[i] +1

pi piggybacks t = Vi on every message it sendswhen pi receives (

m,t) it sets Vi[j] := max(Vi[j

] , t[j

]) j = 1, 2, …N ( then before next event adds 1 to own element using rule 2)57Slide58

Vector Clocks

At p

1

a occurs at (1,0,0); b occurs at (2,0,0)

piggyback (2,0,0) on m1At p2

on receipt of m1 use max ((0,0,0), (2,0,0)) = (2, 0, 0) and add 1 to own element = (2,1,0) Meaning of =, <=, max etc for vector timestamps

compare elements pairwise58Slide59

Vector Clocks

Note that e e

’ implies L(e)<L(e

’). The converse is also trueCan you see a pair of parallel events?

c || e( parallel) because neither V(c) <= V

(e) nor V(e) <= V

(c)59Slide60

Bayou

Write

log

Version

Vector

0:0

1:0

2:0

0:01:02:00:01:02:0

N0

N1N260Slide61

Bayou propagation

Write

log

Version

Vector

0

:31:02:0

N0N1

N21:0 W(x)

2:0 W(y)3:0 W(z)0:01

:12:00:01:02:0

1:

1 W(x)1:0 W(x)2:0 W(y)3:0 W(z)0:31:02:061Slide62

Bayou propagation

Write

log

Version

Vector

0

:31:02:0

N0N1

N21:0 W(x)

2:0 W(y)3:0 W(z)0:31

:42:00:01:02:0

1:

0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)1:1 W(x)0:31:42:062Slide63

Bayou propagation

Write

log

Version

Vector

N0

N1

N2

0:31:42:00:01

:02:01:0 W(x)1:1 W(x)

2:0 W(y)3:0 W(z)0:41

:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)Which portion ofThe log is stable?63Slide64

Bayou propagation

Write

log

Version

Vector

N0

N1N2

0:31:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0

:31:42:51:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)64Slide65

Bayou propagation

Write

log

Version

Vector

N0

N1N2

0:31:62:5

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0:41:42:0

1:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)

0

:41:42:51:0 W(x)1:1 W(x)2:0 W(y)3:0 W(z)0:31:42:565Slide66

Bayou propagation

Write

log

Version

Vector

0

:31:02:0

N0N1

N21:1:

0 W(x)2:2:0 W(y)3:3:0 W(z

)0:01:12:0

0

:01:02:0∞:1:1 W(x)∞:1:1 W(x)0:01:12:066Slide67

Bayou propagation

Write

log

Version

Vector

0

:41:12:0

N0N1

N21:1:0

W(x)2:2:0 W(y)3:3:0 W(z

)0:01:12:0

0

:01:02:0∞:1:1 W(x)4:1:1 W(x)1:1:0 W(x)2:2:0 W(y)3:3:0 W(z)4:1:

1 W(x)0:41:12:067