/
Network algorithms Presenter- Network algorithms Presenter-

Network algorithms Presenter- - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
367 views
Uploaded On 2018-11-04

Network algorithms Presenter- - PPT Presentation

Kurchi S ubhra H azra Agenda Basic Algorithms such as Leader Election Consensus in Distributed Systems Replication and Fault Tolerance in Distributed Systems GFS as an example of a Distributed System ID: 712996

acceptor state protocol commit state acceptor commit protocol distributed election marker processes systems ordinator process sets proposer paxos replication algorithm system receives

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Network algorithms Presenter-" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Network algorithms

Presenter-

Kurchi

S

ubhra

H

azraSlide2

Agenda

Basic Algorithms such as Leader Election

Consensus in Distributed Systems

Replication and Fault Tolerance in Distributed Systems

GFS as an example of a Distributed SystemSlide3

Network Algorithms

Distributed System is a

collection of entities

where

Each of them is autonomous, asynchronous and

failure-prone

Communicating

through

unreliable

channels

To

perform some common

function

Network algorithms enable such distributed systems to effectively perform these “common functions”Slide4

Gobal State in Distributed Systems

We want to estimate a “consistent” state of a distributed system

Required for determining if the system is deadlocked, terminated

and for debugging

Two approaches:

1. Centralized- All processes and channels report to a central process

2. Distributed –

Chandy

Lamport

AlgorithmSlide5

Chandy

Lamport

Algorithm

Based on Marker Messages M

On receiving M over channel c:

If state is not recorded:

a) Record own state

b) Start recording state of incoming channels

c) Send Marker Messages to all outgoing channels

Else

a) Record state of cSlide6

Chandy

Lamport

Algorithm

P1

P2

P3

e

1

0

e

2

0

e

2

3

e

3

0

e

1

3

a

b

M

e

1

1,2

M

1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels Ch

21

and Ch

31

e

2

1,2,3

M

M

2- P2 receives Marker over Ch

12

, records its state (S2), sets state(Ch

12

) = {} sends Marker to P1 & P3; turns on recording for channel Ch

32

e

1

4

3- P1 receives Marker over Ch

21

, sets state(Ch

21

) = {a}

e

3

2,3,4

M

M

4- P3 receives Marker over Ch

13

, records its state (S3), sets state(Ch

13

) = {} sends Marker to P1 & P2; turns on recording for channel Ch

23

e

2

4

5- P2 receives Marker over Ch

32

, sets state(Ch32) = {b}

e

3

1

6- P3 receives Marker over Ch23, sets state(Ch23) = {}

e

1

3

7- P1 receives Marker over Ch31, sets state(Ch31) = {}

Taken from CS 425/UIUC/Fall 2009Slide7

Leader Election

Suppose you want to

-elect a master server out of n servers

-elect a co-

ordinator

among different mobile systems

Common Leader Election Algorithms

-Ring Election

-Bully Election

Two requirements

Safety (Process with best attribute is elected)

Liveness (Election terminates)Slide8

Ring Election

Processes organized in a ring

Send message clockwise to next process in a ring with its id and own attribute value

Next process checks the election message

if its attribute value is greater, it replaces its own process id with that in the message.

If the attribute value is less, it simply passes on the message

If the attribute value is equal it declares itself as the leader and passes on an “elected” message.

What happens when a node fails?Slide9

Ring Election - Example

Taken from CS 425/UIUC/Fall 2009Slide10

Ring Election - Example

Taken from CS 425/UIUC/Fall 2009Slide11

Bully Algorithm

Best case and worst case scenarios

Taken from CS 425/UIUC/Fall 2009Slide12

Consensus

A set of n processes/systems attempt to “agree” on some information

P

i

begins

in undecided state and proposes value

v

i

є

D

P

i‘s communicate by exchanging valuesPi sets its decision value di and enters decided stateRequirements:1.Termination: Eventually all correct processes decide, i.e., each correct process sets its decision variable2. Agreement : Decision value of all correct processes is the same3. Integrity: If all correct processes proposed v, then any correct decided process has di= vSlide13

2 Phase Commit Protocol

Useful in distributed transactions to perform atomic commit

Atomic Commit: Set of distinct changes applied in a single operation

Suppose A transfers 300 $ from A’s account to B’s bank account.

A= A-300

B=B+300

These operations should be guaranteed for consistency.Slide14

2 Phase Commit Protocol

What happens if the co-

ordinator

and

a participant fails

after

doCommit

?Slide15

Issue with 2PC

Co-

ordinator

B

A

CanCommit

?Slide16

Issue with 2PC

Co-

ordinator

B

A

YesSlide17

Issue with 2PC

Co-

ordinator

B

A

doCommit

A crashes

Co-

ordinator

Crashes

B commits

A new co-

ordinator

cannot know whether A had committed.Slide18

3 Phase Commit Protocol (3PC)

Use an additional stageSlide19

3PC Cont

Co-

ordinator

Cohort 1

Cohort 2

Cohort 3

canCommit

ack

preCommit

ack

commit

commit

commit

commitSlide20

3PC Cont

Why

is this better

?

2PC: execute transaction when everyone is willing to COMMIT

it

3PC: execute transaction when everyone

knows

it will COMMIT(http://www.coralcdn.org/07wi-cs244b/notes/l4d.txt)But 3PC is expensiveTimeouts triggered by slow machinesSlide21

Paxos Protocol

A consensus algorithm

Important Safety Conditions:

Only one value is chosen

Only a proposed value is chosen

Important

Liveness

Conditions:

Some proposed value is eventually chosen

Given a value is chosen, a process can learn the value eventually

Nodes behave as

Proposer, Acceptor and LearnersSlide22

Paxos Protocol – Phase 1

22

Proposer

Acceptor

Acceptor

Acceptor

Acceptor

Select a number n

for proposal

of value v

Prepare message

What about

this acceptor?Majority of acceptors is enough

Acceptors respond back with the highest n it has seen

AcknowledgementSlide23

Paxos Protocol – Phase 2

23

Proposer

Acceptor

Acceptor

Acceptor

Acceptor

n

n

n

Majority of acceptors agree on proposal n with value vSlide24

Paxos Protocol – Phase 2

24

Proposer

Acceptor

Acceptor

Acceptor

Acceptor

Majority of acceptors agree on proposal n with value v

Accept

Acceptors accept

What if v

is null?Slide25

Paxos Protocol

Cont

What if arbitrary number of proposers are allowed?

P

Q

Acceptor

n1

Round 1

Round 2

n2Slide26

Paxos Protocol

Cont

What if arbitrary number of proposers are allowed?

To ensure progress, use distinguished proposer

P

Q

Acceptor

Round 1

Round 2

n3

Round 3

n4

Round 4Slide27

Paxos Protocol

Contd

Some issues:

How to choose proposer?

How do we ensure unique n ?

Expensive protocol

No primary if distinguished proposer used

Originally used by

Paxons

to run their part-time parliamentSlide28

Replication

Replication is important for

Fault Tolerance

Load Balancing

Increased Availability

Requirements:

Transparency

ConsistencySlide29

Failure in Distributed Systems

An important consideration in every design decision

Fault detectors should be :

Complete – should be able to detect a fault when it occurs

Accurate – Does not raise false positives Slide30

Byzantine Faults

Arbitrary

messages and

transitions

Cause

: e.g., software bugs, malicious

attacks

Byzantine Agreement Problem: “Can

a set of concurrent processes achieve coordination in spite of the faulty

behavior

of some of them

?”Concurrent processes could be replicas in distributed systemsSlide31

Practical Byzantine Fault Tolerance(PBFT)

Replication Algorithm that is able to tolerate faults.

Useful for software faults

Why “Practical”?

-> since can be used in an asynchronous environment like the internet

Important Assumptions:

At most

nodes can be faulty

All

replicas start in the same

state

Failures are independent – Practical? Slide32

PBFT Cont..

C

R1

R2

R3

R4

request

p

re-prepare

prepare

commit

reply

C : Client

R1: Primary replica

Client blocks and waits for f+1 replies

After accepting

2f prepares

Execution after 2f+1 commitsSlide33

PBFT Cont

The algorithm provides

-> Safety

By guaranteeing

linearizability

. Pre-prepare and prepare ensures total order on messages

->

Liveness

By providing for view change, when the primary replica fails

. Here, synchrony is assumed.

How do we know apriori the value of f?Slide34

Google File System

Revisited traditional file system design

1. Component failures are a norm

2. Multi-GB Files are common

3. Files mutated by appending new data

4. Relaxed consistency modelSlide35

GFS Architecture

Leader Election/ Replication

Maintains metadata, namespace, chunk metadata

etcSlide36

GFS – Relaxed ConsistencySlide37

GFS – Design Issues

Single Master

Rational: Keep things simple

Problems:

Increasing volume of underlying storage -> Increase in metadata

Clients not as fast as master server -> Master server became bottleneck

Current: Multiple Masters per data center

Ref: http

://queue.acm.org/detail.cfm?id=1594206Slide38

GFS Design Isuues

Replication of chunks

Replication across racks – default number is 3

Allowing concurrent changes to the same file

.

->

In retrospect, they would rather have a single writer

Primary replica serializes mutation to

chunks

-

They do not use any of the consensus protocols before applying mutations to the chunks. Ref: http://queue.acm.org/detail.cfm?id=1594206Slide39

Thank You