/
Replication and Distribution Replication and Distribution

Replication and Distribution - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
370 views
Uploaded On 2018-01-31

Replication and Distribution - PPT Presentation

CSE 444 Spring 2012 University of Washington HASH MAPS Hash Maps Precursors to Bloom filters Used to reduce communication while joining S Set to transmit S x 1 x 2 x n ID: 626753

documents site messages hash site documents hash messages commit locks probability writes sends number read million received data abort

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Replication and Distribution" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Replication and Distribution

CSE 444

Spring 2012

University of WashingtonSlide2

HASH MAPSSlide3

Hash Maps

Precursors to Bloom filters.

Used to reduce communication while joining.

S = Set to transmit.

S = {x

1

, x

2

, …,

x

n

}

H = Hash Map.

An array of

m

bits. Slide4

Operation

To insert

x

in

H

:

C

ompute the hash on x to get a bit position

j

Set

j

to 1.

To send

S

, insert all of its elements in

H

.

Two distinct elements can hash to 1 position.

Creates false positives.Slide5

Question

Data supplier

R

has

N = 1 million

documents. Data supplier

S

also has

N = 1 million

documents. Each document

i

s

1KB

. They have

50

documents in common and they want to compute these. They will proceed as follows:

1.

R

computes a hash map

M

with

cN

bits, where

c=8

and sends it to

S

.

2

.

S

checks its items in

M

and sends all matches to

R

.

3.

R

computes the result and sends the matching

50

documents to

S

.

Q:

Indicate

the total number of bytes transferred over the network in each step.Slide6

Analysis

Recall |

H

| = m.

Insert

one

element into

H.

Probability that bit

j

remains

0?p = (1 – 1/m)

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0Slide7

Analysis

Recall |

H

| = m.

Insert

all

n

elements into

H.

Probability that bit

j remains 0?

p = (1 – 1/m)

n

= e-n/m (for large m)

0

1

0

0

1

1

0

1

0

0

0

0

1

1

0

0

0

1

0

0Slide8

Probability of False Positives

Take a random element

y

, and check if its hash is set to

1

in

H

.

Probability of FP = probability that the hash is 1.

Probability that bit

j

is 1?

p = 1 – (1 – 1/m)

n

= 1 – e-n/m (for large m)

0

1

0

0

1

1

0

1

0

0

0

0

1

1

0

0

0

1

0

0Slide9

Question

Data supplier

R

has

N = 1 million

documents. Data supplier

S

also has

N = 1 million

documents. Each document

i

s 1KB. They have

50

documents in common and they want to compute these. They will proceed as follows:

1. R computes a hash map M

with

cN

bits, where

c=8

and sends it to

S

.

2.

S

checks its items in

M

and sends all matches to

R

.

3.

R

computes the result and sends the matching

50

documents to

S

.

Indicate the total number of bytes transferred over the network in each step.Slide10

Solution

Step 1: Send the hash map.

cN

bits = 1 million bytes =

1 MB

.

Step 2: Number of matched tuples (included false positives)

FP rate = 1 – e

-n/m

= 11%

110,000 false positive documents

110,050 documents in total (including the 50 common ones)

110.05 MB

50 documents =

50KBTotal of 111.1 MB

The naïve solution without hash maps takes

1 GB

of data

transferSlide11

Distributed lockingSlide12

Setup

50%

r

ead only

2% writes

10% read only

2% writes

10% read only

2% writes

10% read only

2% writes

10% read only

2% writes

Each site can communicate with every other site.Slide13

Read-locks-one

Write-locks-all

What is the average number of inter-site messages exchanged?

All reads are local, so no locks are acquired.

Each write requires 4 other locksSlide14

Majority locking

What is the average number of inter-site messages?

2 other locks needed for both reads and writes.

What if you could broadcast across sites with 1 message?

Lock acquisition and release is 1 message for all sites

Lock grants still takes at 1 message per site.Slide15

Primary-copy locking

What is the average number of inter-site messages?

The copies need to acquire locks for each operation.

48% of the actions need locks. Slide16

Two phase commitSlide17

Two-Phase Commit

Coordinator :

0

Three subordinates :

{1, 2, 3}

Messages

P

(Prepare)

C

(Commit)

A

(Abort)Y (Yes vote)

N

(No vote)

Ignore acks.Slide18

2PC

What messages

are

exchanged

for a

successful commit

?

(0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C)

When

exactly does

the

commit occur?When coordinator

force-wrote the

commit

record.Slide19

2PC (continued)

If

the

coordinator

has sent

all

the

prepare

messages but has not yet received a vote from site

1,

can it abort the transaction at this point, and send abort messages to the subordinates?

If

the

coordinator has sent all the prepare

messages, received a No vote from site 1, but has not yet received the votes of sites 2 and

3,

s

hould it

wait for the two missing votes, or should it proceed to

abort?

If

site 1

has received a

prepare

message and voted

Yes

, but has not received any commit or abort

messages, and

Site 1 contacts all other subordinates

and discovers that they have all voted Yes, can site 1 commit the transaction

?