CSE 444 Spring 2012 University of Washington HASH MAPS Hash Maps Precursors to Bloom filters Used to reduce communication while joining S Set to transmit S x 1 x 2 x n ID: 626753
Download Presentation The PPT/PDF document "Replication and Distribution" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Replication and Distribution
CSE 444
Spring 2012
University of WashingtonSlide2
HASH MAPSSlide3
Hash Maps
Precursors to Bloom filters.
Used to reduce communication while joining.
S = Set to transmit.
S = {x
1
, x
2
, …,
x
n
}
H = Hash Map.
An array of
m
bits. Slide4
Operation
To insert
x
in
H
:
C
ompute the hash on x to get a bit position
j
Set
j
to 1.
To send
S
, insert all of its elements in
H
.
Two distinct elements can hash to 1 position.
Creates false positives.Slide5
Question
Data supplier
R
has
N = 1 million
documents. Data supplier
S
also has
N = 1 million
documents. Each document
i
s
1KB
. They have
50
documents in common and they want to compute these. They will proceed as follows:
1.
R
computes a hash map
M
with
cN
bits, where
c=8
and sends it to
S
.
2
.
S
checks its items in
M
and sends all matches to
R
.
3.
R
computes the result and sends the matching
50
documents to
S
.
Q:
Indicate
the total number of bytes transferred over the network in each step.Slide6
Analysis
Recall |
H
| = m.
Insert
one
element into
H.
Probability that bit
j
remains
0?p = (1 – 1/m)
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0Slide7
Analysis
Recall |
H
| = m.
Insert
all
n
elements into
H.
Probability that bit
j remains 0?
p = (1 – 1/m)
n
= e-n/m (for large m)
0
1
0
0
1
1
0
1
0
0
0
0
1
1
0
0
0
1
0
0Slide8
Probability of False Positives
Take a random element
y
, and check if its hash is set to
1
in
H
.
Probability of FP = probability that the hash is 1.
Probability that bit
j
is 1?
p = 1 – (1 – 1/m)
n
= 1 – e-n/m (for large m)
0
1
0
0
1
1
0
1
0
0
0
0
1
1
0
0
0
1
0
0Slide9
Question
Data supplier
R
has
N = 1 million
documents. Data supplier
S
also has
N = 1 million
documents. Each document
i
s 1KB. They have
50
documents in common and they want to compute these. They will proceed as follows:
1. R computes a hash map M
with
cN
bits, where
c=8
and sends it to
S
.
2.
S
checks its items in
M
and sends all matches to
R
.
3.
R
computes the result and sends the matching
50
documents to
S
.
Indicate the total number of bytes transferred over the network in each step.Slide10
Solution
Step 1: Send the hash map.
cN
bits = 1 million bytes =
1 MB
.
Step 2: Number of matched tuples (included false positives)
FP rate = 1 – e
-n/m
= 11%
110,000 false positive documents
110,050 documents in total (including the 50 common ones)
110.05 MB
50 documents =
50KBTotal of 111.1 MB
The naïve solution without hash maps takes
1 GB
of data
transferSlide11
Distributed lockingSlide12
Setup
50%
r
ead only
2% writes
10% read only
2% writes
10% read only
2% writes
10% read only
2% writes
10% read only
2% writes
Each site can communicate with every other site.Slide13
Read-locks-one
Write-locks-all
What is the average number of inter-site messages exchanged?
All reads are local, so no locks are acquired.
Each write requires 4 other locksSlide14
Majority locking
What is the average number of inter-site messages?
2 other locks needed for both reads and writes.
What if you could broadcast across sites with 1 message?
Lock acquisition and release is 1 message for all sites
Lock grants still takes at 1 message per site.Slide15
Primary-copy locking
What is the average number of inter-site messages?
The copies need to acquire locks for each operation.
48% of the actions need locks. Slide16
Two phase commitSlide17
Two-Phase Commit
Coordinator :
0
Three subordinates :
{1, 2, 3}
Messages
P
(Prepare)
C
(Commit)
A
(Abort)Y (Yes vote)
N
(No vote)
Ignore acks.Slide18
2PC
What messages
are
exchanged
for a
successful commit
?
(0,1,P), (0,2,P), (0,3,P), (1,0,Y), (2,0,Y), (3,0,Y), (0,1,C), (0,2,C), (0,3,C)
When
exactly does
the
commit occur?When coordinator
force-wrote the
commit
record.Slide19
2PC (continued)
If
the
coordinator
has sent
all
the
prepare
messages but has not yet received a vote from site
1,
can it abort the transaction at this point, and send abort messages to the subordinates?
If
the
coordinator has sent all the prepare
messages, received a No vote from site 1, but has not yet received the votes of sites 2 and
3,
s
hould it
wait for the two missing votes, or should it proceed to
abort?
If
site 1
has received a
prepare
message and voted
Yes
, but has not received any commit or abort
messages, and
Site 1 contacts all other subordinates
and discovers that they have all voted Yes, can site 1 commit the transaction
?