George Varghese UCSD visiting at Yahoo Labs 1 2 Part 1 Reconciling Sets across a link Joint with D Eppstein M Goodrich F Uyeda Appeared in SIGCOMM 2011 Motivation 1 OSPF Routing 1990 ID: 272767
Download Presentation The PPT/PDF document "Reconciling Differences: towards a theor..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reconciling Differences: towards a theory of cloud complexity
George Varghese
UCSD, visiting at Yahoo! Labs
1Slide2
2
Part 1
: Reconciling
Sets across a link
Joint with D.
Eppstein
, M. Goodrich, F.
Uyeda
Appeared in SIGCOMM 2011Slide3
Motivation 1: OSPF Routing (1990)
After partition forms and heals, R1 needs updates at R2 that arrived during partition.
3
R1
R2
M
ust solve the
Set-Difference Problem
!
Partition
healsSlide4
Motivation 2:Amazon S3 storage (2007)
Synchronizing replicas.
4
S1
S2
Set-Difference across cloud again!
Periodic Anti-entropy Protocol between replicasSlide5
What is the Set-Difference problem?
What objects are unique to host 1?
What objects are unique to host 2?
A
Host 1
Host 2
C
A
F
E
B
D
F
5Slide6
Use case 1: Data Synchronization
Identify missing data blocks
Transfer blocks to synchronize sets
A
Host 1
Host 2
C
A
F
E
B
D
F
D
C
B
E
6Slide7
Use case 2: Data De-duplication
Identify all unique blocks.
Replace duplicate data with pointers
A
Host 1
Host 2
C
A
F
E
B
D
F
7Slide8
Prior work versus ours
Trade a sorted list of keys.
Let n be size of sets, U be size of key spaceO(n log U) communication, O(n log n) computationBloom filters can improve to O(n) communication.Polynomial Encodings (
Minsky ,Trachtenberg)Let “d” be the size of the difference
O(d log U) communication, O(dn+d3
) computationInvertible Bloom Filter (our result)O(d log U)
communication
,
O
(
n+d
)
computation
8Slide9
Difference Digests
Efficiently solves the set-difference problem.
Consists of two data structures:Invertible Bloom Filter (IBF)Efficiently computes the set difference.Needs the size of the differenceStrata Estimator
Approximates the size of the set difference.Uses IBF’s as a building block.
9Slide10
IBFs: main idea
Sum over random subsets:
Summarize a set by “checksums” over O(d) random subsets.Subtract: Exchange and subtract checksums.
Eliminate: Hashing for subset choice common elements disappear after subtraction
Invert fast: O(d) equations in d unknowns; randomness allows expected O(d) inversion.
10Slide11
“Checksum” details
Array of IBF cells that form “checksum” words
For set difference of size d, use αd cells (
α > 1)Each element ID is assigned to many IBF cellsEach cell contains:
11
idSum
XOR of all IDs assigned to cell
hashSum
XOR of hash(ID) of
IDs assigned to
cell
count
Number of IDs assigned to cellSlide12
IBF Encode
A
idSum
⊕
AhashSum
⊕
H(A)
count++
idSum
⊕
A
hashSum
⊕
H(A)count++
idSum
⊕
A
hashSum
⊕
H(A)
count++
Hash1
Hash2
Hash3
B
C
Assign ID to many cells
12
IBF:
αd
“Add” ID to cell
Not O(n), like Bloom Filters!
All hosts use the same hash functionsSlide13
Invertible Bloom Filters (IBF)
Trade
IBF’s
with remote host
A
Host 1
Host 2
C
A
F
E
B
D
F
IBF 2
IBF 1
13Slide14
Invertible Bloom Filters (IBF)
“Subtract” IBF structures
Produces a new IBF containing only unique objects
A
Host 1
Host 2
C
A
F
E
B
D
F
IBF 2
IBF 1
IBF (2 - 1)
14Slide15
IBF Subtract
15Slide16
Disappearing act
After subtraction, elements common to both sets disappear because:
Any common element (e.g W) is assigned to same cells on both hosts (same hash functions on both sides)
On subtraction, W XOR W = 0. Thus, W vanishes.While elements in set difference remain, they may be randomly mixed
need a decode procedure.
16Slide17
IBF Decode
17
H(V ⊕ X ⊕ Z)
≠H(V) ⊕ H(X) ⊕ H(Z)
Test for Purity:
H(
idSum
)
H
(
idSum
)
=
hashSum
H(V) = H(V
)Slide18
IBF Decode
18Slide19
IBF Decode
19Slide20
IBF Decode
20Slide21
21
Small
Diffs
:
1.4x – 2.3x
Large Differences:
1.25x - 1.4x
How many IBF cells?
Space Overhead
Set Difference
Hash
Cnt
3
Hash
Cnt
4
Overhead to decode at >99%
αSlide22
How many hash functions?
1 hash function produces many pure cells initially but nothing to undo when an element is removed.
22
A
B
CSlide23
How many hash functions?
1 hash function produces many pure cells initially but nothing to undo when an element is removed.
Many (say 10) hash functions: too many collisions.
23
A
A
B
C
B
C
A
A
B
B
C
CSlide24
How many hash functions?
1 hash function produces many pure cells initially but nothing to undo when an element is removed.
Many (say 10) hash functions: too many collisions.We find by experiment that 3 or 4 hash functions works well. Is there some theoretical reason?
24
A
A
B
C
C
A
B
B
CSlide25
Theory
Let
d = difference size, k = # hash functions.
Theorem 1: With (k + 1) d
cells, failure probability falls exponentially with k. For k = 3, implies a
4x tax on storage, a bit weak.[Goodrich,Mitzenmacher
]
: Failure is equivalent to finding a 2-core (loop) in a random
hypergraph
Theorem 2:
With
c
k
d
, cells, failure probability falls exponentially with k.
c4 = 1.3x tax, agrees with experiments25Slide26
26
Large Differences:
1.25x - 1.4x
Recall experiments
Space Overhead
Set Difference
Hash
Cnt
3
Hash
Cnt
4
Overhead to decode at >99%Slide27
Connection to Coding
Mystery:
IBF decode similar to peeling procedure used to decode Tornado codes. Why?
Explanation: Set Difference is equivalent to coding with insert-delete channels
Intuition: Given a code for set A, send
checkwords only to B. Think of B as a corrupted form of A.
Reduction:
If code can correct D insertions/deletions, then B can recover A and the set difference.
27
Reed Solomon
<---> Polynomial Methods
LDPC (Tornado)
<---> Difference Digest
Slide28
Random Subsets
Fast Elimination28
Sparse
X + Y + Z = . .
αd
X = . .
Y = . .
Pure
Roughly upper triangular and sparse Slide29
Difference Digests
Consists of two data structures:Invertible Bloom Filter (IBF)Efficiently computes the set difference.
Needs the size of the differenceStrata EstimatorApproximates the size of the set difference.Uses IBF’s as a building block.
29Slide30
Strata Estimator
A
Consistent
Partitioning
B
C
30
~1/2
~1/4
~1/8
1/16
IBF 1
IBF 4
IBF 3
IBF 2
Estimator
Divide keys into sampled subsets containing ~1/2
k
Encode each subset into an IBF of small
fixed size
l
og(n) IBF’s of ~20 cells eachSlide31
4x
Strata Estimator
31
IBF 1
IBF 4
IBF 3
IBF 2
Estimator 1
Attempt to subtract & decode
IBF’s
at each level.
If level
k
decodes, then return:
2
k
x
(the number of
ID’s
recovered)
…
IBF 1
IBF 4
IBF 3
IBF 2
Estimator 2
…
Decode
Host 1
Host 2Slide32
KeyDiff
Service
Promising Applications:
File Synchronization
P2P file sharing
Failure Recovery
Key Service
Key Service
Key Service
Application
Application
Application
Add( key )
Remove( key )
Diff( host1, host2 )
32Slide33
Difference Digest Summary
Strata Estimator
Estimates Set Difference.For 100K sets, 15KB estimator has <15% errorO(log n) communication, O(n) computation.Invertible Bloom FilterIdentifies all
ID’s in the Set Difference.16 to 28 Bytes per ID in Set Difference.O(d) communication, O(
n+d) computationWorth it if set difference is < 20% of set sizes
33Slide34
Connection to Sparse Recovery?
If we forget about subtraction, in the end we are recovering a d-sparse vector.
Note that the hash check is key for figuring out which cells are pure after differencing.Is there a connection to compressed sensing. Could sensors do the random summing? The hash summing?Connection the other way:
could use compressed sensing for differences?
34Slide35
Comparison with Information Theory and Coding
Worst case complexity versus average
It emphasize communication complexity not computation complexity: we focus on both.Existence versus Constructive: some similar settings (Slepian-Wolf) are existentialEstimators: We want bounds based on difference and so start by efficiently estimating difference.
35Slide36
Aside: IBFs in Digital Hardware
36
a , b, x, y
Stream of set elements
Logic (Read, hash, Write)
Bank 1
Bank 2
Bank 3
Hash 1
Hash 2
Hash 3
Hash to separate banks for parallelism, slight cost in space needed. Decode in software
Strata HashSlide37
37
Part 2
: Towards
a theory of Cloud Complexity
?
O1
O3
O2
Complexity of reconciling “similar” objects? Slide38
38
Example: Synching Files
?
Measures: Communication bits, computation
X.ppt.v3
X.ppt.v2
X.ppt.v1Slide39
39
So far
: Two
sets, one link, set difference
{
a,b,c
}
{
d,a,c
}Slide40
40
Mild Sensitivity Analysis
: One
set much larger than
other
?
Set A
Set B
Small difference d
(|A|) bits needed, not O (d) :
Patrascu
2008
Simpler proof: DKS 2011Slide41
41
Asymmetric set difference in
LBFS File System (
Mazieres
)
?
File A
Chunk Set B at Server
1 chunk difference
LBFS sends all chunk hashes in File A: O|A|
C1
C2
C3
C97
C98
C99
C1
C5
C3
C97
C98
C99
. . .
. . .
File BSlide42
42
More Sensitivity Analysis
:
small
intersection:
d
atabase
j
oins
?
Set A
Set B
Small intersection d
(|A|) bits needed, not O (d) : Follows from results on hardness of set
disjointness
Slide43
43
Sequences under Edit Distance
(Files for example)
?
File A
File B
Edit distance 2
Insert/delete can renumber all file blocks . . .
A
B
C
D
E
F
A
C
D
E
F
GSlide44
44
Sequence reconciliation
(with J.
Ullman
)
File A
File B
Edit distance 1
Send 2d+1 piece hashes. Clump unmatched pieces and
recurse
.
O( d log (N) )
A
B
C
D
E
F
A
C
D
E
F
H1
H2
H3
H2
H3
2Slide45
21 years of Sequence Reconciliation!
Schwartz,
Bowdidge, Burkhard (1990): recurse on unmatched pieces, not
aggregate.Rsync: widely used tool that breaks file into roughly piece hashes, N is file length.
45
UCSD, Lunch
Princeton, kidsSlide46
46
Sets on
graphs?
{
a,b,c
}
{
d,c,e
}
{
b,c,d
}
{
a,f,g
}Slide47
47
Generalizes rumor spreading which has disjoint
singleton
sets
{a}
{d}
{b}
{g}
CLP10,G11,: O( E n log n /conductance) Slide48
48
Generalized Push-Pull
(with N.
Goyal
and
R.
Kannan
)
{
a,b,c
}
{
d,c,e
}
{
b,c,d
}
Pick random edge
Do 2 party set reconciliation
Complexity: C + D, C as before, D = Sum (U – S )
i
iSlide49
49
Sets on
Steiner graphs?
{a} U S
{b} U S
R1
Only terminals need sets. Push-pull wasteful! Slide50
Butterfly example for Sets
50
S2
S1
S1
D = Diff(S1 ,S2)
S2
D
D
Set difference instead of XOR within network
S1
X
YSlide51
How does reconciliation on Steiner graphs relate to network coding?
Objects in general, not just bits.
Routers do not need objects but can transform/code objects.What transformations within network allow efficient communication close to lower bound?
51Slide52
52
Sequences with d mutations:
VM code pages (with
Ramjee
et al)
?
VM A
VM B
2 “errors”
Reconcile Set A = {(A,1)(B,2),(C,3),(D,4),(E,5)} and Set B = {(A,1),(X,2),(C,3),(D,4),(Y,5)}
A
B
C
D
E
A
X
C
D
YSlide53
Twist: IBFs for error correction?
(with M.
Mitzenmacher)
Write message M[1..n] of n words as set S = {(M[1],1), (M[2], 2), . . (M[n], n)}. Calculate IBF(S) and transmit M, IBF(S)
Receiver uses received message M’ to find IBF(S’); subtracts from IBF’(S) to locate errors.Protect IBF using Reed-Solomon or redundancyWhy: Potentially O(e) decoding for e errors -- Raptor codes achieve this for
erasure channels.
53Slide54
The Cloud Complexity Milieu
54
2
Node Graph
Steiner Nodes
Sets (Key,values
)
EGUV11
GKV11
?
Sequence, Edit Distance
(Files)
SBB90
?
?
Sequence,
errors only
(VMs)
MV11 ?
?Sets of sets (database tables)
?
?
?
Streams
(movies)
?
?
?
Other dimensions: approximate, secure, . . .Slide55
Conclusions: Got Diffs?
Resiliency and fast recoding of random sums
set reconciliation; and error correction?Sets on graphsAll terminals: generalizes rumor spreading
Routers,terminals: resemblance to network coding.Cloud complexity: Some points covered, many remainPractical, may be useful to synch devices across cloud.
55Slide56
Comparison to Logs/Incremental Updates
IBF work with no prior context.Logs work with prior context, BUTRedundant information when sync’ing with multiple parties.
Logging must be built into system for each write.Logging adds overhead at runtime.Logging requires non-volatile storage.Often not present in network devices.
56
IBF’s may out-perform logs when:
Synchronizing multiple parties
Synchronizations happen infrequently