Andrei Broder and Michael Mitzenmacher Presenter Chen Qian Slides credit Hongkun Yang Outline Bloom Filter Overview Standard Bloom Filters Counting Bloom Filters Historical Applications ID: 932820
Download Presentation The PPT/PDF document "Network Applications of Bloom Filters: A..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Network Applications of Bloom Filters: A Survey
Andrei
Broder
and Michael
Mitzenmacher
Presenter:
Chen Qian
Slides
credit:
Hongkun
Yang
Slide2Outline
Bloom Filter Overview
Standard Bloom Filters
Counting Bloom Filters
Historical Applications
Network Applications
Distributed Caching
P2P/Overlay Networks
Resource Routing
Conclusion
Slide3Overview
Burton Bloom introduced it in 1970s
Randomized data structure
Representing a set to support membership queries
Dramatic space savings
Allow false positives
Slide4Standard Bloom Filters: Notations
S
the set of
n
elements
{x1, x2, …,
xn
}k independent hash functions h1, …, hk with range {1, …, m}.Assume: hash functions map each item in the universe to a random number uniformly over the range {1, …, m}MD5An array B of m bits, initially filled with 0s
Slide5Standard Bloom Filters: How It Works
Hash each
xi in
S
k
times. If
Hj(xi) = 1, set B[=1.To check whether y is in S, check B at H_j(y), j = 1,2,…,k
If all k values are set to 1,
y
is assumed to be in
S,If not, y is clearly not in S.
No False Negative
Possible False Positive
Slide6Standard Bloom Filters: An Example
0
0
0
0
0
0
B
INTIAL STATE
Slide7Standard Bloom Filters: An Example
0
0
0
0
0
0
B
INSERTION
x
1
1
1
x
2
1
Slide8Standard Bloom Filters: An Example
1
0
1
0
1
0
B
CHECK
y
1
y
2
Slide9Standard Bloom Filters: False Positive Rate (1)
Pr[a
given bit in
B
is 0]=
The probability of a false positive is
Let
r be the proportion of 0 bits after all elements are inserted in the Bloom filterConditioned on r, the probability of a false positive is
Slide10Standard Bloom Filters: False Positive Rate (2)
The fraction of 0 bits is extremely concentrated around its expectation
Therefore, with high probability,
Slide11Standard Bloom Filters: Optimal Number of Hash Functions (1)
Two competing forces:
More hash functions gives more chances to find a 0 bit for an element that is not a member of
S
Fewer hash functions increases the fraction of 0 bits in the array
Slide12Standard Bloom Filters: Optimal Number of Hash Functions (2)
Slide13Standard Bloom Filters: Optimal Number of Hash Functions (3)
Note that
Let
g
=kln(1-e-kn/m)
, solve
Rewrite
g as where p Using symmetry, g is minimal when p = ½Then
Slide14Standard Bloom Filters: Space Efficiency
A lower bound
Let
e
be the false positive ratio, then
The optimal case
The false
posive rate for the optimal Bloom filter isLet f>e
Slide15Standard Bloom Filters: Operations (1)
Union
Build a Bloom filter representing the union of
A
and
B
by taking the OR of
BF(A) and BF(B)Shrinking a Bloom filterHalving the size by taking the OR of the first and the second half of the Bloom filterIncrease false positive rateThe intersection of two sets
Slide16Standard Bloom Filters: Operations (2)
The intersection of
S
1
and
S
2
The average number of 1 bits in the AND of BF(S1) and BF(S2)Z1 the number of 0 bits in BF(S1), Z2 BF(S2
), Z12 the AND of BF(S
1
)
and BF(S2)
Slide17Counting Bloom Filters: Motivation
Standard Bloom filters
Easy to insert elements
Cannot perform deletion operations
Counting Bloom filters
Each entry is not a single bit but a small counter
Insert an element: increment the corresponding counters
Delete an element: decrement the corresponding counters
Slide18Counting Bloom Filters: An Example
0
0
0
0
0
0
B
INTIAL STATE
Slide19Counting Bloom Filters: An Example
0
0
0
0
0
0
B
INSERTION
x
1
1
1
x
2
1
2
Slide20Counting Bloom Filters: An Example
1
0
1
0
1
0
B
DELETION
x
1
2
0
Slide21Countering Bloom Filters: How Large Counters Do We Need? (1)
n
elements,
k
hash functions,
m
counters, and
c(i) the count associated with the ith counterThe tail probability is bounded byThen use the union bound again
Slide22Countering Bloom Filters: How Large Counters Do We Need? (2)
4 bits per counter is enough
The maximum counter value is
O
(log
m
) with high probability, and hence O(loglog m) bits are sufficientLet j = 3ln m/ lnln m
Slide23Historical Applications
Dictionaries
Hyphenation programs
UNIX spell-checkers
Dictionary of unsuitable passwords
Databases
Semi-join operations
Differential files
Slide24Distributed Caching: Scenario
Slide25Distributed Caching: Summary Cache
Motivation
Sharing of caches among Web proxies to reduce Web traffic and alleviate network bottlenecks
Directly sharing lists of URLs has too much overhead
Solution
Use Bloom filters to reduce network traffic
Use a counting Bloom filter to track cache contents
Broadcast the corresponding standard Bloom filter to other proxies
Slide26P2P/Overlay Networks: Content Delivery
Problem
Peer
A
has a set of items
S
A
, peer B has SB, B wants useful items from A (SA-SB)SolutionB sends A its Bloom filter BF(B)A sends B its items that is not in
SB according to BF(B)Implications of false positives
Not all elements in
S
A-SB will be sentRedundant items (e.g. erasure coding)
A large fraction of SA
-SB is sufficient (not necessarily the entire set)
Slide27P2P/Overlay Networks: Efficient P2P Keyword Searching (1)
Problem
Peer
A
has a set of items
S
A
, peer B has SB, A wants to determine Solution A sends B its Bloom filter BF(A)B sends A its items that appears to be in SA according to BF(A)
B eliminates false positives and determines exactlyFewer bits transmitted than
A
sending the entire set
SA
Slide28P2P/Overlay Networks: Efficient P2P Keyword Searching (2)
1 2
3 4
3 4
5 6
S
A
S
B
Server
A
Server
B
(1) request
3 4
3 4
6
(2)
BF(A)
Client
Slide29Resource Routing (1)
Network is in the form of a rooted tree
Nodes hold resources
Each node keeps Bloom filters representing
A unified list of resources that it holds or reachable through one of its children
Individual lists of resources for it and each child.
When receiving a request for a resource
Check the unified list to see whether the node or its descendants hold the resourceYes: check the individual listsNo: forward the request up the tree toward the root
Slide30Resources Routing (2)
Slide31Conclusion
Simple space-efficient representation of a set or a list that can handle membership queries
Applications in numerous networking problem
Bloom filter principle
Slide32THANK YOU!