/
Cuckoo Filter: Cuckoo Filter:

Cuckoo Filter: - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
463 views
Uploaded On 2015-11-18

Cuckoo Filter: - PPT Presentation

Practically Better Than Bloom Bin Fan CMUGoogle David Andersen CMU Michael Kaminsky Intel Labs Michael Mitzenmacher Harvard 1 What is Bloom Filter A Compact Data Structure ID: 196799

hash cuckoo filter bloom cuckoo hash bloom filter space fingerprint hashing false positive bucket lookup table fingerprints item rate

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Cuckoo Filter:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Cuckoo Filter: Practically Better Than Bloom

Bin Fan (CMU/Google)David Andersen (CMU)Michael Kaminsky (Intel Labs)Michael Mitzenmacher (Harvard)

1Slide2

What is Bloom Filter? A Compact Data Structure Storing Set-membership

Bloom Filters answer “is item x in set Y ” by:“definitely no”, or“probably yes” with probability ε to be wrong

Benefit: not always precise but highly compact

Typically a few bits per item

Achieving lower

ε (more accurate) requires spending more bits per item

2

false positive rateSlide3

Example Use: Safe Browsing3

www.binfan.comLookup(“www.binfan.com”)

No!

Known Malicious U

RLs

Stored in Bloom Filter

Scale to millions URLs

Remote

Server

Please verify

www.binfan.com

It is Good!

Probably Yes!Slide4

Bloom Filter Basics

A Bloom Filter consists of m bits and k hash functions Example: m = 10, k = 3 4

0

0

0

0

0

0

0

0

0

0

Insert(x)

hash

1

(x)

hash

2

(x)

hash

3

(x)

1

0

0

0

0

0

1

0

1

0

1

0

0

0

0

0

1

0

1

0

Lookup(y)

hash

1

(y)

hash

2

(y)

hash

3

(y)

=

not foundSlide5

High Performance

Low Space CostDelete SupportBloom Filter

Counting Bloom Filter

Quotient Filter

5

Succinct Data Structures for

Approximate Set-membership Tests

Can we achieve all three in practice?

✗Slide6

OutlineBackground

Cuckoo filter algorithmPerformance evaluationSummary6Slide7

Basic Idea: Store Fingerprints in Hash Table7

Fingerprint(x): A hash value of xLower false positive rate ε, longer fingerprint

FP(a)

0:

1:

2:

3:

FP(c)

FP(b)

5:

6:

7:

4:Slide8

Basic Idea: Store Fingerprints in Hash Table8

Fingerprint(x): A hash value of xLower false positive rate ε, longer fingerprintInsert(x): add Fingerprint(x) to hash table

FP(a)

0:

1:

2:

3:

FP(c)

FP(b)

5:

6:

7:

4:

FP(x)Slide9

Basic Idea: Store Fingerprints in Hash Table9

Fingerprint(x): A hash value of xLower false positive rate ε, longer fingerprintInsert(x): add Fingerprint(x) to hash tableLookup(x): search Fingerprint(x) in

hashtable

FP(a)

0:

1:

2:

3:

FP(c)

FP(b)

5:

6:

7:

4:

FP(x)

Lookup(x)

=

foundSlide10

Basic Idea: Store Fingerprints in Hash Table10

Fingerprint(x): A hash value of xLower false positive rate ε, longer fingerprintInsert(x): add Fingerprint(x) to hash tableLookup(x): search Fingerprint(x) in hashtable

Delete(x): remove

Fingerprint

(x) from

hashtable

FP(a)

0:

1:

2:

3:

FP(c)

FP(b)

5:

6:

7:

4:

FP(x)

Delete(x)

How to Construct

Hashtable

?Slide11

11Perfect hashing: maps all items with no collisions

FP(e)

FP(c)

FP(d)

FP(b)

FP(f)

FP(a)

{a, b, c, d, e, f}

f(x)

(Minimal) Perfect Hashing:

No Collision but Update is Expensive

StrawmanSlide12

Perfect hashing: maps all items with no collisions

Changing set must recalculate f  high cost/bad performance of update12{a, b, c, d, e, f}

f(x)

(Minimum) Perfect Hashing:

No Collision but

Update is

Expensive

{a, b, c, d, e,

g

}

f(x) = ?

Strawman

FP(e)

FP(c)

FP(d)

FP(b)

FP(f)

FP(a)Slide13

Convention Hash Table: High Space Cost

Chaining :Pointers low space utilizationLinear Probing

Making lookups O(1) requires

large % table empty

low

space utilizationCompare multiple fingerprints

sequentially

more false positives

13

bkt1

bkt2

bkt3

FP(a)

bkt0

Strawman

FP(c)

FP(d)

FP(a)

Lookup(x)

Lookup(x)

FP(c)

FP(f)Slide14

Cuckoo Hashing[Pagh2004]

Good But ..High Space Utilization4-way set-associative table: >95% entries occupiedFast Lookup: O(1)14

0:

1:

2:

3:

5:

6:

7:

4:

hash

2

(x)

Standard

cuckoo

hashing

doesn’t work

with

fingerprints

[Pagh2004

]

Cuckoo hashing.

lookup(x)

hash

1

(x)Slide15

15Standard

Cuckoo Requires Storing Each Item

b

0:

1:

2:

3:

c

a

5:

6:

7:

4:

Insert(x)

h

1

(x)

h

2

(x)Slide16

16Standard

Cuckoo Requires Storing Each Item

b

0:

1:

2:

3:

c

x

5:

6:

7:

4:

Insert(x)

Rehash a: alternate(a) = 4

Kick a to bucket 4

h

2

(x)Slide17

17Standard

Cuckoo Requires Storing Each Item

b

0:

1:

2:

3:

a

x

5:

6:

7:

4:

Insert(x)

Rehash a: alternate(a) = 4

Kick a to bucket 4

Rehash c: alternate(c) = 1

Kick c to bucket 1

h

2

(x)Slide18

18Standard

Cuckoo Requires Storing Each Item

c

b

0:

1:

2:

3:

a

x

5:

6:

7:

4:

Insert(x)

Insert complete

(or fail if

MaxSteps

reached)

Rehash a: alternate(a) = 4

Kick a to bucket 4

Rehash c: alternate(c) = 1

Kick c to bucket 1

h

2

(x)Slide19

Challenge: How to Perform Cuckoo?Cuckoo hashing requires rehashing and displacing existing items

19With only fingerprint, how to calculate

item’s alternate bucket?

FP(b)

0:

1:

2:

3:

FP(c)

FP(a)

5:

6:

7:

4:

Kick FP(a) to which bucket?

Kick FP(c) to which bucket?Slide20

We Apply Partial-Key CuckooStandard Cuckoo Hashing:

two independent hash functions for two bucketsPartial-key Cuckoo Hashing: use one bucket and fingerprint to derive the other [Fan2013]To displace existing fingerprint:20

Solution

bucket1 = hash(x)

bucket2 = bucket1 hash(FP(x))

bucket1 = hash

1

(x)

bucket2 = hash

2

(x)

alternate(x) = current(x) hash(FP(x))

[Fan2013

] MemC3: Compact and Concurrent

MemCache

with

Dumber Caching and Smarter HashingSlide21

Partial Key Cuckoo HashingPerform cuckoo hashing on fingerprints

21Solution

FP(b)

0:

1:

2:

3:

FP(c)

FP(a)

5:

6:

7:

4:

Kick FP(a) to “6 hash(FP(a))”

Kick FP(c)

to

“4

hash(FP

(c))”

Can we still achieve high space utilization with partial-key cuckoo hashing?Slide22

Fingerprints Must Be “Long” for Space Efficiency

Fingerprint must be Ω(logn/b) bits in theoryn: hash table size, b: bucket size

see more analysis in paper

22

When fingerprint > 5 bits, high table space utilization

Table Space Utilization

Table size: n=128

million entriesSlide23

Semi-Sorting: Further Save 1 bit/item

Based on observation:A monotonic sequence of integers is easier to compress[Bonomi2006]Semi-Sorting:Sort fingerprints sorted in each bucketCompress sorted fingerprints+ For 4-way bucket, save one bit per item

-- Slower lookup / insert

23

21

97

88

04

fingerprints

in a bucket

04

21

88

97

Sort

fingerprints

Easier to compress

[Bonomi2006]

Beyond Bloom filters: From approximate membership checks to

ap

- proximate state machines.Slide24

Space Efficiency

24 ε: target false positive ratebits per item to achieve ε

Lower bound

More Space

More False PositiveSlide25

Space Efficiency

25 ε: target false positive ratebits per item to achieve ε

Bloom filter

Lower bound

More Space

More False PositiveSlide26

Space Efficiency

26 ε: target false positive ratebits per item to achieve ε

Cuckoo filter

Bloom filter

Lower bound

More Space

More False PositiveSlide27

Space Efficiency 27

ε: target false positive ratebits per item to achieve εCuckoo filter +

semisorting

more compact than

Bloom

filter at 3%

C

uckoo filter

Bloom filter

Lower bound

More Space

More False PositiveSlide28

OutlineBackground

Cuckoo filter algorithmPerformance evaluationSummary28Slide29

EvaluationCompare cuckoo filter with

Bloom filter (cannot delete)Blocked Bloom filter [Putze2007] (cannot delete) d-left counting Bloom filter [Bonomi2006]Cuckoo filter + semisortingMore in the paperC++ implementation, single threaded

29

[Putze2007

]

Cache-, hash- and space- efficient bloom filters.[Bonomi2006] Beyond Bloom filters: From approximate membership checks to approximate state machines.Slide30

Lookup Performance (MOPS) 30

CuckooCuckoo +semisort

d-left countingBloom

blocked

Bloom

(no deletion)

Bloom

(no deletion)Slide31

Lookup Performance (MOPS) 31

CuckooCuckoo +semisort

blockedBloom

(no deletion)

Bloom

(no deletion)

d-left counting

BloomSlide32

Lookup Performance (MOPS) 32

CuckooCuckoo +semisort

blockedBloom

(no deletion)

Bloom

(no deletion)

d-left counting

BloomSlide33

Lookup Performance (MOPS) 33

Cuckoo filter is among the fastest regardless workloads.CuckooCuckoo +

semisort

blocked

Bloom

(no deletion)

Bloom

(no deletion)

d-left counting

BloomSlide34

Insert Performance (MOPS)

34Cuckoo filter has decreasing insert rate, but overall is only slower than blocked Bloom filter.

Cuckoo

Blocked Bloom

d-left Bloom

Cuckoo +

semisorting

Standard BloomSlide35

SummaryCuckoo filter, a Bloom filter replacement:

Deletion supportHigh performanceLess Space than Bloom filters in practice Easy to implementSource code available in C++:https://github.com/efficient/cuckoofilter

35