/
Bloom-Filters S.Sioutas CEID@UPATRAS Bloom-Filters S.Sioutas CEID@UPATRAS

Bloom-Filters S.Sioutas CEID@UPATRAS - PowerPoint Presentation

anastasia
anastasia . @anastasia
Follow
66 views
Uploaded On 2023-10-28

Bloom-Filters S.Sioutas CEID@UPATRAS - PPT Presentation

Bloom Filters Lookup questions Does item x exist in a set or multiset Data set may be very big or expensive to access Filter lookup questions with negative results before accessing data ID: 1025928

bits bloom set filter bloom bits filter set data false hash encoding functions smaller error optimal errors positive negative

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bloom-Filters S.Sioutas CEID@UPATRAS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Bloom-FiltersS.SioutasCEID@UPATRAS

2. Bloom FiltersLookup questions: Does item “x” exist in a set or multiset?Data set may be very big or expensive to access. Filter lookup questions with negative results before accessing data.Allow false positive errors, as they only cost us an extra data access.Don’t allow false negative errors, because they result in wrong answers.

3. Bloom Filter [B70]Encoding an attribute aUMaintain a Bit Vector V of size mUse k hash functions (h1..hk) , hi: U[1..m]Encoding: For item x, “turn on” bits V[h1(x)]..V[hk(x)].Lookup: Check bits V[h1(i)]..V[hk(i)] . If all equal 1, return “Probably Yes”. Else “Definitely No”.

4. Bloom Filter010001010000010xh1(x)h2(x)hk(x)V0Vm-1h3(x)

5. Bloom Errors010001010000010h1(x)h2(x)hk(x)V0Vm-1h3(x)abcdx didn’t appear, yet its bits are already set

6. Error EstimationAssumption: Hash functions are perfectly randomProbability of a bit being 0 after hashing all n elements:Let p=e-kn/m. The probability of a false positive is:Assuming we are given m and n, the optimal k is:

7. Bloom Filter TradeoffsThree factors: m,k and n.Normally, n and m are given, and we select k.Small kLess computations.Actual number of bits accessed (nk) is smaller, so the chance of a “step over” is smaller too.However, less bits need to be stepped over to generate an error.For big k, the exact opposite holds.Not surprisingly, when k is optimal, the “hit ratio” (ratio of bits flipped in the array) is exactly 0.5

8. ΤΕΛΟΣ