/
Hash Tables: Hash Tables:

Hash Tables: - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
423 views
Uploaded On 2016-06-22

Hash Tables: - PPT Presentation

Linear Probing Uri Zwick Tel Aviv University Hashing with open addressing Uniform probing Insert key in the first free position among   Sometimes assumed to be a permutation To search follow the same order ID: 373361

hash independent probing probes independent hash probes probing table search expected items functions number thorup independence linear probability tra

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hash Tables:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Hash Tables:Linear Probing

Uri Zwick

Tel Aviv UniversitySlide2

Hashing with open addressing“Uniform probing”

Insert key

in the first free position among

 

(Sometimes) assumed to be a

permutation

To search, follow the same order

Table is

not full  Insertion succeeds

Hash table of size

 Slide3

Linear probing“The most important hashing technique”

But,

many fewer

cache

misses

More

probes

than uniform

probing due to

clustering

:

long runs tend to get longer and merge with other runs

How do we analyze it?

Which hash functions should we use?

Extremely efficient in practiceSlide4

Order of insertions

Theorem:

The set of occupied cell and the total

number of probes done while inserting a set of items

into a hash table using linear probing does

not

depend

on the

order

in which the items are inserted

Exercise:

Prove the theorem

Exercise:

Is the same true for uniform probing?Slide5

Number of probes

Exercise:

Show that if, after inserting

items into a

table of size

,

the occupied cells in the table form

runs of length

, where

,

then the

expected number of probes in an

unsuccessful

search, assuming the searched key is mapped into a

uniformly random location in the table, is

 

Exercise:

What are the smallest and largest possible

total number of probes needed to construct

a hash table that contain runs of length

?

 Slide6

Probabilistic analysis of uniform probing[Petersen (1957)]

– number of elements in table

 

– size of hash table

 

Uniform probing:

for

every

,

is

random

permutation,

independent of all other permutations

 

– load

factor (Note:

)

 

Expected no. of probes in a

successful

search is at most

Expected no. of probes in an

unsuccessful

search of a

random

item is at mostSlide7

Claim: Expected no. of probes

in

an unsuccessful search is at most:

The probability

that

a random cell is occupied

is

 

The probability that the first

cells probed

are all occupied is at most

 

Probabilistic analysis of uniform probing

[Petersen (1957)]

Exercise:

Do the calculation more carefully and show that the expected no. of probes in an unsuccessful search is exactly

 Slide8

Probabilistic analysis of linear probing [Knuth (1962)]

Random hash function:

for

every

,

is uniformly distributed,

independent

of all other

,

for

 

– load

factor

(

)

 

Expected no. of probes in an

unsuccessful

search is at most

Expected no. of probes in a

successful

search of a

random

item is at mostSlide9

Expected number of probesAssuming random hash functions

Successful

Search

Unsuccessful

Search

Uniform

Probing

Linear

Probing

When, say,

, all small constants

 Slide10

Expected number of probes

0.5Slide11

11

What is the probability that

is empty?

 

Probabilistic analysis of linear probing

[Knuth (1962)]

– number of elements in table

 

– size of hash table

 

By symmetry, all cells are

equally likely to be emptySlide12

12

What is the probability that

empty

,

occupied?

 

Exactly

items should be mapped to

and

items should be mapped to

 

Given that

items are mapped to

,

should remain empty

 

Given that

items are mapped to

,

should remain empty

 

0

1

2

 

 Slide13

13

0

1

2

 

 

What is the probability that

empty

,

occupied?

 

is the probability that a run of

size exactly

starts at a given position

 

Exercise:

,

 Slide14

14

What is the probability that

an unsuccessful search encounters exactly occupied cells?

 

0

1

2

 

Interesting to note that

 Slide15

15

The

expected no. of probes in an unsuccessful

search, which is also the expected no. of probes needed to insert the

-

st

item is

 

 Slide16

16

Ex. 6.4.27

Knuth, Vol.

3

 Slide17

Abel’s binomial theorem

(see Knuth Eq. 1.2.6-(16))

 Slide18

 

 

 

 

Unsuccessful search

The birth of Knuth’s style

Analysis of Algorithms

…Slide19

Successful search / Construction time

The expected number of probes in a

search of randomly selected item is

 

The expected number of probes in

the construction of the table is

 Slide20

20The “parking problem” [Knuth (1962)] [

Konheim-Weiss (1966)]

A one-way street contains

parking spots

 

cars arrive, one after the other

 

The

-

th

car chooses a random number

between

1

and

and parks in the first free spot

at or after location

,

if there is one

 

Exercise:

What is the probability that all cars find a parking spot?Slide21

In practice, we cannot use

a truly random hash function

Does

linear probing

still have a

constant expected time per operation

when more realistic hash functions are used?

For

chaining

, 2-independence,

or just “universality”, was enough

Linear Probing: Theory vs. Practice

How much independence is

needed for

linear probing

?Slide22

5-independence suffices for linear probing!

[

Pagh-Pagh-Rŭzíc

(2009)]

4-independence does not suffice!

[

Pătraşcu

-Thorup

(2010)]

Linear Probing: Theory vs. PracticeSlide23

-independence

 

Definition:

are

-

independent

iff

for every

distinct

,

are independent

 

Definition:

are independent

iff

for every

, we have

 

 Slide24

Families of -

independent hash functions

 

Let

be a family of hash functions from

to

.

is

-

independent

iff

for every distinct

,

are

independent

, when

is chosen at random from

 

If

is

-

independent

and

,

for some function

, then

is also

-

independent

 

We usually require that for every

,

is (almost)

uniformly distributed

on

 Slide25

Polynomial hash functions

Lemma:

If is a field, then

is a

-independent family of hash functions

 

Corollary:

If

is a prime, and

is arbitrary, then

is a

-independent family of hash functions

 

When

,

is almost uniformly

distributed on

 Slide26

 

distinct

(not necessarily distinct)

 

 

 

Unique solution!

Polynomial hash functionsSlide27

Vandermonde Determinant

 Slide28

Tabulation-based hash functions[Carter-Wegman (1979)]

[

Pătraşcu-Thorup (2010)]

 

 

 

 

may be implemented

using small

look-up tables

 

Very efficient in practiceSlide29

Tabulation-based hash functions[Carter-Wegman (1979)]

[

Pătraşcu-Thorup (2010)]

 

If

are independently chosen from a

uniform

2

-independent family, then

is

2

-independent

 

If

are independently chosen from a

uniform

3

-independent family, then

is

3

-independent

 

Not

4

-independent!

 Slide30

Tabulation-based hash functions[Thorup-Zhang

(2012)]

 

If

are independently chosen from a

5

-independent family, then

is

5

-independent

 

Higher independence possible at

the cost of more table look-upsSlide31

31Linear probing with bounded independence

[

Pagh-Pagh-Rŭzíc (2009)][Pătraşcu-Thorup

(2010

)]

5

4

3

2

Independence

Search

time

Construction time

5

4

3

2

Independence

Search

time

Construction time

Upper bounds hold for

any

set of keys

and

any

family with the specified independence

Lower bounds hold for

some

sets of keys

and

some

families with the specified independenceSlide32

Balls in Bins

Throw

balls randomly into

bins

 

All throws are uniform and (partially-)independentSlide33

Balls in Bins

Throw

balls randomly into

bins

 

Let

be the number of balls

that fall into a specific bin, e.g., the first

 

Let

be

1

if the

-

th

ball falls into

the specific bin, and

0

otherwise

 

We want to bound the probability that

is large

 Slide34

Tail bounds

Markov’s inequality:

If

 

Chebyshev’s inequality:

 

Higher (even) moments:

 Slide35

Tail bounds

Chernoff bound:

If

are

independent

indicators,

,

, and

, then

 

Chernoff

bound

is stronger.

But it requires

complete

independence

.

Proof:

Apply

Markov’s inequality

to

and choose

 Slide36

Computing moments

 

 

 

 

 

 

 

 

] =

 

 

 Slide37

Computing moments

If

are

-

independent,

then so are

 

]

 

If

are distinct, then

 

]

 

If

differs from

, then

 

]

 

If

, then

 Slide38

Computing moments

If

are

-

independent

 

 

 

 

 Slide39

Computing moments

If

are

-

independent

 

 

 

 

If

are

-

independent,

where

and

, then

 

 

Why?

(We only need 4-

th

moments)Slide40

Planting a binary treeSlide41

Crowded

nodes

[

Pătraşcu-Thorup

(2010)]

is a power of

2

 

A node at height

corresponds

to

consecutive cells in the table

 

 

A node at height

is

crowded

, if at least

items are mapped into its interval

 

Simplifying assumptions:

The final locations of items mapped

into an interval may be outside the intervalSlide42

 

Simple observation I

 

 

 Slide43

 

 

 

 

Simple observation IISlide44

Main observation[Pătraşcu-Thorup (2010)]

1

 

 

2

3

4

 

Consider a run of length

, where

 

At least one of the first four nodes at level

whose last cell belongs to the run is

crowded

 Slide45

Proof of main observation

1

 

 

2

3

4

 

Just before the run, there is an empty cell.

Thus, if

1

is not

crowded

, it contributes

less than

items to the run

 

If

2,3,4

are not

crowded

, then each of their

intervals can absorb at least

items

 

Thus, if none of

1,2,3,4

is

crowded

, the run ends at or

before the interval of

4

and its length is less than

 Slide46

Probability of being crowded

Assume that

 

Consider a node at height

 

Throwing

balls into

bins

 

 

 

 Slide47

Construction time[Pătraşcu-Thorup (2010)]

Let

, where

, be the length of the

consecutive runs in the table after inserting the

items

 

The cost of the construction is at most

 

Runs of length

contribute only

 

By the main observation, if

, then

at least one of the first four nodes at level

whose last cell is in the run is

crowded

.

Each node corresponds to at most one run.

 

 Slide48

 

 

If

, we get

 

If

, we get

)

 

Construction time

[

Pătraşcu-Thorup

(2010)]Slide49

Query time (successful/unsuccessful)[Pătraşcu-Thorup (2010)]

If

is in a run of length

then the search time is

 

If

is in a run of length

then at least one of

12

nodes at height

associated with

is

crowded

 

,

 

 Slide50

Query time (successful/unsuccessful)[Pătraşcu-Thorup (2010)]

 

- The independence after

conditioning

on the hash value of the key searched

 

 

If

, we get

 

If

, we get

)

 

If

, we get

 Slide51

Why 12?

1

2

3

4

The constant

12

itself, of course, if

not

too important.

The important thing is that it

is

a constant

5

6

7

8

9

10

11

12

 

Search position