Linear Probing Uri Zwick Tel Aviv University Hashing with open addressing Uniform probing Insert key in the first free position among Sometimes assumed to be a permutation To search follow the same order ID: 373361
Download Presentation The PPT/PDF document "Hash Tables:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Hash Tables:Linear Probing
Uri Zwick
Tel Aviv UniversitySlide2
Hashing with open addressing“Uniform probing”
Insert key
in the first free position among
(Sometimes) assumed to be a
permutation
To search, follow the same order
Table is
not full Insertion succeeds
Hash table of size
Slide3
Linear probing“The most important hashing technique”
But,
many fewer
cache
misses
More
probes
than uniform
probing due to
clustering
:
long runs tend to get longer and merge with other runs
How do we analyze it?
Which hash functions should we use?
Extremely efficient in practiceSlide4
Order of insertions
Theorem:
The set of occupied cell and the total
number of probes done while inserting a set of items
into a hash table using linear probing does
not
depend
on the
order
in which the items are inserted
Exercise:
Prove the theorem
Exercise:
Is the same true for uniform probing?Slide5
Number of probes
Exercise:
Show that if, after inserting
items into a
table of size
,
the occupied cells in the table form
runs of length
, where
,
then the
expected number of probes in an
unsuccessful
search, assuming the searched key is mapped into a
uniformly random location in the table, is
Exercise:
What are the smallest and largest possible
total number of probes needed to construct
a hash table that contain runs of length
?
Slide6
Probabilistic analysis of uniform probing[Petersen (1957)]
– number of elements in table
– size of hash table
Uniform probing:
for
every
,
is
random
permutation,
independent of all other permutations
– load
factor (Note:
)
Expected no. of probes in a
successful
search is at most
Expected no. of probes in an
unsuccessful
search of a
random
item is at mostSlide7
Claim: Expected no. of probes
in
an unsuccessful search is at most:
The probability
that
a random cell is occupied
is
The probability that the first
cells probed
are all occupied is at most
Probabilistic analysis of uniform probing
[Petersen (1957)]
Exercise:
Do the calculation more carefully and show that the expected no. of probes in an unsuccessful search is exactly
Slide8
Probabilistic analysis of linear probing [Knuth (1962)]
Random hash function:
for
every
,
is uniformly distributed,
independent
of all other
,
for
– load
factor
(
)
Expected no. of probes in an
unsuccessful
search is at most
Expected no. of probes in a
successful
search of a
random
item is at mostSlide9
Expected number of probesAssuming random hash functions
Successful
Search
Unsuccessful
Search
Uniform
Probing
Linear
Probing
When, say,
, all small constants
Slide10
Expected number of probes
0.5Slide11
11
What is the probability that
is empty?
Probabilistic analysis of linear probing
[Knuth (1962)]
– number of elements in table
– size of hash table
By symmetry, all cells are
equally likely to be emptySlide12
12
What is the probability that
empty
,
occupied?
Exactly
items should be mapped to
and
items should be mapped to
Given that
items are mapped to
,
should remain empty
Given that
items are mapped to
,
should remain empty
0
1
2
Slide13
13
0
1
2
What is the probability that
empty
,
occupied?
is the probability that a run of
size exactly
starts at a given position
Exercise:
,
Slide14
14
What is the probability that
an unsuccessful search encounters exactly occupied cells?
0
1
2
Interesting to note that
Slide15
15
The
expected no. of probes in an unsuccessful
search, which is also the expected no. of probes needed to insert the
-
st
item is
Slide16
16
Ex. 6.4.27
Knuth, Vol.
3
Slide17
Abel’s binomial theorem
(see Knuth Eq. 1.2.6-(16))
Slide18
Unsuccessful search
The birth of Knuth’s style
Analysis of Algorithms
…Slide19
Successful search / Construction time
The expected number of probes in a
search of randomly selected item is
The expected number of probes in
the construction of the table is
Slide20
20The “parking problem” [Knuth (1962)] [
Konheim-Weiss (1966)]
A one-way street contains
parking spots
cars arrive, one after the other
The
-
th
car chooses a random number
between
1
and
and parks in the first free spot
at or after location
,
if there is one
Exercise:
What is the probability that all cars find a parking spot?Slide21
In practice, we cannot use
a truly random hash function
Does
linear probing
still have a
constant expected time per operation
when more realistic hash functions are used?
For
chaining
, 2-independence,
or just “universality”, was enough
Linear Probing: Theory vs. Practice
How much independence is
needed for
linear probing
?Slide22
5-independence suffices for linear probing!
[
Pagh-Pagh-Rŭzíc
(2009)]
4-independence does not suffice!
[
Pătraşcu
-Thorup
(2010)]
Linear Probing: Theory vs. PracticeSlide23
-independence
Definition:
are
-
independent
iff
for every
distinct
,
are independent
Definition:
are independent
iff
for every
, we have
Slide24
Families of -
independent hash functions
Let
be a family of hash functions from
to
.
is
-
independent
iff
for every distinct
,
are
independent
, when
is chosen at random from
If
is
-
independent
and
,
for some function
, then
is also
-
independent
We usually require that for every
,
is (almost)
uniformly distributed
on
Slide25
Polynomial hash functions
Lemma:
If is a field, then
is a
-independent family of hash functions
Corollary:
If
is a prime, and
is arbitrary, then
is a
-independent family of hash functions
When
,
is almost uniformly
distributed on
Slide26
distinct
(not necessarily distinct)
Unique solution!
Polynomial hash functionsSlide27
Vandermonde Determinant
Slide28
Tabulation-based hash functions[Carter-Wegman (1979)]
[
Pătraşcu-Thorup (2010)]
may be implemented
using small
look-up tables
Very efficient in practiceSlide29
Tabulation-based hash functions[Carter-Wegman (1979)]
[
Pătraşcu-Thorup (2010)]
If
are independently chosen from a
uniform
2
-independent family, then
is
2
-independent
If
are independently chosen from a
uniform
3
-independent family, then
is
3
-independent
Not
4
-independent!
Slide30
Tabulation-based hash functions[Thorup-Zhang
(2012)]
If
are independently chosen from a
5
-independent family, then
is
5
-independent
Higher independence possible at
the cost of more table look-upsSlide31
31Linear probing with bounded independence
[
Pagh-Pagh-Rŭzíc (2009)][Pătraşcu-Thorup
(2010
)]
5
4
3
2
Independence
Search
time
Construction time
5
4
3
2
Independence
Search
time
Construction time
Upper bounds hold for
any
set of keys
and
any
family with the specified independence
Lower bounds hold for
some
sets of keys
and
some
families with the specified independenceSlide32
Balls in Bins
Throw
balls randomly into
bins
All throws are uniform and (partially-)independentSlide33
Balls in Bins
Throw
balls randomly into
bins
Let
be the number of balls
that fall into a specific bin, e.g., the first
Let
be
1
if the
-
th
ball falls into
the specific bin, and
0
otherwise
We want to bound the probability that
is large
Slide34
Tail bounds
Markov’s inequality:
If
Chebyshev’s inequality:
Higher (even) moments:
Slide35
Tail bounds
Chernoff bound:
If
are
independent
indicators,
,
, and
, then
Chernoff
bound
is stronger.
But it requires
complete
independence
.
Proof:
Apply
Markov’s inequality
to
and choose
Slide36
Computing moments
] =
Slide37
Computing moments
If
are
-
independent,
then so are
]
If
are distinct, then
]
If
differs from
, then
]
If
, then
Slide38
Computing moments
If
are
-
independent
Slide39
Computing moments
If
are
-
independent
If
are
-
independent,
where
and
, then
Why?
(We only need 4-
th
moments)Slide40
Planting a binary treeSlide41
Crowded
nodes
[
Pătraşcu-Thorup
(2010)]
is a power of
2
A node at height
corresponds
to
consecutive cells in the table
A node at height
is
crowded
, if at least
items are mapped into its interval
Simplifying assumptions:
The final locations of items mapped
into an interval may be outside the intervalSlide42
≤
Simple observation I
Slide43
Simple observation IISlide44
Main observation[Pătraşcu-Thorup (2010)]
1
2
3
4
Consider a run of length
, where
At least one of the first four nodes at level
whose last cell belongs to the run is
crowded
Slide45
Proof of main observation
1
2
3
4
Just before the run, there is an empty cell.
Thus, if
1
is not
crowded
, it contributes
less than
items to the run
If
2,3,4
are not
crowded
, then each of their
intervals can absorb at least
items
Thus, if none of
1,2,3,4
is
crowded
, the run ends at or
before the interval of
4
and its length is less than
Slide46
Probability of being crowded
Assume that
Consider a node at height
Throwing
balls into
bins
Slide47
Construction time[Pătraşcu-Thorup (2010)]
Let
, where
, be the length of the
consecutive runs in the table after inserting the
items
The cost of the construction is at most
Runs of length
contribute only
By the main observation, if
, then
at least one of the first four nodes at level
whose last cell is in the run is
crowded
.
Each node corresponds to at most one run.
Slide48
If
, we get
If
, we get
)
Construction time
[
Pătraşcu-Thorup
(2010)]Slide49
Query time (successful/unsuccessful)[Pătraşcu-Thorup (2010)]
If
is in a run of length
then the search time is
If
is in a run of length
then at least one of
12
nodes at height
associated with
is
crowded
,
Slide50
Query time (successful/unsuccessful)[Pătraşcu-Thorup (2010)]
- The independence after
conditioning
on the hash value of the key searched
If
, we get
If
, we get
)
If
, we get
Slide51
Why 12?
1
2
3
4
The constant
12
itself, of course, if
not
too important.
The important thing is that it
is
a constant
5
6
7
8
9
10
11
12
Search position