Motivating Applications Large collection of datasets Datasets are dynamic insert delete Goal efficient searchinginsertiondeletion Hashing is ONLY applicable for exactmatch searching Direct Address Tables ID: 236458
Download Presentation The PPT/PDF document "Hashing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
HashingSlide2
Motivating Applications
Large collection of datasets
Datasets are dynamic (insert, delete)
Goal: efficient searching/insertion/deletion
Hashing is ONLY applicable for exact-match searchingSlide3
Direct Address Tables
If the keys domain is U
Create an array T of size U
For each key K add the object to T[K]
Supports insertion/deletion/searching in O(1)Slide4
Direct Address Tables
Alg.:
DIRECT-ADDRESS-SEARCH(
T, k
)
return
T[k]
Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← xAlg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NILRunning time for these operations: O(1)
Drawbacks>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)>> Limited to integer values and does not support duplication
Solution is to use
h
ashing tablesSlide5
Direct Access Tables: Example
Example 1:
Example 2:
U is the domain
K is the actual number of keysSlide6
Hashing
A data structure that maps values from a certain domain or range to another domain or range
Hash function
3
15
20
55
Domain: String values
Domain: Integer valuesSlide7
Hashing
A data structure that maps values from a certain domain or range to another domain or range
Hash function
Domain: numbers [950,000 … 960,000]
Student IDs
950000
…..
960000
Domain: numbers [0 … 10,000]Range0…..10000Slide8
Hash Tables
When
K
is much smaller than
U
, a hash table
requires much less space than a
direct-address tableCan reduce storage requirements to |K|Can still get O(1) search time, but on the average case, not the worst caseSlide9
Hash Tables: Main Idea
Use a
hash function h
to compute the slot for each key k
Store the element in
slot h(k)
Maintain a
hash table
of size m T [0…m-1]A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, . . . , m - 1}We say that k hashes to slot h(k)Slide10
Hash Tables: Main Idea
U
(universe of keys)
K
(actual
keys)
0
m - 1
h(k
3
)
h(k
2
) = h(k
5
)
h(k
1
)
h(k
4
)
k
1
k
4
k
2
k
5
k
3
Hash Table (of size m)
>> m is much smaller that U (m <<U)
>> m can be even smaller than |K|Slide11
Example
Back to the example of 100 students, each with 9-digit SSN
All what we need is a hash table of size 100Slide12
What About Collisions
U
(universe of keys)
K
(actual
keys)
0
m - 1
h(k
3
)
h(k
2
) = h(k
5
)
h(k
1
)
h(k
4
)
k
1
k
4
k
2
k
5
k
3
Collisions!
Collision means two or more keys will go to the same slotSlide13
Handling Collisions
Many ways to handle it
Chaining
Open addressing
Linear probing
Quadratic probing
Double hashingSlide14
Chaining: Main Idea
Put all elements that hash to the same slot into a linked list (Chain)
Slot
j contains a pointer to the head of the list of all elements that hash to
jSlide15
Chaining - Discussion
Choosing the size of the hash table
Small enough not to waste space
Large enough such that lists remain short
Typically 10% -20% of the total number of elements
How should we keep the lists: ordered or not?
Usually each list is unsorted linked listSlide16
Insertion in Hash Tables
Alg.:
CHAINED-HASH-INSERT(
T, x
)
insert
x at the head of list T[h(key[x])]Worst-case running time is O(1)May or may not allow duplication based on the applicationSlide17
Deletion in Hash Tables
Alg.:
CHAINED-HASH-DELETE(T, x)
delete
x
from the list
T[h(key[x])]Need to find the element to be deleted.Worst-case running time:Deletion depends on searching the corresponding listSlide18
Searching in Hash Tables
Alg.:
CHAINED-HASH-SEARCH(T, k)
search for an element with key
k
in list
T[h(k)]
Running time is proportional to the length of the list of elements in slot h(k)What is the worst case and average case??Slide19
Analysis of Hashing with Chaining:
Worst Case
All keys will go to only one chain
Chain size is O(n)
Searching is O(n) + time to apply h(k)
0
m - 1
T
chainSlide20
Analysis of Hashing with Chaining:
Average
Case
With good hash function and uniform distribution of keys
Any given element is equally likely to hash into any of the
m slots
All chain will have similar sizes
Assume n (total # of keys), m is the hash table size
Average chain size O (n/m)
0
m - 1
T
chain
chain
chain
chain
Average Search Time O(n/m): The common case Slide21
If m (# of slots) is proportional to
n
(# of
keys):
m
= O(n)n/m = O(1)
Searching takes constant time on average
Analysis of Hashing with Chaining:Average CaseSlide22
Hash FunctionsSlide23
Hash Functions
A hash function transforms a
key (k)
into a table
address (0…m-1)
What
makes a good hash function?
(1) Easy to compute
(2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing)(3) Reduces the number of collisions Slide24
Hash Functions
Goal:
Map a key
k
into one of the
m slots in the hash table
Make table size (m) a prime number
Avoids even and power-of-2 numbers
Common function h(k) = F(k) mod mSome function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]Slide25
Examples of Hash Functions
Collection of images
F(k): Sum of the pixels colors
h(k) = F(k) mod m
Collection of strings
F(k): Sum of the
ascii
values
h(k) = F(k) mod mCollection of numbers
F(k): just return kh(k) = F(k) mod m