/
Hashing Hashing

Hashing - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
412 views
Uploaded On 2016-02-29

Hashing - PPT Presentation

Motivating Applications Large collection of datasets Datasets are dynamic insert delete Goal efficient searchinginsertiondeletion Hashing is ONLY applicable for exactmatch searching Direct Address Tables ID: 236458

domain hash case tables hash domain tables case key table function hashing list time keys size chain searching slot

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hashing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

HashingSlide2

Motivating Applications

Large collection of datasets

Datasets are dynamic (insert, delete)

Goal: efficient searching/insertion/deletion

Hashing is ONLY applicable for exact-match searchingSlide3

Direct Address Tables

If the keys domain is U

 Create an array T of size U

For each key K  add the object to T[K]

Supports insertion/deletion/searching in O(1)Slide4

Direct Address Tables

Alg.:

DIRECT-ADDRESS-SEARCH(

T, k

)

return

T[k]

Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← xAlg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NILRunning time for these operations: O(1)

Drawbacks>> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible)>> Limited to integer values and does not support duplication

Solution is to use

h

ashing tablesSlide5

Direct Access Tables: Example

Example 1:

Example 2:

U is the domain

K is the actual number of keysSlide6

Hashing

A data structure that maps values from a certain domain or range to another domain or range

Hash function

3

15

20

55

Domain: String values

Domain: Integer valuesSlide7

Hashing

A data structure that maps values from a certain domain or range to another domain or range

Hash function

Domain: numbers [950,000 … 960,000]

Student IDs

950000

…..

960000

Domain: numbers [0 … 10,000]Range0…..10000Slide8

Hash Tables

When

K

is much smaller than

U

, a hash table

requires much less space than a

direct-address tableCan reduce storage requirements to |K|Can still get O(1) search time, but on the average case, not the worst caseSlide9

Hash Tables: Main Idea

Use a

hash function h

to compute the slot for each key k

Store the element in

slot h(k)

Maintain a

hash table

of size m  T [0…m-1]A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, . . . , m - 1}We say that k hashes to slot h(k)Slide10

Hash Tables: Main Idea

U

(universe of keys)

K

(actual

keys)

0

m - 1

h(k

3

)

h(k

2

) = h(k

5

)

h(k

1

)

h(k

4

)

k

1

k

4

k

2

k

5

k

3

Hash Table (of size m)

>> m is much smaller that U (m <<U)

>> m can be even smaller than |K|Slide11

Example

Back to the example of 100 students, each with 9-digit SSN

All what we need is a hash table of size 100Slide12

What About Collisions

U

(universe of keys)

K

(actual

keys)

0

m - 1

h(k

3

)

h(k

2

) = h(k

5

)

h(k

1

)

h(k

4

)

k

1

k

4

k

2

k

5

k

3

Collisions!

Collision means two or more keys will go to the same slotSlide13

Handling Collisions

Many ways to handle it

Chaining

Open addressing

Linear probing

Quadratic probing

Double hashingSlide14

Chaining: Main Idea

Put all elements that hash to the same slot into a linked list (Chain)

Slot

j contains a pointer to the head of the list of all elements that hash to

jSlide15

Chaining - Discussion

Choosing the size of the hash table

Small enough not to waste space

Large enough such that lists remain short

Typically 10% -20% of the total number of elements

How should we keep the lists: ordered or not?

Usually each list is unsorted linked listSlide16

Insertion in Hash Tables

Alg.:

CHAINED-HASH-INSERT(

T, x

)

insert

x at the head of list T[h(key[x])]Worst-case running time is O(1)May or may not allow duplication based on the applicationSlide17

Deletion in Hash Tables

Alg.:

CHAINED-HASH-DELETE(T, x)

delete

x

from the list

T[h(key[x])]Need to find the element to be deleted.Worst-case running time:Deletion depends on searching the corresponding listSlide18

Searching in Hash Tables

Alg.:

CHAINED-HASH-SEARCH(T, k)

search for an element with key

k

in list

T[h(k)]

Running time is proportional to the length of the list of elements in slot h(k)What is the worst case and average case??Slide19

Analysis of Hashing with Chaining:

Worst Case

All keys will go to only one chain

Chain size is O(n)

Searching is O(n) + time to apply h(k)

0

m - 1

T

chainSlide20

Analysis of Hashing with Chaining:

Average

Case

With good hash function and uniform distribution of keys

Any given element is equally likely to hash into any of the

m slots

All chain will have similar sizes

Assume n (total # of keys), m is the hash table size

Average chain size  O (n/m)

0

m - 1

T

chain

chain

chain

chain

Average Search Time O(n/m): The common case Slide21

If m (# of slots) is proportional to

n

(# of

keys):

m

= O(n)n/m = O(1)

Searching takes constant time on average

Analysis of Hashing with Chaining:Average CaseSlide22

Hash FunctionsSlide23

Hash Functions

A hash function transforms a

key (k)

into a table

address (0…m-1)

What

makes a good hash function?

(1) Easy to compute

(2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing)(3) Reduces the number of collisions Slide24

Hash Functions

Goal:

Map a key

k

into one of the

m slots in the hash table

Make table size (m) a prime number

Avoids even and power-of-2 numbers

Common function h(k) = F(k) mod mSome function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]Slide25

Examples of Hash Functions

Collection of images

F(k): Sum of the pixels colors

h(k) = F(k) mod m

Collection of strings

F(k): Sum of the

ascii

values

h(k) = F(k) mod mCollection of numbers

F(k): just return kh(k) = F(k) mod m