/
Hashtables Picture of a Hashtables Picture of a

Hashtables Picture of a - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
364 views
Uploaded On 2018-02-10

Hashtables Picture of a - PPT Presentation

hashtable KEY eg student id VALUE eg student name 089 JOHN 045 DAVE 939 STEVE You can think of this as a dictionary with words and definitions 3 A basic problem We have to store some ID: 629921

hash number table key number hash key table record 700 580625685 281942902 233667136 155778322 701466868 506643548 search array records

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hashtables Picture of a" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

HashtablesSlide2

Picture of a hashtable

KEY e.g.

student id

VALUE e.g. student name089JOHN045DAVE939STEVE

You can think of this as a dictionary – with words and definitions. Slide3

3

A basic problem

We have to store some

records and perform the following:add new recorddelete recordsearch a record by keyFind a way to do these

efficiently

!Slide4

4

Unsorted

array

Use an array to store the records, in unsorted orderadd - add the records as the last entry fast O(1)delete a target -

slow

at finding the target,

fast

at filling the hole (just take the last entry)

O(n)

search

- sequential search

slow

O(n)Slide5

5

Sorted

array

Use an array to store the records, keeping them in sorted orderadd - insert the record in proper position. much record movement slow O(n)

delete

a target - how to handle the hole after deletion? Much record movement

slow

O(n)

search

- binary search

fast

O(log n)Slide6

6

Linked list

Store the records in a linked list (

unsorted) add - fast if one can insert node anywhere O(1)delete

a target -

fast

at disposing the node, but

slow

at finding the target

O(n)

search

- sequential search

slow

O(n)

(if we only use linked list, we cannot use binary search even if the list is sorted.)

Slide7

7

More approaches

have better performance but are more complex

Hash tableTree (BST, Heap, …)Slide8

What is a Hash Table ?

The simplest kind of hash table is an

array of records

.This example has 701 records.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

An array of records

. . .

[ 700]Slide9

What is a Hash Table ?

Each record has a special field, called its

key

.In this example, the key is a long integer field called

Number

.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

. . .

[ 700]

[ 4 ]

Number

506643548Slide10

What is a Hash Table ?

The number might be a

person's identification number

, and the rest of the record has information about the person.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

. . .

[ 700]

[ 4 ]

Number

506643548Slide11

What is a Hash Table ?

When a hash table is in use, some spots

contain

valid records, and other spots are "empty".

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .Slide12

Inserting a New Record

In order to insert a new record, the

key

must somehow be converted to an array index.

The index is called the

hash value

of the

key

.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number

580625685Slide13

Inserting a New Record

Typical way create a

hash value

:

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number

580625685

(Number mod 701)

What is (580625685

mod 701

) ?Slide14

Inserting a New Record

Typical way to create a hash value:

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number

580625685

(Number mod 701)

What is (580625685 mod 701) ?

3Slide15

Inserting a New Record

The hash value is used for the location of the new record.

Number

580625685

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

[3]Slide16

Inserting a New Record

The hash value is used for the location of the new record.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685Slide17

Collisions

Here is another new record to insert, with a hash value of 2.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number

701466868

My hash

value is [2].Slide18

Collisions

This is called a

collision

, because there is already another valid record at [2].

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

When a collision occurs,

move forward until you

find an empty spot.Slide19

Collisions

This is called a

collision

, because there is already another valid record at [2].

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

When a collision occurs,

move forward until you

find an empty spot.Slide20

Collisions

This is called a

collision

, because there is already another valid record at [2].

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

When a collision occurs,

move forward until you

find an empty spot.Slide21

Collisions

This is called a

collision

, because there is already another valid record at [2].

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

The new record goes

in the empty spot.Slide22

Where would you be placed in this table, if there is no collision? Use your national insurance number or some other

favorite

number.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

Number 580625685

Number 701466868

. . .Slide23

Searching

for a Key

The data that's attached to a key can be

found fairly quickly.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868Slide24

Searching for a Key

Calculate

the hash value.

Check that location of the array for the key.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868

My hash

value is [2].

Not me.Slide25

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868

My hash

value is [2].

Not me.Slide26

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868

My hash

value is [2].

Not me.Slide27

Searching for a Key

Keep moving forward until you find the key, or you reach an empty spot.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868

My hash

value is [2].

Yes!Slide28

Searching for a Key

When the item is found, the information can be copied to the necessary location.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Number

701466868

My hash

value is [2].

Yes!Slide29

Deleting

a Record

Records may also be

deleted from a hash table.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 506643548

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Please

delete me.Slide30

Deleting a Record

Records may also be deleted from a hash table.

But the location

must not be left as an ordinary "empty spot" since that could interfere with searches.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868Slide31

Deleting a Record

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[ 700]

Number 233667136

Number 281942902

Number 155778322

. . .

Number 580625685

Number 701466868

Records may also be deleted from a hash table.

But the location must not be left as an ordinary "empty spot" since that could interfere with searches.

The

location

must be

marked

in some special way so that a search can tell that the spot used to have something in it.Slide32

32

Array as table

9903030

9802020

9801010

0056789

0012345

0033333

tom

mary

peter

david

andy

betty

73

100

20

56.8

81.5

90

studid

name

score

9908080

bill

49

...

...

Consider this problem. We want to store

1,000 student records

and search them by student id.Slide33

33

Array as table

:

33333

:

12345

0

:

:

betty

:

andy

:

:

90

:

81.5

:

name

score

56789

david

56.8

:

9908080

:

:

:

bill

:

:

:

49

:

:9999999One way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with

studid 0012345 is stored at A[12345] -- Is this a good idea?If I have 70 friends, and I want to store their mobile phone numbers, I do not want an array 1000000 in size.I could use a table about 140 slots in it. Slide34

34

Array as table

It is also called

Direct-address Hash Table. • Each slot

, or position, corresponds to a key in

U

.

If there’s an element

x

with key

k

, then

T

[k] contains a pointer to x.

Otherwise,

T

[

k

] is empty, represented by NIL.Slide35

35

Array as table

Store the records in a huge array where

the index corresponds to the keyadd - very fast O(1)delete - very fast

O(1)

search

-

very fast

O(1)

But it

wastes a lot of memory

! Not feasible.Slide36

36

Hash function

function Hash(key: KeyType): integer;

Imagine that we have such a magic function Hash

. It maps the key (

studid

) of the 1000 records into the integers 0..999, one to one

. No two different keys maps to the same number.

H(‘0012345’) = 134

H(‘0033333’) = 67

H(‘0056789’) = 764

H(‘9908080’) = 3Slide37

37

Hash Table

:

betty

:

bill

:

:

90

:

49

:

name

score

andy

81.5

:

:

david

:

:

:

56.8

:

:

0033333

:

9908080

:

0012345

:

:

0056789

:

3670

764999134

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).Slide38

38

Hash Table with

Perfect Hash

Such magic function is called perfect hashadd - very fast O(1)delete - very fast

O(1)

search

-

very fast

O(1)

But it is generally

difficult

to design perfect hash. (e.g. when the potential key space is large)Slide39

39

Hash function

A

hash function maps a key to an index within in a rangeDesirable properties:simple and quick

to calculate

even distribution

,

avoid collision

as much as possible

function Hash(key: KeyType);Slide40

40

Division Method

Certain values of m may not be good:

When m = 2p then h(k) is the p lower-order bits of the keyGood values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function:

h

(k) = k mod m

h

(key) = k mod 701Slide41

41

Collision

For most cases,

we cannot avoid collisionCollision resolution - how to handle when two different keys map to the same index

H(‘0012345’) = 134

H(‘0033333’) = 67

H(‘0056789’) = 764

H(‘9903030’) =

3

H(‘9908080’) =

3Slide42

42

The problem arises because we have two keys that hash in the same array entry, a

collision

. There are two ways to resolve collision:Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entryHashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

Solutions to CollisionSlide43

43

Chained Hash Table

2

4

1

0

3

nil

nil

nil

5

nil

:

HASHMAX

Key: 9903030

name: tom

score: 73

One way to handle collision is to store the

collided records in a linked list

. The array now stores

pointers to such lists

. If no key maps to a certain hash value, that array entry points to nil.

Which index has the collisions?Slide44

44

Chained Hash Table

Put all elements that hash to the same slot into a linked list.

Slot

j

contains a pointer to the head of the list of all stored elements that hash to

j

If there are no such elements, slot

j

contains NIL.Slide45

45

Chained Hash table

Hash table, where

collided records are stored in linked listgood hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1)

otherwise

some hash value has a long list of collided records..

add

- just insert at the head

fast

O(1)

delete

a target - delete from unsorted linked list

slow

search

- sequential search slow O(n)

Consider the two extremes. Slide46

46

Open Addressing

An alternative to chaining for handling collisions.

• Store all keys in the hash table itself.• Each slot contains either a key or NIL.• To search for key k:Compute h(k

)

and

examine

slot

h

(

k

)

.

Examining a slot is known as a

probe

.

If slot h(k) contains key k

, the search is

successful

.

If this slot

contains

NIL, the search is

unsuccessful

.

There’s a third possibility

: slot h(k

) contains a key that is not k.

We compute the index of some other slot, based on k and on which

probe (count from 0: 0th, 1st, 2nd, etc.) we’re on. Keep probing until we either find key k

(successful search) or we find a slot holding NIL (unsuccessful search).Slide47

47

How to compute probe sequences

Linear probing:

Given auxiliary hash function h, the probe sequence starts at slot h(k)

and continues sequentially through the table, wrapping after slot

m

1 to slot 0. Given key

k

and probe number

i

(0

i

< m), h(

k

,

i

)

=

(

h

(

k

) + i

) mod

m. Quadratic probing:

As in linear probing, the probe sequence starts at h(k). Unlike linear probing, it examines cells 1,4,9, and so on, away from the original probe point: h(k, i ) = (h(k) + c1i + c2i 2) mod m (if c1=0, c2=1) Double hashing: Use two auxiliary hash functions, h

1 and h2. h1 gives the initial probe, and h2 gives the remaining probes: h(k, i ) = (h1(k) + ih2(k)) mod m.Slide48

48

Open Addressing Example

Hash( 89, 10) = 9

Hash( 18, 10) = 8Hash( 49, 10) = 9Hash( 58, 10) = 8Hash( 9, 10) = 9Slide49

49

Linear Probing:

h

(k, i ) = (h

(

k

)

+

i

)

mod

m

.

In linear probing,

collisions are resolved by sequentially scanning an array (with wraparound) until an empty cell is found.

In following example, table size m = 8, and

k

:

A,P,Q

B,O,R

C,N,S

D,M,T

E,L,U F,K,N G,J,W,Z H,I,X,Y

h(k): 0 1 2 3 4 5 6 7

Action

Store

A

Store CStore DStore GStore PStore QDelete PDelete QStore BStore RStore Q

0

AAAAAAAAAAA1PP

±±BBB2CCCCCCCCCC3D

DDDDDDDD4QQ±±RR

5Q6GGGGGGGG

# probes1111

25251 4 67Slide50

50

Choosing a Hash Function

Notice that the

insertion of Q required several probes (5). This was caused by A and P mapping to slot 0 which is beside the C and D keys.

The performance of the hash table depends on having a hash function

which evenly distributes the keys.

Choosing a good hash function is

a black art

.Slide51

51

Clustering

Even with a good hash function,

linear probing has its problems:The position of the initial mapping i 0 of key k is called the home position of k.When several insertions map to the same home position, they end up placed contiguously in the table.

This

collection of keys with the same home position

is called a

cluster

.

As clusters grow

, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as

primary clustering

.

As these clusters grow, they

merge with other clusters

forming even bigger clusters which

grow even faster.Slide52

52

Quadratic Probing Example

Hash( 89, 10) = 9

Hash( 18, 10) = 8Hash( 49, 10) = 9Hash( 58, 10) = 8Hash( 9, 10) = 9Slide53

53

Quadratic Probing

:

h(k, i ) =

(

h

(

k

)

+

c

1

i

+

c

2

i 2)

mod

m

Quadratic probing eliminates the primary clustering problem of linear probing by examining certain cells away from the original probe point.

In the following example, table size m = 8, and c1 = 0 , c2 = 1

k

:

A,P,Q B,O,R

C,N,S D,M,T E,L,U F,K,N G,J,W,Z H,I,X,Y

h(k): 0 1 2 3 4 5 6 7

Action

Store

AStore CStore DStore GStore PStore QDelete PDelete QStore BStore RStore Q

0

AAAAAAAAAAA1

PP±±BBB2CCCCCCCCCC

3DDDDDDDDD4QQ±±Q

5R6GGGGGGGG

# probes

111123(5)23(5)1 3(4) 3(6)7Slide54

54

Double Hashing

Double hashing:

Use two auxiliary hash functions, h

1

and

h

2

.

h

1

gives the initial probe, and

h

2

gives the remaining probes:

h

(k, i

)

=

(

h

1

(

k

)

+

ih

2(k

)) mod m

. Quadratic probing solves the primary clustering problem, but it has the secondary clustering problem, in which, elements that hash to the same position probe the same alternative cells. Secondary clustering is a minor theoretical blemish.

Double hashing is a hashing technique that does not suffer from secondary clustering. A second hash function is used to drive the collision resolution. Limits are left to ponder 