/
Anagrams and Hash Tables Anagrams and Hash Tables

Anagrams and Hash Tables - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
419 views
Uploaded On 2017-07-18

Anagrams and Hash Tables - PPT Presentation

Olac Fuentes University of Texas at El Paso Can we find anagrams in constant time Yes with time Ov to preprocess the words where v is the size of the vocabulary We will store the words in the English dictionary in a way that allows to find all the anagrams of any given word in constant ID: 571067

insert sorted cdeo public sorted insert public cdeo node hash deit abck word table list code sortedstringnode flow flow

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Anagrams and Hash Tables" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Anagrams and Hash Tables

Olac Fuentes

University of Texas at El PasoSlide2

Can we find anagrams in constant time?

Yes, with time O(v) to preprocess the words, where v is the size of

the vocabulary.

We will store the words in the English dictionary in a way that allows to find all the anagrams of any given word in constant time.

Let sort(w), where w is a string, be the string that contains the same characters as w (including repetitions) sorted in ascending order.

So sort(“java”) = “

aajv

”.

Observation: words w and x are anagrams of each other

iff

sort(w) == sort(x)

For example sort(“aunt”) = “

anut

”; sort(“tuna”) = “

anut

”, so “aunt” and “tuna” are anagrams.

We will use a hash table, using the words with sorted characters as keys. The node that contains the key S also contains a reference to a list of words {

W1

,...

Wn

} such that sort(

W1

)=sort(

W2

)=...=S, thus all words in the list are anagrams of one another.

Let’s see how this works with an example. Slide3

abck

0

1 /

2

3

4

H:

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

tide

tied

/

/

/

flow

/

lake

/

/

flow

/Slide4

public class

hashTableChaining

{

private

SortedStringNode [] H; public hashTableChaining

(int n){ H = new SortedStringNode[n];

for(int i=0;i<

n;i++) H[i] = null; }}

public class SortedStringNode{

public String SortedString; public SortedStringNode

next; public StringNode anagrams;

public SortedStringNode(String S, SortedStringNode

n){ SortedString = S;

next = n; anagrams = null; }}

public class

StringNode{ public String word;

public StringNode next; public

StringNode(String S, StringNode n){ word = S;

next = n; }}

CLASS DEFINITIONSSlide5

First we need to define a hash function that will allow to distribute the words more or less uniformly in the table.

Here are examples of bad hash functions:

h(S) =

S.length

()%H.length //All words of the same length hash to the same locationh(S) = ((

int) S.charAt(0))%H.length //All words that start with the same letter hash to the same location

A better way is to view a string as a number in a base-26 number system (since we have 26 letters), with a=0, b=1, and so onFor example, in a base 10 systemvalue(2017) = 7*100 + 1

*101 + 0*102 + 2*103

Similarly,value(“be”) = 4*260 + 1*261

Thus a good hash function would be: h(w) = value(w)%H.length;However, for our application, we need all words that are anagrams of one another to hash to the same location, so we will use the hash function

h(w) = value(sort(w))%H.length;Slide6

EXAMPLE

Consider the following vocabulary:

back

code

coed

dietedit

flowlaketidetied

We will build a hash table of size 5 (in practice you should use much larger sizes) to store the words, organized as groups of anagrams.Slide7

0 /

1 /

2

/

3

/

4

/

H:

Initially:Slide8

0 /

1 /

2

/

3

/

4

/

H:

Initially:

Insert “back”.

Sorted(“back”) = “

abck

and h(sorted(“back”)) = h(“

abck

”) = 3 Since “abck” is not in the hash table, we create a new node containing the key “

abck” and insert the original word “back” into the list associated with that node. Slide9

abck

0 /

1 /

2

/

3

4

/

H:

Insert “back”.

Sorted(“back”) = “

abck

and h(sorted(“back”)) = h(“

abck

”) = 3

Since “

abck

” is not in the hash table, we create a new node containing the key “

abck

” and insert the original word “back” into the list associated with that node.

back

/

/Slide10

abck

0 /

1 /

2

/

3

4

/

H:

Insert “code”.

Sorted(“code”) = “

cdeo

and h(sorted(“code”)) = h(“

cdeo

”) = 3

Since “

cdeo

” is not in the hash table, we create a new node containing the key “

cdeo

” and insert the original word “code” into the list associated with that node.

back

/

/Slide11

abck

0 /

1 /

2

/

3

4 /

H:

Insert “code”.

Sorted(“code”) = “

cdeo

and h(sorted(“code”)) = h(“

cdeo

”) = 3

Since “

cdeo

” is not in the hash table, we create a new node containing the key “

cdeo

” and insert the original word “code” into the list associated with that node.

cdeo

back

/

/

code

/Slide12

abck

0 /

1 /

2

/

3

4 /

H:

Insert “coed”.

Sorted(“coed”) = “

cdeo

and h(sorted(“coed”)) = h(“

cdeo

”) = 3

The sorted word “

cdeo

” is already in the hash table, so we insert “coed” into the list associated with that node.

cdeo

back

/

/

code

/Slide13

abck

0 /

1 /

2

/

3

4 /

H:

Insert “coed”.

Sorted(“coed”) = “

cdeo

and h(sorted(“coed”)) = h(“

cdeo

”) = 3

The sorted word “

cdeo

” is already in the hash table, so we insert “coed” into the list associated with that node.

cdeo

back

/

code

coed

/

/Slide14

abck

0 /

1 /

2

/

3

4 /

H:

Insert “diet”.

Sorted(“diet”) = “

deit

and h(sorted(“diet”)) = h(“

deit

”) = 4

Since “

deit

” is not in the hash table, we create a new node containing the key “

deit

” and insert the original word “diet” into the list associated with that node.

cdeo

back

/

code

coed

/

/Slide15

abck

0 /

1 /

2

/

3

4

H:

Insert “diet”.

Sorted(“diet”) = “

deit

and h(sorted(“diet”)) = h(“

deit

”) = 4

Since “

deit

” is not in the hash table, we create a new node containing the key “

deit

” and insert the original word “diet” into the list associated with that node.

cdeo

back

/

code

coed

/

diet

deit

/

/

/Slide16

abck

0 /

1 /

2

/

3

4

H:

Insert “edit”.

Sorted(“edit”) = “

deit

and h(sorted(“diet”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “edit” into the list associated with that node.

cdeo

back

/

code

coed

/

diet

deit

/

/

/Slide17

abck

0 /

1 /

2

/

3

4

H:

Insert “edit”.

Sorted(“edit”) = “

deit

and h(sorted(“diet”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “edit” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

/

/

/Slide18

abck

0 /

1 /

2

/

3

4

H:

Insert “flow”.

Sorted(“flow”) = “flow”

and h(sorted(“flow”)) = h(“flow”) = 2

Since “flow” is not in the hash table, we create a new node containing the key “flow” and insert the original word “flow” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

/

/

/Slide19

abck

0 /

1 /

2

3

4

H:

Insert “flow”.

Sorted(“flow”) = “flow”

and h(sorted(“flow”)) = h(“flow”) = 2

Since “flow” is not in the hash table, we create a new node containing the key “flow” and insert the original word “flow” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

/

flow

/

/

flow

/

/Slide20

abck

0 /

1 /

2

3

4

H:

Insert “lake”.

Sorted(“lake”) = “

aekl

and h(sorted(“lake”)) = h(“

aekl

”) = 0

Since “

aekl

” is not in the hash table, we create a new node containing the key “

aekl

” and insert the original word “lake” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

/

flow

/

/

flow

/

/Slide21

abck

0

1 /

2

3

4

H:

Insert “lake”.

Sorted(“lake”) = “

aekl

and h(sorted(“lake”)) = h(“

aekl

”) = 0

Since “

aekl

” is not in the hash table, we create a new node containing the key “

aekl

” and insert the original word “lake” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

/

/

/

flow

/

lake

/

/

flow

/Slide22

abck

0

1 /

2

3

4

H:

Insert “tide”.

Sorted(“tied”) = “

deit

and h(sorted(“tied”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “tide” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

/

/

/

flow

/

lake

/

/

flow

/Slide23

abck

0

1 /

2

3

4

H:

Insert “tide”.

Sorted(“tied”) = “

deit

and h(sorted(“tied”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “tide” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

tide

/

/

/

flow

/

lake

/

/

flow

/Slide24

abck

0

1 /

2

3

4

H:

Insert “tied”.

Sorted(“tied”) = “

deit

and h(sorted(“tied”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “tied” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

tide

/

/

/

flow

/

lake

/

/

flow

/Slide25

abck

0

1 /

2

3

4

H:

Insert “tied”.

Sorted(“tied”) = “

deit

and h(sorted(“tied”)) = h(“

deit

”) = 4

The sorted word “

deit

” is already in the hash table, so we insert “tied” into the list associated with that node.

cdeo

back

/

/

code

coed

deit

diet

edit

aekl

tide

tied

/

/

/

flow

/

lake

/

/

flow

/Slide26

public class

hashTableChaining

{

private

SortedStringNode [] H; public hashTableChaining

(int n){ H = new SortedStringNode[n];

for(int i=0;i<

n;i++) H[i] = null; }}

public class SortedStringNode{

public String SortedString; public SortedStringNode

next; public StringNode anagrams;

public SortedStringNode(String S, SortedStringNode

n){ SortedString = S;

next = n; anagrams = null; }}

public class

StringNode{ public String word;

public StringNode next; public

StringNode(String S, StringNode n){ word = S;

next = n; }}

CLASS DEFINITIONSSlide27

public class

hashTableChaining

{

private

SortedStringNode [] H; public hashTableChaining

(int n){ H = new SortedStringNode[n];

for(int i=0;i<

n;i++) H[i] = null; }}

public class SortedStringNode{

public String SortedString; public StringNode

anagrams; public SortedStringNode(String S,

SortedStringNode n){ SortedString

= S; anagrams = null; }}

public class

StringNode{ public String word; public

StringNode next; public StringNode

(String S, StringNode n){ word = S;

next = n; }}

WHAT IF WE USE LINEAR PROBING?CLASS DEFINITIONSSlide28

abck

0 /

1 /

2

/

3

4 /

H:

Insert “code” into table

h(“

cdeo

”) = 3, but slot 3 is already taken.

Easy, insert

cdeo

” into next available slot.

back

/Slide29

abck

0 /

1 /

2

/

3

4

H:

Insert “code” into table

h(“

cdeo

”) = 3, but slot 3 is already taken.

Easy, insert

cdeo

” into next available slot.

back

code

/

/

cdeoSlide30

abck

0 /

1 /

2

/

3

4

H:

Insert “coed” into table

h(“

cdeo

”) = 3, but slot 3 is already taken, but we find “

cdeo

” in the next slot.

We now insert

“coed” into the appropriate list as before.

back

code

coed

/

/

cdeo