Olac Fuentes University of Texas at El Paso Can we find anagrams in constant time Yes with time Ov to preprocess the words where v is the size of the vocabulary We will store the words in the English dictionary in a way that allows to find all the anagrams of any given word in constant ID: 571067
Download Presentation The PPT/PDF document "Anagrams and Hash Tables" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Anagrams and Hash Tables
Olac Fuentes
University of Texas at El PasoSlide2
Can we find anagrams in constant time?
Yes, with time O(v) to preprocess the words, where v is the size of
the vocabulary.
We will store the words in the English dictionary in a way that allows to find all the anagrams of any given word in constant time.
Let sort(w), where w is a string, be the string that contains the same characters as w (including repetitions) sorted in ascending order.
So sort(“java”) = “
aajv
”.
Observation: words w and x are anagrams of each other
iff
sort(w) == sort(x)
For example sort(“aunt”) = “
anut
”; sort(“tuna”) = “
anut
”, so “aunt” and “tuna” are anagrams.
We will use a hash table, using the words with sorted characters as keys. The node that contains the key S also contains a reference to a list of words {
W1
,...
Wn
} such that sort(
W1
)=sort(
W2
)=...=S, thus all words in the list are anagrams of one another.
Let’s see how this works with an example. Slide3
abck
0
1 /
2
3
4
H:
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
tide
tied
/
/
/
flow
/
lake
/
/
flow
/Slide4
public class
hashTableChaining
{
private
SortedStringNode [] H; public hashTableChaining
(int n){ H = new SortedStringNode[n];
for(int i=0;i<
n;i++) H[i] = null; }}
public class SortedStringNode{
public String SortedString; public SortedStringNode
next; public StringNode anagrams;
public SortedStringNode(String S, SortedStringNode
n){ SortedString = S;
next = n; anagrams = null; }}
public class
StringNode{ public String word;
public StringNode next; public
StringNode(String S, StringNode n){ word = S;
next = n; }}
CLASS DEFINITIONSSlide5
First we need to define a hash function that will allow to distribute the words more or less uniformly in the table.
Here are examples of bad hash functions:
h(S) =
S.length
()%H.length //All words of the same length hash to the same locationh(S) = ((
int) S.charAt(0))%H.length //All words that start with the same letter hash to the same location
A better way is to view a string as a number in a base-26 number system (since we have 26 letters), with a=0, b=1, and so onFor example, in a base 10 systemvalue(2017) = 7*100 + 1
*101 + 0*102 + 2*103
Similarly,value(“be”) = 4*260 + 1*261
Thus a good hash function would be: h(w) = value(w)%H.length;However, for our application, we need all words that are anagrams of one another to hash to the same location, so we will use the hash function
h(w) = value(sort(w))%H.length;Slide6
EXAMPLE
Consider the following vocabulary:
back
code
coed
dietedit
flowlaketidetied
We will build a hash table of size 5 (in practice you should use much larger sizes) to store the words, organized as groups of anagrams.Slide7
0 /
1 /
2
/
3
/
4
/
H:
Initially:Slide8
0 /
1 /
2
/
3
/
4
/
H:
Initially:
Insert “back”.
Sorted(“back”) = “
abck
”
and h(sorted(“back”)) = h(“
abck
”) = 3 Since “abck” is not in the hash table, we create a new node containing the key “
abck” and insert the original word “back” into the list associated with that node. Slide9
abck
0 /
1 /
2
/
3
4
/
H:
Insert “back”.
Sorted(“back”) = “
abck
”
and h(sorted(“back”)) = h(“
abck
”) = 3
Since “
abck
” is not in the hash table, we create a new node containing the key “
abck
” and insert the original word “back” into the list associated with that node.
back
/
/Slide10
abck
0 /
1 /
2
/
3
4
/
H:
Insert “code”.
Sorted(“code”) = “
cdeo
”
and h(sorted(“code”)) = h(“
cdeo
”) = 3
Since “
cdeo
” is not in the hash table, we create a new node containing the key “
cdeo
” and insert the original word “code” into the list associated with that node.
back
/
/Slide11
abck
0 /
1 /
2
/
3
4 /
H:
Insert “code”.
Sorted(“code”) = “
cdeo
”
and h(sorted(“code”)) = h(“
cdeo
”) = 3
Since “
cdeo
” is not in the hash table, we create a new node containing the key “
cdeo
” and insert the original word “code” into the list associated with that node.
cdeo
back
/
/
code
/Slide12
abck
0 /
1 /
2
/
3
4 /
H:
Insert “coed”.
Sorted(“coed”) = “
cdeo
”
and h(sorted(“coed”)) = h(“
cdeo
”) = 3
The sorted word “
cdeo
” is already in the hash table, so we insert “coed” into the list associated with that node.
cdeo
back
/
/
code
/Slide13
abck
0 /
1 /
2
/
3
4 /
H:
Insert “coed”.
Sorted(“coed”) = “
cdeo
”
and h(sorted(“coed”)) = h(“
cdeo
”) = 3
The sorted word “
cdeo
” is already in the hash table, so we insert “coed” into the list associated with that node.
cdeo
back
/
code
coed
/
/Slide14
abck
0 /
1 /
2
/
3
4 /
H:
Insert “diet”.
Sorted(“diet”) = “
deit
”
and h(sorted(“diet”)) = h(“
deit
”) = 4
Since “
deit
” is not in the hash table, we create a new node containing the key “
deit
” and insert the original word “diet” into the list associated with that node.
cdeo
back
/
code
coed
/
/Slide15
abck
0 /
1 /
2
/
3
4
H:
Insert “diet”.
Sorted(“diet”) = “
deit
”
and h(sorted(“diet”)) = h(“
deit
”) = 4
Since “
deit
” is not in the hash table, we create a new node containing the key “
deit
” and insert the original word “diet” into the list associated with that node.
cdeo
back
/
code
coed
/
diet
deit
/
/
/Slide16
abck
0 /
1 /
2
/
3
4
H:
Insert “edit”.
Sorted(“edit”) = “
deit
”
and h(sorted(“diet”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “edit” into the list associated with that node.
cdeo
back
/
code
coed
/
diet
deit
/
/
/Slide17
abck
0 /
1 /
2
/
3
4
H:
Insert “edit”.
Sorted(“edit”) = “
deit
”
and h(sorted(“diet”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “edit” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
/
/
/Slide18
abck
0 /
1 /
2
/
3
4
H:
Insert “flow”.
Sorted(“flow”) = “flow”
and h(sorted(“flow”)) = h(“flow”) = 2
Since “flow” is not in the hash table, we create a new node containing the key “flow” and insert the original word “flow” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
/
/
/Slide19
abck
0 /
1 /
2
3
4
H:
Insert “flow”.
Sorted(“flow”) = “flow”
and h(sorted(“flow”)) = h(“flow”) = 2
Since “flow” is not in the hash table, we create a new node containing the key “flow” and insert the original word “flow” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
/
flow
/
/
flow
/
/Slide20
abck
0 /
1 /
2
3
4
H:
Insert “lake”.
Sorted(“lake”) = “
aekl
”
and h(sorted(“lake”)) = h(“
aekl
”) = 0
Since “
aekl
” is not in the hash table, we create a new node containing the key “
aekl
” and insert the original word “lake” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
/
flow
/
/
flow
/
/Slide21
abck
0
1 /
2
3
4
H:
Insert “lake”.
Sorted(“lake”) = “
aekl
”
and h(sorted(“lake”)) = h(“
aekl
”) = 0
Since “
aekl
” is not in the hash table, we create a new node containing the key “
aekl
” and insert the original word “lake” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
/
/
/
flow
/
lake
/
/
flow
/Slide22
abck
0
1 /
2
3
4
H:
Insert “tide”.
Sorted(“tied”) = “
deit
”
and h(sorted(“tied”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “tide” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
/
/
/
flow
/
lake
/
/
flow
/Slide23
abck
0
1 /
2
3
4
H:
Insert “tide”.
Sorted(“tied”) = “
deit
”
and h(sorted(“tied”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “tide” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
tide
/
/
/
flow
/
lake
/
/
flow
/Slide24
abck
0
1 /
2
3
4
H:
Insert “tied”.
Sorted(“tied”) = “
deit
”
and h(sorted(“tied”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “tied” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
tide
/
/
/
flow
/
lake
/
/
flow
/Slide25
abck
0
1 /
2
3
4
H:
Insert “tied”.
Sorted(“tied”) = “
deit
”
and h(sorted(“tied”)) = h(“
deit
”) = 4
The sorted word “
deit
” is already in the hash table, so we insert “tied” into the list associated with that node.
cdeo
back
/
/
code
coed
deit
diet
edit
aekl
tide
tied
/
/
/
flow
/
lake
/
/
flow
/Slide26
public class
hashTableChaining
{
private
SortedStringNode [] H; public hashTableChaining
(int n){ H = new SortedStringNode[n];
for(int i=0;i<
n;i++) H[i] = null; }}
public class SortedStringNode{
public String SortedString; public SortedStringNode
next; public StringNode anagrams;
public SortedStringNode(String S, SortedStringNode
n){ SortedString = S;
next = n; anagrams = null; }}
public class
StringNode{ public String word;
public StringNode next; public
StringNode(String S, StringNode n){ word = S;
next = n; }}
CLASS DEFINITIONSSlide27
public class
hashTableChaining
{
private
SortedStringNode [] H; public hashTableChaining
(int n){ H = new SortedStringNode[n];
for(int i=0;i<
n;i++) H[i] = null; }}
public class SortedStringNode{
public String SortedString; public StringNode
anagrams; public SortedStringNode(String S,
SortedStringNode n){ SortedString
= S; anagrams = null; }}
public class
StringNode{ public String word; public
StringNode next; public StringNode
(String S, StringNode n){ word = S;
next = n; }}
WHAT IF WE USE LINEAR PROBING?CLASS DEFINITIONSSlide28
abck
0 /
1 /
2
/
3
4 /
H:
Insert “code” into table
h(“
cdeo
”) = 3, but slot 3 is already taken.
Easy, insert
“
cdeo
” into next available slot.
back
/Slide29
abck
0 /
1 /
2
/
3
4
H:
Insert “code” into table
h(“
cdeo
”) = 3, but slot 3 is already taken.
Easy, insert
“
cdeo
” into next available slot.
back
code
/
/
cdeoSlide30
abck
0 /
1 /
2
/
3
4
H:
Insert “coed” into table
h(“
cdeo
”) = 3, but slot 3 is already taken, but we find “
cdeo
” in the next slot.
We now insert
“coed” into the appropriate list as before.
back
code
coed
/
/
cdeo