on the wordRAM Uri Zwick Tel Aviv University Started May 2015 Last update December 21 2016 1 Integer sorting Memory is composed of bit words where Arithmetical logical and shift operations ID: 561205
Download Presentation The PPT/PDF document "Integer Sorting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Integer Sortingon the word-RAM
Uri Zwick
Tel Aviv University
Started:
May 2015Last update: December 21, 2016
1Slide2
Integer sorting
Memory is composed of
-bit words,
where
Arithmetical, logical and shift operations
on
-bit words take
time.
How fast can we
sort an array of length ?
2Slide3
Comparison based algorithms
Time bounds dependent on
- van
Emde
Boas trees
Time bounds independent of
- [
Fredman
-Willard (1993)]
Some of these algorithms are
randomized
and some use
multiplications
- [
Andersson
et al. (1998)]
- [Han-
Thorup
(2002)]
3Slide4
Fundamental open problem
Can we sort in
time,
for any
???
4Slide5
Variants of Sorting
info
bits
key
bits
bits
Each item in the
array to be sorted:
Sort an array of
-bit words according to
-bit keys.
First choice:
(Stably) sort the array or
return a permutation that
(stably) sorts the array.
Second choice:
info
bits important or
info
bits
may be destroyed.
Using simple
-time and
-space deterministic reductions.
5Slide6
Variants of Sorting
Sort an array of
-bit words according to
-bit keys.
Using simple
-time and
-space deterministic reductions.
By adding a
bit index to each key.
What if
?
Use radix-sort or “
double-precision
”.
Using
hashing
.
Thus, all variants are essentially equivalent.
We shall use
to refer to all of them.
6Slide7
(Adapted from
Cormen, Leiserson
, Rivest and Stein, Introduction to Algorithms, Third Edition, 2009, p. 195)
7Slide8
2
8
7
1
459165
7
2
1
3
012
47
235
5
5
7
0
2
2
8
3
9
4
4
844
3
536
2871459
1
130165
7
2
2
4
7
2
7
0
2
2
8
3
9
4
4
8
4
4
3
5
5
5
3
5
3
6
Backward/LSD
Radix sort
Stably sort according to “digits”.
Starting from least significant digit.
Slides from
undergrad course
To sort according
to a “digit” use
bucket or count sort.
8Slide9
2
8
7
1
459113
0
1
6
5
722
47
270
2
2
8
3
9
4
4
8
4
4
3
555
3
536Backward/LSD
Radix sortStably sort according to “digits”.Starting from least significant digit.
9Slide10
2
8
7
1
459113
0
1
6
5
722
4
727
0
2
2
8
3
9
4
4
8
4
4
3555
3
536
130170
2
2353648
4
4
3
5
5
5
2
8
7
1
6
5
7
2
2
4
7
2
4
5
9
1
8
3
9
4
Backward/LSD
Radix sort
Stably sort according to “digits”.
Starting from least significant digit.
After the
-
th
pass, numbers are sorted according to the
least
significant
digits
10Slide11
1
3
0
1
70223
5
3
6
4
8443
5
552
8
7
1
6
5
7
2
2
4
7
2
459
1
8394
Backward/LSD Radix sortStably sort according to “digits”.
Starting from least significant digit.
11Slide12
1
3
0
1
70223
5
3
6
4
844
3
555
2
8
7
1
6
5
7
2
2
4
7
2459
1
839
470221
3
018394
2
4
7
2
3
5
3
6
3
5
5
5
6
5
7
2
4
5
9
1
4
8
4
4
2
8
7
1
Backward/LSD
Radix sort
Stably sort according to “digits”.
Starting from least significant digit.
12Slide13
7
0
2
2
13018
3
9
4
2
4723
5
363
5
5
5
6
5
7
2
4
5
9
1
484
4
287
1Backward/LSD Radix sort
Stably sort according to “digits”.Starting from least significant digit.
13Slide14
7
0
2
2
13018
3
9
4
2
472
3
536
3
5
5
5
6
5
7
2
4
5
9
1484
4
287
113012
4
722871
3
5
3
6
3
5
5
5
4
5
9
1
4
8
4
4
6
5
7
2
7
0
2
2
8
3
9
4
Backward/LSD
Radix sort
Stably sort according to “digits”.
Starting from least significant digit.
14Slide15
Backward/LSD
Radix Sort
in the word-RAM model
Sort according to
least significant bits.
Stably
sort according to remaining
bits.
Least significant
bits can be sorted
in
time using bucket sort
or
count sort.
Total running time is
Running time is
if
Can we do better?
15Slide16
Two techniques
Reduce
to
using only
extra work.
Range reduction
Solve
by packing
keys
in each word. In
time we can then
perform simple operations on
keys.
Packed sorting
(Word-level parallelism)
16Slide17
Four results
We will cover the following results:
A randomized
sorting algorithm
[
Andersson
-
Hagerup
-Nilsson-Raman (1998)]
A randomized
sorting algorithm
[Kirkpatrick-Reisch (1984)]
Sorting in
expected time if
[
Andersson
-
Hagerup
-Nilsson-Raman (1998
)]
Sorting
strings with characters in
time
[Andersson-Nilsson (1994)] 17Slide18
info
bits
key
bits
bits
Reminder:
Bucket Sort
Initialize an array
of lists/buckets of length
.
Each list is initially empty
.
Sequentially scan the items in the input array
.
Append
to
.
Sequentially scan the lists and copy the items back to
.
Time and space:
Each item in the array to be sorted:Time and space for initializing
and scanning the
buckets
18Slide19
info
bits
key
bits
bits
Bucket Sort
using
hashing
Initialize
to be an empty
hash table
.
Initialize
to be an empty list.
Sequentially scan the items in the input array
.
If
is not in
, append
to
and initialize to be an empty list.Append
to .
Scan the non-empty buckets and copy the items back to
.
Each item in the array :
Sort
to get a sorted list of non-empty buckets.
Space:
Time:
-
number of
non-empty
buckets
19Slide20
info
bits
key
bits
bits
Range reduction
[Kirkpatrick-
Reisch
(1984)]
info
bits
low
bits
high
Split each
-bit key into two
-bit parts.
Sort
(recursively) according to the
part.
Sort
(recursively) according to the
part.
Cleverly combine the two sorting steps into one.
20Slide21
Range reduction
[Kirkpatrick-
Reisch
(1984)]First attempt: first Split each
-bit key into two
-bit parts,
and
.
Sort (recursively) according
to the part.
Use
hashed
bucket sort to stably sort according to
.
Involves another recursive call
, where
is the number of
distinct
parts =
non-empty buckets.
In the worst-case, if all parts are different, i.e.,
.
But if all, or most,
parts are different,
we do not really need to sort according to
.
(Same as standard radix sort.)
21Slide22
Second attempt:
first
Split each
-bit key into two -bit parts,
and
.
Maintain a list of non-empty buckets.Let be the number of non-empty buckets.Let
be the number of items in the -
th non-empty bucket. Throw the items into (
hashed) buckets indexed by
.
Sort
the list of non-empty buckets.
Sort
the items in each non-empty bucket according to
.
?
Range reduction
[Kirkpatrick-
Reisch
(1984)]
22Slide23
Third (and final) attempt:
and
together
Split each -bit key into two -bit parts,
and
.
Add all non-minimal items to a list .
Maintain (
hashed) buckets indexed by .
In a bucket, keep only the item with the appropriate
value and the
smallest
part seen so far.
When the first item with a given
value is encountered,
add to the list
.
Sort the list according to
. From the sorted extract a sorted list of non-empty buckets.(When
is encountered, check that bucket is non-empty, and add to the list, if it was not added before.)
Throw all the other items of
to the appropriate buckets.
Concatenate the lists in the non-empty buckets.
Range reduction
[Kirkpatrick-
Reisch
(1984)]
23Slide24
Third (and final) attempt:
and
together (cont.)
The algorithm sorts correctly.The length of the list is
Same complexity may be obtained using
van
Emde
Boas
trees.
We can then pack
keys is each word!
In
time, we can reduce
to
.
It can work with
fields (see pseudo-code).
Thus, sorting
-bit keys reduces to sorting
-bit keys.
Range reduction
[Kirkpatrick-
Reisch
(1984)]
24Slide25
// Throw into buckets and
//
get a sorted list of non-empty buckets
array of length
// Non-empty buckets
// Number of non-empty buckets
for to :
if
and (
or
and
:
// Initialize a list
else:
// Concatenate the non-empty buckets
for
to
:
for
to
// Generate the list
for
to
:
if
:
else:
if
:
else:
// Sort
according to
// Hash table
array of length
25Slide26
// Throw into buckets and
//
get a sorted list of non-empty buckets
//
List of non-empty buckets
for
to :
if
and and
and (
or
:
// Turn into a list
else:
// Concatenate the non-empty buckets for
to
:
for
to
// Generate the list
for
to
:
if
:
else:
if
:
else:
// Sort
according to
// Hash table
//
List
(Code fixed)
26Slide27
Packed representation
0
0
0
0
0
bits
bits
bits
bits
We can easily take two such words and
produce a word that contains
-bit
keys.
test
bits
27Slide28
Packed representation
Useful constants:
1
00…0
1
00…0
1
1
1
0
0
0
0
0
Exercise:
How quickly can these constants be computed?
bits
bits
bits
28Slide29
Packed representation
Useful operation:
1
00…0
1
00…0
1
0
1
0
11…1
0
11…1
0
0
00…0
0
11…1
bits
29Slide30
Packed Sorting
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]Partition the items into groups of size
.
–
Sort
-bit keys on a machine with -bit words.
Time required for this preliminary step is
.
This is
,
if
.
Sort each
group
naïvely.
(For the time being.)
Pack each group into a single word.
30Slide31
(Packed) Merge Sort
31Slide32
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]Merge packed sorted sequences of length to sorted sequences of length , and then
of length
, etc., until a single sorted
sequence of length
is obtained.
As a basic operation, use the merging of two sorted sequences of length .
We shall implement this basic operation in
time,
by simulating a
bitonic sorting network
.
32Slide33
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]Standard merge sort take
time.
For
, the running time is
.
We save a factor of
, as we merge
items in
rather than
time.
Thus, the running time is
.
33Slide34
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Merge the smallest
items from both sequences.
The smallest
items go to the output sequence.
34Slide35
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Merge the smallest
items from both sequences.
The smallest
items go to the output sequence.
35Slide36
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Merge the smallest
items from both sequences.
The smallest
items go to the output sequence.
36Slide37
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Merge the smallest
items from both sequences.
The smallest
items go to the output sequence.
37Slide38
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Small technical detail: We need to know how many
of the
smallest items came from each sequence.
38Slide39
Packed Merge Sort
[Paul-Simon (1980
)] [Albers-
Hagerup
(1997)]
Simple solution: Add a bit to each key, telling
where it is coming form. Count number of keys
coming from each sequence. (
H
ow do we count?)
39Slide40
Batcher’s bitonic sort
To merge two packed sorted sequences
of
keys each, we use Batcher’s bitonic sort.
We need to
reverse
one of the sequences and
concatenate
it to the other sequence.
Suppose that is a power of 2. Bitonic sort is composed of
iterations.
In iteration
, we need to compare/swap items whose indices differ only in their
-
th
bit.
40Slide41
One step of bitonic sort (1)
0
0
0
0
0
0
0
0
Compare/swap items whose indices differ in the
-th
bit
0
0
0
0000
0
0000
0
0
0
0000
0
0000
Extract items whose indices have a 1 in their
-
th
bit,
and items whose indices have a 0 in their
-
th
bit
(In the example
.)
0
0000
0
0000
0
0
0
0000
0
0000
0
0
41Slide42
One step of bitonic sort (2)
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0000
0
0000
0
0
0
0000
0
0000
0
0
Shift the first word
fields to the right,
and set their test bits to
1
42Slide43
One step of bitonic sort (3)
1
1
1
0000
1
0000
1
1
0
0000
0
0000
0
0
0
0000
0
0000
0
0
Shift the first word
fields to the right,
and set their test bits to
1
Subtract
0
0000
0
0000
1
0
0
0000
0
0000
0
1
43Slide44
One step of bitonic sort (4)
1
1
1
0000
1
0000
1
1
0
0000
0
0000
0
0
0
0000
0
0000
0
0
Subtract
0
0000
0
0000
1
0
0
0000
0
0000
0
1
Collect winners and losers
0
0000
0
0000
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
44Slide45
One step of bitonic sort (5)
Shift the winners
fields to the left
0
0000
0
0000
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
45Slide46
One step of bitonic sort (6)
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
0
0000
0
0000
0
0
Combine them together again:
0
0
0
0
0
0
0
0
The
-
th
step is over!
Shift the winners
fields to the left
46Slide47
Packed Bitonic Sort
[Albers-
Hagerup
(1997)]
for
downto
0:
47Slide48
Packed Bitonic Sort
[Albers-
Hagerup
(1997)]
for
downto
0:
48Slide49
Reversing the fields in a word
For
, in any order,
swap fields with a
0
in the
-
th
bit of their index
with fields positions to the left.
Similar to the implementation of bitonic
sort.Exercise: Show that this indeed reverses the fields.We already know how to do it.
49Slide50
Packed Merge Sort
We began by splitting the
input numbers into groups of size
, naively sorting them, and then packing them into words.
This is good enough for obtaining an
-time algorithm,
but the naïve sorting is clearly not optimal.
Exercise:
Show that
integers, each of
bits, can be sorted in
time.
50Slide51
Integer Sorting in
time
[
Andersson
-Hagerup-Nilsson-Raman (1998)]
Putting everything together, we get a
randomized
-time sorting algorithm for any
.
How much space are we using?
Are we using multiplications
?
If the
recursion stack
in managed carefully,
the algorithm uses only
space.
Yes! In the
hashing
.
Non- operations are required to get search time.
51Slide52
Sorting strings/multi-precision integers
We have
strings of
arbitrary
length. Each character is a
-bit word.
We want to sort them
lexicographically
. Let be the number of characters that must be examined to determine the order of the strings.
The problem can be reduced in
time
to the problem of sorting
characters!
We get an
-time algorithm.
52Slide53
Sorting strings/multi-precision integers
D
A
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
53Slide54
Sorting strings/multi-precision integers
D
A
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
We move
pointers
to the strings,
not the strings themselves.
54Slide55
Sorting strings/multi-precision integers
D
A
A
A
C
A
D
A
A
A
C
A
W
Q
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
Necessary and sufficient to examine
the
distinguishing
characters.
D
A
D
A
D
A
D
A
D
A
D
A
G
D
A
D
A
D
A
A
55Slide56
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
After the
-
th
pass, the strings are
sorted according to the first
characters.
56Slide57
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
The strings are partitioned into groups.
We keep the
starting/
end
positions
of each group.
Groups are
active
or
inactive
.
1
4
5
10
3
9
57Slide58
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
The strings are partitioned into groups.
We keep the
starting/
end
positions
of each group.
Groups are
active
or
inactive
.
1
5
3
9
58Slide59
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
1
5
3
9
59Slide60
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
1
5
3
9
60Slide61
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
2
6
3
8
61Slide62
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
A
A
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
F
G
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
2
6
3
8
62Slide63
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
AA
C
AD
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
G
FG
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
6
2
3
8
63Slide64
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
AA
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
GFG
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
7
8
64Slide65
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
AA
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
D
A
GFG
Q
P
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
7
8
65Slide66
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
AA
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
DAG
F
GQP
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
7
8
66Slide67
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]DA
D
A
D
A
AA
C
A
D
A
D
A
D
A
D
A
A
A
C
A
W
Q
DAG
F
GQP
J
A
C
A
A
B
D
L
M
C
X
D
A
D
A
D
A
A
67Slide68
Forward Radix Sort
[
Andersson
-Nilsson
(1994)]The -th pass:
Sequentially scan the items in the active groups.
Append item
into bucket number
.
Scan the non-empty buckets, in increasing order.
“Empty” each active group.
Append each item to its group.How do we find the non-empty buckets?
(The buckets are shared by all groups.)
(Each item remembers the group it belongs to.)
68Slide69
Forward Radix Sort
(
Slight deviation from
[
Andersson-Nilsson (1994)])The -th pass:
Consider each active group separately.
Use
hashing
to determine the different characters appearing in the -th position. If there are
different characters,
then the number of groups increases by .
Sort the
non-minimal characters.
Total size of all sorting problems,
in
all
passes, is at most
.
69Slide70
Total size of all sorting problems, in all passes, is at most
.
Having a collection of smaller problems is almost always better.
We promised one sorting of size at most .
But, in some cases, e.g., if we want to use naïve
bucket sort
, with a large
initialization
cost, having one
large problem is better, and in a sense “cleaner”.
Forward Radix Sort
(
Slight deviation from
[
Andersson
-Nilsson
(
1994)]
)
Actually, removing the minimal character is not really faster as usually
.
70Slide71
Obtaining one sorting problem of size
.
Forward Radix Sort
[Andersson-Nilsson (1994)]Perform two
phases
.
In the first phase, split into sub-groups,
but keep the sub-groups in arbitrary
order.Weaker invariant: After the -th pass, all items inan active group have the same first characters. Items in different groups do
not have the same first
characters. If
is a non-minimal character appearing in the
-
th
position in some group, add
to a list.
71Slide72
Obtaining one sorting problem of size
.
Forward Radix Sort
[Andersson-Nilsson (1994)]If
is a non-minimal character appearing in the
-
th
position in some group, in the
-th pass,then add
to a list.
The total length of the list is at most .
After sorting the list, in the second phase,
we can run the original algorith
m
.
Slight problem:
cannot be bounded in terms of
.
72Slide73
Obtaining one sorting problem of size
.
Forward Radix Sort
[Andersson-Nilsson (1994)]If
is a non-minimal character appearing in the
-
th
position in some group, in the
-th pass,then add
to a list.
Slight problem:
cannot be bounded in terms of
.
Simple solution:
Replace
by
,
where
is the number of passes, at or before
the -th
pass, in which at least one group splits.
Now
and can be encoded using bits. 73Slide74
Range reduction revisited
We can view each
-bit word as a
2
-character string, composed of -bit characters
Using
forward radix sort
of Andersson
and Nilsson, we get an alternative to the range reduction step of Kirkpatrick and Reisch.
74Slide75
Signature Sort
[
Andersson
-
Hagerup-Nilsson-Raman (1998)]Sorting in expected time if
.
Split each
-bit key into
parts/characters, such that
-
bit keys can be sorted in linear time.
We can choose
.
Use a
hash function
to assign each
-
bit
character a
unique
-bit
signature. Form
shortened keys by concatenating the
signatures of the parts, and sort them in linear time.Construct a compressed trie of the shortened keys.Sort the edges of the trie, possibly using recursion.
The keys now appear in the trie
in sorted order.
75Slide76
Compressed tries
D
A
A
A
C
A
D
A
A
A
C
A
W
Q
F
G
Q
P
J
A
C
A
A
B
D
L
M
CX
D
A
D
A
D
A
D
A
D
A
D
A
G
D
A
D
A
D
A
A
AA
C
DA
FG…
BD…
CA
DA
G
DA
WQ
DA
A
Also known as PATRICIA tries
[Morrison (1968)]
"
Practical Algorithm To Retrieve Information Coded In Alphanumeric
".
Exercise:
Show that a compressed trie of a sorted collection of strings can be constructed in
time.
Number of nodes is at most
.
(Only the root may be unary.)
A node is
yellow
of it corresponds to an input string.
In our case, all strings would have the same length, so no string would be a prefix of another.
76Slide77
Signature sort example
AA
C…
DA
FG…
BD
C
D
G
BP
A
C
A
C
B
d
a
a
a
c
a
a
a
c
c
c
a
a
b
d
d
a
d
a
d
a
d
b
z
o
p
b
p
f
g
q
p
d
a
g
d
a
d
c
p
D
A
A
A
C
A
A
A
C
C
C
A
A
B
D
D
A
D
A
D
A
D
B
Z
O
P
B
P
F
G
Q
P
D
A
G
D
A
D
C
P
77Slide78
Signature sort example
D
A
A
A
C
A
A
A
C
C
C
A
A
B
D
D
A
D
A
D
A
D
B
Z
O
PB
P
FGQP
D
A
G
D
A
D
C
P
a
a
c
c
a
a
c
a
a
a
b
d
c
z
o
p
f
g
q
p
d
a
d
a
d
b
d
a
d
c
b
p
d
a
g
d
a
d
a
p
c…
fg
…
aa
bd
c
c
a
da
d
g
bp
b
a
c
b < d < c < a < f < g
AA
C…
DA
FG…
BD
C
D
G
BP
A
C
A
C
B
78Slide79
Signature Sort
[
Andersson
-
Hagerup-Nilsson-Raman (1998)]Sorting in expected time if
.
Q: How do we find
unique signatures
?
Q: How do we sort the shortened keys?
Q: How do we construct the trie of the shortened
keys?
(Note that
is not fast enough!)
Q: How do we reorder the
trie
of the
shortened
keys
to obtain the trie of the original keys?A: Sort the original first character on each edge. If characters are not short enough use recursion.A: Using packed sorting in
time. 79Slide80
Signature Sort
[Andersson-
Hagerup
-Nilsson-Raman (1998)]
The trie has up to
edges.
Why is this
and not
?
As we are only going to repeat it a
constant
number of times, it does not really matter.
But we can get down to
.
Use the trick of finding the minimum edge
and not including it in the sort.
Exercise:
Number of chars to sort is at most
.
80Slide81
Signature Sort
[Andersson-
Hagerup
-Nilsson-Raman (1998)]
As
we have:
(This is
if
.)
If we iterate
times we get:
81Slide82
Signature Sort
[Andersson-
Hagerup
-Nilsson-Raman (1998)]
If we iterate
times we get:
If
, then
.
If
is a large enough constant, e.g.,
,
then
.
Thus, for
, we can sort in
time.
82Slide83
Constructing a compressed trie in
time
Add the sorted strings to the trie one by one.
The left-most path corresponds to
.
Find the
longest common prefix
of
and
.
Suppose that we are about to insert
.
As
are packed, we can do it in
time.
We may need to add an internal node,
unless the common prefix ends at node.
How do we find the parent
of the new internal node?
83Slide84
Constructing a compressed trie in
time
How do we find the parent
of the new internal node?
We can (probably) use bit tricks
to do it in
time.
We can also slowly climb up from last leaf.
Each node we pass
exits
the left-most path.
Total number of operations is
.
Note:
Similar to the linear time
construction of
Cartesian trees
.
84Slide85
Computing unique signatures
[
Andersson
-
Hagerup-Nilsson-Raman (1998)]We have at most
different characters.
The expected number of
collisions
is at most .
Let
be an (almost) universal family of hash functions from
to
For
, there are no collisions,
w.h.p
.
Which family of hash functions should we use?
How do we compute the signatures of
all
characters of a given word in
time?
Really?What do we do if not?85Slide86
Multiplicative hash functions
[
Dietzfelbinger-Hagerup-Katajainen-Penttonen (1997)]
Extremely fast in practice!
Form an “almost-universal” family
odd
Not necessary if
86