/
Integer Sorting Integer Sorting

Integer Sorting - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
397 views
Uploaded On 2017-09-04

Integer Sorting - PPT Presentation

on the wordRAM Uri Zwick Tel Aviv University May 2015 Last updated June 30 2015 Integer sorting Memory is composed of bit words   Arithmetical logical and shift operations on bit words take ID: 584960

0000 sort radix time sort 0000 time radix bit andersson nilsson sorting packed keys hagerup 1994 items sorted word bits merge bitonic

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Integer Sorting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Integer Sortingon the word-RAM

Uri Zwick

Tel Aviv University

May 2015

Last updated: June

30

, 2015Slide2

Integer sorting

Memory is composed of

-bit words.

 

Arithmetical, logical and shift operations

on

-bit words take

time.

 

How fast can we

sort an array of length ?

 Slide3

Comparison based algorithms

 

Time bounds dependent on

 

- van

Emde

Boas trees

 

Time bounds independent of

 

-

Fredman

-Willard (1993)

 

Some of these algorithms are

randomized

- Andersson et al. (1998)

 

- Han-

Thorup

(2002)

 Slide4

Fundamental open problem

Can we sort in

time,

for any

???

 Slide5

Sorting – three variants

Return a

permutation

that (stably) sorts

-bit keys, each stored in a separate word,

on a machine with

-bit words.

 

Sort

-bit keys,

each stored in a separate word,

on

a machine with

-bit words. 

–(Stably) sort -bit words according to -bit keys.

(The keys are the first/last

bits of each word.)

 (We will not be very precise in using these names…)Slide6

Sorting – three variants

Exercise:

Reduce

to

with only

extra work.

 

Exercise:

Reduce

to

with only

extra work.

(Hint: hashing.)

 

Exercise:

Reduce

to

with only

extra work. 

Exercise:

Reduce

to

with no extra work.

 Slide7

Two techniques

Reduce

to

using only

extra work.

 

Range reduction

Solve

by packing

keys

in each word. In

time we can perform

simple operations on

keys.

 

Packed sorting

(Word-level parallelism)Slide8

2

8

7

1

4

5

9

1

65721

3

012

4

7

2

3

5

5

5

7

02

2

83944844

3

53

628714

5

911301

6

5

7

2

2

4

7

2

7

0

2

2

8

3

9

4

4

8

4

4

3

5

5

5

3

5

3

6

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

Slides from

undergrad course

To sort according

to a “digit” use

bucket or count sort.Slide9

2

8

7

1

4

5

9

1

13016

5

722

4

7

2

7

0

2

2

8

39

4

48443555

3

53

6Backward/LSD Radix sort

Stably sort according to “digits”.

Starting from least significant digit.Slide10

2

8

7

1

4

5

9

1

13016

5

722

4

7

2

7

0

2

2

8

39

4

48443555

3

5

361301

7

022353

6

4

8

4

4

3

5

5

5

2

8

7

1

6

5

7

2

2

4

7

2

4

5

9

1

8

3

9

4

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

After the

-

th

pass, numbers are sorted according to the

least

significant

digits

 Slide11

1

3

0

1

7

0

2

2

35364

8

443

5

5

5

2

8

7

1

6

5

72

2472459

1

83

94Backward/LSD Radix sort

Stably sort according to “digits”.

Starting from least significant digit.Slide12

1

3

0

1

7

0

2

2

3536

4

844

3

5

5

5

2

8

7

1

65

7

22472459

1

83

94702

2

130183

9

4

2

4

7

2

3

5

3

6

3

5

5

5

6

5

7

2

4

5

9

1

4

8

4

4

2

8

7

1

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.Slide13

7

0

2

2

1

3

0

1

8394

2

472

3

5

3

6

3

5

5

5

65

7

24591484

4

28

71Backward/LSD Radix sort

Stably sort according to “digits”.

Starting from least significant digit.Slide14

7

0

2

2

1

3

0

1

8394

2

472

3

5

3

6

3

5

5

5

65

7

24591484

4

2

871130

1

247228

7

1

3

5

3

6

3

5

5

5

4

5

9

1

4

8

4

4

6

5

7

2

7

0

2

2

8

3

9

4

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.Slide15

(Adapted from

Cormen, Leiserson

, Rivest and Stein, Introduction to Algorithms, Third Edition, 2009, p. 195)Slide16

Backward/LSD

Radix Sort

in the word-RAM model

Sort according to

least significant

bits.

 

Stably sort according to remaining

bits.

 Least significant

bits can be sorted

in

time using

bucket

sort

or

count sort.

 Total running time is

 

Running time is if

 

We shall revisit Radix Sort later…Slide17

“Ultra” Radix Sort

[Kirkpatrick-

Reisch

(1984)]

First attempt:

Split each

-bit word into two

-bit parts.

 Sort according to the

part.

 Scan the array sequentially and append each item into a bucket indexed by

.

 

Use a

hash table

to maintain the non-empty buckets.

Sort

the indices of the non-empty buckets.

Go over the non-empty buckets, in sorted order,

and concatenate the sorted lists of the buckets.(Each bucket is sorted.)Slide18

The algorithm is correct.

But, to sort an array of

-bit words, we need to

sort two arrays of the same length with

-bit words.

 

We cannot do that more than a constant number of times.

To fix the problem we use a trick similar to the

one used in van Emde Boas trees.First attempt:

“Ultra” Radix Sort

[Kirkpatrick-

Reisch

(1984)]Slide19

Scan the items in arbitrary order and

throw them into buckets according to

.

 

Put all non-minimal items in a list

.

 

For each non-empty bucket, add

to .

 

In each bucket, indexed by keep only the item(s) with the smallest

value found so far.

 

(The length of the

is at most

.)

 

Working version:

“Ultra” Radix Sort

[Kirkpatrick-

Reisch

(1984)]( replaces the minimal item in bucket .) Slide20

Sort

according to

.

 

Append each remaining item

in

into bucket

 Extract from

a sorted list of the non-empty buckets.

(When we encounter the first

pair,

check whether bucket

is non-empty.)

 

“Ultra” Radix Sort

[Kirkpatrick-

Reisch (1984)]Use

hashing to maintain the buckets in both phases.

Working version (continued):Use the sorted list of non-empty buckets toconcatenate the buckets in the appropriate order.Slide21

– time to sort

items according to a

-bit key

 

 

 

Complexity

Matching

van

Emde

Boas

trees…

“Ultra” Radix Sort

[Kirkpatrick-

Reisch

(1984)]

We can then pack

keys is each word!

 

In

time, we can reduce

to

.

 Slide22

Packed representation

0

 

0

 

0

 

0

 

0

 

bits

 

bits

 

bits

 

bits

 

We can easily take two such words and

produce a word that contains

keys.

 

test

bitsSlide23

Packed representation

Useful constants:

1

00…0

1

00…0

1

 

1

 

1

 

 

0

2

 

0

 

0

 

0

 

0

 

 

Exercise:

How quickly can we

construct these constants?

bits

 Slide24

Packed representation

Useful operation:

 

1

00…0

1

00…0

1

 

0

 

1

 

0

11…1

0

11…1

0

 

0

00…0

0

11…1

 

bits

 Slide25

Packed Sorting

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

Partition the items into groups of size

.

 

Sort

-bit keys on a machine with

-bit words.

 

Time required for this preliminary step is

.

This is

,

if

.

 

Sort each

group naïvely.

Pack each group into a single word.Slide26

(Packed) Merge Sort

 

 

 

 Slide27

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

Merge packed sorted sequences of length

to sorted sequences of length

, and then of length , etc., until a single sorted sequence of length is obtained.

 

As a basic operation, use the merging of two sorted sequences of length

.

 

We shall implement this

basic operation

in

time,

by simulating a

bitonic sorting network. Slide28

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

Standard merge sort take

time.

 For

,

the running time is

.

 

We save a factor of

.

 

Thus, the running time is

.

 Slide29

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 Slide30

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 Slide31

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 Slide32

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Small technical detail: We need to know how many

of the

smallest items came from each sequence.

 Slide33

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Simple solution: Add a bit to each key, telling

where it is coming form. Count number of keys coming

from each sequence. (But, how do we count?)Slide34

Batcher’s bitonic sort

To merge two packed sorted sequence

of

keys each, we use Batcher’s bitonic sort.

 

We need to

reverse

one of the sequences and

concatenate it to the other sequence.Suppose that

is a power of

2. Bitonic sort is composed of

iterations.

 

In iteration

, we need to compare/swap items whose indices differ only in their

-

th

bit.

 Slide35

One step of bitonic sort (1)

0

 

0

 

0

 

0

 

0

 

0

 

0

 

0

 

Compare/swap items that whose indices differ in the

-th

bit

 

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

Extract items whose indices have a 1 in their

-

th

bit,

and items whose indices have a 0 in their

-

th

bit

 

(In the example

.)

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 Slide36

One step of bitonic sort (2)

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Shift the first word

fields to the right,

and set their test bits to

1

 Slide37

One step of bitonic sort (3)

1

 

1

 

1

0000

1

0000

1

 

1

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Shift the first work

fields to the right,

and set their test bits to

1

 

Subtract

0

0000

0

0000

1

 

0

 

0

0000

0

0000

0

 

1

 

 

 

 

 Slide38

One step of bitonic sort (4)

1

 

1

 

1

0000

1

0000

1

 

1

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Subtract

0

0000

0

0000

1

 

0

 

0

0000

0

0000

0

 

1

 

 

 

 

 

Collect winners and losers

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 Slide39

One step of bitonic sort (5)

Shift the winners

fields to the left

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 Slide40

One step of bitonic sort (6)

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Combine them together again:

0

 

0

 

0

 

0

 

0

 

0

 

0

 

0

 

The

-

th

step is over!

 

Shift the winners

fields to the left

 Slide41

Packed Bitonic Sort

[Albers-

Hagerup

(1997)]

for

downto

1:

 Slide42

Packed Bitonic Sort

[Albers-

Hagerup

(1997)]

for

downto

1:

 Slide43

Reversing the fields in a word

For

, in any order,

swap fields with a

0

in the

-

th

bit of their index

with fields

positions to the left. 

Similar to the implementation of

bitonic

sort.

Exercise:

Show that this indeed reverses the fields.

We already know how to do it.Slide44

Packed Merge Sort

We began by splitting the

input numbers into groups of size

, naively sorting them, and then packing them into words.

 

This is good enough for obtaining an

-time algorithm,

but the naïve sorting is clearly not optimal.

 

Exercise:

Show that

integers, each of

bits, can be sorted in

time.

 Slide45

Integer Sorting in

time

[

Andersson

-

Hagerup

-Nilsson-Raman (1998)]

 

Putting everything together, we get a randomized

-time sorting algorithm for any

.

 

How much space are we using?

Are we using

multiplications

?

If the

recursion stack

in managed carefully,

the algorithm uses only space.

 

Yes! In the hashing.Non- operations are required to get search time

 Slide46

Sorting strings/multi-precision integers

We have

strings of arbitrary length.

 

Each character is a

-bit word.

 

We want to sort them

lexicographically.

Let be the number of characters that

must be examined to determine the order of the strings. The problem can be reduced in

time

to the problem of sorting

characters!

 

We get an

-time algorithm.

 Slide47

Sorting strings/multi-precision integers

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

ASlide48

Sorting strings/multi-precision integers

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

We move

pointers

to the strings,

not the strings themselves.Slide49

Sorting strings/multi-precision integers

D

A

A

A

C

A

D

A

A

A

C

A

W

Q

F

G

Q

P

J

A

C

A

A

B

D

LM

C

X

Necessary and sufficient to examine

the

distinguishing

characters.

 

D

A

D

A

D

A

D

A

D

A

D

A

G

D

A

D

A

D

A

ASlide50

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

CAW

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

After the

-

th

pass, the strings are

sorted according to the first

characters.

 Slide51

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

CA

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

The strings are partitioned into groups.

We keep the

starting/end positions

of each group.

Groups are

active

or

inactive

.

1

4

5

10Slide52

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

CA

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

The strings are partitioned into groups.

We keep the

starting position

of each group.

Groups are

active

or

inactive

.

1

5Slide53

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

CA

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

1

5Slide54

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

AW

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

1

6Slide55

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

AW

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

2

6Slide56

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

AW

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

2

6Slide57

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DAD

A

AA

C

A

D

A

D

A

D

A

D

A

A

A

C

AWQ

D

AGF

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

6

2Slide58

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

AA

A

C

A

D

A

D

A

D

A

D

A

A

A

CAW

Q

DAGF

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7Slide59

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

AA

A

C

A

D

A

D

A

D

A

D

A

A

A

CAW

Q

DAGF

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7Slide60

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

AA

A

C

A

D

A

D

A

D

A

D

AAA

C

A

WQD

A

GFG

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7Slide61

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

D

A

DA

D

AA

A

C

A

D

A

D

A

D

A

D

AAA

C

A

WQD

A

GFG

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

ASlide62

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]

The

-th

pass: Sequentially scan the items in the active groups. Append item into bucket no.

.

 Scan the non-empty buckets, in increasing order.“Empty” each active group.

Append each item to its group.

How do we find the

non-empty

buckets?

(The buckets are shared by all groups.)

(Each item remembers the group it belongs to.)Slide63

Forward Radix Sort

(

Slight deviation from

[

Andersson

-Nilsson

(1994)])

The

-th pass: Consider each active group separately.Use hashing to determine the different

characters appearing in the

-th position.  

If there are

different characters,

then the number of groups increases by

.

 

Sort the

non-minimal characters.

 

Total size of all sorting problems is at most !

 Slide64

Total size of all sorting problems,

in all passes, is at most

.

 

Having a collection of smaller problems

is in many cases better.

We promised one sorting of size at most

.

 

 

But, in some cases, e.g., if we want to use naïve

bucket sort

, having one large problem is better.

Forward Radix Sort

(

Slight deviation from

[

Andersson

-Nilsson

(

1994)])Slide65

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]Perform two phases.In the first phase, split into sub-groups,but keep the sub-groups in arbitrary order.Weaker invariant:

After the

-th pass, all items inan active group have the same first characters.

 

If

is a non-minimal character appearing in the

-

th

position in some group, add

to a list.

 Slide66

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]If is a non-minimal character appearing in the-th position in some group, in the

-

th pass,then add

to a list.

 

The total length of the list is at most

.

 

After sorting the list, in the second phase,

we can run the original algorith

m.Slight problem: cannot be bounded in terms of

.

 Slide67

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[

Andersson

-Nilsson

(

1994)]If is a non-minimal character appearing in the-th position in some group, in the

-

th pass,then add

to a list.

 

Slight problem:

cannot be bounded in terms of

.

 

Simple solution: Replace

by

, where is the number of passes, at or before the -th pass, in which at least one group splits.

 

Now

and can be encoded using bits. Slide68

Range reduction revisited

We can view each

-bit word as a

2

-character string, composed of

-bit characters

 

Using

forward radix sort of Andersson and Nilsson

, we get an alternative to the range reduction step of Kirkpatrick and

Reisch.Slide69

Signature Sort

[

Andersson

-

Hagerup

-Nilsson-Raman (1998)]

Sorting in

expected time if

.

 

Split each -bit key into

parts/characters, such that

-

bit keys can be sorted in linear time.

 

We can choose

.

 

Use a

hash function

to assign each

-

bit

character a

unique -bit signature.

 

Form shortened keys by concatenating the signatures of the parts, and sort them in linear time.Construct a compressed trie of the shortened keys.Sort the edges of the

trie

, possibly using recursion.

The keys now appear in the

trie

in sorted order.Slide70

Compressed tries

D

A

A

A

C

A

D

A

A

A

C

A

W

Q

F

G

Q

P

J

A

C

A

A

B

D

LMC

X

D

A

D

A

D

A

D

A

D

A

D

A

G

D

A

D

A

D

A

A

AA

C

DA

FG…

BD…

CA

DA

G

DA

WQ

DA

A

Also known as PATRICIA tries

[Morrison (1968)]

"

Practical Algorithm To Retrieve Information Coded In Alphanumeric

".

Exercise:

Show that a

compessed

trie

of a sorted collection of strings can be constructed in

time.

 

Number of nodes is at most

.

(Only the root may be unary.)

 Slide71

Signature sort example

D

A

A

A

C

A

A

A

C

C

C

A

A

B

D

D

A

D

A

D

A

D

B

ZOP

B

PFG

Q

P

D

A

G

D

A

D

C

P

AA

C…

DA

FG…

BD

C

D

G

BP

A

C

A

C

B

d

a

a

a

c

a

a

a

c

c

c

a

a

b

d

d

a

d

a

d

a

d

b

z

o

p

b

p

f

g

q

p

d

a

g

d

a

d

c

pSlide72

Signature sort example

D

A

A

A

C

A

A

A

C

C

C

A

A

B

D

D

A

D

A

D

A

D

B

ZOP

B

PFG

Q

P

D

A

G

D

A

D

C

P

a

a

c

c

a

a

c

a

a

a

b

d

c

z

o

p

f

g

q

p

d

a

d

a

d

b

d

a

d

c

b

p

d

a

g

d

a

d

a

p

c…

fg

aa

bd

c

c

a

da

d

g

bd

b

a

c

b < d < c < a < f < g

AA

C…

DA

FG…

BD

C

D

G

BP

A

C

A

C

BSlide73

Signature Sort

[

Andersson

-

Hagerup

-Nilsson-Raman (1998)]

Sorting in

expected time if

.

 

Q: How do we find unique signatures?

Q: How do we sort the

shortened

keys?

Q: How do we construct the

trie

of the shortened

keys? Note that

is not fast enough! 

Q: How do we reorder the trie

of the shortened keys to obtain the trie of the original keys?A: Sort the original first character on each edge. If characters are not short enough use recursion.A: Using packed sorting in

time.

 Slide74

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

 

The

trie

has up to

edges.

Why is this

and not

?

 

As we are only going to repeat it a

constant

number of times, it does not really matter.

But we can get down to

.

 

Use the trick of finding the minimum edge

separately and not including it in the sort.Exercise:

Number of chars to sort is exactly

. Slide75

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

 

As

we have:

 

 

 

 

(This is

if

.)

 

If we iterate

times we get:

 Slide76

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

 

If we iterate

times we get:

 

If

, then

.

 

If

is a large enough constant, e.g.,

,

then

.

 

Thus, for

, we can sort in

time.

 Slide77

Constructing a compressed

trie

in

time

 

Add the strings to the

trie

one by one.

The left-most path corresponds to

.

 

Find the

longest common prefix

of

and

.

 

Suppose that we are about to insert

.

 

As

are packed, we can do it in

time.

 

We may need to add an internal node,

unless the common prefix ends at node.

How do we find the parent

of the new internal node?Slide78

Constructing a compressed

trie

in

time

 

How do we find the parent

of the new internal node?

We can (probably) use bit tricks

to do it in

time.

 

We can also slowly climb up from last leaf.

Each node we pass

exits

the left-most path.

Total number of operations is

.

 

Note:

Similar to the linear time

construction of

Cartesian trees

.Slide79

Computing unique signatures

[Andersson-Hagerup-Nilsson-Raman (1998)]

We have at most

different characters.

 

The expected number of

collisions

is at most

.

 

Let

be an (almost) universal family of hash functions from

to

 

For

, there are no collisions,

w.h.p

.

 

Which family of hash functions should we use?How do we compute the signatures of

all

characters of a given word in

time?  Slide80

Multiplicative hash functions [

Dietzfelbinger-Hagerup-Katajainen-Penttonen

(1997)]Extremely fast in practice!

Form an

“almost-universal”

family

 

 

odd

 

 

 

 

Not necessary if