/
Integer Sorting Integer Sorting

Integer Sorting - PowerPoint Presentation

test
test . @test
Follow
399 views
Uploaded On 2017-06-19

Integer Sorting - PPT Presentation

on the wordRAM Uri Zwick Tel Aviv University Started May 2015 Last update December 21 2016 1 Integer sorting Memory is composed of bit words where   Arithmetical logical and shift operations ID: 561205

0000 sort bit time sort 0000 time bit sorting bits andersson nilsson radix items empty buckets list packed hagerup

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Integer Sorting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Integer Sortingon the word-RAM

Uri Zwick

Tel Aviv University

Started:

May 2015Last update: December 21, 2016

1Slide2

Integer sorting

Memory is composed of

-bit words,

where

 

Arithmetical, logical and shift operations

on

-bit words take

time.

 

How fast can we

sort an array of length ?

 

2Slide3

Comparison based algorithms

 

Time bounds dependent on

 

- van

Emde

Boas trees

 

Time bounds independent of

 

- [

Fredman

-Willard (1993)]

 

Some of these algorithms are

randomized

and some use

multiplications

- [

Andersson

et al. (1998)]

 

- [Han-

Thorup

(2002)]

 

3Slide4

Fundamental open problem

Can we sort in

time,

for any

???

 

4Slide5

Variants of Sorting

info

bits

 

key

bits

 

bits

 

Each item in the

array to be sorted:

Sort an array of

-bit words according to

-bit keys.

 

First choice:

(Stably) sort the array or

return a permutation that

(stably) sorts the array.

Second choice:

info

bits important or

info

bits

may be destroyed.

 

Using simple

-time and

-space deterministic reductions.

 

5Slide6

Variants of Sorting

Sort an array of

-bit words according to

-bit keys.

 Using simple

-time and

-space deterministic reductions.

 

 

 

By adding a

bit index to each key.

 

What if

?

 

Use radix-sort or “

double-precision

”.

 

Using

hashing

.

Thus, all variants are essentially equivalent.

We shall use

to refer to all of them.

 

6Slide7

(Adapted from

Cormen, Leiserson

, Rivest and Stein, Introduction to Algorithms, Third Edition, 2009, p. 195)

7Slide8

2

8

7

1

459165

7

2

1

3

012

47

235

5

5

7

0

2

2

8

3

9

4

4

844

3

536

2871459

1

130165

7

2

2

4

7

2

7

0

2

2

8

3

9

4

4

8

4

4

3

5

5

5

3

5

3

6

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

Slides from

undergrad course

To sort according

to a “digit” use

bucket or count sort.

8Slide9

2

8

7

1

459113

0

1

6

5

722

47

270

2

2

8

3

9

4

4

8

4

4

3

555

3

536Backward/LSD

Radix sortStably sort according to “digits”.Starting from least significant digit.

9Slide10

2

8

7

1

459113

0

1

6

5

722

4

727

0

2

2

8

3

9

4

4

8

4

4

3555

3

536

130170

2

2353648

4

4

3

5

5

5

2

8

7

1

6

5

7

2

2

4

7

2

4

5

9

1

8

3

9

4

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

After the

-

th

pass, numbers are sorted according to the

least

significant

digits

 

10Slide11

1

3

0

1

70223

5

3

6

4

8443

5

552

8

7

1

6

5

7

2

2

4

7

2

459

1

8394

Backward/LSD Radix sortStably sort according to “digits”.

Starting from least significant digit.

11Slide12

1

3

0

1

70223

5

3

6

4

844

3

555

2

8

7

1

6

5

7

2

2

4

7

2459

1

839

470221

3

018394

2

4

7

2

3

5

3

6

3

5

5

5

6

5

7

2

4

5

9

1

4

8

4

4

2

8

7

1

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

12Slide13

7

0

2

2

13018

3

9

4

2

4723

5

363

5

5

5

6

5

7

2

4

5

9

1

484

4

287

1Backward/LSD Radix sort

Stably sort according to “digits”.Starting from least significant digit.

13Slide14

7

0

2

2

13018

3

9

4

2

472

3

536

3

5

5

5

6

5

7

2

4

5

9

1484

4

287

113012

4

722871

3

5

3

6

3

5

5

5

4

5

9

1

4

8

4

4

6

5

7

2

7

0

2

2

8

3

9

4

Backward/LSD

Radix sort

Stably sort according to “digits”.

Starting from least significant digit.

14Slide15

Backward/LSD

Radix Sort

in the word-RAM model

Sort according to

least significant bits.

 

Stably

sort according to remaining

bits.

 Least significant

bits can be sorted

in

time using bucket sort

or

count sort.

 

Total running time is

 

Running time is

if

 

Can we do better?

15Slide16

Two techniques

Reduce

to

using only

extra work.

 

Range reduction

Solve

by packing

keys

in each word. In

time we can then

perform simple operations on

keys.

 

Packed sorting

(Word-level parallelism)

16Slide17

Four results

We will cover the following results:

A randomized

sorting algorithm

[

Andersson

-

Hagerup

-Nilsson-Raman (1998)]

 A randomized

sorting algorithm

[Kirkpatrick-Reisch (1984)] 

Sorting in

expected time if

[

Andersson

-

Hagerup

-Nilsson-Raman (1998

)]

 

Sorting

strings with characters in

time

[Andersson-Nilsson (1994)] 17Slide18

info

bits

 

key

bits

 

bits

 

Reminder:

Bucket Sort

Initialize an array

of lists/buckets of length

.

Each list is initially empty

.

 

Sequentially scan the items in the input array

.

Append

to

.

 

Sequentially scan the lists and copy the items back to

.

 

Time and space:

 

Each item in the array to be sorted:Time and space for initializing

and scanning the

buckets

18Slide19

info

bits

 

key

bits

 

bits

 

Bucket Sort

using

hashing

Initialize

to be an empty

hash table

.

Initialize

to be an empty list.

 

Sequentially scan the items in the input array

.

If

is not in

, append

to

and initialize to be an empty list.Append

to . 

Scan the non-empty buckets and copy the items back to

.

 

Each item in the array :

Sort

to get a sorted list of non-empty buckets.

 

Space:

 

Time:

 

-

number of

non-empty

buckets

 

19Slide20

info

bits

 

key

bits

 

bits

 

Range reduction

[Kirkpatrick-

Reisch

(1984)]

info

bits

 

low

 

bits

 

high

 

Split each

-bit key into two

-bit parts.

 

Sort

(recursively) according to the

part.

 

Sort

(recursively) according to the

part.

 

Cleverly combine the two sorting steps into one.

20Slide21

Range reduction

[Kirkpatrick-

Reisch

(1984)]First attempt: first Split each

-bit key into two

-bit parts,

and

Sort (recursively) according

to the part. 

Use

hashed

bucket sort to stably sort according to

.

Involves another recursive call

, where

is the number of

distinct

parts =

non-empty buckets.

 

In the worst-case, if all parts are different, i.e.,

 

But if all, or most,

parts are different,

we do not really need to sort according to

.

 

(Same as standard radix sort.)

 

 

21Slide22

Second attempt:

first

 

Split each

-bit key into two -bit parts,

and

.

 

Maintain a list of non-empty buckets.Let be the number of non-empty buckets.Let

be the number of items in the -

th non-empty bucket. Throw the items into (

hashed) buckets indexed by

.

 

Sort

the list of non-empty buckets.

Sort

the items in each non-empty bucket according to

.

 

 

 

 

?

Range reduction

[Kirkpatrick-

Reisch

(1984)]

22Slide23

Third (and final) attempt:

and

together

 Split each -bit key into two -bit parts,

and

.

 

Add all non-minimal items to a list . 

Maintain (

hashed) buckets indexed by .

 

In a bucket, keep only the item with the appropriate

value and the

smallest

part seen so far.

 

When the first item with a given

value is encountered,

add to the list

.

 Sort the list according to

.  From the sorted extract a sorted list of non-empty buckets.(When

is encountered, check that bucket is non-empty, and add to the list, if it was not added before.)

 

Throw all the other items of

to the appropriate buckets.

 

Concatenate the lists in the non-empty buckets.

Range reduction

[Kirkpatrick-

Reisch

(1984)]

23Slide24

Third (and final) attempt:

and

together (cont.)

 The algorithm sorts correctly.The length of the list is

 

 

 

Same complexity may be obtained using

van

Emde

Boas

trees.

We can then pack

keys is each word!

 

In

time, we can reduce

to

.

 

It can work with

fields (see pseudo-code).

 

Thus, sorting

-bit keys reduces to sorting

-bit keys.

 

Range reduction

[Kirkpatrick-

Reisch

(1984)]

24Slide25

 

// Throw into buckets and

//

get a sorted list of non-empty buckets

array of length

// Non-empty buckets

// Number of non-empty buckets

for to :

if

and (

or

and

:

// Initialize a list

else:

 

// Concatenate the non-empty buckets

for

to

:

for

to

 

// Generate the list

for

to

:

if

:

else:

if

:

else:

 

// Sort

according to

 

// Hash table

array of length

 

25Slide26

 

// Throw into buckets and

//

get a sorted list of non-empty buckets

//

List of non-empty buckets

for

to :

if

and and

and (

or

:

// Turn into a list

else:

 

// Concatenate the non-empty buckets for

to

:

for

to

 

// Generate the list

for

to

:

if

:

else:

if

:

else:

 

// Sort

according to

 

// Hash table

//

List

 

(Code fixed)

26Slide27

Packed representation

0

 

0

 

0

 

0

 

0

 

bits

 

bits

 

bits

 

bits

 

We can easily take two such words and

produce a word that contains

-bit

keys.

 

test

bits

 

27Slide28

Packed representation

Useful constants:

1

00…0

1

00…0

1

 

1

 

1

 

 

0

 

0

 

0

 

0

 

0

 

 

Exercise:

How quickly can these constants be computed?

bits

 

bits

 

bits

 

28Slide29

Packed representation

Useful operation:

 

1

00…0

1

00…0

1

 

0

 

1

 

0

11…1

0

11…1

0

 

0

00…0

0

11…1

 

bits

 

29Slide30

Packed Sorting

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]Partition the items into groups of size

.

 

Sort

-bit keys on a machine with -bit words. 

Time required for this preliminary step is

.

This is

,

if

.

 

Sort each

group

naïvely.

(For the time being.)

Pack each group into a single word.

30Slide31

(Packed) Merge Sort

 

 

 

 

31Slide32

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]Merge packed sorted sequences of length to sorted sequences of length , and then

of length

, etc., until a single sorted

sequence of length

is obtained.

 As a basic operation, use the merging of two sorted sequences of length .

 

We shall implement this basic operation in

time,

by simulating a

bitonic sorting network

.

 

32Slide33

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]Standard merge sort take

time.

 

For

, the running time is

.

 We save a factor of

, as we merge

items in

rather than

time.

 

Thus, the running time is

.

 

33Slide34

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 

34Slide35

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 

35Slide36

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 

36Slide37

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Merge the smallest

items from both sequences.

The smallest

items go to the output sequence.

 

37Slide38

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Small technical detail: We need to know how many

of the

smallest items came from each sequence.

 

38Slide39

Packed Merge Sort

[Paul-Simon (1980

)] [Albers-

Hagerup

(1997)]

 

 

 

Simple solution: Add a bit to each key, telling

where it is coming form. Count number of keys

coming from each sequence. (

H

ow do we count?)

39Slide40

Batcher’s bitonic sort

To merge two packed sorted sequences

of

keys each, we use Batcher’s bitonic sort.

 We need to

reverse

one of the sequences and

concatenate

it to the other sequence.

Suppose that is a power of 2. Bitonic sort is composed of

iterations.

 

In iteration

, we need to compare/swap items whose indices differ only in their

-

th

bit.

 

40Slide41

One step of bitonic sort (1)

0

 

0

 

0

 

0

 

0

 

0

 

0

 

0

 

Compare/swap items whose indices differ in the

-th

bit

 

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

Extract items whose indices have a 1 in their

-

th

bit,

and items whose indices have a 0 in their

-

th

bit

 

(In the example

.)

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

41Slide42

One step of bitonic sort (2)

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Shift the first word

fields to the right,

and set their test bits to

1

 

42Slide43

One step of bitonic sort (3)

1

 

1

 

1

0000

1

0000

1

 

1

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Shift the first word

fields to the right,

and set their test bits to

1

 

Subtract

0

0000

0

0000

1

 

0

 

0

0000

0

0000

0

 

1

 

 

 

 

 

43Slide44

One step of bitonic sort (4)

1

 

1

 

1

0000

1

0000

1

 

1

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Subtract

0

0000

0

0000

1

 

0

 

0

0000

0

0000

0

 

1

 

 

 

 

 

Collect winners and losers

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

44Slide45

One step of bitonic sort (5)

Shift the winners

fields to the left

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

45Slide46

One step of bitonic sort (6)

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

0

0000

0

0000

0

 

0

 

Combine them together again:

0

 

0

 

0

 

0

 

0

 

0

 

0

 

0

 

The

-

th

step is over!

 

Shift the winners

fields to the left

 

46Slide47

Packed Bitonic Sort

[Albers-

Hagerup

(1997)]

for

downto

0:

 

47Slide48

Packed Bitonic Sort

[Albers-

Hagerup

(1997)]

for

downto

0:

 

48Slide49

Reversing the fields in a word

For

, in any order,

swap fields with a

0

in the

-

th

bit of their index

with fields positions to the left. 

Similar to the implementation of bitonic

sort.Exercise: Show that this indeed reverses the fields.We already know how to do it.

49Slide50

Packed Merge Sort

We began by splitting the

input numbers into groups of size

, naively sorting them, and then packing them into words.

 

This is good enough for obtaining an

-time algorithm,

but the naïve sorting is clearly not optimal.

 

Exercise:

Show that

integers, each of

bits, can be sorted in

time.

 

50Slide51

Integer Sorting in

time

[

Andersson

-Hagerup-Nilsson-Raman (1998)]

 

Putting everything together, we get a

randomized

-time sorting algorithm for any

.

 How much space are we using?

Are we using multiplications

?

If the

recursion stack

in managed carefully,

the algorithm uses only

space.

 

Yes! In the

hashing

.

Non- operations are required to get search time.

 

51Slide52

Sorting strings/multi-precision integers

We have

strings of

arbitrary

length. Each character is a

-bit word.

 

We want to sort them

lexicographically

. Let be the number of characters that must be examined to determine the order of the strings. 

The problem can be reduced in

time

to the problem of sorting

characters!

 

We get an

-time algorithm.

 

52Slide53

Sorting strings/multi-precision integers

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

53Slide54

Sorting strings/multi-precision integers

D

A

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

We move

pointers

to the strings,

not the strings themselves.

54Slide55

Sorting strings/multi-precision integers

D

A

A

A

C

A

D

A

A

A

C

A

W

Q

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

Necessary and sufficient to examine

the

distinguishing

characters.

 

D

A

D

A

D

A

D

A

D

A

D

A

G

D

A

D

A

D

A

A

55Slide56

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

After the

-

th

pass, the strings are

sorted according to the first

characters.

 

56Slide57

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

The strings are partitioned into groups.

We keep the

starting/

end

positions

of each group.

Groups are

active

or

inactive

.

1

4

5

10

3

9

57Slide58

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

The strings are partitioned into groups.

We keep the

starting/

end

positions

of each group.

Groups are

active

or

inactive

.

1

5

3

9

58Slide59

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

1

5

3

9

59Slide60

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

1

5

3

9

60Slide61

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

2

6

3

8

61Slide62

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

A

A

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

F

G

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

2

6

3

8

62Slide63

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

AA

C

AD

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

G

FG

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

6

2

3

8

63Slide64

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

AA

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

GFG

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7

8

64Slide65

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

AA

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

D

A

GFG

Q

P

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7

8

65Slide66

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

AA

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

DAG

F

GQP

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

7

8

66Slide67

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]DA

D

A

D

A

AA

C

A

D

A

D

A

D

A

D

A

A

A

C

A

W

Q

DAG

F

GQP

J

A

C

A

A

B

D

L

M

C

X

D

A

D

A

D

A

A

67Slide68

Forward Radix Sort

[

Andersson

-Nilsson

(1994)]The -th pass: 

Sequentially scan the items in the active groups.

Append item

into bucket number

.

 Scan the non-empty buckets, in increasing order.

“Empty” each active group.

Append each item to its group.How do we find the non-empty buckets?

(The buckets are shared by all groups.)

(Each item remembers the group it belongs to.)

68Slide69

Forward Radix Sort

(

Slight deviation from

[

Andersson-Nilsson (1994)])The -th pass:

 

Consider each active group separately.

Use

hashing

to determine the different characters appearing in the -th position.  If there are

different characters,

then the number of groups increases by .

 

Sort the

non-minimal characters.

 

Total size of all sorting problems,

in

all

passes, is at most

.

 

69Slide70

Total size of all sorting problems, in all passes, is at most

.

 

Having a collection of smaller problems is almost always better.

We promised one sorting of size at most . 

 

But, in some cases, e.g., if we want to use naïve

bucket sort

, with a large

initialization

cost, having one

large problem is better, and in a sense “cleaner”.

Forward Radix Sort

(

Slight deviation from

[

Andersson

-Nilsson

(

1994)]

)

Actually, removing the minimal character is not really faster as usually

.

 

70Slide71

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[Andersson-Nilsson (1994)]Perform two

phases

.

In the first phase, split into sub-groups,

but keep the sub-groups in arbitrary

order.Weaker invariant: After the -th pass, all items inan active group have the same first characters. Items in different groups do

not have the same first

characters.  If

is a non-minimal character appearing in the

-

th

position in some group, add

to a list.

 

71Slide72

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[Andersson-Nilsson (1994)]If

is a non-minimal character appearing in the

-

th

position in some group, in the

-th pass,then add

to a list.

 The total length of the list is at most .

 

After sorting the list, in the second phase,

we can run the original algorith

m

.

Slight problem:

cannot be bounded in terms of

.

 

72Slide73

Obtaining one sorting problem of size

.

 

Forward Radix Sort

[Andersson-Nilsson (1994)]If

is a non-minimal character appearing in the

-

th

position in some group, in the

-th pass,then add

to a list.

 Slight problem:

cannot be bounded in terms of

.

 

Simple solution:

Replace

by

,

where

is the number of passes, at or before

the -th

pass, in which at least one group splits.

 Now

and can be encoded using bits. 73Slide74

Range reduction revisited

We can view each

-bit word as a

2

-character string, composed of -bit characters

 

Using

forward radix sort

of Andersson

and Nilsson, we get an alternative to the range reduction step of Kirkpatrick and Reisch.

 

 

74Slide75

Signature Sort

[

Andersson

-

Hagerup-Nilsson-Raman (1998)]Sorting in expected time if

.

 

Split each

-bit key into

parts/characters, such that

-

bit keys can be sorted in linear time.

 

We can choose

.

 

Use a

hash function

to assign each

-

bit

character a

unique

-bit

signature. Form

shortened keys by concatenating the

signatures of the parts, and sort them in linear time.Construct a compressed trie of the shortened keys.Sort the edges of the trie, possibly using recursion.

The keys now appear in the trie

in sorted order.

75Slide76

Compressed tries

D

A

A

A

C

A

D

A

A

A

C

A

W

Q

F

G

Q

P

J

A

C

A

A

B

D

L

M

CX

D

A

D

A

D

A

D

A

D

A

D

A

G

D

A

D

A

D

A

A

AA

C

DA

FG…

BD…

CA

DA

G

DA

WQ

DA

A

Also known as PATRICIA tries

[Morrison (1968)]

"

Practical Algorithm To Retrieve Information Coded In Alphanumeric

".

Exercise:

Show that a compressed trie of a sorted collection of strings can be constructed in

time.

 

Number of nodes is at most

.

(Only the root may be unary.)

 

A node is

yellow

of it corresponds to an input string.

In our case, all strings would have the same length, so no string would be a prefix of another.

76Slide77

Signature sort example

AA

C…

DA

FG…

BD

C

D

G

BP

A

C

A

C

B

d

a

a

a

c

a

a

a

c

c

c

a

a

b

d

d

a

d

a

d

a

d

b

z

o

p

b

p

f

g

q

p

d

a

g

d

a

d

c

p

D

A

A

A

C

A

A

A

C

C

C

A

A

B

D

D

A

D

A

D

A

D

B

Z

O

P

B

P

F

G

Q

P

D

A

G

D

A

D

C

P

77Slide78

Signature sort example

D

A

A

A

C

A

A

A

C

C

C

A

A

B

D

D

A

D

A

D

A

D

B

Z

O

PB

P

FGQP

D

A

G

D

A

D

C

P

a

a

c

c

a

a

c

a

a

a

b

d

c

z

o

p

f

g

q

p

d

a

d

a

d

b

d

a

d

c

b

p

d

a

g

d

a

d

a

p

c…

fg

aa

bd

c

c

a

da

d

g

bp

b

a

c

b < d < c < a < f < g

AA

C…

DA

FG…

BD

C

D

G

BP

A

C

A

C

B

78Slide79

Signature Sort

[

Andersson

-

Hagerup-Nilsson-Raman (1998)]Sorting in expected time if

.

 

Q: How do we find

unique signatures

?

Q: How do we sort the shortened keys?

Q: How do we construct the trie of the shortened

keys?

(Note that

is not fast enough!)

 

Q: How do we reorder the

trie

of the

shortened

keys

to obtain the trie of the original keys?A: Sort the original first character on each edge. If characters are not short enough use recursion.A: Using packed sorting in

time. 79Slide80

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

 

The trie has up to

edges.

Why is this

and not

?

 

As we are only going to repeat it a

constant

number of times, it does not really matter.

But we can get down to

.

 

Use the trick of finding the minimum edge

and not including it in the sort.

Exercise:

Number of chars to sort is at most

.

 

80Slide81

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

As

we have:

 

 

 

 

(This is

if

.)

 

If we iterate

times we get:

 

 

81Slide82

Signature Sort

[Andersson-

Hagerup

-Nilsson-Raman (1998)]

 

If we iterate

times we get:

 

If

, then

.

 

If

is a large enough constant, e.g.,

,

then

.

 

Thus, for

, we can sort in

time.

 

82Slide83

Constructing a compressed trie in

time

 

Add the sorted strings to the trie one by one.

The left-most path corresponds to

.

 

Find the

longest common prefix

of

and

.

 

Suppose that we are about to insert

.

 

As

are packed, we can do it in

time.

 

We may need to add an internal node,

unless the common prefix ends at node.

How do we find the parent

of the new internal node?

83Slide84

Constructing a compressed trie in

time

 

How do we find the parent

of the new internal node?

We can (probably) use bit tricks

to do it in

time.

 

We can also slowly climb up from last leaf.

Each node we pass

exits

the left-most path.

Total number of operations is

.

 

Note:

Similar to the linear time

construction of

Cartesian trees

.

84Slide85

Computing unique signatures

[

Andersson

-

Hagerup-Nilsson-Raman (1998)]We have at most

different characters.

 

The expected number of

collisions

is at most . 

Let

be an (almost) universal family of hash functions from

to

 

For

, there are no collisions,

w.h.p

.

 

Which family of hash functions should we use?

How do we compute the signatures of

all

characters of a given word in

time?

 

Really?What do we do if not?85Slide86

Multiplicative hash functions

[

Dietzfelbinger-Hagerup-Katajainen-Penttonen (1997)]

Extremely fast in practice!

Form an “almost-universal” family

 

 

odd

 

 

 

 

Not necessary if

 

86