/
Equivalence Equivalence

Equivalence - PowerPoint Presentation

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
437 views
Uploaded On 2016-11-05

Equivalence - PPT Presentation

Between Priority Queues and Sorting in External Memory Zhewei Wei Renmin University of China MADALGO Aarhus University Ke Yi The Hong Kong University of Science and Technology Priority Queue ID: 485024

log keys size base keys log base size sets cost priority sorting amortized queue buffer level reduction memory layers

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Equivalence" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Equivalence Between Priority Queues and Sorting in External Memory

Zhewei

Wei

Renmin

University of China

MADALGO, Aarhus University

Ke

Yi

The Hong Kong University of Science and Technology Slide2

Priority QueueMaintain a set of keys

Support insertions, deletions and

findmin

(

deletemin

)

Fundamental data structure

Used as subroutin

es in greedy algorithms

Dijkstra’s

single source shortest path

algorithm

Prim’s minimum

spanning tree

algorithmSlide3

Sorting to Priority QueuePriority queue can do sortingGiven N unsorted keys

Insert the keys to the priority queue

Perform N

deletemin

operations

(find minimum and delete it)

If a priority queue can support insertion, deletion,

findmin

in

S(N)

time, then the sorting algorithm runs in

O(NS(N))

time.Slide4

Priority Queue to SortingThorup [2007]: sorting can do priority queue

!

A sorting algorithm sorts N keys in

N*S(N)

time in RAM model

O(

Nloglog

N) sorting -> O(

loglog

N) priority queueO() sorting -> O() priority queue

 

A priority queue support all operations in

O(S(N)) time

Use sorting algorithm

as a black boxSlide5

The I/O Model [Aggarwal and Vitter 1988]

Disk

Memory

CPU

Block

Complexity: # of block transfers (I/

Os

)

CPU computations and memory accesses are freeSize: MUnlimited sizeSize: BSlide6

Cache-Oblivious Model

Disk

Memory

CPU

Block

Optimal without knowledge of M and B

Optimal for all M and B

Size: ?

Unlimited size

Size: ?Slide7

Sorting in the I/O ModelSorting bound:

Upper bound: external merge sort

Lower bound: holds for comparison model or indivisibility assumption

Conjecture: lower bound holds for B not too small, even without indivisibility assumption

Sort(N)=

Θ(

N/B *

log

M

/BN ) I/OsTreat keys as atomsSlide8

Priority Queue in External MemoryTree-based: do not give any priority queue-to-sorting reduction

O(1/B*

log

M

/B

N ) amortized cost

I/O model

Buffer tree [

Arge

1995]M/B-ary heaps [Fadel et. al. 1999]Array heaps[Brodal and Katajainen 1998]Slide9

Priority Queue in External MemoryCache-oblivious priority queue [Arge

et.al. 2002]

Keys are moving around in

l

oglog

N levels

O(1/B*

log

M

/BN) with tall cache assumptionM>B2Reduction: Given an external sorting algorithm that sorts N keys in NS(N)/B I/Os, there is an external priority queue that support all operations in O(S(N)loglog N/B) amortized I/

OsSlide10

Our ResultsS(N)/B for S(N) = Ω

(2

log*N

), or M =

Ω

(B*log

(c)

N)

Other wise O((S(N) log*N) /B)No new bounds for external priority queueExternal priority queue lower bound -> external sorting lower bound

A sorting algorithm sorts N keys in N*S(N)/B time in the I/O modelA priority queue support all operations in 1/B*Σi≥0S(Blog(i)(N/B)) amortized I/OsUse sorting algorithm as a black boxS(N) + S(B*log N) + S(B*loglog N)) + …Slide11

OutlineHow Thorup did

it (on a high level)

How

we

extend it in external memory (on a high level)

Open

problemsSlide12

Thorup’s ReductionWord RAM model:

each word consists of w ≥ log N bits

constant number of registers,

each with

capacity for one

word

Atomic heap [Han 2004]: support insertions, deletions, and predecessor

queries in

set of O(log2 N) size in constant timeSlide13

Thorup’s Reduction – O(S(N)*log N)

O(log N)

levels

N keys

N/2 keys

c

keys

2

c

keysN/4 keys

Keep min in the head

Invariant: Keys in higher level are larger than keys in Lower levelSlide14

Thorup’s Reduction – O(S(N)*log N)

Rebalance cost

for

level

2

j

:

2

j

*S(N) # of sorts in N updates: N/2jAmortized cost in level 2j: S(N)log N levels

N keys

N/2 keys

c keys

2c

keys

N/4 keys

O(log N)

levels

Cost: O(S(N)*

logN

)Slide15

Thorup’s Reduction

N/log N

base sets

N/2log N

base sets

1 base sets

2 base sets

N/4log N

Base sets

l

og N

Split/merge base sets:

S(N)

amortized

Rebalancing level

2

j

:

2

j

S(N)/log N

# of

rebalance

in N updates:

N/2

j

Amortized cost for

level 2

j

:

S(N)/log

N

O(log N)

levels

O(S(N)) Amortized costSlide16

Thorup’s Reduction

N/log N

base sets

N/2log N

base sets

1 base sets

2 base sets

N/4log N

Base sets

Atomic

heap

of size

log N

l

og N

Split/merge base sets:

S(N)

amortized

Rebalancing level

2

j

:

2

j

S(N)/log N

# of

rebalance

in N updates:

N/2

j

Amortized cost for

level 2

j

:

S(N)/log

N

O(1) cost

O(S(N)) Amortized costSlide17

Thorup’s Reduction

Amortized Cost

: O(S(N))

Atomic

heap

of size

log N

N/log N

base sets

N/2log N

base sets

1 base sets

2 base sets

N/4log N

Base sets

Atomic heap of size

log N

Buffer size: N/log N

Buffer size: N/2log N

Buffer size: N/4log N

O(S(N)) Amortized cost

O(1) costSlide18

Externalize Thorup’s Reduction

Where does B come in?

How to replace atomic heap?

How to handle deletions in external memory?Slide19

Where does B come in?

B

uffer

of size

B*log N

N/Blog

N

base sets

N/2Blog N

base sets

1 base sets

2 base sets

N/4Blog N

Base sets

Buffer size:

N/log

N

Buffer size:

N/2log

N

Buffer size:

N/4log

N

B*log N

…Slide20

I/O-efficient Flush Operation

Buffer size

|R|

k substructures

Sort keys in

buffer: O(R*S(R)/

B)

Distribute

keys to k substructures: O(R/

B+k

)

Total I/O cost:

O(RS(N)/

B + k)

If k =O(R/B), total flush cost is O(RS(N)/B), amortized cost is O(S(N)/B)Slide21

Where does B come in?

Base sets: 2

j

/(Blog N)

Buffer size:

2

j

/log

N

B*log N

Amortized I/O cost for flushing level buffers: O(S(N)/B

)

If a level holds 2

j

keys

Large

st buffer size: 2

j

/log N

Largest # of base sets: 2

j

/Blog N

Smallest base set (head) size: B*log NSlide22

Replacing Atomic Heap

R = B*log N

k = log N

B

uffer

of size

B*log N

…Slide23

Replacing Atomic Heap

Head of size O(Blog N)

Amortized I/O cost: O(S(N)/B)

B

uffer

of size

B*log N

Recursively build the structure in the head Slide24

Recursively Build Layers

N keys

B*log (N/B) keys

cB

keys

2^c*B keys

B*

loglog

(N/B) keys

O(log* N) Layers…

Levels rebalancing- Move base sets around

- Redistribute buffer

- S(N)/(Blog N) for one level- S(N)/B

for one layer

- S(N)log* N/B

amortized I/O cost Slide25

Recursively Build Layers

N keys

B*log (N/B) keys

cB

keys

2^c*B keys

B*

loglog

(N/B) keys

O(log* N) Layers…

Layers Rebalancing- Rebuild the first (last) level

-

S(N)/B for one layer- S (N)log*

N/B

amortized I/O costSlide26

Recursively Build Layers

N keys

B*log (N/B) keys

cB

keys

2^c*B keys

B*

loglog

(N/B) keys

O(log* N) Layers…Slide27

Recursively Build Layers

N keys

B*log (N/B) keys

cB

keys

2^c*B keys

B*

loglog

(N/B) keys

Memory

b

uffer

of sizeO(B)

R = B

k = log* N

…Slide28

Recursively Build Layers

N keys

B*log (N/B) keys

cB

keys

2^c*B keys

B*

loglog

(N/B) keys

Memory

b

uffer

of sizeO(B)

Amortized cost:

log* N/B

I/O cost per update:

O(S(N)log* N/B)

…Slide29

Handle DeletionsFollow a pointer to perform deletion takes 1 I/O per deletion

Deleting signals

:

Delete x -> Insert (-, x)

Perform actual deletion afterwards

Unlike

buffer tree, we don’t have access to the “leaves”(base sets)

Invariant: Only

process deleting signals in the headSlide30

ScheduleAvoid repeated sortingIf head or memory buffer unbalanced:

Flush stage:

flush

all

overflowed buffers and rebalance all unbalanced base sets

Push

stage: rebalance all overflowed layers and

levels

(expand)Pull stage: deal with delete signals and

rebalance all underflowed layers and levels (shrink)Slide31

Open problemsOptimal reduction? Priority queue that support insertions/deletions in O(1/B) I/O cost for set of size O(B*log

(c)

N)

New reduction framework

Better (than

loglog

N) reduction in Cache-oblivious model?

Hard to do I/O-efficient flushing and rebalancing without knowing BSlide32

Thank You!