/
Augmenting Data Structures Augmenting Data Structures

Augmenting Data Structures - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
401 views
Uploaded On 2017-03-20

Augmenting Data Structures - PPT Presentation

Dr Yingwu Zhu Chapter 14 Be Creative but no need to be Genius You do not need to create an entirely new type of data structures for applicationsproblems Suffice to augment an existing data structure ID: 526948

interval left select return left interval return select size trees rank time overlap amp data root operations max set subtree element tree

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Augmenting Data Structures" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Augmenting Data Structures

Dr. Yingwu Zhu

Chapter 14Slide2

Be Creative, but no need to be Genius

You do not need to create an entirely new type of data structures for applications/problems

Suffice to augment an existing data structure

Storing additional info

Support new operationsSlide3

Two Examples

Order statistics trees (OS-Trees)

Interval treesSlide4

OS-Trees:

Dynamic Order Statistics

We’ve seen algorithms for finding the

i

th element of an unordered set in O(

n

) time

OS-Trees

: a structure to support finding the

i

th element of a dynamic set in O(lg

n

) time

Support standard dynamic set operations (

Insert(), Delete(), Min(), Max(), Succ(), Pred()

)

Also support these order statistic operations:

void OS-Select(root,

i

);

int OS-Rank(

x

);Slide5

OS-Trees

OS-Trees

augment red-black trees:

Associate a

size

field with each node in the tree

x->size

records the size of

subtree

rooted at x, including x itself:

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1Slide6

OS-Select(

x,i

)

Return a pointer to a node containing the

i-th

smallest key in the

subtree

rooted at x

Example

: OS-Select(root, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1Slide7

OS-Select

Example: show OS-Select(

root

, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}Slide8

OS-Select

Example: show OS-Select(

root

, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}

i = 5

r = 6Slide9

OS-Select

Example: show OS-Select(

root

, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}

i = 5

r = 6

i = 5

r = 2Slide10

OS-Select

Example: show OS-Select(

root

, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}

i = 5

r = 6

i = 5

r = 2

i = 3

r = 2Slide11

OS-Select

Example: show OS-Select(

root

, 5):

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}

i = 5

r = 6

i = 5

r = 2

i = 3

r = 2

i = 1

r = 1Slide12

OS-Select

Example: show OS-Select(

root

, 5):

Note: use a sentinel NIL element at the leaves with size = 0 to simplify code, avoid testing for NULL

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Select(x, i)

{

r = x->left->size + 1;

if (i == r)

return x;

else if (i < r)

return OS-Select(x->left, i);

else

return OS-Select(x->right, i-r);

}

i = 5

r = 6

i = 5

r = 2

i = 3

r = 2

i = 1

r = 1Slide13

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

Idea: rank of right child

x

is one

more than its parent’s rank, plus the size of

x

’s left subtreeSlide14

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Idea: rank of right child

x

is one

more than its parent’s rank, plus the size of

x

’s left subtreeSlide15

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 1:

find rank of element with key H

y

r = 1Slide16

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 1:

find rank of element with key H

r = 1

y

r = 1+1+1 = 3Slide17

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 1:

find rank of element with key H

r = 1

r = 3

y

r = 3+1+1 = 5Slide18

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 1:

find rank of element with key H

r = 1

r = 3

r = 5

y

r = 5Slide19

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 2:

find rank of element with key P

y

r = 1Slide20

Determining

The

Rank Of An Element

M

8

C

5

P

2

Q

1

A

1

F

3

D

1

H

1

OS-Rank(T, x)

{

r = x->left->size + 1;

y = x;

while (y != T->root)

if (y == y->p->right)

r = r + y->p->left->size + 1;

y = y->p;

return r;

}

Example 2:

find rank of element with key P

r = 1

y

r = 1 + 5 + 1 = 7Slide21

Maintaining

Subtree

Sizes

So by keeping subtree sizes, order statistic operations can be done in O(lg n) time

Next: maintain sizes during Insert() and Delete() operations

Insert(): Increment size fields of nodes traversed during search down the tree

Delete(): Decrement sizes along a path from the deleted node to the root

Both: Update sizes correctly during rotationsSlide22

Maintaining

Subtree

Sizes

Note that rotation invalidates only

x

and

y: local operations

Can recalculate their sizes in constant time

Thm

15.1: can compute any property in O(lg n) time that depends only on node, left child, and right child

y

19

x

11

x

19

y

12

rightRotate(y)

leftRotate(x)

6

4

7

6

4

7Slide23

Methodology

For Augmenting Data Structures

Choose underlying data structure

Determine additional information to maintain

Verify that information can be maintained for operations that modify the structure

Develop new operationsSlide24

Interval Trees

The problem: maintain a set of intervals

E.g., time intervals for a scheduling program:

10

7

11

5

8

4

18

15

23

21

17

19

i

= [7,10];

i

low

= 7;

i

high

= 10Slide25

Interval Trees

The problem: maintain a set of intervals

E.g., time intervals for a scheduling program:

Query: find an interval in the set that overlaps a given query interval

[14,16]

 [15,18]

[16,19]  [15,18] or [17,19]

[12,14]  NULL

10

7

11

5

8

4

18

15

23

21

17

19

i

= [7,10];

i

low

= 7;

i

high

= 10Slide26

Interval Trees

Following the methodology:

Pick underlying data structure

Decide what additional information to store

Figure out how to maintain the information

Develop the desired new operationsSlide27

Interval Trees

Following the methodology:

Pick underlying data structure

Red-black trees will store intervals, keyed on

i

low

Decide what additional information to store

Figure out how to maintain the information

Develop the desired new operationsSlide28

Interval Trees

Following the methodology:

Pick underlying data structure

Red-black trees will store intervals, keyed on

i

low

Decide what additional information to store

We will store

max, the maximum endpoint in the subtree rooted at

iFigure out how to maintain the information

Develop the desired new operationsSlide29

Interval Trees

[17,19]

[5,11]

[21,23]

[4,8]

[15,18]

[7,10]

int

max

What are the max fields?Slide30

Interval Trees

[17,19]

23

[5,11]

18

[21,23]

23

[4,8]

8

[15,18]

18

[7,10]

10

int

max

Note that:Slide31

Interval Trees

Following the methodology:

Pick underlying data structure

Red-black trees will store intervals, keyed on

i

low

Decide what additional information to store

Store the maximum endpoint in the subtree rooted at

i

Figure out how to maintain the informationHow would we maintain max field for a BST?

What’s different?

Develop the desired new operationsSlide32

Interval Trees

What are the new max values for the subtrees?

[11,35]

35

[6,20]

20

[6,20]

???

[11,35]

???

rightRotate(y)

leftRotate(x)

14

19

30

???

???

???Slide33

Interval Trees

What are the new max values for the subtrees?

A: Unchanged

What are the new max values for x and y?

[11,35]

35

[6,20]

20

[6,20]

???

[11,35]

???

rightRotate(y)

leftRotate(x)

14

19

30

14

19

30Slide34

Interval Trees

What are the new max values for the subtrees?

A: Unchanged

What are the new max values for x and y?

A: root value unchanged, recompute other

[11,35]

35

[6,20]

20

[6,20]

35

[11,35]

35

rightRotate(y)

leftRotate(x)

14

19

30

14

19

30Slide35

Interval Trees

Following the methodology:

Pick underlying data structure

Red-black trees will store intervals, keyed on

i

low

Decide what additional information to store

Store the maximum endpoint in the subtree rooted at

iFigure out how to maintain the information

Insert: update max on way down, during rotations

Delete: similarDevelop the desired new operationsSlide36

Searching Interval Trees

Find a node in tree T whose interval overlaps interval I

IntervalSearch

(T,

i

)Slide37

Searching Interval Trees

IntervalSearch

(T,

i

)

{

x = T->root;

while (x != NULL && !overlap(

i

, x->interval))

if (x->left != NULL && x->left->max

i->low) x = x->left; else x = x->right;

return x}What will be the running time?Slide38

IntervalSearch() Example

Example: search for interval

overlapping [14,16]

[17,19]

23

[5,11]

18

[21,23]

23

[4,8]

8

[15,18]

18

[7,10]

10

IntervalSearch

(T,

i

)

{

x = T->root;

while (x != NULL && !overlap(

i

, x->interval))

if (x->left != NULL && x->left->max

i

->low)

x = x->left;

else

x = x->right;

return x

}Slide39

IntervalSearch() Example

Example: search for interval

overlapping [12,14]

[17,19]

23

[5,11]

18

[21,23]

23

[4,8]

8

[15,18]

18

[7,10]

10

IntervalSearch(T, i)

{

x = T->root;

while (x != NULL && !overlap(i, x->interval))

if (x->left != NULL && x->left->max

i->low)

x = x->left;

else

x = x->right;

return x

}Slide40

Correctness of IntervalSearch()

Key idea: need to check only 1 of node’s 2 children

Case 1: search goes right

Show that

 overlap in right subtree, or no overlap at all

Case 2: search goes left

Show that

 overlap in left subtree, or no overlap at allSlide41

Correctness of IntervalSearch()

Case 1: if search goes right,  overlap in the right subtree or no overlap in either subtree

If

 overlap in right subtree, we’re done

Otherwise:

xleft = NULL, or x  left  max < x  low (

Why?

)

Thus, no overlap in left subtree!

while (x != NULL && !overlap(i, x->interval))

if (x->left != NULL && x->left->max

i->low) x = x->left; else

x = x->right;

return x;Slide42

Correctness of IntervalSearch()

Case 2: if search goes left,  overlap in the left subtree or no overlap in either subtree

If  overlap in left subtree, we’re done

Otherwise:

i low  x left max, by branch condition

x left max = y high for some y in left subtree

Since i and y don’t overlap and i low  y high,

i high < y low

Since tree is sorted by low’s, i high < any low in right subtree

Thus, no overlap in right subtree

while (x != NULL && !overlap(i, x->interval))

if (x->left != NULL && x->left->max

i->low)

x = x->left;

else x = x->right; return x;Slide43

An Application

of

Interval Trees

43Slide44

Overlapping Windows

PROBLEM:

INPUT:

We are given a set S = {w

1

, w

2

, …,

w

n

} of n axis-parallel rectangular windows; w = [l(w), r(w), b(w), t(w)], for wS.OUTPUT MODE:

Reporting: Report all (R) pairs of overlapping input windows. Counting: How many pairs of overlapping input windows? (=R)

Disjointness: Is there an overlapping pair of input windows?

(R>0?) Can we do better than the obvious O(n2) time brute-force solution?

YES: Counting & Disjointness in O(n log n) time,

Reporting in O(R + n log n) time.

l(w)

r(w)

b(w)

t(w)

w

x

y

44Slide45

Plane Sweep Method

sweep line

x

y

sweep line

x

y

active rectangles

x

active x-intervalsSlide46

Sweep Schedule

&

Sweep Status

Sweep Schedule

is the 2n events (corresponding to the bottom and top y

coordinates of the n windows in input set S) in ascending order.

We can pre-sort these 2n events in O(n log n) time.

Sweep Status

is the dictionary of the active windows maintained in an

interval tree T

. For each window w, its x-interval is [l(w), r(w)], starting at

l(w) and finishing at r(w). Dictionary size: |T|

 n.

Operations:

Insert (w, T):

insert (x-interval of) activated window w into T.

Delete(

w,T

):

delete (x-interval of) deactivated window w from T.OverlapSearch(

w,T):

return a window in T that overlaps w.CountAllOverlaps(w,T): return count of # windows in T that overlap w.ReportAllOverlaps(

w,T): report all windows in T that overlap w.

All but the last operation take

O(log n) time each. ReportAllOverlaps takes O(Rw + log n) time, where

Rw is the output size.

S

wS O(Rw + log n) = O( ( Sw

S R

w ) + n log n) = O( R + n log

n).Slide47

Plane Sweep Algorithm

ALGORITHM

ReportOverlappingPairs

(

S,n

)

Step 0: Initialize sweep status interval tree T:

T

 

Step 1: Initialize sweep schedule of events E:

E = {

b(w), w

|

w

S }  { 

t(w), w  | w

S

}. Sort E in ascending order of the first component. [Tie breaking rule: if b(w)=t(w’), b has precedence over t]Step 2: Sweep the plane according to sweep schedule E:

for each y,w

 E, in ascending order of y do if y = b

(w) then do

ReportAllOverlaps(w,T) Insert(

w,T) end-then

else Delete(w,T) (* y = t(w) *)

end-for

end

Running time = O(R + n log n).Slide48

Exercises

48Slide49

Range-Search-Counting:

Given a Red-Black tree T and a pair of keys a and b, a < b (not necessarily in T), we wish to output the number of items x in T such that a

key[x]

b. Augment T so that, without degrading the time complexities of the dictionary operations Search, Insert, Delete, operation

RangeSearchCount

can be done in O(log n) time in the worst case, where n is the number of items in T.

Dictionary with Average and Median:

Operations

Average(K,T)

and

Median(K,T)

return, respectively, the average and median of those items in dictionary T that are

 the given

key K.

(a) Describe the augmented dictionary data structure to support these new operations.

(b) Describe the algorithms for Average and Median, and explain their running times.

(c) What are the required modifications to Search, Insert, Delete? Are their running times affected?

[CLRS, Exercise 14.1-7, p. 345]

We wish to output the number of inversions in a given array A[1..n] of n real numbers. Show how to use Order Statistics

Dictionary (OSD)

to solve this problem in O(n log n)

time.

Selection in a Pair of Order Statistics Dictionaries:

As we learned in the course, an Order Statistics Dictionary (OSD) can be implemented efficiently by an augmented Red-Black Tree in which each node x has the extra field size[x] which stores the number of descendents

of node x. We are given two OSDs S and T with n keys in total, and a positive integer k, k

 n. The elements within an OSD have distinct keys, but the elements in the two OSDs may have some equal keys. The problem is to find the kth smallest key among the n keys stored in S and T (without changing S and T). Design and analyze an algorithm that solves this problem in worst-case O(log n) time.

49Slide50

[CLRS, Exercise 14.1-8, p. 345]

Consider n given chords on a circle, each defined by its two endpoints. Describe an O(n log n) time algorithm to determine the number of pairs of chords that intersect inside the circle.

[For instance, if the n chords are all diameters that meet at the center, the correct answer would be n(n-1)/2.]

For simplicity, assume that no two chords share an end-point.

[CLRS, Problem 14-1, p. 354]

We wish to keep track of a

Point of Maximum Overlap (PMO)

in a dynamic set T of intervals, i.e., a point on the real line that is overlapped by the largest number of intervals in data set T.

(a)

Show

that there is always a PMO that is an end-point of an interval in T (if not empty).

(b)

Design

and analyze an augmented interval data structure T that efficiently supports the

old

operations Insert

, Delete, and the new operation

FindPMO

(T)

. The latter returns a point of

maximum

overlap

with

the intervals in the data set T.

[Hint: Keep a Red-Black tree of all endpoints. Associate a value of +1 with each left

end-point, and associate a value of

–1 with each right end-point. Augment each

node of the tree with extra information to maintain the PMO.][CLRS, Exercise 14.3-6, p. 354] Dynamic Closest Pair:

Show how to maintain a dynamic data set Q of numbers that supports the operation ClosestPair, which returns the pair of numbers in Q with minimum absolute value difference. For example, if Q = {1, 5, 9, 15, 18, 22}, then ClosestPair

(Q) returns (15,18), since |18 – 15| = 3 is the minimum difference among any distinct pair of numbers in Q.

Design and analyze an augmented data structure to support such a data set with the operations Search, Insert, Delete, and ClosestPair as efficiently as possible.50Slide51

Closest

Bichromatic

Pair:

We want to maintain a dynamic data set Q of items. Each item has two attributes: a

key

, which is a real number, and a

colour

, which is either

blue

or

green

. In addition to the usual dictionary operations, we wish to support a new operation called

Closest

Bichromatic

Pair (CBP).

CBP(Q) returns a pair of items in Q that have different

colours

and minimum possible absolute value difference in their key values. (Note, we are referring to the values of their keys, not their index or position in the data set.) CBP returns nil if all items in Q have the same

colour

.

Design and analyze an augmented data structure to support such a data set with the operations Search, Insert, Delete, and CBP as efficiently as possible.

[CLRS, Exercise 14.3-3, p. 353]

EarliestOverlapSearch

(I,T):

We are given an interval tree T and an interval I. For simplicity, assume all intervals have distinct starting points. Design and analyze an efficient implementation of EarliestOverlapSearch(I,T), which returns the interval with minimum starting time among those intervals in T that overlap interval I. It returns nil if no interval in T overlaps I. [CLRS, Exercise 14.3-4, p. 354]

ReportAllOverlaps(I,T): (a) Show that ReportAllOverlaps(I,T) can be done in O(min{n, (R+1)log n}) time, where n is the number

of intervals in Interval tree T, and R is the output size, i.e., the number of intervals in T that overlap

interval I. You should not redefine or augment Interval trees in any way. However, the operation may temporarily modify T during its process (e.g., delete an item, then reinsert it back into T). (b) Answer part (a) but with

ReportAllOverlaps not modifying T in any way, not even temporarily.(c) [Extra credit and optional:]

Improve the time complexity of

ReportAllOverlaps(I,T) to O(R + log n), without degrading the time complexities of the other operations. [You may redefine the structure.]51Slide52

Nested Rectangles:

We are given a set R= {R

1

, ... ,

R

n

} of n axis-parallel rectangles in the plane. Each rectangle is given by its two pairs of extreme x and y coordinates. One of these rectangles is the bounding box and encloses all the remaining n-1 rectangles in R. All rectangles in R have disjoint boundaries. So, for any pair of rectangles

R

i

and

R

j

in R, either they are completely outside each other, or one encloses the other. Therefore, we may have several levels of nesting among these rectangles. (See Fig. (a) below.) We say

Ri immediately encloses

Rj, if Ri encloses Rj and there is no other rectangle

Rk in R such that Ri

encloses

Rk and Rk encloses Rj

.(a) The immediate enclosure relationship on R defines a directed graph, where we have a directed edge

from

Ri to Rj iff

Ri immediately encloses Rj

. Prove this digraph is indeed a rooted tree (called the

Nesting Tree of R). That is, you need to prove that the said digraph is acyclic, and every node has in- degree 1 except for one node that has in-degree 0. What rectangle in R corresponds to the root of that tree? (As an illustration, see Fig. (b) corresponding to Fig. (a).)

(b) Design and analyze an efficient algorithm that given the set R, constructs its corresponding Nesting Tree by parent

pointers. That is, outputs the parent array p[1..n] such that p[j] = i means

Ri immediately encloses Rj. Carefully explain the correctness and running time of your algorithm.

(a)

A

set of axis parallel rectangles with disjoint boundaries.

A

B

D

C

E

F

G

H

J

I

F

B

D

C

G

H

(b) The Nesting

Tree

.

I

J

A

E

52