Dr Yingwu Zhu Chapter 14 Be Creative but no need to be Genius You do not need to create an entirely new type of data structures for applicationsproblems Suffice to augment an existing data structure ID: 526948
Download Presentation The PPT/PDF document "Augmenting Data Structures" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Augmenting Data Structures
Dr. Yingwu Zhu
Chapter 14Slide2
Be Creative, but no need to be Genius
You do not need to create an entirely new type of data structures for applications/problems
Suffice to augment an existing data structure
Storing additional info
Support new operationsSlide3
Two Examples
Order statistics trees (OS-Trees)
Interval treesSlide4
OS-Trees:
Dynamic Order Statistics
We’ve seen algorithms for finding the
i
th element of an unordered set in O(
n
) time
OS-Trees
: a structure to support finding the
i
th element of a dynamic set in O(lg
n
) time
Support standard dynamic set operations (
Insert(), Delete(), Min(), Max(), Succ(), Pred()
)
Also support these order statistic operations:
void OS-Select(root,
i
);
int OS-Rank(
x
);Slide5
OS-Trees
OS-Trees
augment red-black trees:
Associate a
size
field with each node in the tree
x->size
records the size of
subtree
rooted at x, including x itself:
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1Slide6
OS-Select(
x,i
)
Return a pointer to a node containing the
i-th
smallest key in the
subtree
rooted at x
Example
: OS-Select(root, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1Slide7
OS-Select
Example: show OS-Select(
root
, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}Slide8
OS-Select
Example: show OS-Select(
root
, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}
i = 5
r = 6Slide9
OS-Select
Example: show OS-Select(
root
, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}
i = 5
r = 6
i = 5
r = 2Slide10
OS-Select
Example: show OS-Select(
root
, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}
i = 5
r = 6
i = 5
r = 2
i = 3
r = 2Slide11
OS-Select
Example: show OS-Select(
root
, 5):
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}
i = 5
r = 6
i = 5
r = 2
i = 3
r = 2
i = 1
r = 1Slide12
OS-Select
Example: show OS-Select(
root
, 5):
Note: use a sentinel NIL element at the leaves with size = 0 to simplify code, avoid testing for NULL
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Select(x, i)
{
r = x->left->size + 1;
if (i == r)
return x;
else if (i < r)
return OS-Select(x->left, i);
else
return OS-Select(x->right, i-r);
}
i = 5
r = 6
i = 5
r = 2
i = 3
r = 2
i = 1
r = 1Slide13
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
Idea: rank of right child
x
is one
more than its parent’s rank, plus the size of
x
’s left subtreeSlide14
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Idea: rank of right child
x
is one
more than its parent’s rank, plus the size of
x
’s left subtreeSlide15
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 1:
find rank of element with key H
y
r = 1Slide16
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 1:
find rank of element with key H
r = 1
y
r = 1+1+1 = 3Slide17
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 1:
find rank of element with key H
r = 1
r = 3
y
r = 3+1+1 = 5Slide18
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 1:
find rank of element with key H
r = 1
r = 3
r = 5
y
r = 5Slide19
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 2:
find rank of element with key P
y
r = 1Slide20
Determining
The
Rank Of An Element
M
8
C
5
P
2
Q
1
A
1
F
3
D
1
H
1
OS-Rank(T, x)
{
r = x->left->size + 1;
y = x;
while (y != T->root)
if (y == y->p->right)
r = r + y->p->left->size + 1;
y = y->p;
return r;
}
Example 2:
find rank of element with key P
r = 1
y
r = 1 + 5 + 1 = 7Slide21
Maintaining
Subtree
Sizes
So by keeping subtree sizes, order statistic operations can be done in O(lg n) time
Next: maintain sizes during Insert() and Delete() operations
Insert(): Increment size fields of nodes traversed during search down the tree
Delete(): Decrement sizes along a path from the deleted node to the root
Both: Update sizes correctly during rotationsSlide22
Maintaining
Subtree
Sizes
Note that rotation invalidates only
x
and
y: local operations
Can recalculate their sizes in constant time
Thm
15.1: can compute any property in O(lg n) time that depends only on node, left child, and right child
y
19
x
11
x
19
y
12
rightRotate(y)
leftRotate(x)
6
4
7
6
4
7Slide23
Methodology
For Augmenting Data Structures
Choose underlying data structure
Determine additional information to maintain
Verify that information can be maintained for operations that modify the structure
Develop new operationsSlide24
Interval Trees
The problem: maintain a set of intervals
E.g., time intervals for a scheduling program:
10
7
11
5
8
4
18
15
23
21
17
19
i
= [7,10];
i
low
= 7;
i
high
= 10Slide25
Interval Trees
The problem: maintain a set of intervals
E.g., time intervals for a scheduling program:
Query: find an interval in the set that overlaps a given query interval
[14,16]
[15,18]
[16,19] [15,18] or [17,19]
[12,14] NULL
10
7
11
5
8
4
18
15
23
21
17
19
i
= [7,10];
i
low
= 7;
i
high
= 10Slide26
Interval Trees
Following the methodology:
Pick underlying data structure
Decide what additional information to store
Figure out how to maintain the information
Develop the desired new operationsSlide27
Interval Trees
Following the methodology:
Pick underlying data structure
Red-black trees will store intervals, keyed on
i
low
Decide what additional information to store
Figure out how to maintain the information
Develop the desired new operationsSlide28
Interval Trees
Following the methodology:
Pick underlying data structure
Red-black trees will store intervals, keyed on
i
low
Decide what additional information to store
We will store
max, the maximum endpoint in the subtree rooted at
iFigure out how to maintain the information
Develop the desired new operationsSlide29
Interval Trees
[17,19]
[5,11]
[21,23]
[4,8]
[15,18]
[7,10]
int
max
What are the max fields?Slide30
Interval Trees
[17,19]
23
[5,11]
18
[21,23]
23
[4,8]
8
[15,18]
18
[7,10]
10
int
max
Note that:Slide31
Interval Trees
Following the methodology:
Pick underlying data structure
Red-black trees will store intervals, keyed on
i
low
Decide what additional information to store
Store the maximum endpoint in the subtree rooted at
i
Figure out how to maintain the informationHow would we maintain max field for a BST?
What’s different?
Develop the desired new operationsSlide32
Interval Trees
What are the new max values for the subtrees?
[11,35]
35
[6,20]
20
[6,20]
???
[11,35]
???
rightRotate(y)
leftRotate(x)
…
14
…
19
…
30
…
???
…
???
…
???Slide33
Interval Trees
What are the new max values for the subtrees?
A: Unchanged
What are the new max values for x and y?
[11,35]
35
[6,20]
20
[6,20]
???
[11,35]
???
rightRotate(y)
leftRotate(x)
…
14
…
19
…
30
…
14
…
19
…
30Slide34
Interval Trees
What are the new max values for the subtrees?
A: Unchanged
What are the new max values for x and y?
A: root value unchanged, recompute other
[11,35]
35
[6,20]
20
[6,20]
35
[11,35]
35
rightRotate(y)
leftRotate(x)
…
14
…
19
…
30
…
14
…
19
…
30Slide35
Interval Trees
Following the methodology:
Pick underlying data structure
Red-black trees will store intervals, keyed on
i
low
Decide what additional information to store
Store the maximum endpoint in the subtree rooted at
iFigure out how to maintain the information
Insert: update max on way down, during rotations
Delete: similarDevelop the desired new operationsSlide36
Searching Interval Trees
Find a node in tree T whose interval overlaps interval I
IntervalSearch
(T,
i
)Slide37
Searching Interval Trees
IntervalSearch
(T,
i
)
{
x = T->root;
while (x != NULL && !overlap(
i
, x->interval))
if (x->left != NULL && x->left->max
i->low) x = x->left; else x = x->right;
return x}What will be the running time?Slide38
IntervalSearch() Example
Example: search for interval
overlapping [14,16]
[17,19]
23
[5,11]
18
[21,23]
23
[4,8]
8
[15,18]
18
[7,10]
10
IntervalSearch
(T,
i
)
{
x = T->root;
while (x != NULL && !overlap(
i
, x->interval))
if (x->left != NULL && x->left->max
i
->low)
x = x->left;
else
x = x->right;
return x
}Slide39
IntervalSearch() Example
Example: search for interval
overlapping [12,14]
[17,19]
23
[5,11]
18
[21,23]
23
[4,8]
8
[15,18]
18
[7,10]
10
IntervalSearch(T, i)
{
x = T->root;
while (x != NULL && !overlap(i, x->interval))
if (x->left != NULL && x->left->max
i->low)
x = x->left;
else
x = x->right;
return x
}Slide40
Correctness of IntervalSearch()
Key idea: need to check only 1 of node’s 2 children
Case 1: search goes right
Show that
overlap in right subtree, or no overlap at all
Case 2: search goes left
Show that
overlap in left subtree, or no overlap at allSlide41
Correctness of IntervalSearch()
Case 1: if search goes right, overlap in the right subtree or no overlap in either subtree
If
overlap in right subtree, we’re done
Otherwise:
xleft = NULL, or x left max < x low (
Why?
)
Thus, no overlap in left subtree!
while (x != NULL && !overlap(i, x->interval))
if (x->left != NULL && x->left->max
i->low) x = x->left; else
x = x->right;
return x;Slide42
Correctness of IntervalSearch()
Case 2: if search goes left, overlap in the left subtree or no overlap in either subtree
If overlap in left subtree, we’re done
Otherwise:
i low x left max, by branch condition
x left max = y high for some y in left subtree
Since i and y don’t overlap and i low y high,
i high < y low
Since tree is sorted by low’s, i high < any low in right subtree
Thus, no overlap in right subtree
while (x != NULL && !overlap(i, x->interval))
if (x->left != NULL && x->left->max
i->low)
x = x->left;
else x = x->right; return x;Slide43
An Application
of
Interval Trees
43Slide44
Overlapping Windows
PROBLEM:
INPUT:
We are given a set S = {w
1
, w
2
, …,
w
n
} of n axis-parallel rectangular windows; w = [l(w), r(w), b(w), t(w)], for wS.OUTPUT MODE:
Reporting: Report all (R) pairs of overlapping input windows. Counting: How many pairs of overlapping input windows? (=R)
Disjointness: Is there an overlapping pair of input windows?
(R>0?) Can we do better than the obvious O(n2) time brute-force solution?
YES: Counting & Disjointness in O(n log n) time,
Reporting in O(R + n log n) time.
l(w)
r(w)
b(w)
t(w)
w
x
y
44Slide45
Plane Sweep Method
sweep line
x
y
sweep line
x
y
active rectangles
x
active x-intervalsSlide46
Sweep Schedule
&
Sweep Status
Sweep Schedule
is the 2n events (corresponding to the bottom and top y
coordinates of the n windows in input set S) in ascending order.
We can pre-sort these 2n events in O(n log n) time.
Sweep Status
is the dictionary of the active windows maintained in an
interval tree T
. For each window w, its x-interval is [l(w), r(w)], starting at
l(w) and finishing at r(w). Dictionary size: |T|
n.
Operations:
Insert (w, T):
insert (x-interval of) activated window w into T.
Delete(
w,T
):
delete (x-interval of) deactivated window w from T.OverlapSearch(
w,T):
return a window in T that overlaps w.CountAllOverlaps(w,T): return count of # windows in T that overlap w.ReportAllOverlaps(
w,T): report all windows in T that overlap w.
All but the last operation take
O(log n) time each. ReportAllOverlaps takes O(Rw + log n) time, where
Rw is the output size.
S
wS O(Rw + log n) = O( ( Sw
S R
w ) + n log n) = O( R + n log
n).Slide47
Plane Sweep Algorithm
ALGORITHM
ReportOverlappingPairs
(
S,n
)
Step 0: Initialize sweep status interval tree T:
T
Step 1: Initialize sweep schedule of events E:
E = {
b(w), w
|
w
S } {
t(w), w | w
S
}. Sort E in ascending order of the first component. [Tie breaking rule: if b(w)=t(w’), b has precedence over t]Step 2: Sweep the plane according to sweep schedule E:
for each y,w
E, in ascending order of y do if y = b
(w) then do
ReportAllOverlaps(w,T) Insert(
w,T) end-then
else Delete(w,T) (* y = t(w) *)
end-for
end
Running time = O(R + n log n).Slide48
Exercises
48Slide49
Range-Search-Counting:
Given a Red-Black tree T and a pair of keys a and b, a < b (not necessarily in T), we wish to output the number of items x in T such that a
key[x]
b. Augment T so that, without degrading the time complexities of the dictionary operations Search, Insert, Delete, operation
RangeSearchCount
can be done in O(log n) time in the worst case, where n is the number of items in T.
Dictionary with Average and Median:
Operations
Average(K,T)
and
Median(K,T)
return, respectively, the average and median of those items in dictionary T that are
the given
key K.
(a) Describe the augmented dictionary data structure to support these new operations.
(b) Describe the algorithms for Average and Median, and explain their running times.
(c) What are the required modifications to Search, Insert, Delete? Are their running times affected?
[CLRS, Exercise 14.1-7, p. 345]
We wish to output the number of inversions in a given array A[1..n] of n real numbers. Show how to use Order Statistics
Dictionary (OSD)
to solve this problem in O(n log n)
time.
Selection in a Pair of Order Statistics Dictionaries:
As we learned in the course, an Order Statistics Dictionary (OSD) can be implemented efficiently by an augmented Red-Black Tree in which each node x has the extra field size[x] which stores the number of descendents
of node x. We are given two OSDs S and T with n keys in total, and a positive integer k, k
n. The elements within an OSD have distinct keys, but the elements in the two OSDs may have some equal keys. The problem is to find the kth smallest key among the n keys stored in S and T (without changing S and T). Design and analyze an algorithm that solves this problem in worst-case O(log n) time.
49Slide50
[CLRS, Exercise 14.1-8, p. 345]
Consider n given chords on a circle, each defined by its two endpoints. Describe an O(n log n) time algorithm to determine the number of pairs of chords that intersect inside the circle.
[For instance, if the n chords are all diameters that meet at the center, the correct answer would be n(n-1)/2.]
For simplicity, assume that no two chords share an end-point.
[CLRS, Problem 14-1, p. 354]
We wish to keep track of a
Point of Maximum Overlap (PMO)
in a dynamic set T of intervals, i.e., a point on the real line that is overlapped by the largest number of intervals in data set T.
(a)
Show
that there is always a PMO that is an end-point of an interval in T (if not empty).
(b)
Design
and analyze an augmented interval data structure T that efficiently supports the
old
operations Insert
, Delete, and the new operation
FindPMO
(T)
. The latter returns a point of
maximum
overlap
with
the intervals in the data set T.
[Hint: Keep a Red-Black tree of all endpoints. Associate a value of +1 with each left
end-point, and associate a value of
–1 with each right end-point. Augment each
node of the tree with extra information to maintain the PMO.][CLRS, Exercise 14.3-6, p. 354] Dynamic Closest Pair:
Show how to maintain a dynamic data set Q of numbers that supports the operation ClosestPair, which returns the pair of numbers in Q with minimum absolute value difference. For example, if Q = {1, 5, 9, 15, 18, 22}, then ClosestPair
(Q) returns (15,18), since |18 – 15| = 3 is the minimum difference among any distinct pair of numbers in Q.
Design and analyze an augmented data structure to support such a data set with the operations Search, Insert, Delete, and ClosestPair as efficiently as possible.50Slide51
Closest
Bichromatic
Pair:
We want to maintain a dynamic data set Q of items. Each item has two attributes: a
key
, which is a real number, and a
colour
, which is either
blue
or
green
. In addition to the usual dictionary operations, we wish to support a new operation called
Closest
Bichromatic
Pair (CBP).
CBP(Q) returns a pair of items in Q that have different
colours
and minimum possible absolute value difference in their key values. (Note, we are referring to the values of their keys, not their index or position in the data set.) CBP returns nil if all items in Q have the same
colour
.
Design and analyze an augmented data structure to support such a data set with the operations Search, Insert, Delete, and CBP as efficiently as possible.
[CLRS, Exercise 14.3-3, p. 353]
EarliestOverlapSearch
(I,T):
We are given an interval tree T and an interval I. For simplicity, assume all intervals have distinct starting points. Design and analyze an efficient implementation of EarliestOverlapSearch(I,T), which returns the interval with minimum starting time among those intervals in T that overlap interval I. It returns nil if no interval in T overlaps I. [CLRS, Exercise 14.3-4, p. 354]
ReportAllOverlaps(I,T): (a) Show that ReportAllOverlaps(I,T) can be done in O(min{n, (R+1)log n}) time, where n is the number
of intervals in Interval tree T, and R is the output size, i.e., the number of intervals in T that overlap
interval I. You should not redefine or augment Interval trees in any way. However, the operation may temporarily modify T during its process (e.g., delete an item, then reinsert it back into T). (b) Answer part (a) but with
ReportAllOverlaps not modifying T in any way, not even temporarily.(c) [Extra credit and optional:]
Improve the time complexity of
ReportAllOverlaps(I,T) to O(R + log n), without degrading the time complexities of the other operations. [You may redefine the structure.]51Slide52
Nested Rectangles:
We are given a set R= {R
1
, ... ,
R
n
} of n axis-parallel rectangles in the plane. Each rectangle is given by its two pairs of extreme x and y coordinates. One of these rectangles is the bounding box and encloses all the remaining n-1 rectangles in R. All rectangles in R have disjoint boundaries. So, for any pair of rectangles
R
i
and
R
j
in R, either they are completely outside each other, or one encloses the other. Therefore, we may have several levels of nesting among these rectangles. (See Fig. (a) below.) We say
Ri immediately encloses
Rj, if Ri encloses Rj and there is no other rectangle
Rk in R such that Ri
encloses
Rk and Rk encloses Rj
.(a) The immediate enclosure relationship on R defines a directed graph, where we have a directed edge
from
Ri to Rj iff
Ri immediately encloses Rj
. Prove this digraph is indeed a rooted tree (called the
Nesting Tree of R). That is, you need to prove that the said digraph is acyclic, and every node has in- degree 1 except for one node that has in-degree 0. What rectangle in R corresponds to the root of that tree? (As an illustration, see Fig. (b) corresponding to Fig. (a).)
(b) Design and analyze an efficient algorithm that given the set R, constructs its corresponding Nesting Tree by parent
pointers. That is, outputs the parent array p[1..n] such that p[j] = i means
Ri immediately encloses Rj. Carefully explain the correctness and running time of your algorithm.
(a)
A
set of axis parallel rectangles with disjoint boundaries.
A
B
D
C
E
F
G
H
J
I
F
B
D
C
G
H
(b) The Nesting
Tree
.
I
J
A
E
52