Pawe ł Gawrychowski and Pat Nicholson University of Warsaw MaxPlanck Institut für Informatik Range Queries in Arrays Input an array Preprocess the array to answer queries of the form ID: 585353
Download Presentation The PPT/PDF document "Encodings of Range Maximum-Sum Segment Q..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Encodings of Range Maximum-Sum Segment Queries and Applications
Pawe
ł
Gawrychowski
* and
Pat Nicholson**
*University of Warsaw
**Max-Planck-
Institut
für
InformatikSlide2
Range Queries in Arrays
Input: an array
Preprocess the array to answer queries of the form
“Given a range find _____ in the subarray ”Where ______ is something like:the index of the maximum/minimum elementthe index of the top- values the index of the -th largest/smallest numberfind the maximum sum range
Slide3
There is a
succinct data structure
that occupies
bits, and answers queries in constant time.Fischer and Heun (2011) Encoding Range Queries in ArraysHow much space do we need to answer these queries?As an example, think of range min. queries (RMinQ):If we return the value of min, then we must store the array. Why?Because we can ask the query for each This allows us to recover the entire arrayIf we return just the array index, then we can do
much
better.
Slide4
Typical Data Structure
Input Data
(Relatively Big)Slide5
Typical Data Structure
Input Data
(Relatively Big)
Data StructurePreprocessSlide6
Encoding Approach
Input Data
(Relatively Big)Slide7
Encoding Approach
Input Data
(Relatively Big)
Preprocessw.r.t.Some QueryEncoding(Hope: much smaller)Slide8
Encoding Approach
Input Data
(Relatively Big)
Preprocessw.r.t.Some QueryEncoding(Hope: much smaller)Slide9
Encoding Approach
Input Data
(Relatively Big)
Preprocessw.r.t.Some QueryEncoding(Hope: much smaller)Auxiliary Data Structures:(Should be smaller still)Slide10
Encoding Approach
Succinct Data Structure:
Minimum Space Possible
Encoding(Hope: much smaller)Auxiliary Data Structures:(Should be smaller still)Input Data(Relatively Big)Preprocessw.r.t.Some QuerySlide11
Encoding Approach
Succinct Data Structure:
Minimum Space Possible
Encoding(Hope: much smaller)Auxiliary Data Structures:(Should be smaller still)Query(Hope: as fast as non-succinct counterpart)Input Data(Relatively Big)Preprocessw.r.t.
Some QuerySlide12
This Talk: Maximum-Sum Segments
From Jon Bentley’s “Programming Pearls”:
Input
: an array containing arbitrary numbersOutput: the range s.t. is maximizedOnly non-trivial if array contains negative numbersCan be solved in linear time (credited to Kadane)Applications:Bentley [1986]: “[problem] is a toy – it was never incorporated into a system.”Chen and Chao [2004]: “…plays an important role in sequence analysis.”We focus on the range query case:Find range s.t.
is
maximized
Also motivated by biological sequence analytics applications
Slide13
Range Maximum-Sum Segment Queries
What was known:
Chen and Chao [ISAAC 2004, Disc. App. Math. 2007]
This can be done in words of space and timeVery closely related to the range maximum problem:RMSSQ RMaxQ: Pad elements with large negative numbersRMinQ/RMaxQ RMSSQ: More complicated argument Slide14
Range Maximum-Sum Segment Queries
What was not known:
I
s there an efficient encoding structure for this problem?That is: can we beat words Really similar to RMaxQSolution storesseveral
word
arrays;
compares various array
elements
Slide15
Range Maximum-Sum Segment Queries
Our main results:
We can encode these queries using
bits(Rest of this talk)A space lower bound of bits(Enumeration argument using methods from: )Application to computing -covers( disjoint subranges that achieve the maximum sum)Csűrös: “The problem arises in DNA and protein segmentation, and in postprocessing of sequence alignments.” Slide16
Main Idea:
word solution
Define an array consisting of the partial sums of Slide17
Main Idea:
word solution
Imagine shooting a ray from each to the left Slide18
Main Idea:
word solution
Imagine shooting a ray from each to the left Slide19
Main Idea:
word solution
Now find the minimum in this rangeSlide20
Main Idea:
word solution
Now find the minimum in this rangeSlide21
Main Idea:
word solution
Define another array storing these minima
Slide22
Candidate Pairs
We call each pair
a
candidateWe define (yet another) array as follows: is the score of the candidate That is: the sum within the range Slide23
What Do They Store?
The array
:
words(Cumulative Sums)The array : words(Candidate partners)Range min (RMinQ) structure on : bitsRange max (RMaxQ) structure on : bits(Candidate Scores)
Slide24
Main Idea:
word solution
How to answer a query: the easy caseSlide25
Main Idea:
word solution
Let , and examine candidate pair
Slide26
Main Idea:
word solution
If is in query range, return
Slide27
Main Idea:
word solution
How to answer a query: the not so easy caseSlide28
Main Idea:
word solution
Let … this time
Slide29
Main Idea:
word solution
Let …
Slide30
Main Idea:
word solution
Let and
Slide31
Main Idea:
word solution
Return the greater sum: or
Slide32
Reducing the Space
What are the bottlenecks in the data structure?
Storing the array
We need to store the candidate pairsStoring the array We must compare scores of candidates in the not so easy case Slide33
Dealing with
: Bottleneck
I
Slide34
Dealing with
: Bottleneck
I
Slide35
Nested Is Good
Imagine indices as
vertices, candidate pairs as edges
We can represent an -edge nested graph in bitsAlso known as a one-page or outerplanar graphNavigation is efficient: select vertices, follow edges, etc.Jacobson (1989), Munro and Raman (2001); bits
1
2
3
4
5
6
7
8
()((
())(
()(
()))
())((
()(
()))
())Slide36
Dealing with
: Bottleneck
I
Slide37
Dealing with
: Bottleneck
II
Slide38
Dealing with
: Bottleneck
II
Slide39
Dealing with
: Bottleneck
II
We call the point the left sibling of
Knowing
, we can handle the not so easy case.
Slide40
Recall The Query Algorithm
Return the greater sum:
or
Slide41
Recall The Query Algorithm
Return the greater sum:
or
If the left sibling of
is
Slide42
Recall The Query Algorithm
Return the greater sum:
or
If the left sibling of
is
Slide43
Recall The Query Algorithm
Return the greater sum:
or
L
eft sibling of
can’t be here
Slide44
Dealing with
: Bottleneck
II
Problem: cannot store the left siblings explicitlySlide45
Dealing with
: Bottleneck
II
Idea: try to find something that is nestedSlide46
Dealing with
: Bottleneck
II
Solution: the pairs
are nested
Slide47
Dealing with
: Bottleneck
II
Slide48
Dealing with
: Bottleneck
II
Slide49
What Do We Store?
The graph representing candidates:
bits
The graph representing left siblings: bitsRange min (RMinQ) structure on : bitsRange max (RMaxQ) structure on : bitsGrand total:
bits…
(
can be reduced slightly with more tricks)
Slide50
Thank You