Pat Nicholson and Rajeev Raman MPII University of Leicester Input Data Relatively Big déjà vu The Encoding Approach déjà vu The Encoding Approach Input Data Relatively Big ID: 602588
Download Presentation The PPT/PDF document "Encoding Nearest Larger Values" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Encoding Nearest Larger Values
Pat Nicholson* and Rajeev Raman**
*
MPII
**
University of LeicesterSlide2
Input Data
(Relatively Big)
déjà vu: The Encoding ApproachSlide3
déjà vu: The Encoding Approach
Input Data
(Relatively Big)
Preprocess
w.r.t.
Some Query
Encoding
(Hope: much smaller)Slide4
Input Data
(Relatively Big)
Preprocess
w.r.t.
Some Query
Encoding
(Hope: much smaller)
déjà vu: The Encoding ApproachSlide5
Input Data
(Relatively Big)
Preprocess
w.r.t.
Some Query
Encoding
(Hope: much smaller)
Auxiliary Data Structures:
(Should be smaller still)
déjà vu: The Encoding ApproachSlide6
Succinct Data Structure:
Minimum Space Possible
Encoding
(Hope: much smaller)
Auxiliary Data Structures:
(Should be smaller still)
Input Data
(Relatively Big)
Preprocess
w.r.t.
Some Query
déjà vu: The Encoding ApproachSlide7
Succinct Data Structure:
Minimum Space Possible
Encoding
(Hope: much smaller)
Auxiliary Data Structures:
(Should be smaller still)
Query
(Hope: as fast as non-succinct counterpart)
Input Data
(Relatively Big)
Preprocess
w.r.t.
Some Query
déjà vu: The Encoding ApproachSlide8
NEAREST LARGER VALUES
Support nearest larger value queries on an array
:
Given index
, return position
of the “nearest” value larger than
Two important questions:
#1: What does “nearest”
mean?
Many possible variants of the problem:
Unidirectional: return the index of the NLV to the left (
)
Bidirectional: return the indices of the NLV to the left AND right
Nondirectional
: return the index of the closest NLV (min.
)
#2: Are all elements in the array distinct?
10
2
3
1
9
8
7
11
5
4Slide9
OVERview: ENCODING NLV
For all these results: space bound is optimal to within lower order terms
Distinct
Problem
Space
Q
Notes
Yes
Unidirectional
Cartesian
Tree
Bidirectional
Cartesian
Tree
Nondirectional
No
Unidirectional
[Fischer
et al. 2009
]
Cartesian
Tree
Bidirectional
[Fischer
2011]
Schr
ö
der
T
rees
(Navigate
CSA)
Nondirectional
Distinct
Problem
Space
Q
Notes
Yes
Unidirectional
Cartesian
Tree
Bidirectional
Cartesian
Tree
Nondirectional
No
Unidirectional
[Fischer
et al. 2009
]
Cartesian
Tree
Bidirectional
[Fischer
2011]
Schr
ö
der
T
rees
(Navigate
CSA)
NondirectionalSlide10
OVERview: ENCODING NLV
Distinct
Problem
Space
Q
Notes
Yes
Unidirectional
Cartesian
Tree
Bidirectional
Cartesian
Tree
Nondirectional
This paper:
NLV
Tree
No
Unidirectional
[Fischer
et al. 2009
]
Cartesian
Tree
Bidirectional
[Fischer
2011]
Schr
ö
der
T
rees
(Navigate
CSA)
Nondirectional
Distinct
Problem
Space
Q
Notes
Yes
Unidirectional
Cartesian
Tree
Bidirectional
Cartesian
TreeNondirectionalThis paper:NLV TreeNoUnidirectional[Fischer et al. 2009]Cartesian TreeBidirectional[Fischer 2011]Schröder Trees(Navigate CSA)Nondirectional
Still very open: What is the constant? ? Prove it!
Slide11
BIGGER PICTURE
Encoding 1D Range Minimum Queries
Fischer and
Heun
[SICOMP 2011]
Encoding also using
bits via Cartesian tree
All-Nearest Larger Values
Asano et al. [
Mehlhorn’s
Festschrift 2009, WADS 2013] Trade-offs for computing the solutions to all NLV queries
Berkman et al. [J.
Alg
1993]
Parallel algorithms for parenthesis matching, triangulating monotone polygons
Encoding 2D Nearest Larger ValuesJo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]
Encode NLV of array under
metric using
bits
Encoding
2D
Range Minimum Queries
See
Brodal
et al. [ESA 2010, 2012, and 2013]
Encode RMQ for an
matrix requires
Slide12
BIGGER PICTURE
Encoding 1D
Range Minimum Queries
Fischer and
Heun
[SICOMP
2011]Encoding also using
bits via Cartesian tree
All-Nearest Larger Values
Asano et al. [
Mehlhorn’s
Festschrift 2009, WADS 2013]
Trade-offs for computing the solutions to all NLV queries
Berkman et al. [J.
Alg
1993]
Parallel algorithms for parenthesis matching, triangulating monotone polygonsEncoding 2D Nearest Larger ValuesJo, Raman, and Rao [WALCOM 2015], Jayapaul et al. [IWOCA 2014]Encode NLV of
array under
metric using
bits
Encoding
2D
Range Minimum Queries
See
Brodal
et al. [ESA 2010, 2012, and 2013]
Encode RMQ for an
matrix requires
Slide13
CARTESIAN TREES Review
We can rebuild
him
.
We have the technology
.
TREESlide14
Nondirectional NLV Tree
Tie breaking rule: break ties to by choosing the one to the right.Slide15
TIEBREAKING MATTERS?
1
2
3
4
5
6
7
8
9
10
Rule
To the right
1
2
5
14
40
116
341
1010
3009
9012
To the smaller
1
2
5
14
42
126
383
1178
3640
11316
To the larger
1
2
5
12
32
88
248
702
1998569612345678910RuleTo the right1251440116341101030099012To the smaller12514421263831178364011316To the larger12512328824870219985696Open problem: Does the tie breaking rule affect the constant factor: i.e., ?
Slide16
IDEA: Compress RUNSSlide17
IDEA: Compress RUNSSlide18
IDEA: Compress RUNSSlide19
IDEA: Compress RUNSSlide20
DIGRESSION: PATH (OR CHAIN) COMPRESSION
If there are
deleted nodes, and
chains, then store:
Path/chain-compressed tree: ~
bits
Bitvector
marking chain terminals:
bits
Bitvector
of length
with
ones: unary chain lengths
bits
Bitvector
of length
indicating a zig or zag for each deleted
node
This works out to
bits
Note: doesn’t support queries, just recovers structure
Degree two
Degree one
Terminal
SubtreeSlide21
Compressing Cartesian trees w.r.t. NLVs
Lemma
: Excluding chains containing nodes representing array elements
or
, if a chain contains
deleted elements, then there are exactly
combinatorially distinct chains with respect to answering nearest larger value queries (breaking ties to the right).
Forget about whether it zigs or zags, just store # in prefix…Slide22
The Encoding
If there are
deleted nodes, and
chains, then store:
Path/chain-compressed tree: ~
bits
Bitvector
marking chain terminals:
bits
Bitvector
of length
with
ones: unary chain lengths
bits
For each deleted chain of length
, store number of nodes in prefix
This takes no more than
bits
If we maximize this expression in terms of
and
:
Upper bounded by
bits
We can improve the encoding of the chain lengths
:
Encode multiset of lengths to zeroth order empirical entropy
This improves the upper bound constant factor to about
(
)
Slide23
Sub-optimality examplesSlide24
Sub-optimality examplesSlide25
Sub-optimality examplesSlide26
Sub-optimality examplesSlide27
Encoding
Data structure
Tree decomposition: Mini-Micro trees
Farzan
and
Munro
Davoodi
et al. showed how to support select-
inorder for binary tree We simply plug our compression into this frameworkNeed to support two additional operations: is_chain_prefix/suffixD
ecompress fingerprints, use lookup tables: tree +
inorder
position
Theorem
: Space bound the same as encoding
bits and supports
nondirectional
NLV query in
time
Slide28
Lower bound sketch
“Computer assisted” lower bound idea:
Rough Idea 1: Use the computer to count
number
of distinct structures on
elements for some integer
. Call this value
.
Rough Idea 2: Given an instance of size
glue pieces of size
together without restricting the number of possible configurations of each piece
Two adjacent
-structures can obviously interfere with each other in non-trivial ways
Slide29
Lower bound sketch
Clearer Idea:
Fix tiebreaking rule: to the right
is the number of distinct
sized NLV structures
is the number of distinct
sized NLV structures
with added restrictions
Break permutation into upper and lower half:
Green blocks come from upper half, blue from lower half
Interleave: green guys can have
configurations, blue guys:
Only the max in each green block will “exit”
10
2
3
1
9
8
7
11
5
4
10
2
3
1
9
8
7
11
5
4Slide30
Conclusions and Open Problems
For nearest larger value problems the details are crucial:
Distinct elements?
Definition of nearest?
Tiebreaking rules?
We have considered encodings of
nondirectional NLVFor an array containing distinct elements these can be encoded using less than
bits: slightly less than the Cartesian tree
Open Problems:
What is the optimal space bound for the
nondirectional NLV?Distinct vs. nondistinct?
Does the tiebreaking rule affect the constant factor?
Other formulations?
Slide31
Thank You