Marc Gyssens Hasselt University Jan Paredaens University of Antwerp A Study of a Positive Fragment of Path Queries Expressiveness Normal Form and Minimization TexPoint fonts used in EMF ID: 278429
Download Presentation The PPT/PDF document "Yuqing Wu, Dirk Van Gucht Indiana Univ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization
TexPoint fonts used in EMF.
Read the TexPoint manual before you delete this box.:
A
ASlide2
Research in XMLXML data modelXML query languagesXPathXQuery……XML data repositories Support from DB vendors
LORE, Niagara, TIMBER……2Slide3
Research in XMLCharacteristics of XML query languagesXPath and fragmentsCharacteristics: expressiveness, distinguishibility, complexity, …System design Query processing and evaluation
New access methods: structural joinIntegrity, security, ……
3Slide4
Theoretical Study System Design RDB did very well in this aspectOur work at Indiana UniversityCoupling the theoretical study of XML data and query language and the system design of XML search engines [ICDT-EROW 07]
Coupling the partition of XML documents induced by the structure of XML document with the partition induced by fragments of XPath algebras. [DBPL 07, IS 08]
Applying the coupling in the design of structural indices for XML [
WebDB
08]
Designing workload sensitive structural indices for XML. [in submission]
4Slide5
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
5Slide6
OutlineWhat we studiedXML documentsPath+
algebraTree queriesEquivalences of query languagesNormal formResolution expressiveness
Efficient query evaluation
Summary and discussion
6Slide7
XML DocumentsA labeled tree (V, Ed, l), whereV is the set of nodes
Ed is the set of edgesl : V
L
is a node-labeling function.
7Slide8
Querying XML Documentfor $i
in doc(…)//a/b for $j in $i/c
/*/d[e]
for $k in $j/*/f
return ($
i
, $k)
intersect
for $
i
in doc(…)//a/b
for $j in $
i/c
/
a/d
for $k in $j/c/f
return ($
i
, $k)
8Slide9
Path+ Algebra – Path Semantics
9Slide10
Path+ Expression – An Example
E
(
D
) = {(n
8
,n
11
), (n
8
,n
12
)}
10Slide11
Interesting Sub-languages
Path+ : Path
+
(
P
1
,
P
2
) :
DPath
+
(
P
1
) :
11Slide12
Tree Query for XML A tree query T is a 3-tuple (T, s, d), with T: a labeled tree – nodes of
T are either labeled with a symbol of L or with a wildcard *.
s and d: nodes of
T
, called the
source
and
destination
nodes.
12Slide13
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
13Slide14
Equivalences of Query LanguagesTheorem The query languages Path+
, T and Path+(
P
1
,
P
2
)
are all equivalent in expressive power, and there exist translation algorithms between any two of them.
14Slide15
Path+ Expression Tree Query T
*
l
*
*
s
d
*
*
d
s
15Slide16
Transformation of Composition
16Slide17
Transformation of Intersection
E
1
E
2
17Slide18
Tree Query T Path+
ExpressionBase cases:
Empty tree
(<{n},
>,
n,n
)
(<{n
1
, n
2
},{
(
n
1
, n
2
)}>,
n
1
, n
2
)(<{n1, n2},{(n1, n2)}>,
n2, n1)
s
(
d
)
s
d
d
s
s
(
d
)
l
18Slide19
Tree Query T Path+
Expression
Recursive case #1:
s
is not an ancestor of
d
.
s
has no child and
l
(
s
)=*
d
is parent of
s, d
has no ancestor,
no other
child and
l
(
d
)=*
d
s
p
T
1
T
2
d
s
19Slide20
Recursive case #2: s is not the root.s has no child and
l(s)=*
Tree Query
T
Path
+
Expression
d
s
r
T
1
T
2
s
r
d
20Slide21
Recursive case #3: s is a strict ancestor of d.d has no child and
l(d)=*
s
is parent of
d, s
has no child
other than
d
and
l
(
d
)=*
Tree Query
T
Path
+
Expression
d
s
d
s
T
1
T
2
p
21Slide22
Recursive case #4: s = d is the root.
l(s)=*
Tree Query
T
Path
+
Expression
s
,
d
…
T
1
s
,
d
T
n
c
1
c
n
22Slide23
Equivalences of Query LanguagesTheorem The query languages Path+
, T and Path+(
P
1
,
P
2
)
are all equivalent in expressive power, and there exist translation algorithms between any two of them
.
Path
+
exp
T
query
Path
+
(
P1, P2) exp
23Slide24
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
24Slide25
Normal Form Observation about the tree query Path+
(P1,
P
2
)
transformation:
The resultant Path
+
(
P
1
,
P
2
) expression is of the form
where
m≥0 and n ≥0
C
i
(
i
= u
m ,…, u1, d1 ,…, dn) are of the formCtop
is of the formE is a DPath
+
(
P
1
)
expression.
d
r
t
s
25Slide26
Normal FormE(Tts
) -1; E(T
tt
)
;
P
2
(
E(
T
rt
)
);
E(
T
td
)
E(
T
ts
), E(
T
tt
),E(Trt),E(
Ttd) are
DPath
+
(
P
1
)
expressions
T
ts
T
td
T
rt
T
tt
d
r
t
s
26Slide27
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
27Slide28
Resolution ExpressivenessResolution expressiveness: a language’s ability to distinguish a pairs of nodes of a pair of paths in the document. 28Slide29
Expression Equivalence Nodes m1 and m2 are
expression-related (m1 ≥exp m
2
), if for each expression
E
,
E
(
D
)(
m
1
)
implies
E
(
D
)(
m
2
)
, where E(D)(m) = {n | (m,n) E(D)}.m
1 =exp
m
2
if
m
1
≥
exp
m
2
and
m
2
≥
exp
m
1
29Slide30
1-equivalence Nodes m1 and m2 are
downward 1-related (m1 ≥
1
m
2
)
iff
l
(
m
1
) =
l
(
m
2
);
For each child
n
1
of
m
1, there exist a child n2 of m2 such that n1 ≥1 n2. Nodes
m1 and m2 are
1-related
(
m
1
≥
1
m
2
)
iff
m
1
≥
1
m
2
if
m
1
is not the root and
p
1
is the parent of
m
1
, then
m
2
is not the root with parent
p
2
such that
p
1
≥
1
p
2
.
m
1
=
1
m
2
if
m
1 ≥
1 m2 and m2
≥1 m1.
(m
1, n1)
≥1
(m2, n
2) if m1
≥1 m2 and n1 ≥1
n2 and sig(
m1, n1) =
sig
(
m
2
,
n
2
)
.
30Slide31
Resolution ExpressivenessTheorem :
m
1
=
exp
m
2
iff
m
1
=
1
m
2
Theorem
:
(
m
1
,
n
1
)
E(D) implies
(
m
2
,
n
2
)
E(D)
iff
(
m
1
,
n
1
)
≥
1
(
m
2
,
n
2
)
31Slide32
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
32Slide33
Tree Query Minimization1st Reduction: merging 1-equivalent nodes in a tree query;
*
a
a
*
c
*
c
*
d
d
s
d
c
*
d
*
a
a
*
c
*
c
*
d
d
s
d
33Slide34
Tree Query Minimization 1-*-related (≥*1 ): relax 1-related with
l (m1) +
l
(
m
2
) =
l
(
m
2
);
2
nd
Reduction: deleting from a tree query in a top-down fashion every node m
1
for which there exists
another
node m
2
such that
m
1
≥*1 m2 .
*
a
a
*
c
*
c
*
d
d
s
d
*
a
c
*
d
s
d
34Slide35
Efficient Query Evaluation
Path
+
Expression
E
(
T
ts
)
-1
;
E
(
T
tt
)
;
P
2
(
E
(
T
rt
)
)
;
E
(
T
td
)
E
(
T
ts
) ,
E
(
T
tt
),
E
(
T
rt
),
E
(
T
td
)
are
DPath
+
(
P
1
) expressions
T
ts
T
td
T
rt
T
tt
d
r
t
s
Minimum Tree Query
Tree Query
1
st
& 2
nd
Reduction
Normal Form
35Slide36
Efficient Query EvaluationExp = E(
Tts)
-1
;
E
(
T
tt
)
;
P
2
(
E
(
T
rt
)
)
;
E
(
Ttd) E
(Tts) , E(
T
tt
),
E
(
T
rt
),
E
(
T
td
)
are
DPath
+
(
P
1
) expressions
DPath
+
(
P
1
) queries can be evaluated via index-only plan using P(k)-
Trie
index. [Bre08]
[Bre08]: Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz.
Trie
Indices for Efficient XML Query Evaluation.
WebDB
2008.
36Slide37
OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion
37Slide38
SummaryObjects of study:
XML document: a treePath+ language : Tree queries Areas of study:
Expressiveness
Equivalence
Normal form
Query evaluation
38Slide39
Extending the Path+ language
Adding operators:Will the results hold? Expressiveness
Equivalence
Normal form
Query evaluation
39Slide40
Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization
TexPoint fonts used in EMF.
Read the TexPoint manual before you delete this box.:
A
A
Thank you.
Questions?
A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and MinimizationSlide41
41SIGMOD/PODS 2010 Indianapolis, Indiana, USAConference date: Jun, 2010
Deadlines: SIGMOD early Nov, 2009 PODS early Dec, 2009Slide42
P[k]-Trie Index
Keep track of the
P
[
k
]
-partitions
Use the reverse label path as key
P
[
2
]
(A)
(B)
(C)
(D)
{(A
1
, A
1
), (A
2
, A
2
)}
{(B
1
, B
1
), (B
2
, B
2
), (B
3
, B
3
), (B
4
, B
4
),
(B
5
, B
5
)}
{(C
1
, C
1
), (C
2
, C
2
), (C
3
, C
3
),
(C
4
, C
4
)}
{(D
1
, D
1
)}
(A,A)
(A,B)
(B,B)
(B,C)
(B,D)
{(A
1
, A
2
)}
{(A
1
, B
1
), (A
2
, B
2
), (A
2
, B
3
), (A
1
, B
4
)}
{
(B
4
, B
5
)}
{(B
1
, C
1
), (B
2
, C
2
), (B
3
, C
3
), (B
5
, C
4
)}
{(B
2
, D
1
)}
(A,A,B)
(A,B,B)
(A,B,C)
(A,B,D)
(B,B,C)
{(A
1
, B
2
),
(A
1
, B
3
)}
{(A
1
, B
5
)}
{(A
1
, C
1
), (A
2
, C
2
), (A
2
, C
3
)}
{(A
2
, D
1
)}
{(B
4
, C
4
)}
42Slide43
Query Evaluation with P[k]-Trie Index
Query 1: //A/B/C
43Slide44
Query Evaluation with P[k]-Trie Index
Query 2: //B/C
44Slide45
Query Evaluation with P[k]-Trie Index
Query 3: //A/B[./D]/C
45Slide46
Query Evaluation with P[k]-Trie Index
Query 3: //A/B[./D]/C
46