/
Yuqing Wu, Dirk Van Gucht 		Indiana University Yuqing Wu, Dirk Van Gucht 		Indiana University

Yuqing Wu, Dirk Van Gucht Indiana University - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
373 views
Uploaded On 2016-04-11

Yuqing Wu, Dirk Van Gucht Indiana University - PPT Presentation

Marc Gyssens Hasselt University Jan Paredaens University of Antwerp A Study of a Positive Fragment of Path Queries Expressiveness Normal Form and Minimization TexPoint fonts used in EMF ID: 278429

path query xml tree query path tree xml expression evaluation ttt exp nodes index child discussion evaluationsummary expressivenessefficient formresolution

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Yuqing Wu, Dirk Van Gucht Indiana Univ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

A

ASlide2

Research in XMLXML data modelXML query languagesXPathXQuery……XML data repositories Support from DB vendors

LORE, Niagara, TIMBER……2Slide3

Research in XMLCharacteristics of XML query languagesXPath and fragmentsCharacteristics: expressiveness, distinguishibility, complexity, …System design Query processing and evaluation

New access methods: structural joinIntegrity, security, ……

3Slide4

Theoretical Study  System Design RDB did very well in this aspectOur work at Indiana UniversityCoupling the theoretical study of XML data and query language and the system design of XML search engines [ICDT-EROW 07]

Coupling the partition of XML documents induced by the structure of XML document with the partition induced by fragments of XPath algebras. [DBPL 07, IS 08]

Applying the coupling in the design of structural indices for XML [

WebDB

08]

Designing workload sensitive structural indices for XML. [in submission]

4Slide5

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

5Slide6

OutlineWhat we studiedXML documentsPath+

algebraTree queriesEquivalences of query languagesNormal formResolution expressiveness

Efficient query evaluation

Summary and discussion

6Slide7

XML DocumentsA labeled tree (V, Ed, l), whereV is the set of nodes

Ed is the set of edgesl : V

L

is a node-labeling function.

7Slide8

Querying XML Documentfor $i

in doc(…)//a/b for $j in $i/c

/*/d[e]

for $k in $j/*/f

return ($

i

, $k)

intersect

for $

i

in doc(…)//a/b

for $j in $

i/c

/

a/d

for $k in $j/c/f

return ($

i

, $k)

8Slide9

Path+ Algebra – Path Semantics

9Slide10

Path+ Expression – An Example

E

(

D

) = {(n

8

,n

11

), (n

8

,n

12

)}

10Slide11

Interesting Sub-languages

Path+ : Path

+

(

P

1

,

P

2

) :

DPath

+

(

P

1

) :

11Slide12

Tree Query for XML A tree query T is a 3-tuple (T, s, d), with T: a labeled tree – nodes of

T are either labeled with a symbol of L or with a wildcard *.

s and d: nodes of

T

, called the

source

and

destination

nodes.

12Slide13

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

13Slide14

Equivalences of Query LanguagesTheorem The query languages Path+

, T and Path+(

P

1

,

P

2

)

are all equivalent in expressive power, and there exist translation algorithms between any two of them.

14Slide15

Path+ Expression  Tree Query T

*

l

*

*

s

d

*

*

d

s

15Slide16

Transformation of Composition

16Slide17

Transformation of Intersection

E

1

E

2

17Slide18

Tree Query T  Path+

ExpressionBase cases:

Empty tree

(<{n},

>,

n,n

)

(<{n

1

, n

2

},{

(

n

1

, n

2

)}>,

n

1

, n

2

)(<{n1, n2},{(n1, n2)}>,

n2, n1)

s

(

d

)

s

d

d

s

s

(

d

)

l

18Slide19

Tree Query T  Path+

Expression

Recursive case #1:

s

is not an ancestor of

d

.

s

has no child and

l

(

s

)=*

d

is parent of

s, d

has no ancestor,

no other

child and

l

(

d

)=*

d

s

p

T

1

T

2

d

s

19Slide20

Recursive case #2: s is not the root.s has no child and

l(s)=*

Tree Query

T

Path

+

Expression

d

s

r

T

1

T

2

s

r

d

20Slide21

Recursive case #3: s is a strict ancestor of d.d has no child and

l(d)=*

s

is parent of

d, s

has no child

other than

d

and

l

(

d

)=*

Tree Query

T

Path

+

Expression

d

s

d

s

T

1

T

2

p

21Slide22

Recursive case #4: s = d is the root.

l(s)=*

Tree Query

T

Path

+

Expression

s

,

d

T

1

s

,

d

T

n

c

1

c

n

22Slide23

Equivalences of Query LanguagesTheorem The query languages Path+

, T and Path+(

P

1

,

P

2

)

are all equivalent in expressive power, and there exist translation algorithms between any two of them

.

Path

+

exp

T

query

Path

+

(

P1, P2) exp

23Slide24

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

24Slide25

Normal Form Observation about the tree query  Path+

(P1,

P

2

)

transformation:

The resultant Path

+

(

P

1

,

P

2

) expression is of the form

where

m≥0 and n ≥0

C

i

(

i

= u

m ,…, u1, d1 ,…, dn) are of the formCtop

is of the formE is a DPath

+

(

P

1

)

expression.

d

r

t

s

25Slide26

Normal FormE(Tts

) -1; E(T

tt

)

;

P

2

(

E(

T

rt

)

);

E(

T

td

)

E(

T

ts

), E(

T

tt

),E(Trt),E(

Ttd) are

DPath

+

(

P

1

)

expressions

T

ts

T

td

T

rt

T

tt

d

r

t

s

26Slide27

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

27Slide28

Resolution ExpressivenessResolution expressiveness: a language’s ability to distinguish a pairs of nodes of a pair of paths in the document. 28Slide29

Expression Equivalence Nodes m1 and m2 are

expression-related (m1 ≥exp m

2

), if for each expression

E

,

E

(

D

)(

m

1

)

implies

E

(

D

)(

m

2

)

 , where E(D)(m) = {n | (m,n) E(D)}.m

1 =exp

m

2

if

m

1

exp

m

2

and

m

2

exp

m

1

29Slide30

1-equivalence Nodes m1 and m2 are

downward 1-related (m1 ≥

1

m

2

)

iff

l

(

m

1

) =

l

(

m

2

);

For each child

n

1

of

m

1, there exist a child n2 of m2 such that n1 ≥1 n2. Nodes

m1 and m2 are

1-related

(

m

1

1

m

2

)

iff

m

1

1

m

2

if

m

1

is not the root and

p

1

is the parent of

m

1

, then

m

2

is not the root with parent

p

2

such that

p

1

1

p

2

.

m

1

=

1

m

2

if

m

1 ≥

1 m2 and m2

≥1 m1.

(m

1, n1)

≥1

(m2, n

2) if m1

≥1 m2 and n1 ≥1

n2 and sig(

m1, n1) =

sig

(

m

2

,

n

2

)

.

30Slide31

Resolution ExpressivenessTheorem :

m

1

=

exp

m

2

iff

m

1

=

1

m

2

Theorem

:

(

m

1

,

n

1

)

E(D) implies

(

m

2

,

n

2

)

E(D)

iff

(

m

1

,

n

1

)

1

(

m

2

,

n

2

)

31Slide32

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

32Slide33

Tree Query Minimization1st Reduction: merging 1-equivalent nodes in a tree query;

*

a

a

*

c

*

c

*

d

d

s

d

c

*

d

*

a

a

*

c

*

c

*

d

d

s

d

33Slide34

Tree Query Minimization 1-*-related (≥*1 ): relax 1-related with

l (m1) +

l

(

m

2

) =

l

(

m

2

);

2

nd

Reduction: deleting from a tree query in a top-down fashion every node m

1

for which there exists

another

node m

2

such that

m

1

≥*1 m2 .

*

a

a

*

c

*

c

*

d

d

s

d

*

a

c

*

d

s

d

34Slide35

Efficient Query Evaluation

Path

+

Expression

E

(

T

ts

)

-1

;

E

(

T

tt

)

;

P

2

(

E

(

T

rt

)

)

;

E

(

T

td

)

E

(

T

ts

) ,

E

(

T

tt

),

E

(

T

rt

),

E

(

T

td

)

are

DPath

+

(

P

1

) expressions

T

ts

T

td

T

rt

T

tt

d

r

t

s

Minimum Tree Query

Tree Query

1

st

& 2

nd

Reduction

Normal Form

35Slide36

Efficient Query EvaluationExp = E(

Tts)

-1

;

E

(

T

tt

)

;

P

2

(

E

(

T

rt

)

)

;

E

(

Ttd) E

(Tts) , E(

T

tt

),

E

(

T

rt

),

E

(

T

td

)

are

DPath

+

(

P

1

) expressions

DPath

+

(

P

1

) queries can be evaluated via index-only plan using P(k)-

Trie

index. [Bre08]

[Bre08]: Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz.

Trie

Indices for Efficient XML Query Evaluation.

WebDB

2008.

36Slide37

OutlineWhat we studiedEquivalences of query languagesNormal formResolution expressivenessEfficient query evaluationSummary and discussion

37Slide38

SummaryObjects of study:

XML document: a treePath+ language : Tree queries Areas of study:

Expressiveness

Equivalence

Normal form

Query evaluation

38Slide39

Extending the Path+ language

Adding operators:Will the results hold? Expressiveness

Equivalence

Normal form

Query evaluation

39Slide40

Yuqing Wu, Dirk Van Gucht Indiana UniversityMarc Gyssens Hasselt UniversityJan Paredaens University of Antwerp

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and Minimization

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

A

A

Thank you.

Questions?

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form and MinimizationSlide41

41SIGMOD/PODS 2010 Indianapolis, Indiana, USAConference date: Jun, 2010

Deadlines: SIGMOD early Nov, 2009 PODS early Dec, 2009Slide42

P[k]-Trie Index

Keep track of the

P

[

k

]

-partitions

Use the reverse label path as key

P

[

2

]

(A)

(B)

(C)

(D)

{(A

1

, A

1

), (A

2

, A

2

)}

{(B

1

, B

1

), (B

2

, B

2

), (B

3

, B

3

), (B

4

, B

4

),

(B

5

, B

5

)}

{(C

1

, C

1

), (C

2

, C

2

), (C

3

, C

3

),

(C

4

, C

4

)}

{(D

1

, D

1

)}

(A,A)

(A,B)

(B,B)

(B,C)

(B,D)

{(A

1

, A

2

)}

{(A

1

, B

1

), (A

2

, B

2

), (A

2

, B

3

), (A

1

, B

4

)}

{

(B

4

, B

5

)}

{(B

1

, C

1

), (B

2

, C

2

), (B

3

, C

3

), (B

5

, C

4

)}

{(B

2

, D

1

)}

(A,A,B)

(A,B,B)

(A,B,C)

(A,B,D)

(B,B,C)

{(A

1

, B

2

),

(A

1

, B

3

)}

{(A

1

, B

5

)}

{(A

1

, C

1

), (A

2

, C

2

), (A

2

, C

3

)}

{(A

2

, D

1

)}

{(B

4

, C

4

)}

42Slide43

Query Evaluation with P[k]-Trie Index

Query 1: //A/B/C

43Slide44

Query Evaluation with P[k]-Trie Index

Query 2: //B/C

44Slide45

Query Evaluation with P[k]-Trie Index

Query 3: //A/B[./D]/C

45Slide46

Query Evaluation with P[k]-Trie Index

Query 3: //A/B[./D]/C

46