/
Optimal Query Processing Meets Information Theory Optimal Query Processing Meets Information Theory

Optimal Query Processing Meets Information Theory - PowerPoint Presentation

test
test . @test
Follow
358 views
Uploaded On 2018-10-06

Optimal Query Processing Meets Information Theory - PPT Presentation

Dan Suciu University of Washington Hung Ngo Mahmoud AboKhamis PODS2016 PODS2017 RelationalAI Inc Basic Question What is the optimal runtime to compute a query Q on a database ID: 685331

max xyz tree min xyz max min tree log proof yzu algorithm bound uxy zux output info upper maxd node alon yuster

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Optimal Query Processing Meets Informati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Optimal Query Processing Meets Information Theory

Dan Suciu – University of Washington

Hung Ngo

Mahmoud Abo-Khamis

[PODS’2016]

[PODS’2017]

RelationalAI

Inc.Slide2

Basic Question

What is the optimal runtime to compute a query Q on a database

D?Q, D are labeled hypergraphsProblem 1: list all occurrences in

Q in DProblem 2: check if there exists

Q in DData complexity: Q is fixed, runtime = f(

D)

2Slide3

Example Queries

3

Enumerate all labeled triangles:

R(X,Y)

S(Y,Z) ∧ T(Z,X)

∃x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

Check if there exists a labeled 4-cycle

U

Z

X

Y

Z

X

Y

R

R

S

S

T

T

KSlide4

Main Results

Problem 1: enumeration problem

Thm

D,(1) |Q(D

)| ≤ Entropic-bound ≤ Polymatroid-bound(2) Q(D) computable in time Õ(Polymatroid-bound)

Fix statistics for

D

(cardinalities, functional dependencies, max degrees)

Fix the query

Q

Problem 2: decision problem

Thm

D

,

Q(

D) is computable in time Õ(2submodular-width)

Tight, but openif computable

Computable,

but not tight

Optimal?Slide5

Main Principle

Find information-theoretic

proof ofthe upper bound, or the submodular widthConvert

proof to algorithm

5Slide6

Outline

Enumeration problemDecision problemConclusions

6Slide7

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |Q(D)| ≤ N

2S.Y is a key: |Q(D)| ≤ NS.Y has degree ≤ d: |Q(D)| ≤

d×NE.g. R(X,Y)

S(Y,Z)

T(Z,X)

No other info: |Q(D)| ≤ N

3/2

7Slide8

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |Q(D)| ≤ N

S.Y has degree ≤ d: |Q(D)| ≤ d×N

E.g. R(X,Y)

S(Y,Z)

T(Z,X)

No other info: |Q(D)| ≤ N

3/2

8Slide9

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key:

|Q(D)| ≤ NS.Y has degree ≤ d: |Q(D)| ≤ d×N

E.g. R(X,Y)

S(Y,Z)

T(Z,X)

No other info: |Q(D)| ≤ N

3/2

9Slide10

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |

Q(D)| ≤ NS.Y has degree ≤ d:

|Q(D)| ≤

d×N

E.g. R(X,Y)

S(Y,Z)

T(Z,X)

No other info: |Q(D)| ≤ N3/2

10Slide11

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |

Q(D)| ≤ NS.Y

has degree ≤

d

:

|Q(D)| ≤

d×N

E.g. R(X,Y)

S(Y,Z)

T(Z,X) No other info: |Q(D)| ≤ N

3/211Slide12

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |

Q(D)| ≤ NS.Y

has degree ≤

d

: |

Q

(

D

)| ≤

d

×N

E.g. R(X,Y) ∧ S(Y,Z) ∧

T(Z,X) No other info: |Q(D)| ≤ N3/2

12Slide13

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |

Q(D)| ≤ NS.Y

has degree ≤

d

: |

Q

(

D

)| ≤

d

×NE.g.

R(X,Y) ∧ S(Y,Z) ∧

T(Z,X) No other info: |Q(D)| ≤ N3/2

13Slide14

Maximum Output Size

maxD satisfies stats (|

Q(D)|)E.g. R(X,Y)

∧ S(Y,Z), |R

|, |S| ≤ NNo other info: |

Q(D)| ≤ N2S.Y is a key: |

Q(D)| ≤ NS.Y

has degree ≤

d

: |

Q

(

D

)| ≤

d

×NE.g.

R(X,Y) ∧ S(Y,Z) ∧

T(Z,X) No other info: |Q(D)| ≤ N

3/214Slide15

Background: Entropy, Polymatroid

Fix a set

X={X

1,…,Xk} and a function H: 2

X  R+

Def

H is called entropic if there exists randomvariables

X

s.t.

H(

U

) = entropy of

U, for U ⊆ X

Def

H is a polymatroid ifH(

∅) = 0H(V) ≥ H(U

) for U ⊆ VH(U

) + H(V) ≥ H(U ∩

V) + H(U ∪

V)

Every entropic function is a polymatroidConverse fails for k≥4 [Zhang&Yeung’98]

Shannon

inequalitiesSlide16

Enumeration Problem

16

Theorem ∀D

that satisfies the statisticslog |Q(D

)| ≤ max H entropic satisfying stats H(X) ≤ max H polymatroid satisfying stats

H(X)

Thm ∀

D

,

Q

(

D

) computable in time

Õ

(

Polymatroid-bound)

Fix a set of statistics for D (cardinalities, FDs, degrees)Fix a query

Q with variables X={X1,…,

Xk}

Asymptotically tight,but open if computable

Computable

in EXPTIME, but not tightSlide17

Proof of Upper Bound

17

Q(X,Y,Z) = R(X,Y)

∧ S(Y,Z)

∧ T(Z,X)Database D

 entropic function H

Z

X

YSlide18

Proof of Upper Bound

X

Y

a

3

a

2

b

2

d

3

Y

Z

3

m

2

q

3

q

2

m

Z

X

m

a

q

a

q

b

m

d

R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

18

Q(X,Y,Z) = R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

entropic function H

Z

X

YSlide19

Proof of Upper Bound

X

Y

Z

a

3

m

a

2

q

b

2

q

d

3

m

a

3

q

X

Y

a

3

a

2

b

2

d

3

Y

Z

3

m

2

q

3

q

2

m

Z

X

m

a

q

a

q

b

m

d

Output

Q(

D

)

R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

19

Q(X,Y,Z) = R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

entropic function H

Z

X

YSlide20

Proof of Upper Bound

X

Y

Z

a

3

m

1/5

a

2

q

1/5

b

2

q

1/5

d

3

m

1/5

a

3

q

1/5

X

Y

a

3

a2

b

2

d3

Y

Z

3

m

2

q

3

q

2

m

Z

X

m

a

q

a

q

b

m

d

Output

Q(

D

)

R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

20

H(

XYZ

) = log |

Q

(

D

)|

Q(X,Y,Z) = R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

entropic function H

Z

X

YSlide21

Proof of Upper Bound

X

Y

Z

a

3

m

1/5

a

2

q

1/5

b

2

q

1/5

d

3

m

1/5

a

3

q

1/5

X

Y

a

32/5

a2

1/5b

21/5

d3

1/5

Y

Z

3

m

2/5

2

q

2/5

3

q

1/5

2

m

0

Z

X

m

a

1/5

q

a

2/5

q

b

1/5

m

d

1/5

Output

Q(

D

)

R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

21

H(

XYZ

) = log |

Q

(

D

)|

Q(X,Y,Z) = R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

entropic function H

Z

X

YSlide22

Proof of Upper Bound

X

Y

Z

a

3

m

1/5

a

2

q

1/5

b

2

q

1/5

d

3

m

1/5

a

3

q

1/5

X

Y

a

32/5

a2

1/5b

21/5

d3

1/5

Y

Z

3

m

2/5

2

q

2/5

3

q

1/5

2

m

0

Z

X

m

a

1/5

q

a

2/5

q

b

1/5

m

d

1/5

H(

XZ

) ≤ log

N

T

H(

YZ

) ≤ log

N

S

H(

Z|Y

) ≤ log

deg

S

(

z|y

)

H(

XY

) ≤ log

N

R

Q(X,Y,Z) = R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

entropic function H

Output

Q(

D

)

R(X,Y)

S(Y,Z)

T(Z,X)

Database

D

H(

XYZ

) = log |

Q

(

D

)|

Cardinalitites

, functional dependences, max degrees

Z

X

YSlide23

Proof of Upper Bound

23

Q(X,Y,Z) = R(X,Y)

∧ S(Y,Z)

∧ T(Z,X)

|R|,|S|,|T| ≤

N  |Q(

D

)

|≤

N

3/2

Z

X

YSlide24

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ)

= 2 log |Output|

24

Q(X,Y,Z)

=

R(X,Y)

S(Y,Z)

T(Z,X)|

R|,|S|,|T| ≤ N

 |Q(D)

|≤ N3/2

Z

X

YSlide25

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ)

= 2 log |Output|

submodularity

25

Q(X,Y,Z)

=

R(X,Y)

S(Y,Z)

T(Z,X)|R|,|S|,|

T| ≤ N  |Q(

D)|≤ N

3/2

Z

X

YSlide26

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(

Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ)

= 2 log |Output|

submodularity

26

Q(X,Y,Z)

=

R(X,Y)

S(Y,Z)

T(Z,X)|R|,|S|,|T| ≤

N  |

Q(D)|≤ N

3/2

Z

X

YSlide27

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(

Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(∅) = 2 h(XYZ)

= 2 log |Output|

submodularity

submodularity

27

Q(X,Y,Z)

=

R(X,Y)

S(Y,Z)

T(Z,X)|

R|,|S|,|T| ≤ N

 |Q(

D)|≤ N3/2

Z

X

YSlide28

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(

Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(

∅) = 2 h(XYZ)

= 2 log |Output|

submodularity

submodularity

28

Q(X,Y,Z)

=

R(X,Y)

S(Y,Z)

T(Z,X)|

R|,|S|,|T| ≤

N  |Q(D

)|≤ N3/2

Z

X

YSlide29

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(

Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(

∅) = 2 h(

XYZ

)

= 2 log |

Q(

D

)

|

submodularity

submodularity

29

Q(X,Y,Z)

=

R(X,Y)

∧ S(Y,Z)

∧ T(Z,X)|

R|,|S|,|T| ≤ N

 |Q(D

)|≤ N3/2

Z

X

YSlide30

Proof of Upper Bound

3 log N

≥ h(XY) + h(YZ) + h(XZ)

≥ h(XYZ) + h(

Y) + h(XZ)

≥ h(XYZ) + h(XYZ) + h(

∅) = 2 h(

XYZ

)

= 2 log |

Q(

D

)

|

submodularity

submodularity

30

Shearer’s inequality

h(

XY

) + h(

YZ) + h(XZ) ≥ 2 h(

XYZ)

Q(X,Y,Z) = R(X,Y)

∧ S(Y,Z)

∧ T(Z,X)|R|,|

S|,|T| ≤ N 

|Q(D)

|≤ N3/2

Z

X

YSlide31

Proof to Algorithm

h(

XY)+h(

YZ)+h(XZ)

h(XYZ)

h(Y

) + h(XZ)

h(

XYZ

)

+

31

h(

XY

) + h(

YZ

) + h(

XZ

)

≥ 2 h(

XYZ

)

Q(X,Y,Z) =

R(X,Y) ∧ S(Y,Z)

∧ T(Z,X)

Proof

Z

X

YSlide32

Proof to Algorithm

R(X,Y)

∧S(Y,Z)∧

T(Z,X)

h(XY)+h(

YZ)+h(XZ)

h(XYZ)

h(

Y

) + h(

XZ

)

h(

XYZ

)

+

32

h(

XY

) + h(

YZ

) + h(

XZ

) ≥ 2 h(XYZ)

Q(X,Y,Z) = R(X,Y)

∧ S(Y,Z)

∧ T(Z,X)

Proof

Algorithm

Z

X

YSlide33

Proof to Algorithm

R(X,Y)

∧S(Y,Z)∧

T(Z,X)

Rlight(X,Y)∧

S(Y,Z)

N

3/2

h(

XY

)+h(

YZ

)

+h(

XZ

)

h(

XYZ)

h(Y) + h(

XZ)h(

XYZ)

+

33

h(

XY

) + h(

YZ) + h(XZ) ≥ 2 h(XYZ

)

Q(X,Y,Z) = R(X,Y)

∧ S(Y,Z) ∧

T(Z,X)

Proof

Algorithm

Z

X

Y

R

light

or

R

heavy

:

degree(

Y

)

≤ or >

N

1/2Slide34

Proof to Algorithm

R(X,Y)

∧S(Y,Z)∧

T(Z,X)

Rlight(X,Y)∧

S(Y,Z)R

heavy(Y)∧T(X,Z)

N

3/2

N

3/2

h(

XY

)+h(

YZ

)

+h(

XZ

)

h(

XYZ

)

h(Y) + h(XZ)

h(XYZ)

+

34

h(

XY

) + h(

YZ) + h(XZ

) ≥ 2 h(XYZ)

Q(X,Y,Z) =

R(X,Y) ∧

S(Y,Z) ∧

T(Z,X)

Proof

Algorithm

Z

X

Y

R

light

or

R

heavy

:

degree(

Y

)

≤ or >

N

1/2Slide35

Proof to Algorithm

R(X,Y)

∧S(Y,Z)∧

T(Z,X)

Rlight(X,Y)∧

S(Y,Z)R

heavy(Y)∧T(X,Z)

N

3/2

N

3/2

h(

XY

)+h(

YZ

)

+h(

XZ

)

h(

XYZ

)

h(Y) + h(XZ)

h(XYZ)

+

Runtime

Õ

(

N

3/2)

35

h(XY) + h(YZ) + h(

XZ) ≥ 2 h(XYZ)

Q(X,Y,Z)

= R(X,Y) ∧

S(Y,Z) ∧

T(Z,X)

Proof

Algorithm

Z

X

Y

R

light

or

R

heavy

:

degree(

Y

)

≤ or >

N

1/2Slide36

Enumeration Problem: Discussion

Cardinalities: [Atserias,Grohe,Marx’08, Ngo,Re,Rudra’13]Entropic bound = polymatroid bound

Algorithm for Q(D) has single log factor

Cardinalities + FDs + max degrees:Entropic bound ≨ polymatroid bound

Algorithm for Q(D) has polylog factor

36Slide37

Outline

Enumeration problemDecision problemConclusions

37Slide38

Decision Problem

38

Fix Q, fix statistics on

DProblem: does Q occur in D?

Theorem

One can check if Q is in D

in time Õ(2subw(Q))

Optimal? (fine grained lower bound is open!)

“submodular width”Slide39

Background: Tree Decomposition

Informally: TD = a tree where each node t represents an enumeration problem

Fractional hypetree width [Grohe,Marx’14]

mintree max

node t maxD

Submodular width [Marx’2013]maxD min

tree maxnode

t

39Slide40

40

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

min

tree

max

node t maxDSlide41

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u)

K(

u,x

),R(

x,y

)

41

min

tree

max

node

t

maxD

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(z,u)∧K(

u,x)

|R|,|S|,|T|,|K| ≤ N O(

N3/2) algorithm [Alon,Yuster,Zwick’97]Slide42

R(

x,y

),S(y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(

z,u

)

K(

u,x

),R(

x,y

)

42

min

tree

max

node t maxD

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u)∧K(u,x)

|R|,|S|,|T|,|K| ≤

N O(N3/2) algorithm [Alon,Yuster,Zwick’97]

Runtime

Õ

(N

2

)

(suboptimal)Slide43

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u

)

K(

u,x

),R(

x,y

)

43

min

tree

max

node

t maxD

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u)∧K(u,x)

|R|,|S|,|T|,|K| ≤

N O(N3/2) algorithm [Alon,Yuster,Zwick’97]

Runtime

Õ

(N

2

)

(suboptimal)Slide44

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u)

K(

u,x

),R(

x,y

)

44

max

D

min

tree

maxnode t

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u)∧K(u,x)

|R|,|S|,|T|,|K|

≤ N O(N3/2) algorithm [Alon,Yuster,Zwick’97]Slide45

R(

x,y

),S(y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(

z,u

)

K(

u,x

),R(

x,y

)

min(

max(h(

xyz

),h(zux))

, max(h(yzu),h(uxy

))) =

T1

45

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u)∧K(u,x)

|R|,|S|,|T|,|K| ≤

N O(N3/2) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

tSlide46

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u

)

K(

u,x

),R(

x,y

)

min(

max(h(

xyz

),h(

zux)),

max(h(yzu),h(uxy))) =

T1

T2

46

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(u,x

)

|R|,|S|,|T|,|K| ≤ N O(N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

tSlide47

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u)

K(

u,x

),R(

x,y

)

min( max(h(

xyz

),h(

zux

)),

max(h(yzu),h(

uxy))) =

T1

T2

47

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u)∧K(

u,x)

|R|,|S|,|T|,|K| ≤ N O(

N3/2) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

tSlide48

R(

x,y),S(

y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(z,u)

K(

u,x

),R(

x,y

)

min( max(h(xyz),h(

zux

)),

max(h(

yzu

),h(

uxy))) =

T1

T2

= max(min(h(

xyz

),h(yzu)), min(h(

xyz),h(uxy)), min(h(

zux),h(yzu)), min(h(zux),h(uxy

))) 48

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

tSlide49

R(

x,y

),S(y,z)

T(

z,u),K(u,x)

Tree decompositions

S(y,z),T(

z,u

)

K(

u,x

),R(

x,y

)

= max(

min(h(

xyz

),h(yzu)),

min(h(xyz),h(uxy)),

min(h(zux),h(yzu

)), min(h(zux

),h(uxy)))

min( max(h(xyz),h(

zux)), max(h(yzu

),h(uxy))) =

T1

T2

3

log

N

≥ h(

xy) + h(yz) + h(

zu)

49

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

tSlide50

S(

y,z

),T(z,u)

K(

u,x),R(x,y)

= max(min(h(xyz),h(yzu

)), min(h(xyz

),h(

uxy

)),

min(h(

zux

),h(

yzu

)),

min(h(

zux

),h(uxy)))

min( max(h(xyz),h(zux

)), max(h(yzu

),h(uxy))) =

T1

T2

3

log

N

h(xy) + h(yz) + h(

zu) ≥ h(xyz) + h(y) + h(

zu)

50

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

t

R(

x,y

),S(

y,z

)

T(

z,u

),K(

u,x

)

Tree decompositionsSlide51

= max(

min(h(xyz),h(yzu)

), min(h(xyz

),h(uxy)),

min(h(zux),h(yzu

)), min(h(zux

),h(uxy)))

min( max(h(xyz),h(

zux

)),

max(h(

yzu

),h(

uxy

))) =

T1

T2

3

log

N

h(

xy

) + h(yz)

+ h(zu) ≥ h(xyz) +

h(y) + h(zu) ≥ h(xyz) + h(

yzu) + h(∅)

51

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

t

S(

y,z

),T(

z,u

)

K(

u,x

),R(

x,y

)

R(

x,y

),S(

y,z

)

T(

z,u

),K(

u,x

)

Tree decompositionsSlide52

3

log N ≥

h(xy) + h(

yz) + h(zu

) ≥ h(xyz) + h(y) + h(

zu) ≥ h(xyz) + h(yzu) + h(

∅) ≥ 2 min(h(

xyz

),h(

yzu

))

= max(

min(h(

xyz

),h(

yzu

)), min(h(

xyz),h(uxy)),

min(h(zux),h(yzu)),

min(h(zux),h(uxy

)))

min( max(h(xyz),h(zux

)), max(h(yzu

),h(uxy))) =

T1

T2

52

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

t

S(

y,z

),T(

z,u

)

K(

u,x

),R(

x,y

)

R(

x,y

),S(

y,z

)

T(

z,u

),K(

u,x

)

Tree decompositionsSlide53

= max(min(h(

xyz),h(yzu)), min(h(

xyz),h(uxy)), min(h(

zux),h(yzu)), min(h(zux

),h(uxy))) ≤ 3/2 log

N

min( max(h(xyz),h(zux)),

max(h(

yzu

),h(

uxy

))) =

T1

T2

subw

(

Q

) =

3/2

log

N

3

log

N

≥ h(

xy) + h(yz) + h(

zu) ≥ h(xyz) + h(y) + h(

zu) ≥ h(xyz) + h(yzu

) + h(∅) ≥ 2 min(h(xyz

),h(yzu))

53

U

Z

X

Y

Q() = ∃

x∃y∃z∃u

R(

x,y

)∧S(

y,z

)∧T(

z,u

)∧K(

u,x

)

|R|,|S|,|T|,|K|

N

O(

N

3/2

) algorithm [Alon,Yuster,Zwick’97]

max

D

min

tree

max

node

t

S(

y,z

),T(

z,u

)

K(

u,x

),R(

x,y

)

R(

x,y

),S(

y,z

)

T(

z,u

),K(

u,x

)

Tree decompositionsSlide54

Proof

to Algorithm

Use the proof of: to compute the

disjunctive datalog rule:

(details omitted)

54

Runtime Õ(N

3/2

)

h(

xyz

)+h(

yzu

) ≤ h(

xy

) + h(

yz) + h(

zu)

A(x,y,z) ∨ B

(y,z,u)  R(

x,y) ∧ S(y,z)

∧ T(z,u)Slide55

Outline

Enumeration problemDecision problemConclusions

55Slide56

Conclusions

Query evaluation summary:

Information theory  ProofProof

 AlgorithmOpen problems:Better “Proof

 Algorithm”Fine-grained lower bounds

56Slide57

Thank You!

Questions?

Hung Ngo

Mahmoud Abo-Khamis

[PODS’2016]

[PODS’2017]

RelationalAI

Inc.