/
1 Data integration 1 Data integration

1 Data integration - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
445 views
Uploaded On 2016-12-12

1 Data integration - PPT Presentation

Most slides are borrowed from Dr Chen Li UC Irvine 2 Motivation Legacy database Plain text files Biblio sever Support seamless access to autonomous and heterogeneous information sources 3 Comparison Shopping ID: 500828

qtr crs title query crs qtr query title source std mediator reg year isbn views rewriting data queries containment

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Data integration" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Data integration

Most slides are borrowed from Dr. Chen Li, UC IrvineSlide2

2

Motivation

Legacy database

Plain text files

Biblio sever

Support seamless access to autonomous and heterogeneous information sources.Slide3

3

Comparison Shopping

Lowest price of the

DVD: “The Matrix”?

Applications

Comparison shopping

Supply-chain management

Supplier 2

Integrator

Supplier M

Supplier 1

Buyer 2

Buyer M

Buyer 1

…Slide4

4

Mediation architecture

Mediator

Wrapper

Source 1

Wrapper

Source 2

Wrapper

Source nSlide5

5

Sources are heterogeneous:

Different data models: relational, object-oriented, XML, …

Different schemas and representations. E.g.,

“Keanu Reeves” or “Reeves, Keanu” or “Reeves, K.” etc.

Describe source contentsUse source data to answer queriesSources have limited query capabilitiesData qualityPerformance… …ChallengesSlide6

6

Research projects

Garlic (IBM),

Information Manifold (AT&T)

InfoSleuth (MCC),

Tsimmis, InfoMaster (Stanford)Internet Softbot/Razor/Tukwila (U Wash.)Hermes (Maryland)Telegraph / Eddies (UC Berkeley)Niagara (Univ Wisconsin)DISCO, Agora (INRIA, France)SIMS/Ariadne (USC/ISI)Emerac/Havasu (ASU)Slide7

7

Industry

Nimble Technology

Enosys Markets

IBM

BEASlide8

8

Virtual integration

Leave the data in the sources

When a query comes in:

Determine the relevant sources to the query

Break down the query into sub-queries for the sourcesGet the answers from the sources, filter them if needed and combine them appropriatelyData is freshOtherwise known as On Demand IntegrationSlides from Dr. Michalis Petropoulos Slide9

9

Mediator

Virtual Integration Architecture

Data

Source

Data

Source

Global

Schema

Local

Schema

Local

Schema

Query

Result

Wrapper

Wrapper

End User

Design-Time

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

Slides from Dr. Michalis Petropoulos

Slide10

10

Design-Time

Mediator

Virtual Integration Architecture

Data

Source

Data

Source

Global

Schema

Local

Schema

Local

Schema

Query

Result

Wrapper

Wrapper

End User

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

2

Slides from Dr. Michalis Petropoulos

Slide11

11

Mediator

Virtual Integration Architecture

Data

Source

DataSource

Global

Schema

Local

Schema

Local

Schema

Query

Result

Wrapper

Wrapper

End User

Design-Time

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

2

3

Slides from Dr. Michalis Petropoulos

Slide12

12

Mediator

Virtual Integration Architecture

Data

Source

Data

Source

Global

Schema

Local

Schema

Local

Schema

Query

Result

Wrapper

Wrapper

End User

Design-Time

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

2

3

4

Slides from Dr. Michalis Petropoulos

Slide13

13

Mediator

Virtual Integration Architecture

Data

Source

Data

Source

Global

Schema

Local

Schema

Local

Schema

Query

Result

Wrapper

Wrapper

End User

Design-Time

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

2

5

3

4

Slides from Dr. Michalis Petropoulos

Slide14

14

Mediator

Virtual Integration Architecture

Data

Source

Data

Source

Global

Schema

Local

Schema

Local

Schema

Query

Result

End User

Wrapper

Wrapper

Design-Time

Mediation

Language

Mapping Tool

Run-Time

Query

Reformulation

Optimization

& Execution

XML

Web Services

1

2

5

6

3

4

Slides from Dr. Michalis Petropoulos

Slide15

15

Outline

Basics: theories of conjunctive queries

Global-as-view (GAV) approach to data integration

Local-as-view (LAV) approach to data integrationSlide16

16

Conjunctive Queries (CQ’s) in Datalog

Most common form of query; equivalent to select-project-join (SPJ) queries

Useful for data integration

Form:

q(X) :- p1(X1), p2(X

2),…, pn

(Xn).Head

q(X) represents the query answersBody

p1(X1), p2(X2

),…, pn(Xn)

represents the query conditions

The head is true if all the subgoals are true.Each pi(Xi

) is called a subgoal. Xi is a vector of variables or constants.

Shared variables represent join conditions

Constants represent “Attribute=const” selection conditionsA relation can appear in multiple predicates (subgoals)

head

body

subgoals

q(X) :- p

1

(X

1

), p

2

(X

2

), …, p

n

(X

n

)Slide17

17

Conjunctive queries

Head and subgoals are

atoms

.

An atom consists of a predicate applied to zero or more argumentsPredicates represent relations.An atom is true for given values of its variables iff the arguments form a tuple of the relation.Whenever an assignment of values to all variables makes all subgoals true, the rule asserts that the resulting head is also true.Slide18

18

Conjunctive Queries: example

Schema

student(name, courseNum), course(number, Instructor)

SQL

SELECT name FROM student, course WHERE student.courseNum=course.number AND instructor=‘Li’;Equal to: ans(SN) :- student(SN, CN), course(CN,’Li’).Predicates student and course

correspond to relations namesTwo subgoals: student(SN, CN) and course(CN,’Li’)

Variables: SN, CN. Constant: ‘Li’Shared variable, CN, corresponds to “student.courseNum=course.number”

Variable SN in the head: the answer to the querySlide19

19

Why not SQL

Datalog

is more concise

Let us state some general principles

e.g., containment of rules that are almost impossible to state correctly in SQL.Will see that laterRecursion is much easier to express in Datalog.Slide20

20

Answer to a CQ

For a CQ

Q

on database

D, the answer Q(D) is a set of heads of Q if we:Substitute constants for variables in the body of Q in all possible waysRequire all subgoals to be trueExample: ans(SN) :-

student(SN, CN), course(CN,’Li

’).Tuples are also called facts:

student(Jack, 184), student(Tom,215), …, course(184,Li), course(215,Li), …

Answer “Jack”: SNJack,CN184Answer “Tom”: SN

Tom,CN215Answer “Jack”: SN

Jack,CN215 (duplicate eliminated)

Student

CourseSlide21

21

Query containment

For two queries

Q

1

and Q2, we say Q1 is contained in Q2, denoted Q1Q2, if any database D, we have Q1(D) Q

2(D).We say Q1 and

Q2 are equivalent, denoted Q1

Q2, if Q1

(D) Q2

(D) and Q2(D

) 

Q1(D).Example: Q

1: ans(SN

) :- student(SN, CN),

course(CN, ’Li’).Q2: ans(SN

) :- student(SN

, CN), course(CN, INS). We have: Q

1(D) 

Q2(D).Slide22

22

Another example

Q

1

:

p(X,Y) :- r(X,W), b(W,Z), r(Z,Y).Q2: p(X,Y) :- r(X,W),

b(W,W), r(W,Y).

We have: Q2 

Q1Proof:

For any DB D, suppose p(x,y

) is in Q2

(D

). Then there is a w such that

r(x,w), b(w,w

), and

r(w,y) are in D.For Q

1, consider the substitution:

X x

, W w, Z w

, Y y. Thus the head of

Q1 becomes p(x,y)

, meaning that p(x,y

) is also in Q1(D

).In general, how to test containment of

CQ’s?Containment mappingsSlide23

23

Test containment

Two approaches:

Containment mappings.

Canonical databases.

Really the same in the simple CQ case covered so far.Containment test is NP-complete, but CQ’s tend to be small so here is one case where intractability doesn’t hurt you.Slide24

24

Containment mappings

A containment mapping from Q2 to Q1: Map variables of Q2 to variables of Q1, such that:

Head of Q2 becomes head of Q1;

Each

subgoal of Q2 becomes some subgoal of Q1.It is not necessary that every subgoal of Q1 is the target of some subgoal of Q2.

Q1 

Q2 iff

there is a containment mapping from Q2 to Q1. Note that the containment mapping is opposite the containment --- it goes from the larger (containing CQ) to the smaller (contained CQ).

Example:

Q1:

p(X,Y) :-

r(X,W), b(W,Z), r(Z,Y).

Q2:

p(X,Y) :-

r(X,W), b(W,W), r(W,Y).

Containment mapping from Q1 to Q2: X

 X, Y  Y, W

 W, Z 

WNo containment mapping from Q2 to Q1: For b(W,W

) in Q2, its only possible target in Q1 is b(W,Z)However, we cannot have a mapping WW and WZ, since each variable cannot be mapped to two different variablesSlide25

25

A slightly different example

Q1:

p(X,Y

):-

r(X,Z), g(Z,Z), r(Z,Y).

Q2: p(A,B):- r(A,C

), g(C,D),

r(D,B).

Containment mapping m

:m

(A)=X;

m(B)=Y;

m(C)=

m

(D)=Z.Slide26

26

Q1:

p(X,Y):- r(X,Y), g(Y,Z).

Q2:

p(A,B):- r(A,B), r(A,C).

Q1 looks for:

Q2 looks for:

Another Example

X

Z

Y

A

B

CSlide27

27

Q1:

p(X,Y):- r(X,Y), g(Y,Z).

Q2:

p(A,B):- r(A,B), r(A,C).

Containment mapping:m(A)=X;m(B)=

m(C)=Y.Example - Continued

Notice two

subgoals can

map to one.

And not

every subgoal

need be a

target.Slide28

28

Example - Concluded

Q1:

p(X,Y

):-

r(X,Y), g(Y,Z).Q2: p(A,B):- r(A,B),

r(A,C).

No containment mapping from Q1 to Q2.g(Y,Z

) cannot map anywhere, since there is no g

subgoal in Q2.

Thus, Q1 properly contained in Q2.Slide29

29

Extending CQ’s

CQ’s

with built-in predicates:

We can add more conditions to variables in a CQ.

Example: student(name, GPA, courseNum), course(number,instructor,year) Q1(SN) :- student(SN, G, CN), course(CN, ’Li’),

G>=3.5. Q2(SN) :- student(SN, G, CN),

course(CN, ’Li’), G>=3.5, Y < 2002. Q2(SN)

 Q1(SN).

Datalog queries: a (possibly infinite) set of CQ’s with (possibly) recursion

Example: parent(Parent, Child)

Query: finding all ancestors of Tom

ancestor(P,C) :- parent(P, C).

ancestor(P,C) :- ancestor(P,X), parent(X

, C).

result(P) :- ancestor(P, ‘tom’).Slide30

30

Although CQ theory first appeared at a database conference, the AI community has taken CQ’s to heart.

CQ’s, or similar logics like description logic, are used in a number of AI applications.

Again, their design theory is really containment and equivalence.Slide31

31

Outline

Basics: theories of conjunctive queries

Global-as-view (GAV) approach to data integration

Local-as-view (LAV) approach to data integrationSlide32

32

GAV approach to data integration

Readings:

Jeffrey Ullman, Information Integration Using Logical Views, ICDT 1997.

Ramana Yerneni, Chen Li, Hector Garcia-Molina, and Jeffrey Ullman, Computing Capabilities of Mediators, SIGMOD 1999. Slide33

33

Global-as-view Approach

Mediator

Mediator exports

views

defined on source relations

med(Dealer,City,Make,Year

) = R1 R2

A query is posted on mediator views:

SELECT * FROM med

WHERE Year = ‘2001’;

ans(D,C,M

, ‘2001’) :- med(D,C,M,‘2001’).

Mediator

expands

query to source queries:

SELECT * FROM R1, R2

WHERE Year = ‘2001’;

ans(D,C,M,’2001’) :- R1(D,C), R2(D,M, ‘2001’).

R1(Dealer,City)

R2(Dealer, Make, Year)

med(Dealer,City,Make,Year

) = R1 R2Slide34

34

Project: TSIMMIS at Stanford

Advantages:

User queries are easy to define

Query transformation generation is straightforward

Disadvantages:Not all source information is exported: Not easily scalable: every time a new source is added, mediator views need to be changed.Research issuesEfficient query execution?Deal with limited source capabilities?GAV ApproachSlide35

35

Limited source capabilities

Complete scans of relations

not

possible

Reasons: Legacy databases or structured files: limited interfaces Security/Privacy Performance concerns Example 1: legacy databases with restrictive interfaces

Ullman

DBMS

Knuth

TeX

author

title

Given an author,

return the books.Slide36

36

Another example: Web search forms

www.imdb.comSlide37

37

Problems

How to describe source restrictions?

How to compute mediator restrictions from sources?

How to answer queries efficiently given these restrictions?

How to compute as many answers as possible to a query?…Slide38

38

Computing mediator restrictions

Motivation: do not want users to be frustrated by submitting a query that cannot be answerable by the mediator

Example:

Source 1:

book(author?, title, price)Capability: “bff”i.e., we must provide an author, and can get title and price infoSource 2: review(title?, reviewer, rate)Capability: “bff”i.e., we must provide a book title, and can get other infoMediator view: MedView(A?,T,P,RV,RT) :- book(A,T,P),review(T,RV,RT).

Query on the mediator view:Ans(RT) :- MedView(A, ‘db’, P, RV, RT).

I.e., “find the review rates of DB books”But the mediator cannot answer this query, since we do not know the authors.

We want to tell the user beforehand what queries can be answeredSlide39

39

Outline

Basics: theories of conjunctive queries;

Global-as-view (GAV) approach to data integration;

Local-as-view (LAV) approach to data integration.Slide40

40

Local-as-view (LAV) approach

Mediator

There are

global predicates

, e.g., “car,” “person,” “book,” etc.

They can been seen as mediator views

The content of each source is described using these global predicates

A query to the mediator is also defined on the global predicates

The mediator finds a way to answer the query using the source contents

sourcesSlide41

41

Example

Mediator

Global predicates:

Loc(Dealer,City),Sell(Dealer,Make,Year)

Source content defined on global predicates:

S1(Dealer,City) :- Loc(Dealer, City).

S2(Dealer,Make,Year) :- Sell(Dealer, Make, Year).

In general, each definition could be more complicated, rather than direct copies.

Queries defined on global predicates.

Q: ans(D,M,Y) :- Loc(D, ’windsor’), Sell(D, M, Y).

Users do not know source views.

The mediator decides how to use source views to answer queries.

“Answering queries using views”:

ans(D, M, Y) :- S1(D,’windsor’), S2(D,M,Y).

S1(Dealer,City)

S2(Dealer,Make,Year)Slide42

42

Answering queries using views

Mediator

Source views can be complicated: SPJs, arithmetic comparisons,…

Not easy to decide how to answer a query using source views

Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).

Rewriting:

ans(D,M) :- V3(D,‘windsor’, M,Y).

ans(D,M) :- V1(D,’windsor’), V2(D,M,Y).

“Equivalent rewriting”: compute the “same” answer as the query

A rewriting can join multiple source views

V1(Dealer,City):- Loc(Dealer, City).

V2(Dealer,Make,Year):-Sell(Dealer, Make, Year).

V3(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y).

V4(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y), Y<1970.

QuerySlide43

43

Arithmetic comparisons

Mediator

Comparisons can make the problem even trickier

Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).

Rewriting: ans(D,M) :- V(D,‘windsor’, M,Y).

Contained rewriting:

only retrieve cars before 1970.

Query: ans(D,M):- Loc(D, ‘windsor'), Sell(D,M,Y), Y < 1960.

Rewriting: ans(D,M) :- V(D,‘windsor’, M, Y), Y < 1960.

V(D,C,M,Y):- Loc(D,C),Sell(D,M,Y),Y<1970.Slide44

44

Local-as-View (LAV)

Source

1

Source

2

Source

3

Source

4

Source

5

Local Schema

Local

Schema

Local

Schema

Local

Schema

Global Schema

Book

ISBN

Title

Genre

Year

Author

ISBN

Name

R1

ISBN

Title

Name

Local Schema

R5

ISBNTitle

Books before 1970

Humor Books

Create View R1 AS

SELECT B.ISBN, B.Title, A.Name

FROM Book B, Author A

WHERE A.ISBN = B.ISBN

AND B.Year < 1970

R1(ISBN, Title, Name):-

Book(ISBN, Title, Genre,Year),

Author(ISBN, Name), Year<1970.

Create View R5 AS

SELECT B.ISBN, B.Title

FROM Book B

WHERE B.Genre = ‘Humor’

R5(ISBN, Title)

:-Book(ISBN, Title, ‘humor’, Year).Slide45

45

LAV details

Query: Find authors of humor books

Q(Name):-Book(ISBN,Title,”humor”,YEAR), Author(ISBN, Name)

Views:

R1(ISBN, Title, Name):- Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970.R5(ISBN, Title) :-Book(ISBN, Title, ‘humor’, Year).Rewriting of Q using views:Q’(Name):-R1(ISBN, Title, Name), R2(ISBN, Title)Expansion of Q’Q’’(Name):- Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970, Book(ISBN, Title, ‘humor’, Year). Q’’’(Name):- Author(ISBN, Name), Year<1970, Book(ISBN, Title, ‘humor’, Year).Q’’’ is contained in QSlide46

46

Query Rewritings

Given a query Q and a set of views V:

A

conjunctive query

P is called a “rewriting” of Q using V if P only uses views in V, and P computes a partial answer of Q. That is: P

exp Q

. A rewriting is also called a “contained rewriting” (CR).

A

conjunctive query P is called an “

equivalent rewriting”

(ER) of Q using V if P only uses views in V, and P computes the

exact answer of Q. That is: Pexp

 Q.Slide47

47

Bucket algorithm

It is the basic method for query rewriting

Each subgoal must be “covered” by some view

Make a list of candidates (buckets) per query subgoal

Consider combinations of candidates from different bucketsNot all combos are “compatible”Keep the compatible ones and minimize themDiscard the ones contained in anotherTake their unionSlide48

48

The Bucket Algorithm: Example

V1

(Std,Crs,Qtr,Title)

:- reg(Std,Crs,Qtr), course(Crs,Title),

Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),

teaches(Prof,Crs,Qtr), Qtr ≤ Aut97q

(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

Step 1: For each query subgoal, put the

relevant sources into a bucketSlide49

49

The Bucket Algorithm: Example

V1

(Std,Crs,Qtr,Title)

:- reg(Std,Crs,Qtr), course(Crs,Title),

Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr)

, Qtr ≤ Aut97q

(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),

C ≥ 300, Q ≥ Aut95PProf, CCrs, QQtr

Note: Arithmetic predicates don’t pose a problem in this step

V2

Buckets

V4

teaches

reg

courseSlide50

50

The Bucket Algorithm: Example

V1

(Std,Crs,Qtr,Title)

:- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94

V4(Prof,Crs,Title,Qtr) :-

reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr)

, Qtr ≤ Aut97q

(S,C,P) :- teaches(P,C,Q),

reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95

SStd, CCrs, QQtr

Note: V3 doesn’t work: arithmetic predicates not consistent V4 doesn’t work: S not in the output of V4

V2

Buckets

V4

teaches

reg

course

V1

V2Slide51

51

The Bucket Algorithm: Example

V1

(Std,Crs,Qtr,Title)

:-

reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :-

reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr)

:- reg(Std,Crs,Qtr), course(Crs,Title)

, teaches(Prof,Crs,Qtr)

, Qtr ≤ Aut97q(S,C,P)

:- teaches(P,C,Q), reg(S,C,Q)

, course(C,T)

, C ≥ 300, Q ≥ Aut95CCrs, TTitle

V2

Buckets

V4

teaches

reg

course

V1

V2

V1

V4Slide52

52

The Bucket Algorithm: Example

Step 2:

Try all combos of views, one each from a bucket

Test satisfaction of arithmetic predicates in each case

e.g., two views may not overlap, i.e., they may be inconsistentDesired rewriting = union of surviving onesQuery rewriting 1:

q1

(S,C,P) :- V2

(S’,P,C,Q), V1(S,C,Q,T’),

V1(S”,C,Q’,T)

no problem from arithmetic predicates (none in V2)May or may not be minimal (why?)

V2

V4

teaches

reg

course

V1

V2

V1

V4Slide53

53

The Bucket Algorithm: Example

Unfolding of rewriting 1:

q1’

(S,C,P)

:- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), c(C,T),

C ≥ 500, Q

≥ Aut98, C ≥

500, Q’ ≥ Aut98

Black r’s can be mapped to green r:

S’

S, S”S, Q’Q

Black c can be mapped to green c: just extend above mapping to

TT’

Minimized unfolding of rewriting 1:

q1m’(S,C,P) :- t(P,C,Q),

r(S,C,Q),

c(C,T’), C ≥

500, Q ≥ Aut98

Minimized rewriting 1:q1m(S,C,P) :-

V2(S’,P,C,Q), V1(S,C,Q,T’)Slide54

54

The Bucket Algorithm: Example

Query Rewriting 2:

q2

(S,C,P)

:- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)

q2’(S,C,P)

:- r(S’,C,Q), t(P,C,Q), r(S,C,Q),

r(S,C,Q), c(C,T’),

C ≥ 500, Q ≥ Aut98, r(S”,C,Q’),

c(C,T), t(P’,C,Q’), Q’ ≤ Aut97

This combo is infeasible: consider the conjunction of arithmetic predicates in

V1 and V4

Query rewriting 3:

q3

(S,C,P) :-

V2(S’,P,C,Q),

V2(S,P’,C,Q), V4

(P”,C,T,Q’)

V2

V4

teaches

reg

course

V1

V2

V1

V4

V2

V4

teaches

reg

course

V1

V2

V1

V4Slide55

55

The Bucket Algorithm: Example

Unfolding of rewriting 3:

q3’

(S,C,P)

:- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’),

c(C,T),

t(P”,C,Q’), Q’ ≤ Aut97

The green subgoals can cover the black ones under the mapping: S’S, S”S, P’P, P”P, Q’Q

Minimized rewriting 3:

q3m(

S,C,P)

:- V2(S,P,C,Q), V4

(P,C,T,Q)

Verify that there are only two rewritings that are not covered by others

Maximally Contained Rewriting:

q’ = q1m  q3mSlide56

56

The Bucket Algorithm: Example 2

Query:

q

(X)

:- cites(X,Y), cites(Y,X), sameTopic(X,Y)Views:V4(A) :- cites(A,B), cites(B,A)V5(C,D) :- sameTopic(C,D)V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)

Note: Should we list V4(X) twice in the buckets?

V4

Buckets

V6

cites

cites

sameTopic

V4

V6

V5

V6Slide57

57

Bucket algorithm

Query:

q(x):-car(x), sell(x, d), loc(d, ’windsor’).

Views:

v1(x) :- car(x).v2(x) :- car(x), sell(x, d).v3(x,d) :- sell(x, d), loc(d, ’windsor’).v4(x) :- sell(x, d), loc(d, ’windsor’).

Car(x)

Sell(x,d)

Loc(d,’windsor’)

V1(x)

V2(x)

v2(x)

V3(x,d)

V3(x,d)

V4(x)

V4(x)

q(x):-v1(x), v2(x), v3(x,d).

q(x):-v1(x), v3(x,d).

q(x):-v1(x), v4(x).

q(x):-v2(x), v3(x,d).

q(x):-v2(x), v4(x).

… Slide58

58

Projects: Information Manifold, Infomaster, Tukwila, …

Advantages:

Scalable: new sources easy to add without modifying the mediator views

All we need to do is to define the new source using the existing mediator views (predicates)

Disadvantages:Hard to decide how to answer a query using viewsReading: Alon Halevy, Answering Queries Using Views: A Survey. Projects using the LAV approach