Most slides are borrowed from Dr Chen Li UC Irvine 2 Motivation Legacy database Plain text files Biblio sever Support seamless access to autonomous and heterogeneous information sources 3 Comparison Shopping ID: 500828
Download Presentation The PPT/PDF document "1 Data integration" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Data integration
Most slides are borrowed from Dr. Chen Li, UC IrvineSlide2
2
Motivation
Legacy database
Plain text files
Biblio sever
Support seamless access to autonomous and heterogeneous information sources.Slide3
3
Comparison Shopping
Lowest price of the
DVD: “The Matrix”?
Applications
Comparison shopping
Supply-chain management
Supplier 2
…
Integrator
Supplier M
Supplier 1
Buyer 2
Buyer M
Buyer 1
…Slide4
4
Mediation architecture
Mediator
Wrapper
Source 1
Wrapper
Source 2
Wrapper
Source nSlide5
5
Sources are heterogeneous:
Different data models: relational, object-oriented, XML, …
Different schemas and representations. E.g.,
“Keanu Reeves” or “Reeves, Keanu” or “Reeves, K.” etc.
Describe source contentsUse source data to answer queriesSources have limited query capabilitiesData qualityPerformance… …ChallengesSlide6
6
Research projects
Garlic (IBM),
Information Manifold (AT&T)
InfoSleuth (MCC),
Tsimmis, InfoMaster (Stanford)Internet Softbot/Razor/Tukwila (U Wash.)Hermes (Maryland)Telegraph / Eddies (UC Berkeley)Niagara (Univ Wisconsin)DISCO, Agora (INRIA, France)SIMS/Ariadne (USC/ISI)Emerac/Havasu (ASU)Slide7
7
Industry
Nimble Technology
Enosys Markets
IBM
BEASlide8
8
Virtual integration
Leave the data in the sources
When a query comes in:
Determine the relevant sources to the query
Break down the query into sub-queries for the sourcesGet the answers from the sources, filter them if needed and combine them appropriatelyData is freshOtherwise known as On Demand IntegrationSlides from Dr. Michalis Petropoulos Slide9
9
Mediator
Virtual Integration Architecture
Data
Source
Data
Source
Global
Schema
Local
Schema
Local
Schema
Query
Result
Wrapper
Wrapper
End User
Design-Time
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
Slides from Dr. Michalis Petropoulos
Slide10
10
Design-Time
Mediator
Virtual Integration Architecture
Data
Source
Data
Source
Global
Schema
Local
Schema
Local
Schema
Query
Result
Wrapper
Wrapper
End User
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
2
Slides from Dr. Michalis Petropoulos
Slide11
11
Mediator
Virtual Integration Architecture
Data
Source
DataSource
Global
Schema
Local
Schema
Local
Schema
Query
Result
Wrapper
Wrapper
End User
Design-Time
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
2
3
Slides from Dr. Michalis Petropoulos
Slide12
12
Mediator
Virtual Integration Architecture
Data
Source
Data
Source
Global
Schema
Local
Schema
Local
Schema
Query
Result
Wrapper
Wrapper
End User
Design-Time
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
2
3
4
Slides from Dr. Michalis Petropoulos
Slide13
13
Mediator
Virtual Integration Architecture
Data
Source
Data
Source
Global
Schema
Local
Schema
Local
Schema
Query
Result
Wrapper
Wrapper
End User
Design-Time
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
2
5
3
4
Slides from Dr. Michalis Petropoulos
Slide14
14
Mediator
Virtual Integration Architecture
Data
Source
Data
Source
Global
Schema
Local
Schema
Local
Schema
Query
Result
End User
Wrapper
Wrapper
Design-Time
Mediation
Language
Mapping Tool
Run-Time
Query
Reformulation
Optimization
& Execution
XML
Web Services
1
2
5
6
3
4
Slides from Dr. Michalis Petropoulos
Slide15
15
Outline
Basics: theories of conjunctive queries
Global-as-view (GAV) approach to data integration
Local-as-view (LAV) approach to data integrationSlide16
16
Conjunctive Queries (CQ’s) in Datalog
Most common form of query; equivalent to select-project-join (SPJ) queries
Useful for data integration
Form:
q(X) :- p1(X1), p2(X
2),…, pn
(Xn).Head
q(X) represents the query answersBody
p1(X1), p2(X2
),…, pn(Xn)
represents the query conditions
The head is true if all the subgoals are true.Each pi(Xi
) is called a subgoal. Xi is a vector of variables or constants.
Shared variables represent join conditions
Constants represent “Attribute=const” selection conditionsA relation can appear in multiple predicates (subgoals)
head
body
subgoals
q(X) :- p
1
(X
1
), p
2
(X
2
), …, p
n
(X
n
)Slide17
17
Conjunctive queries
Head and subgoals are
atoms
.
An atom consists of a predicate applied to zero or more argumentsPredicates represent relations.An atom is true for given values of its variables iff the arguments form a tuple of the relation.Whenever an assignment of values to all variables makes all subgoals true, the rule asserts that the resulting head is also true.Slide18
18
Conjunctive Queries: example
Schema
student(name, courseNum), course(number, Instructor)
SQL
SELECT name FROM student, course WHERE student.courseNum=course.number AND instructor=‘Li’;Equal to: ans(SN) :- student(SN, CN), course(CN,’Li’).Predicates student and course
correspond to relations namesTwo subgoals: student(SN, CN) and course(CN,’Li’)
Variables: SN, CN. Constant: ‘Li’Shared variable, CN, corresponds to “student.courseNum=course.number”
Variable SN in the head: the answer to the querySlide19
19
Why not SQL
Datalog
is more concise
Let us state some general principles
e.g., containment of rules that are almost impossible to state correctly in SQL.Will see that laterRecursion is much easier to express in Datalog.Slide20
20
Answer to a CQ
For a CQ
Q
on database
D, the answer Q(D) is a set of heads of Q if we:Substitute constants for variables in the body of Q in all possible waysRequire all subgoals to be trueExample: ans(SN) :-
student(SN, CN), course(CN,’Li
’).Tuples are also called facts:
student(Jack, 184), student(Tom,215), …, course(184,Li), course(215,Li), …
Answer “Jack”: SNJack,CN184Answer “Tom”: SN
Tom,CN215Answer “Jack”: SN
Jack,CN215 (duplicate eliminated)
Student
CourseSlide21
21
Query containment
For two queries
Q
1
and Q2, we say Q1 is contained in Q2, denoted Q1Q2, if any database D, we have Q1(D) Q
2(D).We say Q1 and
Q2 are equivalent, denoted Q1
Q2, if Q1
(D) Q2
(D) and Q2(D
)
Q1(D).Example: Q
1: ans(SN
) :- student(SN, CN),
course(CN, ’Li’).Q2: ans(SN
) :- student(SN
, CN), course(CN, INS). We have: Q
1(D)
Q2(D).Slide22
22
Another example
Q
1
:
p(X,Y) :- r(X,W), b(W,Z), r(Z,Y).Q2: p(X,Y) :- r(X,W),
b(W,W), r(W,Y).
We have: Q2
Q1Proof:
For any DB D, suppose p(x,y
) is in Q2
(D
). Then there is a w such that
r(x,w), b(w,w
), and
r(w,y) are in D.For Q
1, consider the substitution:
X x
, W w, Z w
, Y y. Thus the head of
Q1 becomes p(x,y)
, meaning that p(x,y
) is also in Q1(D
).In general, how to test containment of
CQ’s?Containment mappingsSlide23
23
Test containment
Two approaches:
Containment mappings.
Canonical databases.
Really the same in the simple CQ case covered so far.Containment test is NP-complete, but CQ’s tend to be small so here is one case where intractability doesn’t hurt you.Slide24
24
Containment mappings
A containment mapping from Q2 to Q1: Map variables of Q2 to variables of Q1, such that:
Head of Q2 becomes head of Q1;
Each
subgoal of Q2 becomes some subgoal of Q1.It is not necessary that every subgoal of Q1 is the target of some subgoal of Q2.
Q1
Q2 iff
there is a containment mapping from Q2 to Q1. Note that the containment mapping is opposite the containment --- it goes from the larger (containing CQ) to the smaller (contained CQ).
Example:
Q1:
p(X,Y) :-
r(X,W), b(W,Z), r(Z,Y).
Q2:
p(X,Y) :-
r(X,W), b(W,W), r(W,Y).
Containment mapping from Q1 to Q2: X
X, Y Y, W
W, Z
WNo containment mapping from Q2 to Q1: For b(W,W
) in Q2, its only possible target in Q1 is b(W,Z)However, we cannot have a mapping WW and WZ, since each variable cannot be mapped to two different variablesSlide25
25
A slightly different example
Q1:
p(X,Y
):-
r(X,Z), g(Z,Z), r(Z,Y).
Q2: p(A,B):- r(A,C
), g(C,D),
r(D,B).
Containment mapping m
:m
(A)=X;
m(B)=Y;
m(C)=
m
(D)=Z.Slide26
26
Q1:
p(X,Y):- r(X,Y), g(Y,Z).
Q2:
p(A,B):- r(A,B), r(A,C).
Q1 looks for:
Q2 looks for:
Another Example
X
Z
Y
A
B
CSlide27
27
Q1:
p(X,Y):- r(X,Y), g(Y,Z).
Q2:
p(A,B):- r(A,B), r(A,C).
Containment mapping:m(A)=X;m(B)=
m(C)=Y.Example - Continued
Notice two
subgoals can
map to one.
And not
every subgoal
need be a
target.Slide28
28
Example - Concluded
Q1:
p(X,Y
):-
r(X,Y), g(Y,Z).Q2: p(A,B):- r(A,B),
r(A,C).
No containment mapping from Q1 to Q2.g(Y,Z
) cannot map anywhere, since there is no g
subgoal in Q2.
Thus, Q1 properly contained in Q2.Slide29
29
Extending CQ’s
CQ’s
with built-in predicates:
We can add more conditions to variables in a CQ.
Example: student(name, GPA, courseNum), course(number,instructor,year) Q1(SN) :- student(SN, G, CN), course(CN, ’Li’),
G>=3.5. Q2(SN) :- student(SN, G, CN),
course(CN, ’Li’), G>=3.5, Y < 2002. Q2(SN)
Q1(SN).
Datalog queries: a (possibly infinite) set of CQ’s with (possibly) recursion
Example: parent(Parent, Child)
Query: finding all ancestors of Tom
ancestor(P,C) :- parent(P, C).
ancestor(P,C) :- ancestor(P,X), parent(X
, C).
result(P) :- ancestor(P, ‘tom’).Slide30
30
Although CQ theory first appeared at a database conference, the AI community has taken CQ’s to heart.
CQ’s, or similar logics like description logic, are used in a number of AI applications.
Again, their design theory is really containment and equivalence.Slide31
31
Outline
Basics: theories of conjunctive queries
Global-as-view (GAV) approach to data integration
Local-as-view (LAV) approach to data integrationSlide32
32
GAV approach to data integration
Readings:
Jeffrey Ullman, Information Integration Using Logical Views, ICDT 1997.
Ramana Yerneni, Chen Li, Hector Garcia-Molina, and Jeffrey Ullman, Computing Capabilities of Mediators, SIGMOD 1999. Slide33
33
Global-as-view Approach
Mediator
Mediator exports
views
defined on source relations
med(Dealer,City,Make,Year
) = R1 R2
A query is posted on mediator views:
SELECT * FROM med
WHERE Year = ‘2001’;
ans(D,C,M
, ‘2001’) :- med(D,C,M,‘2001’).
Mediator
expands
query to source queries:
SELECT * FROM R1, R2
WHERE Year = ‘2001’;
ans(D,C,M,’2001’) :- R1(D,C), R2(D,M, ‘2001’).
R1(Dealer,City)
R2(Dealer, Make, Year)
med(Dealer,City,Make,Year
) = R1 R2Slide34
34
Project: TSIMMIS at Stanford
Advantages:
User queries are easy to define
Query transformation generation is straightforward
Disadvantages:Not all source information is exported: Not easily scalable: every time a new source is added, mediator views need to be changed.Research issuesEfficient query execution?Deal with limited source capabilities?GAV ApproachSlide35
35
Limited source capabilities
Complete scans of relations
not
possible
Reasons: Legacy databases or structured files: limited interfaces Security/Privacy Performance concerns Example 1: legacy databases with restrictive interfaces
Ullman
DBMS
Knuth
TeX
…
…
author
title
Given an author,
return the books.Slide36
36
Another example: Web search forms
www.imdb.comSlide37
37
Problems
How to describe source restrictions?
How to compute mediator restrictions from sources?
How to answer queries efficiently given these restrictions?
How to compute as many answers as possible to a query?…Slide38
38
Computing mediator restrictions
Motivation: do not want users to be frustrated by submitting a query that cannot be answerable by the mediator
Example:
Source 1:
book(author?, title, price)Capability: “bff”i.e., we must provide an author, and can get title and price infoSource 2: review(title?, reviewer, rate)Capability: “bff”i.e., we must provide a book title, and can get other infoMediator view: MedView(A?,T,P,RV,RT) :- book(A,T,P),review(T,RV,RT).
Query on the mediator view:Ans(RT) :- MedView(A, ‘db’, P, RV, RT).
I.e., “find the review rates of DB books”But the mediator cannot answer this query, since we do not know the authors.
We want to tell the user beforehand what queries can be answeredSlide39
39
Outline
Basics: theories of conjunctive queries;
Global-as-view (GAV) approach to data integration;
Local-as-view (LAV) approach to data integration.Slide40
40
Local-as-view (LAV) approach
Mediator
There are
global predicates
, e.g., “car,” “person,” “book,” etc.
They can been seen as mediator views
The content of each source is described using these global predicates
A query to the mediator is also defined on the global predicates
The mediator finds a way to answer the query using the source contents
sourcesSlide41
41
Example
Mediator
Global predicates:
Loc(Dealer,City),Sell(Dealer,Make,Year)
Source content defined on global predicates:
S1(Dealer,City) :- Loc(Dealer, City).
S2(Dealer,Make,Year) :- Sell(Dealer, Make, Year).
In general, each definition could be more complicated, rather than direct copies.
Queries defined on global predicates.
Q: ans(D,M,Y) :- Loc(D, ’windsor’), Sell(D, M, Y).
Users do not know source views.
The mediator decides how to use source views to answer queries.
“Answering queries using views”:
ans(D, M, Y) :- S1(D,’windsor’), S2(D,M,Y).
S1(Dealer,City)
S2(Dealer,Make,Year)Slide42
42
Answering queries using views
Mediator
Source views can be complicated: SPJs, arithmetic comparisons,…
Not easy to decide how to answer a query using source views
Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).
Rewriting:
ans(D,M) :- V3(D,‘windsor’, M,Y).
ans(D,M) :- V1(D,’windsor’), V2(D,M,Y).
…
“Equivalent rewriting”: compute the “same” answer as the query
A rewriting can join multiple source views
V1(Dealer,City):- Loc(Dealer, City).
V2(Dealer,Make,Year):-Sell(Dealer, Make, Year).
V3(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y).
V4(D,C,M,Y) :- Loc(D,C),Sell(D,M,Y), Y<1970.
QuerySlide43
43
Arithmetic comparisons
Mediator
Comparisons can make the problem even trickier
Query: ans(D,M) :- Loc(D,‘windsor'), Sell(D,M,Y).
Rewriting: ans(D,M) :- V(D,‘windsor’, M,Y).
Contained rewriting:
only retrieve cars before 1970.
Query: ans(D,M):- Loc(D, ‘windsor'), Sell(D,M,Y), Y < 1960.
Rewriting: ans(D,M) :- V(D,‘windsor’, M, Y), Y < 1960.
V(D,C,M,Y):- Loc(D,C),Sell(D,M,Y),Y<1970.Slide44
44
Local-as-View (LAV)
Source
1
Source
2
Source
3
Source
4
Source
5
Local Schema
Local
Schema
Local
Schema
Local
Schema
Global Schema
Book
ISBN
Title
Genre
Year
Author
ISBN
Name
R1
ISBN
Title
Name
Local Schema
R5
ISBNTitle
Books before 1970
Humor Books
Create View R1 AS
SELECT B.ISBN, B.Title, A.Name
FROM Book B, Author A
WHERE A.ISBN = B.ISBN
AND B.Year < 1970
R1(ISBN, Title, Name):-
Book(ISBN, Title, Genre,Year),
Author(ISBN, Name), Year<1970.
Create View R5 AS
SELECT B.ISBN, B.Title
FROM Book B
WHERE B.Genre = ‘Humor’
R5(ISBN, Title)
:-Book(ISBN, Title, ‘humor’, Year).Slide45
45
LAV details
Query: Find authors of humor books
Q(Name):-Book(ISBN,Title,”humor”,YEAR), Author(ISBN, Name)
Views:
R1(ISBN, Title, Name):- Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970.R5(ISBN, Title) :-Book(ISBN, Title, ‘humor’, Year).Rewriting of Q using views:Q’(Name):-R1(ISBN, Title, Name), R2(ISBN, Title)Expansion of Q’Q’’(Name):- Book(ISBN, Title, Genre,Year), Author(ISBN, Name), Year<1970, Book(ISBN, Title, ‘humor’, Year). Q’’’(Name):- Author(ISBN, Name), Year<1970, Book(ISBN, Title, ‘humor’, Year).Q’’’ is contained in QSlide46
46
Query Rewritings
Given a query Q and a set of views V:
A
conjunctive query
P is called a “rewriting” of Q using V if P only uses views in V, and P computes a partial answer of Q. That is: P
exp Q
. A rewriting is also called a “contained rewriting” (CR).
A
conjunctive query P is called an “
equivalent rewriting”
(ER) of Q using V if P only uses views in V, and P computes the
exact answer of Q. That is: Pexp
Q.Slide47
47
Bucket algorithm
It is the basic method for query rewriting
Each subgoal must be “covered” by some view
Make a list of candidates (buckets) per query subgoal
Consider combinations of candidates from different bucketsNot all combos are “compatible”Keep the compatible ones and minimize themDiscard the ones contained in anotherTake their unionSlide48
48
The Bucket Algorithm: Example
V1
(Std,Crs,Qtr,Title)
:- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title),
teaches(Prof,Crs,Qtr), Qtr ≤ Aut97q
(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95
Step 1: For each query subgoal, put the
relevant sources into a bucketSlide49
49
The Bucket Algorithm: Example
V1
(Std,Crs,Qtr,Title)
:- reg(Std,Crs,Qtr), course(Crs,Title),
Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr)
, Qtr ≤ Aut97q
(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T),
C ≥ 300, Q ≥ Aut95PProf, CCrs, QQtr
Note: Arithmetic predicates don’t pose a problem in this step
V2
Buckets
V4
teaches
reg
courseSlide50
50
The Bucket Algorithm: Example
V1
(Std,Crs,Qtr,Title)
:- reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr ≤ Aut94
V4(Prof,Crs,Title,Qtr) :-
reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr)
, Qtr ≤ Aut97q
(S,C,P) :- teaches(P,C,Q),
reg(S,C,Q), course(C,T), C ≥ 300, Q ≥ Aut95
SStd, CCrs, QQtr
Note: V3 doesn’t work: arithmetic predicates not consistent V4 doesn’t work: S not in the output of V4
V2
Buckets
V4
teaches
reg
course
V1
V2Slide51
51
The Bucket Algorithm: Example
V1
(Std,Crs,Qtr,Title)
:-
reg(Std,Crs,Qtr), course(Crs,Title), Crs ≥ 500, Qtr ≥ Aut98V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr)V3(Std,Crs) :-
reg(Std,Crs,Qtr), Qtr ≤ Aut94V4(Prof,Crs,Title,Qtr)
:- reg(Std,Crs,Qtr), course(Crs,Title)
, teaches(Prof,Crs,Qtr)
, Qtr ≤ Aut97q(S,C,P)
:- teaches(P,C,Q), reg(S,C,Q)
, course(C,T)
, C ≥ 300, Q ≥ Aut95CCrs, TTitle
V2
Buckets
V4
teaches
reg
course
V1
V2
V1
V4Slide52
52
The Bucket Algorithm: Example
Step 2:
Try all combos of views, one each from a bucket
Test satisfaction of arithmetic predicates in each case
e.g., two views may not overlap, i.e., they may be inconsistentDesired rewriting = union of surviving onesQuery rewriting 1:
q1
(S,C,P) :- V2
(S’,P,C,Q), V1(S,C,Q,T’),
V1(S”,C,Q’,T)
no problem from arithmetic predicates (none in V2)May or may not be minimal (why?)
V2
V4
teaches
reg
course
V1
V2
V1
V4Slide53
53
The Bucket Algorithm: Example
Unfolding of rewriting 1:
q1’
(S,C,P)
:- r(S’,C,Q), t(P,C,Q), r(S,C,Q), c(C,T’), r(S”,C,Q’), c(C,T),
C ≥ 500, Q
≥ Aut98, C ≥
500, Q’ ≥ Aut98
Black r’s can be mapped to green r:
S’
S, S”S, Q’Q
Black c can be mapped to green c: just extend above mapping to
TT’
Minimized unfolding of rewriting 1:
q1m’(S,C,P) :- t(P,C,Q),
r(S,C,Q),
c(C,T’), C ≥
500, Q ≥ Aut98
Minimized rewriting 1:q1m(S,C,P) :-
V2(S’,P,C,Q), V1(S,C,Q,T’)Slide54
54
The Bucket Algorithm: Example
Query Rewriting 2:
q2
(S,C,P)
:- V2(S’,P,C,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’)
q2’(S,C,P)
:- r(S’,C,Q), t(P,C,Q), r(S,C,Q),
r(S,C,Q), c(C,T’),
C ≥ 500, Q ≥ Aut98, r(S”,C,Q’),
c(C,T), t(P’,C,Q’), Q’ ≤ Aut97
This combo is infeasible: consider the conjunction of arithmetic predicates in
V1 and V4
Query rewriting 3:
q3
(S,C,P) :-
V2(S’,P,C,Q),
V2(S,P’,C,Q), V4
(P”,C,T,Q’)
V2
V4
teaches
reg
course
V1
V2
V1
V4
V2
V4
teaches
reg
course
V1
V2
V1
V4Slide55
55
The Bucket Algorithm: Example
Unfolding of rewriting 3:
q3’
(S,C,P)
:- r(S’,C,Q), t(P,C,Q), r(S,C,Q), t(P’,C,Q), r(S”,C,Q’),
c(C,T),
t(P”,C,Q’), Q’ ≤ Aut97
The green subgoals can cover the black ones under the mapping: S’S, S”S, P’P, P”P, Q’Q
Minimized rewriting 3:
q3m(
S,C,P)
:- V2(S,P,C,Q), V4
(P,C,T,Q)
Verify that there are only two rewritings that are not covered by others
Maximally Contained Rewriting:
q’ = q1m q3mSlide56
56
The Bucket Algorithm: Example 2
Query:
q
(X)
:- cites(X,Y), cites(Y,X), sameTopic(X,Y)Views:V4(A) :- cites(A,B), cites(B,A)V5(C,D) :- sameTopic(C,D)V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G)
Note: Should we list V4(X) twice in the buckets?
V4
Buckets
V6
cites
cites
sameTopic
V4
V6
V5
V6Slide57
57
Bucket algorithm
Query:
q(x):-car(x), sell(x, d), loc(d, ’windsor’).
Views:
v1(x) :- car(x).v2(x) :- car(x), sell(x, d).v3(x,d) :- sell(x, d), loc(d, ’windsor’).v4(x) :- sell(x, d), loc(d, ’windsor’).
Car(x)
Sell(x,d)
Loc(d,’windsor’)
V1(x)
V2(x)
v2(x)
V3(x,d)
V3(x,d)
V4(x)
V4(x)
q(x):-v1(x), v2(x), v3(x,d).
q(x):-v1(x), v3(x,d).
q(x):-v1(x), v4(x).
q(x):-v2(x), v3(x,d).
q(x):-v2(x), v4(x).
… Slide58
58
Projects: Information Manifold, Infomaster, Tukwila, …
Advantages:
Scalable: new sources easy to add without modifying the mediator views
All we need to do is to define the new source using the existing mediator views (predicates)
Disadvantages:Hard to decide how to answer a query using viewsReading: Alon Halevy, Answering Queries Using Views: A Survey. Projects using the LAV approach