Review Relational Query Languages Announcements The due data for selecting papers is Tuesday Your top 5 choices 2 Relational model and query languages Relational model defines data organization ID: 812913
Download The PPT/PDF document "CS 519: Big Data Exploration and Analyt..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS 519: Big Data Exploration and Analytics
Review: Relational Query Languages
Slide2Announcements
The due data for selecting papers is Tuesday.
Your top 5 choices
2
Slide3Relational model and query languages
Relational model defines data organization
Relational query languages define data retrieval/manipulation operations.
3
Slide44
Relational model
Title Price Category Year
MySQL $102.1 computer 2001
Cell biology $201.69 biology 1954
French cinema $53.99 art 2002
NBA History $63.65 sport 2010
tuples
Attribute names
Relation name
Book:
Slide5Relational Model
Attributes
Atomic values
Domain: string, integer, real, Keys: no duplicate valuesEach relation must have keys A relation does not contain duplicate tuples.
5
Slide6Database Schema vs. Database Instance
Schema: S
:
Book(Title, Price, Category, Year)Instance:
Values of each attribute A in I:
active domain
of A,
adom
(A)
6
Title Price Category YearMySQL $102.1 computer 2001Cell biology $201.69 biology 1954French cinema $53.99 art 2002NBA History $63.65 sport 2010
Slide7SQL
A declarative language for querying data stored in relational databases.
Much easier to use than procedural languages.
Say what instead of howSELECT returned attribute(s)FROM table(s)WHERE conditions on the tuples of the table(s)
7
Slide8SQL Example
Movie(
id
, title, year, total-gross)Actor(id, name, b-year)Plays(mid, aid
)
What movies are made in 1998?
SELECT title
FROM movie
WHERE year = 1998;
8
Slide9SQL Example
Find actors who played in a movie whose total gross is more than $2,000,000.
SELECT * FROM Actor, Movie, Plays WHERE
Movie.id
=
Plays.mid
AND
Plays.aid = Actor.id
AND total-gross > 2000000; 9Movie(mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide10Formal Relational Query Languages
Formal languages that express queries over relational schemas.
Relational Algebra
Datalog (recursion-free with negation) Relational calculusUsed to explore the properties of relational model.Easier to use than SQL in some application domains.
10
Slide11Relational Algebra (RA)
Used by RDBMS to execute queries
Six operators
Selection σ Projection ΠJoin ∞Union
Difference –
Renaming
ρ
(for named perspective)
11
Slide12Relational Algebra
Selection
σ
σtitle=‘Fargo’ (Movie)Projection Π
Π
b
-year
(Actor)
Join ∞Movie ∞
id=mid Plays
12Movie(mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide13Datalog
First created to support recursive queries over relational databases.
Easier to use than relational algebra.
Used extensively in research and industry Data integration, networking, logic programming, learning, distributed processing, …We talk about the recursion-free datalog.
13
Slide14Datalog
Each tuple in database is a
fact
Movie(236878, ‘Godfather I’, 1972, 40000000) Movie(879900, ‘Godfather II’, 1974, 3900000) Actor(090988,’Robert De Niro’, 1943)
Each query is a
rule
Movies that were produced in 1998 and made more than $2,000.
Q1(y):- Movie(x,y,1998,z),z > 2000.
14
Movie(mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide15Datalog Example
Actors who played in a movie whose total gross is more than $2,000.
Q2(y):- Actor(
x,y,z),Plays(t,x
),Movie(
t,v,w,f
),
f > 2000.
Actors who played in a movie whose total gross is more than $2,000 and a movie made in 1998.
Q3(y):- Actor(x,y,z
),Plays(t,x),Movie(t,v,w,f), f > 2000, Plays(g,x), Movie(g,l,1998,h).15Movie(mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid,
aid)
Slide16Datalog
Q2(y):- Actor(
x,y,z
),Plays(t,x),Movie(t,v,w,f
),
f > 2000.
y: head variable; x, z, t : existential variables
Extensional Database Predicates (EDB)
Movie, Actor, Plays
Intentional Database Predicate (IDB) Q2
16headatom
atom
body
Movie(
mid
, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide17Datalog programs
A collection of rules: union
Actors who played in a movie with gross of more than $2,000 or a movie made after 1990.
Q4(y):- Actor(x,y,z
),Plays(
t,x
),Movie(
t,v,w,f
),
f > 2000.Q4(y):- Actor(x,y,z
),Plays(t,x),Movie(t,v,w,f), w > 1990.17Movie(mid, title, year, total-gross)Actor(aid, name, b-year)
Plays(mid, aid)
Slide18Views
Similar to views in SQL
Actors who played in a movie with gross of more than $2000 and in a movie with
‘Robert De Niro’.
V(
x,y,z
):- Actor(
x,y,z
), Plays(
t,x), Movie(t,v,w,f
), f > 20000.Q5(y):- V(x,y,z), Plays(t,x), Plays(t,f), Actor(f, ’Robert De Niro’, g, h).UnfoldingQ5(y):- Actor(x,y,z), Plays(t,x), Movie(t,v,w,f), f > 20000, Plays(u,x
), Plays(u,f),
Actor(f,’Robert De
Niro’,g, h).
18
Definition of VMovie(
mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide19Datalog with negation
All actors who did not play in a movie with ‘Robert De
Niro
’.U(
x,y,z
):- Actor(
x,y,z
), Plays(
t,x), Plays(
t,f),
Actor(f, ’Robert De Niro’, g).Q6(y):- Actor(x,y,z), not U(x,y,z).19Movie(mid, title, year, total-gross)Actor(aid
, name, b-year)Plays(mid, aid
)
Slide20Safe datalog rules
Unsafe
rules:
V(x,y,z):- Actor(x,y,1998), z > 200.
W(
x,y,z
):- Actor(
x,y,z
),
not
Plays(t,x).A datalog rule is safe if every variable appear in at least one positive predicate20Movie(mid, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid
)
Slide21Datalog to SQL
Non-recursive datalog with negation represents the core functionalities of SQL
We can translate each non-recursive datalog program to a core SQL query and vice versa.
21
Slide22Equivalency Theorem
RA and non-recursive datalog with negation the same set of queries.
Relational queries.
22
Slide23Conjunctive queries (CQ)
One datalog rule.
SELECT-DISTINCT-FROM-WHERE.
Select/project/join (σ, Π, ∞) fragment of RA.
Existential/ conjunctive fragment of RC
There is not any comparison operator (<, ≠, …) in CQ.
If used the family is called CQ
<
, CQ
≠, …
23
Slide24CQ examples
Actors who played in “LTR”.
Q7(y):- Actor(x, y, z), Plays(t, x),
Movie(t, ’LTR’, w, f).Non-CQ
: Actors who played in some movies with only one actor.
24
Movie(
mid
, title, year, total-gross)
Actor(
aid, name, b-year)Plays(mid, aid)
Slide25Query equivalency and containment
Interesting and long standing problems in query processing.
Queries q
1 and q2 are
equivalent
if and only if for every database instance I, q
1
(I) = q
2(I)
Shown as q1 q
2Query q1 is contained in q2 if and only if for every database instance I, q1(I) q2(I)Shown as q1 q2 25
Slide26Containment examples
Is q
1
q2?q1(x):- R(
x,y
),R(
y,z
),R(
z,w).
q2(x):- R(x,y),R(
y,z).q1(x):-R(x,y),R(y,’Joe’).q2(x):-R(x,y),R(y,z).q1(x):- R(x,y),R(y,z),R(z,x).q2(x):- R(x,y
),R(y,x).
26
Slide27Containment examples
Is q
1
q2?q1(x):- R(
x,y
),R(
y,y
).
q2(x):- R(x,y
),R(y,z),R(
z,t).27
Slide28Query semantics
Rules based form of a CQ
q(u):- R1(u1),…,
R
n
(u
n
).
ui
is shorthand for (x, y,…, z).Valuation v is a total function from a set of variables to domain (dom) and identity on constants in the domain.28
Slide29Query semantics
The set of variables in
q
is shown as var(q) e.g. var
(q
1
)
= {
x,y}
q1(x):- R(x,y),R(
y,y).The image of database instance I under query q, q(u):- R1(u1),…,Rn(un) is q(I) = {v(u)| v is a valuation over var
(q), for each }
29
Slide30Query homomorphism
A
homomorphism
is a function from var(
q
2
) to
var
(q
1)
s.t. for each atom R(x, y, …) in the query q1 there is an atom R(h(x), h(y), …) in q2 .h leaves the constants in q2 intact.
Exampleq1(x):- R(
x,y),R(y,z
),R(z,w).
q2(x):- R(x,y),R(
y,z).We treat head variables, ‘x’, as constants, i.e., the same in q1 and q2 .30
Slide31Homomorphism Theorem
Given CQs q
1
and q2, we have q1 q
2
if and only if there exists a homomorphism .
Example:
q1(x):-R(
x,y),R(
y,z),R(z,w). q2(x):-R(x,y),R(y,z).Since is a homomorphism, we have .31
Slide32Homomorphism examples
q1(x):-R(
x,y
),R(y,’Joe’).q2(x):-R(
x,y
),R(
y,z
).
q1(x):- R(
x,y
),R(y,z),R(z,x).q2(x):- R(x,y),R(y,x).There is no homomorphism: 32
Slide33Homomorphism examples
Is q
1
q2?q1(x):- R(
x,y
),R(
y,y
).
q2(x):- R(x,y
),R(y,z),R(
z,t).33
Slide34Homomorphism Theorem
Given CQs
q
and q’, we have q
q
’
if and only if there exists a homomorphism .
Proof:
For each w in q(I), there is a valuation v that maps free tuples in q to I such that v(u)= w. Thus, h(v) will map free tuples in q’ into I and h(v(u’)) = w, where w is in q’(I). Using canonical instances: read the book page 117.34
Slide35Checking containment
Check if there exists a homomorphism between queries.
The problem is NP-complete, proved by reducing from 3-SAT.
Since the size of queries are relatively small, the process is sufficiently fast.
35
Slide36Query minimization
A conjunctive query
q
is minimal if for every other conjunctive query q’ , if q’ q,
q
’
has at least as many atoms as
q.
Examples:
q1(x):- R(x,z),R(
z,t),R(x,w). q2(x):- R(x,z),R(z,t),R(x,’Joe’). 36
Slide37Query minimization algorithm
1. Remove an atom from
q
. Let’s call new query q’.2. We have q q’.
3. Check to see if
q
’
q,
if it is then remove the atom permanently. Example:
q1(x):- R(x,z
),R(z,t),R(x,w). q2(x):- R(x,z),R(z,t). We have a homomorphism from q1 to q2. 37
Slide38Larger families: UCQ
CQ with union
Movies that were produced in 1998 or made more than
$2,000. Q1(y):- Movie(x,y,1998,z). Q1(y):- Movie(
x,y,z,t
), t > 2000.
We can extend homomorphism theorem for UCQs.
38
Movie(
mid
, title, year, total-gross)Actor(aid, name, b-year)Plays(mid, aid)
Slide39Homomorphism Theorem for UCQ
Given UCQs and , we have
if and only if for every
there is a , such that Thus, we can use apply homomorphism theorem to each CQ in a UCQ to check the containment.
Containment checking for UCQs is NP-complete.
39
Slide40Larger families: relational queries
Containment checking for relational queries is undecidable.
Proved by reduction from finite satisfiability problem:
Given a query, is there any (finite) database where the query as at least one answer.
40