and other strange ways to go beyond tables Serge Abiteboul INRIA amp ENS Cachan PODS 30th Anniversary 2011 Luc Véro Another one of these NoSQL talks IMS hierarchical model Vrelations Jacobss calculus Hardgraves ID: 790216
Download The PPT/PDF document "Trees, semistructured data," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Trees, semistructured data,
and other strange ways to go beyond tables
Serge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011
Luc
Véro
Another
one of
these
No-SQL
talks
?
IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON…
Slide2Trees are useless
n
A tree is a tree. How many more do you have to look at? Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966)We don’t need anything beyond relations. These things are useless. Reject!Anonymous referee (circa 1990)
Knowledge lives in trees
But of the tree of the knowledge of good and evil, thou
shalt not eat of it: for in the day that thou eatest thereof thou
shalt surely die.Genesis, 2. 17
Introduction
Theorem: Information lives in trees and not in relationsProof:
the Bible does not say « But of the two dimensional table of knowledge of good and evil … »
Slide3Organization
IntroductionHierarchical data model 60sNested relations 80sComplex objects early 90sSemistructured data & unranked labeled trees late 90s
Unranked labeled ordered trees, aka XML early 00s Evolving trees, aka Active XML mid 00sCycles 90s to nowConclusion
More or less chronological
Slide4For lack of time, we will ignore IMS and the hierarchical model
The language was purely navigational anywayWe will also ignore early works such as Makinouchi, Jacobs or Hardgrave
We will start with N1NFFrançois Bancilhon in FranceHans Schek in Germany PhD thesis of Nicole Bidoit
Slide5Non-First-Normal-Form N1NF
Name
ChildCarAliceTotoJaguarAliceLulu2CVBobMimi
MustangBobZaza
Prius
A
quarter
on tables. Now what?
Trees
!Name
ChildCarAliceTotoLuluJaguar2CVBobMimiZazaMustangPrius
Data would prefer to live in infamous nested relations aka V-relations aka N1NF relations aka NF2 relationsData live in 1NF relations
DB101
Slide6The devil is in the details
V-relationsN1NF-relations
AB11122
223
3
133
A
B
C11212
2331334
AC113334
A123AB1
111
21
31121131
2
3
1
1
2
3
A is not a key
The size is now possibly exponential
in the size of the domain
A is a key
No new power
Slide7Complex object model
tuple and set constructors used freely
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
*
*
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
Slide8A logic and algebra for complex objects
Logic: main novelty is set variables – non first-orderExample: AbouBanat Query{
T.Father | Families(T) X T.Children ( X.Sex = F ) }Algebra: powerset operation, unnest/nest
NameChildCar
Alice
TotoBobMimi
ZazaMustangBobLulu
Prius
NameChildCarBobMimiMustangBob
ZazaMustangBobLuluPrius
NameChildCarBobMimiZazaLuluMustangPrius
Slide9Results
Equivalence theorem: algebra and logic have same expressive powerRemark: one can compute TC using algebra/logic (waoh! Cool!) Also studied:
fixpoint, datalog, while… Complexity: each new level of nesting introduces one more exponentialNeed to control the use of powerset2n 2 2
n….
Slide10From complex objects to
semistructured data
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
*
*
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
Slide11Revolution 1: more flexibility
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
*
*
Name
Peter
Cars
Name
2CV
Year
1976
Name
Mimi
Sex
F
Children
*
Name
Zaza
Sex
F
Annotations
Trash
Slide12Revolution 2:
Remove some nodes; name
all
*
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
*
*
*
Name
Peter
Cars
Name
2CV
Year
1976
Name
Zaza
Sex
F
Ann.
Trash
Family
Family
Car
Car
Child
Child
Slide13Unranked label trees
Name
Peter
Cars
Name
BMW
Year
2010
Name
Toto
Sex
M
Children
Families
Name
Peter
Cars
Name
2CV
Year
1976
Name
Zaza
Sex
F
Ann.
Trash
Family
Family
Car
Car
Child
Child
Slide14This is better adapted to a Web context
Self describing data: No separation between schema and data
Flexibility Not such a big dealMay be the main contribution is the format?<families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> …
Plus ça change,
plus c’est la même chose
The more things change, the more
they stay the same
Slide15What else? The trees are unbounded
Like nested relations, trees are unbounded in widthUnlike nested relations, they are unbounded in depthOne can simulate 2 counter machines with 2 branchesDo
applications simulate 2 counter machines with XML documents?I am still looking for oneXML documents are rarely deepBut even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees
r
a
$
a
a
a
a
a
a
a
a
a
a
$
a
b
a
b
a
b
Slide16What else? the trees are orderedUnranked labeled
ordered trees = XML
Ignore order Classical optimizationRespect orderTotally new ball gameBring in tree automata
Reconcile
Order is
often painful for optimization
Slide17Selling argument is the Web…
The move from relations to trees is interestingBut the move from centralized to distributed as well and much less investigated
Where the fun is:Scale is beyond what we though was thinkableMachines are totally autonomousSchema replaced by numerous ontologiesTrue/false logic replaced by inconsistency, probabilities, trust, belief…
Slide18And the trees are evolving (aka Active XML)
An old idea from object databases: mix data and computation
ResortResorts
snowcond
Name
Aspen
State
Colorado
!
Unisys.com/snow
(“Aspen”)
hotels
Unit
Depth
Meter
1
!Yahoo.com/
GetHotels
<
city name=“Aspen”/>)
snow
Slide19And there are cycles
For lack of time, I will not mention the network model [Codasyl 1969]The language was purely navigational anywayIf I would add references to XML, I’d get cycles
Lots of models for graph data, e.g., IQL Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQLSimilar issue for unordered trees [recent result with Vianu]
Person
Name
Spouse
Adam
Person
Name
Spouse
Eve
Paris C.
Kanellakis
Slide20Conclusion
Is this a good time to do research on trees in databases? The best time to plant a tree was 20 years ago. The next best time is now. Chinese Proverb
Slide21Advertisement
Book on Web data management to appear at Cambridge University Press
http://webdam.inria.fr/Jorge