/
Trees, semistructured data, Trees, semistructured data,

Trees, semistructured data, - PowerPoint Presentation

acenum
acenum . @acenum
Follow
344 views
Uploaded On 2020-06-30

Trees, semistructured data, - PPT Presentation

and other strange ways to go beyond tables Serge Abiteboul INRIA amp ENS Cachan PODS 30th Anniversary 2011 Luc Véro Another one of these NoSQL talks IMS hierarchical model Vrelations Jacobss calculus Hardgraves ID: 790216

sex trees cars data trees sex data cars relations year peter children xml families model car 2010 logic bmw

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Trees, semistructured data," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Trees, semistructured data,

and other strange ways to go beyond tables

Serge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011

Luc

Véro

Another

one of

these

No-SQL

talks

?

IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON…

Slide2

Trees are useless

n

A tree is a tree. How many more do you have to look at? Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966)We don’t need anything beyond relations. These things are useless. Reject!Anonymous referee (circa 1990)

Knowledge lives in trees

But of the tree of the knowledge of good and evil, thou

shalt not eat of it: for in the day that thou eatest thereof thou

shalt surely die.Genesis, 2. 17

Introduction

Theorem: Information lives in trees and not in relationsProof:

the Bible does not say « But of the two dimensional table of knowledge of good and evil … » 

Slide3

Organization

IntroductionHierarchical data model 60sNested relations 80sComplex objects early 90sSemistructured data & unranked labeled trees late 90s

Unranked labeled ordered trees, aka XML early 00s Evolving trees, aka Active XML mid 00sCycles 90s to nowConclusion

More or less chronological

Slide4

For lack of time, we will ignore IMS and the hierarchical model

The language was purely navigational anywayWe will also ignore early works such as Makinouchi, Jacobs or Hardgrave

We will start with N1NFFrançois Bancilhon in FranceHans Schek in Germany PhD thesis of Nicole Bidoit

Slide5

Non-First-Normal-Form N1NF

Name

ChildCarAliceTotoJaguarAliceLulu2CVBobMimi

MustangBobZaza

Prius

A

quarter

on tables. Now what?

Trees

!Name

ChildCarAliceTotoLuluJaguar2CVBobMimiZazaMustangPrius

Data would prefer to live in infamous nested relations aka V-relations aka N1NF relations aka NF2 relationsData live in 1NF relations

DB101

Slide6

The devil is in the details

V-relationsN1NF-relations

AB11122

223

3

133

A

B

C11212

2331334

AC113334

A123AB1

111

21

31121131

2

3

1

1

2

3

A is not a key

The size is now possibly exponential

in the size of the domain

A is a key

No new power

Slide7

Complex object model

tuple and set constructors used freely

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

*

*

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Slide8

A logic and algebra for complex objects

Logic: main novelty is set variables – non first-orderExample: AbouBanat Query{

T.Father | Families(T)  X  T.Children ( X.Sex = F ) }Algebra: powerset operation, unnest/nest

NameChildCar

Alice

TotoBobMimi

ZazaMustangBobLulu

Prius

NameChildCarBobMimiMustangBob

ZazaMustangBobLuluPrius

NameChildCarBobMimiZazaLuluMustangPrius

Slide9

Results

Equivalence theorem: algebra and logic have same expressive powerRemark: one can compute TC using algebra/logic (waoh! Cool!) Also studied:

fixpoint, datalog, while… Complexity: each new level of nesting introduces one more exponentialNeed to control the use of powerset2n 2 2

n….

Slide10

From complex objects to

semistructured data

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

*

*

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Slide11

Revolution 1: more flexibility

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

*

*

Name

Peter

Cars

Name

2CV

Year

1976

Name

Mimi

Sex

F

Children

*

Name

Zaza

Sex

F

Annotations

Trash

Slide12

Revolution 2:

Remove some nodes; name

all

*

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

*

*

*

Name

Peter

Cars

Name

2CV

Year

1976

Name

Zaza

Sex

F

Ann.

Trash

Family

Family

Car

Car

Child

Child

Slide13

Unranked label trees

Name

Peter

Cars

Name

BMW

Year

2010

Name

Toto

Sex

M

Children

Families

Name

Peter

Cars

Name

2CV

Year

1976

Name

Zaza

Sex

F

Ann.

Trash

Family

Family

Car

Car

Child

Child

Slide14

This is better adapted to a Web context

Self describing data: No separation between schema and data

Flexibility Not such a big dealMay be the main contribution is the format?<families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> …

Plus ça change,

plus c’est la même chose

The more things change, the more

they stay the same

Slide15

What else? The trees are unbounded

Like nested relations, trees are unbounded in widthUnlike nested relations, they are unbounded in depthOne can simulate 2 counter machines with 2 branchesDo

applications simulate 2 counter machines with XML documents?I am still looking for oneXML documents are rarely deepBut even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees

r

a

$

a

a

a

a

a

a

a

a

a

a

$

a

b

a

b

a

b

Slide16

What else? the trees are orderedUnranked labeled

ordered trees = XML

Ignore order Classical optimizationRespect orderTotally new ball gameBring in tree automata

Reconcile

Order is

often painful for optimization

Slide17

Selling argument is the Web…

The move from relations to trees is interestingBut the move from centralized to distributed as well and much less investigated

Where the fun is:Scale is beyond what we though was thinkableMachines are totally autonomousSchema replaced by numerous ontologiesTrue/false logic replaced by inconsistency, probabilities, trust, belief…

Slide18

And the trees are evolving (aka Active XML)

An old idea from object databases: mix data and computation

ResortResorts

snowcond

Name

Aspen

State

Colorado

!

Unisys.com/snow

(“Aspen”)

hotels

Unit

Depth

Meter

1

!Yahoo.com/

GetHotels

<

city name=“Aspen”/>)

snow

Slide19

And there are cycles

For lack of time, I will not mention the network model [Codasyl 1969]The language was purely navigational anywayIf I would add references to XML, I’d get cycles

Lots of models for graph data, e.g., IQL Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQLSimilar issue for unordered trees [recent result with Vianu]

Person

Name

Spouse

Adam

Person

Name

Spouse

Eve

Paris C.

Kanellakis

Slide20

Conclusion

Is this a good time to do research on trees in databases? The best time to plant a tree was 20 years ago.  The next best time is now.  Chinese Proverb

Slide21

Advertisement

Book on Web data management to appear at Cambridge University Press

http://webdam.inria.fr/Jorge