/
Ian Horrocks Information Systems Group Ian Horrocks Information Systems Group

Ian Horrocks Information Systems Group - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
376 views
Uploaded On 2018-02-27

Ian Horrocks Information Systems Group - PPT Presentation

Department of Computer Science University of Oxford History of the Semantic Web A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities ID: 637732

query data datalog ontology data query ontology datalog rewriting access owl answering rdf mappings materialisation queries statoil exploration amp optique approach implementable

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ian Horrocks Information Systems Group" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Ian HorrocksInformation Systems GroupDepartment of Computer ScienceUniversity of OxfordSlide2

History of the Semantic Web

“A new form of Web content that is meaningful to computers will

unleash a revolution of new possibilities

”Slide3

Standards

RDFa

Oct, 2008

RDFa 1.1

Mar, 2015

JSON-LD 1.0

Jan, 2014

SKOS

Aug, 2009

GRDDL

Sep, 2007

POWDER

Sep, 2009

PROV

Apr, 2013

RIFFeb, 2013

SAWSDLAug 2007

RDB2RDF

Sep, 2012Slide4

Tools

Hermit

Initial focus

on ontology engineering

Main reasoning tasks are (class) consistency

and

subsumption

/classificationSlide5

Applications: Travel planning and bookingSlide6

Applications: Content ManagementSlide7

Applications: SearchSlide8

Applications: Search

Explicit KR sometimes needed, e.g.,

Knowledge Graph

Less rigorous treatment of

semantics

Not using Semantic Web standardsSlide9

Applications: Search

Explicit KR sometimes needed, e.g.,

Knowledge Graph

Less rigorous treatment of

semantics

Not using Semantic Web standards

Hiring

Semantic Web peopleSlide10

Application DomainsAgricultureSlide11

Application DomainsAgricultureAstronomySlide12

Application DomainsAgricultureAstronomyOceanographySlide13

Application DomainsAgricultureAstronomyOceanography

Defence

Education

Energy managementGeography

Gioscience

Life sciences

…Slide14

Application DomainsOBO foundry includes more than 100 biological and biomedical ontologies

Siemens

“actively building OWL based clinical solutions”

SNOMED-CT

(Clinical Terms) ontology

used in healthcare systems of more than 15 countries, including Australia, Canada, Denmark, Spain, Sweden and the UK

also used by major US providers, e.g., Kaiser Permanente

ontology provides common vocabulary for recording clinical dataSlide15

Ontologies as structured vocabulariesOntologies used as structured vocabulariesSNOMED, e.g., distributed as simple

taxonomySlide16

Ontologies as structured vocabulariesOntologies used as structured vocabulariesSNOMED, e.g., distributed as simple taxonomy

Ontology terms used as

annotations

/tags

Benefits

of using logic-based ontology language include

Reasoning support (critical) for development & maintenance, in particular derivation of taxonomy from descriptions (classification)

Easier location of relevant terms within large structured vocab.

Query answers enhanced by exploiting class hierarchyBut limited query answering w.r.t. Ontology + Data Slide17

Data Access Applications: “Big Data”Huge quantity of data increasing at an exponential rate

Identifying & accessing

relevant

data is of critical importance

Handling data

variety & complexity

often turns out to be main challenge

Ontologies

can provide integrated anduser-centric view of heterogeneous data sourcesSlide18

Data Access: Statoil ExplorationGeologists & geophysicists use data from previous operations in nearby locations to develop

stratigraphic models

of unexplored areas

TBs of relational datausing

diverse schemata

spread over

1,000s of tables

and

multiple data basesSlide19

Data Access

900 geologists & geophysicists

30-70% of time on data gathering

4 day turnaround for new queries

Geologists & geophysicists

use data from previous

operations in nearby locations to develop

stratigraphic models

of unexplored areas

TBs of relational datausing diverse schemataspread over 1,000s of tablesand multiple data basesData Access: Statoil ExplorationSlide20

Data Access

900 geologists & geophysicists

30-70% of time on data gathering

4 day turnaround for new queries

Data Exploitation

Better use of experts time

Data analysis “most important factor” for drilling success

Geologists & geophysicists

use data from previous

operations in nearby locations to develop stratigraphic models of unexplored areas

TBs of relational datausing diverse schemataspread over 1,000s of tablesand multiple data basesData Access: Statoil ExplorationSlide21

Complex modelsInterdisciplinary: geology, physics, chemistry, …

Combine

(in ad hoc ways) data from:

Multiple specialised sourcesDirect and indirect observations

Different vendors

Incompatible models

Overlapping content

Data Access: Statoil ExplorationSlide22

Data Access: Statoil ExplorationSlide23

Data Access: Statoil ExplorationSlide24

Data Access: Statoil Exploration

Query

: Wellbores with cores that overlap log curvesSlide25

Data Access: Statoil ExplorationSlide26

Data Access: Statoil ExplorationSlide27

Data Access: Optique ApproachSlide28

Data Access: Optique ApproachSlide29

Data Access: Optique ApproachSlide30

Data Access: Optique ApproachSlide31

Data Access: Optique ApproachSlide32

Data Access: Optique ApproachSlide33

OWL ProfilesOWL 2 (2009) defines language subsets, aka

profiles

that

can

be

more simply and/or efficiently implemented

OWL 2 QLBased on DL-LiteEfficiently implementable via rewriting into relational queries (OBDA)Slide34

OWL 2 QL and Query Rewriting

Given QL ontology

O

query

Q

and mappings

M

:

Slide35

OWL 2 QL and Query Rewriting

Given QL ontology

O

query

Q

and mappings

M

:

Use O to rewrite Q → Q0 s.t. answering Q

0 without O is equivalent to answering Q w.r.t. O for any datasetSlide36

OWL 2 QL and Query Rewriting

Given QL ontology

O

query

Q

and mappings

M

:

Use O to rewrite Q → Q0 s.t. answering Q

0 without O is equivalent to answering Q w.r.t. O for any datasetMap ontology queries → DB queries (typically SQL) using mappings M to rewrite Q0 into a DB querySlide37

OWL 2 QL and Query Rewriting

Given QL ontology

O

query

Q

and mappings

M

:

Use

O to rewrite Q → Q0 s.t. answering Q0 without O is equivalent to answering Q w.r.t. O for any datasetMap ontology queries → DB queries (typically SQL) using mappings M to rewrite Q0 into a DB queryEvaluate (SQL) query against DBSlide38

Query Rewriting — Issues1

Rewriting

May be large (worst case exponential in size of ontology)

Queries may be hard for existing DBMSsSlide39

Query: Wellbores with cores that overlap log curvesSlide40

Query: Wellbores with cores that overlap log curvesSlide41

Query: Wellbores with cores that overlap log curvesSlide42

Query Rewriting — Issues1

Rewriting

May be large (worst case exponential in size of ontology)

Queries may be hard for existing DBMSs

2

Mappings

May be difficult to develop and

maintainSlide43

Data Access: Optique ApproachSlide44

Query Rewriting — Issues1

Rewriting

May be large (worst case exponential in size of ontology)

Queries may be hard for existing DBMSs

2

Mappings

May be difficult to develop and

maintain3 ExpressivityOWL 2 QL (necessarily) has (very) restricted expressive power, e.g.:No functional or transitive propertiesNo universal (for-all) restrictions…Slide45

OWL Profiles – Beyond QL?OWL 2 (2009) defines

language subsets, aka

profiles

that can

be

more simply and/or efficiently implemented

OWL 2 QLBased on DL-LiteEfficiently implementable via rewriting into relational queries (OBDA)OWL 2 RLBased on “Description Logic Programs” ( )Implementable via Datalog query answering OWL 2 EL Based on EL++Implementable via Datalog query answering plus “filtration”Slide46

RL/Datalog Query Ans. via Materialisation

Given (RDF) data DB, RL/Datalog ontology

O

and query

Q

:

Slide47

RL/Datalog Query Ans. via Materialisation

Given (RDF) data DB, RL/Datalog ontology

O

and query

Q

:

Materialise

(RDF) data DB → DB0 s.t. evaluating Q w.r.t. DB0 equivalent to answering Q

w.r.t. DB and Onb: Closely related to chase procedure used with DB dependenciesSlide48

RL/Datalog Query Ans. via Materialisation

Given (RDF) data DB, RL/Datalog ontology

O

and query

Q

:

Materialise

(RDF) data DB → DB0 s.t. evaluating Q w.r.t. DB0 equivalent to answering Q

w.r.t. DB and Onb: Closely related to chase procedure used with DB dependenciesEvaluate Q against DB0Slide49

Materialisation — Issues1 Scalability

Ptime

complete

Efficiently implementable in practice?

2

Updates

Additions relatively easy (continue

materialisation)But what about retraction?3 Migrating data to RDFMaterialisation assumes data in “special” (RDF triple) storeHow can legacy data be migrated?4 Expressivity ; in particular, no RHS existentialsSlide50

Materialisation: ScalabilityEfficient Datalog/RL engine is critical

Existing approaches mainly target distributed “shared-nothing” architectures, often via

map reduce

High communication overhead

Typically focus on small fragments (e.g., RDFS), so don’t really address expressivity issue

Even then, query answering over (distributed) materialized data is non-trivial and may require considerable communicationSlide51

RDFox Datalog EngineTargets SOTA main-memory, mulit-core architectureOptimized in-memory storage with ‘mostly’

lock-free parallel inserts

Memory efficient: commodity server with 128 GB can store

>10

9

triples

Exploits multi-core architecture:

10-20 x speedup with 32/16 threads/cores

LUBM 120K (>1010 triples) in 251s (20M t/s) on T5-8 (4TB/1024 threads)Slide52

RDFox Datalog EngineIncremental addition and retraction of triplesRetraction via novel

FBF “view maintenance”

algorithm

Retraction of

5,000 triples

from materialised LUBM 50k in

less than 1s

Many other

novel featuresHandles more general (than RL) Dalalog and SWRL rulesSPARQL features such as BIND and FILTER in rule bodiesNative equality handling (owl:sameAs) via rewritingStratified negation as failure (NAF)Slide53

Materialisation: Data MigrationNeed to specify a suitable migration processUse R2RML

mappings to extract data and transform into RDF

But where do these mappings come from?

Recall query rewriting:

Mappings

M

are

R2RML mappingsRun mappings in reverse to extract and transform data“Lazy ETL”Deploy query rewriting (OBDA) systemExtend O and M as neededUse M to ETL data into RDF storeSlide54

Materialisation: ExpressivityRL is more powerful than QL, but In particular, no RHS existentials

Can’t express, e.g.,

Recall

OWL

2 EL

Based on

EL

++

Implementable via Datalog query answering plus “filtration”Slide55

OWL 2 EL via Datalog + FiltrationGiven (RDF) Data Set, EL ontology O and query

Q

:

Slide56

OWL 2 EL via Datalog + FiltrationGiven (RDF) Data Set, EL ontology O

and query

Q

:

Over-approximate

O

into Datalog program DSlide57

OWL 2 EL via Datalog + FiltrationGiven (RDF) Data Set, EL ontology O and query

Q

:

Over-approximate

O

into Datalog program DEvaluate Q over D + Data Set(via materialisation)Slide58

OWL 2 EL via Datalog + FiltrationGiven (RDF) Data Set, EL ontology O

and query

Q

:

Over-approximate

O

into Datalog program DEvaluate Q over D + Data Set(via materialisation)Use (polynomial) Filtering Procedureto eliminate spurious answersSlide59

DiscussionQL-Rewriting has many advantagesData can be left untouched and in legacy storageExploits existing DB infrastructure and scalability

But what if

more expressiveness/flexibility

is needed?

Query answering for EL and RL still tractable (polynomial)

Critically depend on Datalog scalability – RDFox to the rescue!

Easy migration path from QL-rewriting via “lazy ETL”Slide60

QL-Rewriting has many advantagesData can be left untouched and in legacy storageExploits existing DB infrastructure and scalability…

But what if

more expressiveness/flexibility

is needed?Query answering for EL and RL still tractable (polynomial)

Critically depend on Datalog scalability – RDFox to the rescue!

Easy migration path from QL-rewriting via “lazy ETL”

DiscussionSlide61

Ongoing/Future WorkPiloting, evaluation and tuningSlide62
Slide63

Ongoing/Future WorkPiloting, evaluation and tuning

Semantic

(data)

partitioning and distributed

query evaluation

Improved

query planning

(

Incremental maintenance of) aggregationsBag semantics for OBDA approachExpressiveness beyond RL/EL via PAGOdA techniquesTime series data and stream reasoning Hybrid rewriting/materialisation (on demand) approach…Slide64

AcknowledgementsSlide65

Thank you for listeningSlide66

Thank you for listening

Any questions?

YOU MADE THE

GLASS TWICE AS

BIG AS IT

NEEDED

TO BE

A PESSIMIST SAYS THE

GLASS IS HALF EMPTY

AN OPTIMIST SAYS IT’S

HALF FULL