/
Scalable Integration and Processing of Linked Data Scalable Integration and Processing of Linked Data

Scalable Integration and Processing of Linked Data - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
420 views
Uploaded On 2016-05-28

Scalable Integration and Processing of Linked Data - PPT Presentation

Andreas Harth Aidan Hogan Spyros Kotoulas Jacopo Urbani Outline Session 1 Introduction to Linked Data Foundations and Architectures Crawling and Indexing Querying Session 2 Integrating Web Data with Reasoning ID: 339232

rdf foaf rdfs person foaf rdf person rdfs type http owl data prefix clause reasoning surname org expertise title

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Scalable Integration and Processing of L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Scalable Integration and Processing of Linked Data

Andreas Harth,

Aidan Hogan,

Spyros Kotoulas, Jacopo UrbaniSlide2

Outline

Session 1: Introduction to Linked Data

Foundations and Architectures

Crawling and Indexing

Querying

Session 2: Integrating Web Data with Reasoning

Introduction to RDFS/OWL on the Web

Introduction and Motivation for Reasoning

Session 3: Distributed Reasoning: Because Size Matters

Problems and Challenges

MapReduce and WebPIE

Session 4: Putting Things Together (Demo)

The LarKC Platform

Implementing a LarKC WorkflowSlide3

PART I: How can we query Linked Data?

PART 2: How can we reason over Linked Data? (start of Session 2)Slide4

Answer: SPARQL (W3C Rec. 2008)

…SPARQL 1.1 upcoming

(W3C Rec. 201?)Slide5

S

PARQL

P

rotocol and

R

DF

Q

uery Language (SPARQL)

Introducing SPARQL

Standardised query language (and supporting recommendations) for querying RDF

~SQL-like language

…but only if you squint

…and without the vendor-specific headachesSlide6

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name . ?person rdf:type foaf:Person .

?person foaf:title ?title . FILTER regex(?title, "^Prof")

OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}ORDER BY ?surname

The anatomy of a typical SPARQL query

Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

QUERY CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSE

;

foaf:familyName ?surname .Slide7

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)

, in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

QUERY CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSESlide8

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

Prefix Declarations

foaf

:

Person ⇔ <http://xmlns.com/foaf/0.1/Person>

Use

http://prefix.cc/

PREFIX DECLARATIONSSlide9

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)

, in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

QUERY CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSESlide10

SELECT

?name

?expertise

Result Clause

1.

SELECT

2.

CONSTRUCT (RDF)

3.

ASK4. DESCRIBE (RDF)

RESULT CLAUSESlide11

Return

all

tuples for the bindings of the variables

?name

and

?expertise-----------------------------------------------------------|

“Professor Robert Allen” |

Control engineering” || “Professor Robert Allen” | “

Biomedical engineering” |

| “Prof Carl Leonetto Amos” | || “Professor Peter Ashburn” | “Silicon technology

||

“Professor Robert Allen”

| “Control engineering”

|-----------------------------------------------------------

Result Clause 1.

SELECT…

SELECT ?name ?expertise

RESULT CLAUSE

Give me a list of names

of professors

in Southampton

and their expertise (if available),

in order of their surnameSlide12

Return

all

tuples for the bindings of the variables

?name

and

?expertise-----------------------------------------------------------|

“Professor Robert Allen” |

Control engineering” || “Professor Robert Allen” | “

Biomedical engineering” |

| “Prof Carl Leonetto Amos” | || “Professor Peter Ashburn” | “Silicon technology

||

“Professor Robert Allen”

| “Control engineering”

|-----------------------------------------------------------

?name

?expertise

SELECTResult Clause

1. SELECT DISTINCT

…DISTINCT

unique

Give me a list of names

of professors

in Southampton

and their expertise (if available)

,

in order of their surnameSlide13

CONSTRUCT {

?person foaf:name ?name

;

ex:expertise ?expertise .}Return RDF using bindings for the variables:

ex:RAllen foaf:name “Professor Robert Allen”

; ex:expertise “Biomedical engineering” , “Control engineering” .

ex:PAshburn foaf:name “Peter Ashburn ” ;

ex:expertise “Silicon technology” .Result Clause 2.

CONSTRUCT…

RESULT CLAUSE

Give me a list of names

of professors

in Southampton and their expertise (if available),

in order of their surnameSlide14

ASK

… WHERE { … }

Is there any results?

Returns:

true

or

false

Result Clause 3.

ASK

…RESULT CLAUSESlide15

DESCRIBE ?person

… WHERE { ?person … }

Returns some

RDF

which “describes” the given resource…

No standard for what to return!

Typically returns:Result Clause

4. DESCRIBE…

RESULT CLAUSE

all triples where the given resource appears as subject and/or object

ORConcise Bounded Descriptions…Slide16

DESCRIBE ex:RAllen

(…can give URIs directly without need for a

WHERE

clause

.)

Result Clause

4.

DESCRIBE (DIRECT)…

RESULT CLAUSESlide17

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)

, in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

QUERY CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSESlide18

FROM NAMED

<

http://data.southampton.ac.uk/>

Dataset clause (

FROM

/

FROM NAMED

)

DATASET CLAUSE

(Briefly)

Restrict the dataset against which you wish to querySPARQL stores named graphs: sets of triples which are associated with (URI) namesCan match across graphs!Named graphs typically corrrespond with data provenance (i.e., documents)!

Default graph typically corresponds to the merge of all graphsMany engines will typically dereference a graph if not available locally!

Give me a list of names of professors in Southampton and their expertise (if available), in order of their surnameSlide19

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)

, in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSE

WHERE {

?person foaf:name ?name

;

foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof")

OPTIONAL {

?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise

}}

QUERY CLAUSESlide20

WHERE {

?person foaf:name ?name

;

foaf:familyName ?surname .

?person rdf:type foaf:Person .

?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI .

?expertiseURI rdfs:label ?expertise }}

Query clause (

WHERE)QUERY CLAUSE

Give me a list of names

of professors in Southampton and their expertise (if available), in order of their surname“

Professor Peter Ashburn”

“Silicon technology”

“Professor

✓ex:PAshburn

ex:Silicon ✓

“Ashburn”Slide21

WHERE { …

{

?person oo:availableToCommentOn ?expertiseURI .

} UNION

{

?person foaf:interest ?expertiseURI .

}

…}Quick mention for UNION

QUERY CLAUSE

Represent

disjunction (OR)

Useful when there’s more than one property/class that represents the same information you’re interested in (heterogenity)Reasoning can also help, assuming terms are mapped (more later)Slide22

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)

, in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSE

QUERY CLAUSESlide23

ORDER BY ?surname

Solution Modifiers

Give me a list of names

of professors

in Southampton

and their expertise (if available), in order of their surname

SOLUTION MODIFIERS

Order output results by surname (as you probably guessed)

LIMIT

OFFSETORDER BY ?surname LIMIT 10

SOLUTION MODIFIERS

ORDER BY ?surname LIMIT 10

OFFSET 20

SOLUTION MODIFIERS

Only return 10 results

Return results 20

‒30

…also…Slide24

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name ; foaf:familyName ?surname .

?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}

ORDER BY ?surname

Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

PREFIX DECLARATIONS

RESULT CLAUSE

QUERY CLAUSE

SOLUTION MODIFIERS

DATASET CLAUSE

What are you looking for?

Which results do you want?

Where should we look?

How should results be ordered/split?

Shortcuts for URIs

The summary of a typical SPARQL querySlide25

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

PREFIX oo: <http://purl.org/openorg/>

SELECT

?name

?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {

?person foaf:name ?name . ?person rdf:type foaf:Person .

?person foaf:title ?title . FILTER regex(?title, "^Prof")

OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}ORDER BY ?surname

Trying out a typical SPARQL query

Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

;

foaf:familyName ?surname .Slide26

SparqlEndpoints (W3C Wiki)http://www.w3.org/wiki/SparqlEndpoints

(or just use Google)

List of Public SPARQL Endpoints:Slide27

SPARQL 1.1Currently a W3C Working Draft

http://www.w3.org/TR/sparql11-query/

(or just use Google)

Coming Soon:Slide28

“SPARQL by example”By Cambridge Semantics

Lee Feigenbaum & Eric Prud'hommeaux

http://www.cambridgesemantics.com/2008/09/sparql-by-example/

(or just use Google)

Highly recommend checking out:Slide29

After the break…

Session 1: Introduction to Linked Data

Foundations and Architectures

Crawling and Indexing

Querying

Session 2: Integrating Web Data with Reasoning

Introduction to RDFS/OWL on the Web

Introduction and Motivation for Reasoning

Session 3: Distributed Reasoning: Because Size MattersProblems and ChallengesMapReduce and WebPIE

Session 4: Putting Things Together (Demo)

The LarKC PlatformImplementing a LarKC WorkflowSlide30

Question:

Find the

people

who have won both an academy award for best director

and

a raspberry award for worst director

Endpoint:

(that is, if you want to use SPARQL… feel free to use whatever) http://dbpedia.org/sparql/ or http://google.com/ (to make it fair)

Hint: Look at http://dbpedia.org/page/Michael_Bay

and

http://dbpedia.org/page/Woody_Allen for examples (The same prefixes therein are understood by the endpoint, …so no need to declare them in the query)

During the break…Slide31

The Winning (?) Query:

SELECT DISTINCT ?name

WHERE{

?director dcterms:subject category:Worst_Director_Golden_Raspberry_Award_winners , category:Best_Director_Academy_Award_winners ;

foaf:name ?name .

}

The Answer:

And the answer is…Slide32

PART I: How can we query Linked Data?

PART 2:

How can we reason over Linked Data?

…and why?!Slide33

A Web of Data

Images from:

http://richard.cyganiak.de/2007/10/lod/

;

Cyganiak, Jentzsch

September 2010

August 2007

November 2007

February 2008

March 2008

September 2008

March 2009

July 2009 Slide34

Reasoning

explicit

data

implicit

data

How can consumers query the implicit dataSlide35

…so what’s The Problem?…

heterogeneity

…need to integrate data from different sourcesSlide36

Take Query Answering…

Gimme

webpages

relating to

Tim Berners-Lee

foaf:page timbl:i

timbl:i

foaf:page ?pages .Slide37

Hetereogenity in

schema

webpage:

properties

foaf:page

foaf:homepage

foaf:isPrimaryTopicOf

foaf:weblog

doap:homepage

foaf:topic

foaf:primaryTopic

mo:musicBrainz

mo:myspace

= rdfs:subPropertyOf

= owl:inverseOf Slide38

Linked Data, RDFS and OWL:

Linked Vocabularies

SKOS

Image from

http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg

:

;

Giasson, BergmanSlide39

Hetereogenity in

naming

Tim Berners-Lee:

URIs

timbl:i

dblp:100007

identica:45563

adv:timbl

fb:en.tim_berners-lee

db:Tim-Berners_Lee

= owl:sameAs Slide40

Returning to our simple query…

Gimme

webpages

relating to

Tim Berners-Lee

foaf:page

timbl:i

timbl:i

foaf:page ?pages .

... 7 x 6 = 42 possible patternsfoaf:homepage foaf:isPrimaryTopicOf

doap:homepage

foaf:topic

foaf:primaryTopic

mo:myspace

SKOS

dblp:100007

identica:45563

adv:timbl

fb:en.tim_berners-lee

db:Tim-Berners_LeeSlide41

…reasoning to the rescue?Slide42

Challenges……what (OWL) reasoning is feasible for Linked Data?Slide43

Linked Data Reasoning:

Challenges

Scalable

Expressive

Robust

Domain-AgnosticSlide44

Scalability

At least tens of billions of statements (for the moment)

Near linear scale!!!

Noisy data

Inconsistencies galore

Publishing errors

Linked Data Reasoning:

ChallengesSlide45

Challenges (Semantic Web Wikipedia Article)

Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web.

Vastness:

The World Wide Web contains at least 48 billion pages as of this writing (August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms.

Any automated reasoning system will have to deal with truly huge inputs.

Vagueness:

These are imprecise concepts like "young" or "tall". This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness.Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty.Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined.

Deductive reasoning fails catastrophically when faced with inconsistency, because "anything follows from a contradiction". Defeasible reasoning and paraconsistent reasoning are two techniques which can be employed to deal with inconsistency.Deceit: This is when

the producer of the information is intentionally misleading

the consumer of the information. Cryptography techniques are currently utilized to ameliorate this threat.Linked Data Reasoning: ChallengesSlide46

Proposition 1 Web data is noisy.

Proof:

08445a31a78661b5c746feff39a9db6e4e2cc5cf

sha1-sum of

mailto

:’common value for foaf:mbox_sha1sumAn inverse-functional (uniquely identifying) property!!!Any person who shares the same value will be considered the sameQ.E.D.

Noisy Data: Omnipotent BeingSlide47

Alternate proof (courtesy of

http://www.eiao.net/rdf/1.0

)

rdf:type rdf:type owl:Property .

rdf:type rdfs:label “type”@en .

rdf:type rdfs:comment “Type of resource” .

rdf:type rdfs:domain eiao:testRun .

rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .

rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .

rdf:type rdfs:domain eiao:header .

rdf:type rdfs:domain eiao:runs .Noisy Data: Redefining everything …and home in time for teaSlide48

foaf:Person owl:disjointWith foaf:Document .

Inconsistent Data:

Cannot compute…Slide49

…herein, we look at (monotonic) rules.Expressive reasoning (also) possible through tableaux, but yet to demonstrate desired scaleSlide50

Rules

IF

THEN

Body

/Antecedent/ConditionHead/Consequent

?c

1

rdfs:subClassOf ?c2 . ?x rdf:type ?c1 . ⇒ ?x rdf:type ?c2 .

foaf:Person rdfs:subClassOf foaf:Agent .timbl:me rdf:type foaf:Person .

⇒ timbl:me rdf:type foaf:Agent .Schema/Terminology/OntologicalInstance/AssertionalSlide51

Rules (Inconsistencies [a.k.a. Contradictions])

IF

THEN

?c

1 owl:disjointWith ?c

2 . ?x rdf:type ?c

1

. ?x rdf:type ?c2 . ⇒ falsefoaf:Person owl:disjointWith foaf:Document .ex:sleepygirl rdf:type foaf:Person .

ex:sleepygirl rdf:type foaf:Document .⇒ false

Body/Antecedent/ConditionHead/ConsequentSlide52

Materialisation (Forward-Chaining):

Write the consequences of the rules down

Executing rules: MaterialisationSlide53

One size does

not fit all!

Don't materialise

too much!

Materialisation

Forward-chaining Materialisation

Avoid runtime expense

Users taught impatience by Google

Pre-compute for quick retrieval

Web-scale systems should scale well

More data = more disk-space/machinesSlide54

INPUT:

Flat file of triples (quads)

OUTPUT:

Flat file of (partial) inferred triples (quads)Slide55

“Standard”

RDFS

OWL 2 RL

(W3C Rec: 27 Oct. 2009)

“Non-standard”

DLP

pD* (

OWL Horst)OWL–…What rulesets

?Slide56

Let’s look at a recent corpus of Linked Data and see what schema’s inside

(and what the rulesets support)

Open-domain crawl May 2010

1.1 billion quadruples

3.985 million sources (docs)

780 pay-level domains (e.g.,

dbpedia.org

) Ran “special” PageRank over documents 86 thousand docs contained some RDFS/OWL schema data (2.2% of docs... but <0.2% of triples)Summated ranks of docs using each primitive

What rules?Slide57

Survey of Linked Data schema: Top 15 ranks

# Axiom Rank(

Σ

) RDFS Horst O2R

rdfs:subClassOf

0.295

✓ ✓rdfs:range 0.294 ✓ ✓ ✓rdfs:domain 0.292

✓ ✓ ✓

rdfs:subPropertyOf

0.090 ✓ ✓ ✓owl:FunctionalProperty 0.063 ✘ ✓ ✓owl:disjointWith

0.049 ✘ ✘ ✓

owl:inverseOf 0.047 ✘ ✓ ✓owl:unionOf 0.035 ✘ ✘ ✓owl:SymmetricProperty

0.033 ✘ ✓

✓owl:TransitiveProperty 0.030

✘ ✓ ✓

owl:equivalentClass 0.021

✘ ✓ ✓owl:InverseFunctionalProperty 0.030 ✘ ✓

✓owl:equivalentProperty 0.030 ✘ ✓ ✓

owl:someValuesFrom 0.030 ✘ ✓ ✓

owl:hasValue 0.028 ✘ ✓ ✓Slide58

What about noise? ……need to consider the

provenance

of Web dataSlide59

Consider source of schema data

Class/property URIs dereference to their

authoritative

document

FOAF spec authoritative for

foaf:Person

✓ MY spec not authoritative for foaf:Person ✘Allow “extension” in third-party documentsmy:Person rdfs:subClassOf foaf:Person . (MY spec) ✓

BUT: Reduce obscure memberships

foaf:Person rdfs:subClassOf my:Person .

(MY spec) ✘ALSO: Protect specificationsfoaf:knows a owl:SymmetricProperty . (MY spec) ✘

Authoritative

ReasoningSlide60

More proof (courtesy of

http://www.eiao.net/rdf/1.0

)

rdf:type rdf:type owl:Property .

rdf:type rdfs:label “type”@en .

rdf:type rdfs:comment “Type of resource” .

rdf:type rdfs:domain eiao:testRun .

rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .

rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .

rdf:type rdfs:domain eiao:header .

rdf:type rdfs:domain eiao:runs .60 Noisy Data: Redefining everything …and home in time for tea

Not AuthoritativeSlide61

Gong Cheng, Yuzhong Qu. "

Integrating Lightweight Reasoning into Class-Based Query Refinement for Object Search

." ASWC 2008.

Aidan Hogan, Andreas Harth, Axel Polleres.

"

Scalable Authoritative OWL Reasoning for the Web

." IJSWIS 2009.

Aidan Hogan, Jeff Z. Pan, Axel Polleres and Stefan Decker. "SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples." ISWC 2010. My thesis: http://aidanhogan.com/docs/thesis/ (or use Google).

Authoritative Reasoning: read more …w/ essential plugsSlide62

62

Quarantined reasoning!

Separate and cache hierarchy of schema documents/dependencies…

Alternative to Authoritative Reasoning?Slide63

63

Quarantined

Reasoning [Delbru et al.; 2008]Slide64

64

Quarantined

Reasoning [Delbru et al.; 2008]Slide65

65

Quarantined

Reasoning [Delbru et al.; 2008]Slide66

66

A-Box / Instance Data

(e.g, a FOAF file)

T-Box / Ontology Data

(e.g., the FOAF ontology and its indirect imports)

Quarantined

Reasoning [Delbru et al.; 2008]Slide67

More proof (courtesy of

http://www.eiao.net/rdf/1.0

)

rdf:type rdf:type owl:Property .

rdf:type rdfs:label “type”@en .

rdf:type rdfs:comment “Type of resource” .

rdf:type rdfs:domain eiao:testRun .

rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .

rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .

rdf:type rdfs:domain eiao:header .

rdf:type rdfs:domain eiao:runs .Noisy Data: Redefining everything …and home in time for tea

Not In HereSlide68

R. Delbru, A. Polleres, G. Tummarello and S. Decker.

"

Context Dependent Reasoning for Semantic Documents in Sindice

.

4th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2008.Quarantined Reasoning: read moreSlide69

…what about owl:sameAs?Slide70

70

Consolidation for Linked DataSlide71

Use provided owl:sameAs mappings in the data

timbl:i

owl:sameas

identica:45563 .

dbpedia:Berners-Lee

owl:sameas

identica:45563

.Store “equivalences” foundtimbl:i -> identica:45563 ->dbpedia:Berners-Lee ->

timbl:i

identica:45563

dbpedia:Berners-Lee

Consolidation: BaselineSlide72

For each set of equivalent identifiers, choose a canonical term

timbl:i

identica:45563

dbpedia:Berners-Lee

Consolidation:

BaselineSlide73

Afterwards, rewrite identifiers to their canonical version:

Canonicalisation

timbl:i

rdf:type foaf:Person .

identica:48404 foaf:knows

identica:45563

.

dbpedia:Berners-Lee dpo:birthDate “1955-06-08”^^xsd:date

.

dbpedia:Berners-Lee

rdf:type foaf:Person .identica:48404 foaf:knows dbpedia:Berners-Lee .dbpedia:Berners-Lee

dpo:birthDate “1955-06-08”^^xsd:date .

timbl:iidentica:45563dbpedia:Berners-LeeSlide74

Infer owl:sameAs through reasoning (OWL 2 RL/RDF)

explicit

owl:sameAs

(again)owl:InverseFunctionalProperty

owl:FunctionalProperty

owl:cardinality 1 / owl:maxCardinality 1

foaf:homepage a owl:InverseFunctionalProperty

.timbl:i foaf:homepage w3c:timblhomepage .adv:timbl foaf:homepage w3c:timblhomepage .

⇒timbl:i owl:sameas adv:timbl

.

…then apply consolidation as beforeExtended ConsolidationSlide75

For our Linked Data corpus:

~12 million explicit

owl:sameAs

triples (as before)~8.7 million thru. owl:InverseFunctionalProperty

~106 thousand thru.

owl:FunctionalProperty

none thru.

owl:cardinality/owl:maxCardinalityIn terms of equivalences found (baseline vs. extended):~2.8 million sets of equivalent identifiers(1.31x baseline)~14.86 million identifiers involved(2.58x baseline)~5.8 million URIs!!(1.014x baseline)!!

Consolidation:

ResultsSlide76

Conclusion…Slide77

Heterogeneity poses a significant problem for consuming Linked DataHeterogenity in schema

Heterogenity in naming

…but we can use the mappings provided by publishers to integrate heterogeneous Linked Data corpora

(with a little caution)

Lightweight rule-based reasoning can go a long way

Deceit/Noise

End Of World Consider source of data!Inconsistency ≠ End Of World Useful for finding noise in fact!Explicit owl:sameAs vs. extended consolidation:

Extended consolidation mostly (but not entirely) for consolidating blank-nodes from older FOAF exporters

ConclusionsSlide78

How can we reason at Web scale?

Scalable/distributed rule-based materialisation over MapReduce using the WebPIE system

Next up…Slide79

timbl:i

foaf:page

?pages

.

timbl:i

identica:45563

dbpedia:Berners-Lee

dbpedia:Berners-Lee

foaf:page

?pages .Slide80

80

Authoritative

Reasoning (Appendix)

OWL 2 RL rule

prp-inv1

?p

1

owl:inverseOf ?p2 . ?x ?p1 ?y .

⇒ ?y ?p2 ?x .

OWL 2 RL rule prp-inv2?p1 owl:inverseOf ?p

2 .

?x ?p2

?y .

⇒ ?y ?p1 ?x .

TBOX:

foo:doesntKnow owl:inverseOf foaf:knows .

(from foo:)

ABOX:bar:Aidan foo:doesntKnow

bar:Axel . bar:Stefan

foaf:knows bar:Jeff .

AUTHORITATIVE INFERENCE:

bar:Axel

foaf:knows

bar:Aidan

.

bar:Jeff

foo:doesntKnow

bar:Stefan

.