Andreas Harth Aidan Hogan Spyros Kotoulas Jacopo Urbani Outline Session 1 Introduction to Linked Data Foundations and Architectures Crawling and Indexing Querying Session 2 Integrating Web Data with Reasoning ID: 339232
Download Presentation The PPT/PDF document "Scalable Integration and Processing of L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scalable Integration and Processing of Linked Data
Andreas Harth,
Aidan Hogan,
Spyros Kotoulas, Jacopo UrbaniSlide2
Outline
Session 1: Introduction to Linked Data
Foundations and Architectures
Crawling and Indexing
Querying
Session 2: Integrating Web Data with Reasoning
Introduction to RDFS/OWL on the Web
Introduction and Motivation for Reasoning
Session 3: Distributed Reasoning: Because Size Matters
Problems and Challenges
MapReduce and WebPIE
Session 4: Putting Things Together (Demo)
The LarKC Platform
Implementing a LarKC WorkflowSlide3
PART I: How can we query Linked Data?
PART 2: How can we reason over Linked Data? (start of Session 2)Slide4
Answer: SPARQL (W3C Rec. 2008)
…SPARQL 1.1 upcoming
(W3C Rec. 201?)Slide5
S
PARQL
P
rotocol and
R
DF
Q
uery Language (SPARQL)
Introducing SPARQL
Standardised query language (and supporting recommendations) for querying RDF
~SQL-like language
…but only if you squint
…and without the vendor-specific headachesSlide6
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name . ?person rdf:type foaf:Person .
?person foaf:title ?title . FILTER regex(?title, "^Prof")
OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}ORDER BY ?surname
The anatomy of a typical SPARQL query
Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
QUERY CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSE
;
foaf:familyName ?surname .Slide7
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)
, in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
QUERY CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSESlide8
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
Prefix Declarations
foaf
:
Person ⇔ <http://xmlns.com/foaf/0.1/Person>
Use
http://prefix.cc/
…
PREFIX DECLARATIONSSlide9
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)
, in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
QUERY CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSESlide10
SELECT
?name
?expertise
Result Clause
1.
SELECT
2.
CONSTRUCT (RDF)
3.
ASK4. DESCRIBE (RDF)
RESULT CLAUSESlide11
Return
all
tuples for the bindings of the variables
?name
and
?expertise-----------------------------------------------------------|
“Professor Robert Allen” |
“
Control engineering” || “Professor Robert Allen” | “
Biomedical engineering” |
| “Prof Carl Leonetto Amos” | || “Professor Peter Ashburn” | “Silicon technology
”
||
“Professor Robert Allen”
| “Control engineering”
|-----------------------------------------------------------
Result Clause 1.
SELECT…
SELECT ?name ?expertise
RESULT CLAUSE
Give me a list of names
of professors
in Southampton
and their expertise (if available),
in order of their surnameSlide12
Return
all
tuples for the bindings of the variables
?name
and
?expertise-----------------------------------------------------------|
“Professor Robert Allen” |
“
Control engineering” || “Professor Robert Allen” | “
Biomedical engineering” |
| “Prof Carl Leonetto Amos” | || “Professor Peter Ashburn” | “Silicon technology
”
||
“Professor Robert Allen”
| “Control engineering”
|-----------------------------------------------------------
?name
?expertise
SELECTResult Clause
1. SELECT DISTINCT
…DISTINCT
unique
Give me a list of names
of professors
in Southampton
and their expertise (if available)
,
in order of their surnameSlide13
CONSTRUCT {
?person foaf:name ?name
;
ex:expertise ?expertise .}Return RDF using bindings for the variables:
ex:RAllen foaf:name “Professor Robert Allen”
; ex:expertise “Biomedical engineering” , “Control engineering” .
ex:PAshburn foaf:name “Peter Ashburn ” ;
ex:expertise “Silicon technology” .Result Clause 2.
CONSTRUCT…
RESULT CLAUSE
Give me a list of names
of professors
in Southampton and their expertise (if available),
in order of their surnameSlide14
ASK
… WHERE { … }
Is there any results?
Returns:
true
or
false
Result Clause 3.
ASK
…RESULT CLAUSESlide15
DESCRIBE ?person
… WHERE { ?person … }
Returns some
RDF
which “describes” the given resource…
No standard for what to return!
Typically returns:Result Clause
4. DESCRIBE…
RESULT CLAUSE
all triples where the given resource appears as subject and/or object
ORConcise Bounded Descriptions…Slide16
DESCRIBE ex:RAllen
(…can give URIs directly without need for a
WHERE
clause
.)
Result Clause
4.
DESCRIBE (DIRECT)…
RESULT CLAUSESlide17
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)
, in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
QUERY CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSESlide18
FROM NAMED
<
http://data.southampton.ac.uk/>
Dataset clause (
FROM
/
FROM NAMED
)
DATASET CLAUSE
(Briefly)
Restrict the dataset against which you wish to querySPARQL stores named graphs: sets of triples which are associated with (URI) namesCan match across graphs!Named graphs typically corrrespond with data provenance (i.e., documents)!
Default graph typically corresponds to the merge of all graphsMany engines will typically dereference a graph if not available locally!
Give me a list of names of professors in Southampton and their expertise (if available), in order of their surnameSlide19
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)
, in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSE
WHERE {
?person foaf:name ?name
;
foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof")
OPTIONAL {
?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise
}}
QUERY CLAUSESlide20
WHERE {
?person foaf:name ?name
;
foaf:familyName ?surname .
?person rdf:type foaf:Person .
?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI .
?expertiseURI rdfs:label ?expertise }}
Query clause (
WHERE)QUERY CLAUSE
Give me a list of names
of professors in Southampton and their expertise (if available), in order of their surname“
Professor Peter Ashburn”
“Silicon technology”
“Professor
”
✓ex:PAshburn
ex:Silicon ✓
“Ashburn”Slide21
WHERE { …
{
?person oo:availableToCommentOn ?expertiseURI .
} UNION
{
?person foaf:interest ?expertiseURI .
}
…}Quick mention for UNION
QUERY CLAUSE
Represent
disjunction (OR)
Useful when there’s more than one property/class that represents the same information you’re interested in (heterogenity)Reasoning can also help, assuming terms are mapped (more later)Slide22
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
The anatomy of a typical SPARQL queryGive me a list of names of professors in Southampton and their expertise (if available)
, in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSE
QUERY CLAUSESlide23
ORDER BY ?surname
Solution Modifiers
Give me a list of names
of professors
in Southampton
and their expertise (if available), in order of their surname
SOLUTION MODIFIERS
Order output results by surname (as you probably guessed)
LIMIT
OFFSETORDER BY ?surname LIMIT 10
SOLUTION MODIFIERS
ORDER BY ?surname LIMIT 10
OFFSET 20
SOLUTION MODIFIERS
Only return 10 results
Return results 20
‒30
…also…Slide24
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name ; foaf:familyName ?surname .
?person rdf:type foaf:Person . ?person foaf:title ?title . FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}
ORDER BY ?surname
Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname
PREFIX DECLARATIONS
RESULT CLAUSE
QUERY CLAUSE
SOLUTION MODIFIERS
DATASET CLAUSE
What are you looking for?
Which results do you want?
Where should we look?
How should results be ordered/split?
Shortcuts for URIs
The summary of a typical SPARQL querySlide25
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX oo: <http://purl.org/openorg/>
SELECT
?name
?expertiseFROM NAMED <http://data.southampton.ac.uk/> WHERE {
?person foaf:name ?name . ?person rdf:type foaf:Person .
?person foaf:title ?title . FILTER regex(?title, "^Prof")
OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI . ?expertiseURI rdfs:label ?expertise }}ORDER BY ?surname
Trying out a typical SPARQL query
Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname
;
foaf:familyName ?surname .Slide26
SparqlEndpoints (W3C Wiki)http://www.w3.org/wiki/SparqlEndpoints
(or just use Google)
List of Public SPARQL Endpoints:Slide27
SPARQL 1.1Currently a W3C Working Draft
http://www.w3.org/TR/sparql11-query/
(or just use Google)
Coming Soon:Slide28
“SPARQL by example”By Cambridge Semantics
Lee Feigenbaum & Eric Prud'hommeaux
http://www.cambridgesemantics.com/2008/09/sparql-by-example/
(or just use Google)
Highly recommend checking out:Slide29
After the break…
Session 1: Introduction to Linked Data
Foundations and Architectures
Crawling and Indexing
Querying
Session 2: Integrating Web Data with Reasoning
Introduction to RDFS/OWL on the Web
Introduction and Motivation for Reasoning
Session 3: Distributed Reasoning: Because Size MattersProblems and ChallengesMapReduce and WebPIE
Session 4: Putting Things Together (Demo)
The LarKC PlatformImplementing a LarKC WorkflowSlide30
Question:
Find the
people
who have won both an academy award for best director
and
a raspberry award for worst director
Endpoint:
(that is, if you want to use SPARQL… feel free to use whatever) http://dbpedia.org/sparql/ or http://google.com/ (to make it fair)
Hint: Look at http://dbpedia.org/page/Michael_Bay
and
http://dbpedia.org/page/Woody_Allen for examples (The same prefixes therein are understood by the endpoint, …so no need to declare them in the query)
During the break…Slide31
The Winning (?) Query:
SELECT DISTINCT ?name
WHERE{
?director dcterms:subject category:Worst_Director_Golden_Raspberry_Award_winners , category:Best_Director_Academy_Award_winners ;
foaf:name ?name .
}
The Answer:
…
And the answer is…Slide32
PART I: How can we query Linked Data?
PART 2:
How can we reason over Linked Data?
…and why?!Slide33
…
A Web of Data
Images from:
http://richard.cyganiak.de/2007/10/lod/
;
Cyganiak, Jentzsch
September 2010
August 2007
November 2007
February 2008
March 2008
September 2008
March 2009
July 2009 Slide34
Reasoning
explicit
data
implicit
data
How can consumers query the implicit dataSlide35
…so what’s The Problem?…
…
heterogeneity
…need to integrate data from different sourcesSlide36
Take Query Answering…
Gimme
webpages
relating to
Tim Berners-Lee
foaf:page timbl:i
timbl:i
foaf:page ?pages .Slide37
Hetereogenity in
schema
…
webpage:
properties
foaf:page
foaf:homepage
foaf:isPrimaryTopicOf
foaf:weblog
doap:homepage
foaf:topic
foaf:primaryTopic
mo:musicBrainz
mo:myspace
…
= rdfs:subPropertyOf
= owl:inverseOf Slide38
Linked Data, RDFS and OWL:
Linked Vocabularies
SKOS
…
…
Image from
http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg
:
;
Giasson, BergmanSlide39
Hetereogenity in
naming
…
Tim Berners-Lee:
URIs
…
timbl:i
dblp:100007
identica:45563
adv:timbl
fb:en.tim_berners-lee
db:Tim-Berners_Lee
= owl:sameAs Slide40
Returning to our simple query…
Gimme
webpages
relating to
Tim Berners-Lee
foaf:page
timbl:i
timbl:i
foaf:page ?pages .
... 7 x 6 = 42 possible patternsfoaf:homepage foaf:isPrimaryTopicOf
doap:homepage
foaf:topic
foaf:primaryTopic
mo:myspace
SKOS
dblp:100007
identica:45563
adv:timbl
fb:en.tim_berners-lee
db:Tim-Berners_LeeSlide41
…reasoning to the rescue?Slide42
Challenges……what (OWL) reasoning is feasible for Linked Data?Slide43
Linked Data Reasoning:
Challenges
Scalable
Expressive
Robust
Domain-AgnosticSlide44
Scalability
At least tens of billions of statements (for the moment)
Near linear scale!!!
Noisy data
Inconsistencies galore
Publishing errors
Linked Data Reasoning:
ChallengesSlide45
Challenges (Semantic Web Wikipedia Article)
Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web.
Vastness:
The World Wide Web contains at least 48 billion pages as of this writing (August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms.
Any automated reasoning system will have to deal with truly huge inputs.
Vagueness:
These are imprecise concepts like "young" or "tall". This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness.Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty.Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined.
Deductive reasoning fails catastrophically when faced with inconsistency, because "anything follows from a contradiction". Defeasible reasoning and paraconsistent reasoning are two techniques which can be employed to deal with inconsistency.Deceit: This is when
the producer of the information is intentionally misleading
the consumer of the information. Cryptography techniques are currently utilized to ameliorate this threat.Linked Data Reasoning: ChallengesSlide46
Proposition 1 Web data is noisy.
Proof:
08445a31a78661b5c746feff39a9db6e4e2cc5cf
sha1-sum of
‘
mailto
:’common value for foaf:mbox_sha1sumAn inverse-functional (uniquely identifying) property!!!Any person who shares the same value will be considered the sameQ.E.D.
Noisy Data: Omnipotent BeingSlide47
Alternate proof (courtesy of
http://www.eiao.net/rdf/1.0
)
rdf:type rdf:type owl:Property .
rdf:type rdfs:label “type”@en .
rdf:type rdfs:comment “Type of resource” .
rdf:type rdfs:domain eiao:testRun .
rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .
rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .
rdf:type rdfs:domain eiao:header .
rdf:type rdfs:domain eiao:runs .Noisy Data: Redefining everything …and home in time for teaSlide48
foaf:Person owl:disjointWith foaf:Document .
Inconsistent Data:
Cannot compute…Slide49
…herein, we look at (monotonic) rules.Expressive reasoning (also) possible through tableaux, but yet to demonstrate desired scaleSlide50
Rules
IF
⇒
THEN
Body
/Antecedent/ConditionHead/Consequent
?c
1
rdfs:subClassOf ?c2 . ?x rdf:type ?c1 . ⇒ ?x rdf:type ?c2 .
foaf:Person rdfs:subClassOf foaf:Agent .timbl:me rdf:type foaf:Person .
⇒ timbl:me rdf:type foaf:Agent .Schema/Terminology/OntologicalInstance/AssertionalSlide51
Rules (Inconsistencies [a.k.a. Contradictions])
IF
⇒
THEN
?c
1 owl:disjointWith ?c
2 . ?x rdf:type ?c
1
. ?x rdf:type ?c2 . ⇒ falsefoaf:Person owl:disjointWith foaf:Document .ex:sleepygirl rdf:type foaf:Person .
ex:sleepygirl rdf:type foaf:Document .⇒ false
Body/Antecedent/ConditionHead/ConsequentSlide52
Materialisation (Forward-Chaining):
Write the consequences of the rules down
Executing rules: MaterialisationSlide53
One size does
not fit all!
Don't materialise
too much!
Materialisation
Forward-chaining Materialisation
Avoid runtime expense
Users taught impatience by Google
Pre-compute for quick retrieval
Web-scale systems should scale well
More data = more disk-space/machinesSlide54
INPUT:
Flat file of triples (quads)
OUTPUT:
Flat file of (partial) inferred triples (quads)Slide55
“Standard”
RDFS
OWL 2 RL
(W3C Rec: 27 Oct. 2009)
“Non-standard”
DLP
pD* (
OWL Horst)OWL–…What rulesets
?Slide56
Let’s look at a recent corpus of Linked Data and see what schema’s inside
(and what the rulesets support)
Open-domain crawl May 2010
1.1 billion quadruples
3.985 million sources (docs)
780 pay-level domains (e.g.,
dbpedia.org
) Ran “special” PageRank over documents 86 thousand docs contained some RDFS/OWL schema data (2.2% of docs... but <0.2% of triples)Summated ranks of docs using each primitive
What rules?Slide57
Survey of Linked Data schema: Top 15 ranks
# Axiom Rank(
Σ
) RDFS Horst O2R
rdfs:subClassOf
0.295
✓
✓ ✓rdfs:range 0.294 ✓ ✓ ✓rdfs:domain 0.292
✓ ✓ ✓
rdfs:subPropertyOf
0.090 ✓ ✓ ✓owl:FunctionalProperty 0.063 ✘ ✓ ✓owl:disjointWith
0.049 ✘ ✘ ✓
owl:inverseOf 0.047 ✘ ✓ ✓owl:unionOf 0.035 ✘ ✘ ✓owl:SymmetricProperty
0.033 ✘ ✓
✓owl:TransitiveProperty 0.030
✘ ✓ ✓
owl:equivalentClass 0.021
✘ ✓ ✓owl:InverseFunctionalProperty 0.030 ✘ ✓
✓owl:equivalentProperty 0.030 ✘ ✓ ✓
owl:someValuesFrom 0.030 ✘ ✓ ✓
owl:hasValue 0.028 ✘ ✓ ✓Slide58
What about noise? ……need to consider the
provenance
of Web dataSlide59
Consider source of schema data
Class/property URIs dereference to their
authoritative
document
FOAF spec authoritative for
foaf:Person
✓ MY spec not authoritative for foaf:Person ✘Allow “extension” in third-party documentsmy:Person rdfs:subClassOf foaf:Person . (MY spec) ✓
BUT: Reduce obscure memberships
foaf:Person rdfs:subClassOf my:Person .
(MY spec) ✘ALSO: Protect specificationsfoaf:knows a owl:SymmetricProperty . (MY spec) ✘
Authoritative
ReasoningSlide60
More proof (courtesy of
http://www.eiao.net/rdf/1.0
)
rdf:type rdf:type owl:Property .
rdf:type rdfs:label “type”@en .
rdf:type rdfs:comment “Type of resource” .
rdf:type rdfs:domain eiao:testRun .
rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .
rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .
rdf:type rdfs:domain eiao:header .
rdf:type rdfs:domain eiao:runs .60 Noisy Data: Redefining everything …and home in time for tea
Not AuthoritativeSlide61
Gong Cheng, Yuzhong Qu. "
Integrating Lightweight Reasoning into Class-Based Query Refinement for Object Search
." ASWC 2008.
Aidan Hogan, Andreas Harth, Axel Polleres.
"
Scalable Authoritative OWL Reasoning for the Web
." IJSWIS 2009.
Aidan Hogan, Jeff Z. Pan, Axel Polleres and Stefan Decker. "SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples." ISWC 2010. My thesis: http://aidanhogan.com/docs/thesis/ (or use Google).
Authoritative Reasoning: read more …w/ essential plugsSlide62
62
Quarantined reasoning!
Separate and cache hierarchy of schema documents/dependencies…
Alternative to Authoritative Reasoning?Slide63
63
Quarantined
Reasoning [Delbru et al.; 2008]Slide64
64
Quarantined
Reasoning [Delbru et al.; 2008]Slide65
65
Quarantined
Reasoning [Delbru et al.; 2008]Slide66
66
A-Box / Instance Data
(e.g, a FOAF file)
T-Box / Ontology Data
(e.g., the FOAF ontology and its indirect imports)
Quarantined
Reasoning [Delbru et al.; 2008]Slide67
More proof (courtesy of
http://www.eiao.net/rdf/1.0
)
rdf:type rdf:type owl:Property .
rdf:type rdfs:label “type”@en .
rdf:type rdfs:comment “Type of resource” .
rdf:type rdfs:domain eiao:testRun .
rdf:type rdfs:domain eiao:pageSurvey .rdf:type rdfs:domain eiao:siteSurvey .rdf:type rdfs:domain eiao:scenario .rdf:type rdfs:domain eiao:rangeLocation .
rdf:type rdfs:domain eiao:startPointer .rdf:type rdfs:domain eiao:endPointer .
rdf:type rdfs:domain eiao:header .
rdf:type rdfs:domain eiao:runs .Noisy Data: Redefining everything …and home in time for tea
Not In HereSlide68
R. Delbru, A. Polleres, G. Tummarello and S. Decker.
"
Context Dependent Reasoning for Semantic Documents in Sindice
.
“
4th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2008.Quarantined Reasoning: read moreSlide69
…what about owl:sameAs?Slide70
70
Consolidation for Linked DataSlide71
Use provided owl:sameAs mappings in the data
timbl:i
owl:sameas
identica:45563 .
dbpedia:Berners-Lee
owl:sameas
identica:45563
.Store “equivalences” foundtimbl:i -> identica:45563 ->dbpedia:Berners-Lee ->
timbl:i
identica:45563
dbpedia:Berners-Lee
Consolidation: BaselineSlide72
For each set of equivalent identifiers, choose a canonical term
timbl:i
identica:45563
dbpedia:Berners-Lee
Consolidation:
BaselineSlide73
Afterwards, rewrite identifiers to their canonical version:
Canonicalisation
timbl:i
rdf:type foaf:Person .
identica:48404 foaf:knows
identica:45563
.
dbpedia:Berners-Lee dpo:birthDate “1955-06-08”^^xsd:date
.
dbpedia:Berners-Lee
rdf:type foaf:Person .identica:48404 foaf:knows dbpedia:Berners-Lee .dbpedia:Berners-Lee
dpo:birthDate “1955-06-08”^^xsd:date .
timbl:iidentica:45563dbpedia:Berners-LeeSlide74
Infer owl:sameAs through reasoning (OWL 2 RL/RDF)
explicit
owl:sameAs
(again)owl:InverseFunctionalProperty
owl:FunctionalProperty
owl:cardinality 1 / owl:maxCardinality 1
foaf:homepage a owl:InverseFunctionalProperty
.timbl:i foaf:homepage w3c:timblhomepage .adv:timbl foaf:homepage w3c:timblhomepage .
⇒timbl:i owl:sameas adv:timbl
.
…then apply consolidation as beforeExtended ConsolidationSlide75
For our Linked Data corpus:
~12 million explicit
owl:sameAs
triples (as before)~8.7 million thru. owl:InverseFunctionalProperty
~106 thousand thru.
owl:FunctionalProperty
none thru.
owl:cardinality/owl:maxCardinalityIn terms of equivalences found (baseline vs. extended):~2.8 million sets of equivalent identifiers(1.31x baseline)~14.86 million identifiers involved(2.58x baseline)~5.8 million URIs!!(1.014x baseline)!!
Consolidation:
ResultsSlide76
Conclusion…Slide77
Heterogeneity poses a significant problem for consuming Linked DataHeterogenity in schema
Heterogenity in naming
…but we can use the mappings provided by publishers to integrate heterogeneous Linked Data corpora
(with a little caution)
Lightweight rule-based reasoning can go a long way
Deceit/Noise
≠
End Of World Consider source of data!Inconsistency ≠ End Of World Useful for finding noise in fact!Explicit owl:sameAs vs. extended consolidation:
Extended consolidation mostly (but not entirely) for consolidating blank-nodes from older FOAF exporters
ConclusionsSlide78
How can we reason at Web scale?
Scalable/distributed rule-based materialisation over MapReduce using the WebPIE system
Next up…Slide79
timbl:i
foaf:page
?pages
.
timbl:i
identica:45563
dbpedia:Berners-Lee
dbpedia:Berners-Lee
foaf:page
?pages .Slide80
80
Authoritative
Reasoning (Appendix)
OWL 2 RL rule
prp-inv1
?p
1
owl:inverseOf ?p2 . ?x ?p1 ?y .
⇒ ?y ?p2 ?x .
OWL 2 RL rule prp-inv2?p1 owl:inverseOf ?p
2 .
?x ?p2
?y .
⇒ ?y ?p1 ?x .
TBOX:
foo:doesntKnow owl:inverseOf foaf:knows .
(from foo:)
ABOX:bar:Aidan foo:doesntKnow
bar:Axel . bar:Stefan
foaf:knows bar:Jeff .
AUTHORITATIVE INFERENCE:
bar:Axel
foaf:knows
bar:Aidan
.
bar:Jeff
foo:doesntKnow
bar:Stefan
.
✓
✘