and Abuses Lessons Learned in IBM Rational Arthur Ryman 20140415 Best Practices Publishing vocabularies Data model customization Realworld things JSON and RDF Multivalued and optional properties ID: 676263
Download Presentation The PPT/PDF document "Linked Data Best Practices" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Linked Data Best Practices(and Abuses)Lessons Learned in IBM Rational
Arthur Ryman
2014-04-15Slide2
Best PracticesPublishing vocabulariesData model customization
Real-world things
JSON and RDF
Multi-valued and optional propertiesProvenance and inverse propertiesOntologies and constraints
2Slide3
Publishing vocabularies
3Slide4
Publishing vocabulariesWe should use established vocabularies if they exist
W3C, Dublin Core, OSLC, …
Any new terms we define should be described in vocabulary documents rooted at
http://jazz.net/nspropose generally useful terms to OSLC
When you look up an RDF term, you should get its vocabulary document
HTML for web browsers
RDF for programs, e.g. query builders
e.g.
http://jazz.net/ns/qm/rqm#Category
4Slide5
Vocabulary page for http://jazz.net/ns/qm/rqm#Category
5Slide6
How to publish a vocabularyWe have a new public wiki!https://jazz.net/wiki/bin/view/LinkedData
Read the
guidelines
Create a wiki page and attach the HTML, Turtle, and RDF/XML files
Request a review from Nelson
Allow
dev
time to address issues
Arthur will redirect jazz.net/ns to the wiki
6Slide7
LinkedData wiki
7Slide8
AbusesYou published your vocabulary but skimped on the content
e.g. minimal or cryptic comments
You published your vocabulary, but didn’t keep it up-to-date
e.g. Focal Point 227292You created some new terms but didn’t publish your vocabulary
e.g. JLIP Tracked Resource Set
306919
8Slide9
Data model customization
9Slide10
Data model customizationMany of our tools allow customization
e.g. RTC work items
We need to expose the custom data elements as RDF
Tools should allow users to map custom data elements to externally defined RDF termsindustry standards
corporate standards
When no mapping is specified, tools should generate local RDF terms and vocabularies
vocabularies are needed by query authors
tools must host the vocabularies they generate
10Slide11
AbusesYour tool generates a cryptic URI for local RDF terms
Obfuscates meaning
Forces humans to access vocabulary document
Your tool does not generate a vocabulary document for local RDF termse.g. RTC 304143
see following case study
When the mapping to RDF is changed, your tool does not create TRS change events for just the affected resources
11Slide12
Case study: RTC Work ItemsSome attributes are built-inSome are defined by OSLC CM 2.0
Some are user defined
Consider
Priority
12Slide13
Project area editor allows customization
13Slide14
Enumerated values should specify RDF URIs (External Value)
14Slide15
Priority values are enumerated
15Slide16
Get the resource URL
16Slide17
Look for priority in the RDF representation of Task 224727
17Slide18
RDF triple for prioritySubject (good)
<
https
://jazzop05.rtp.raleigh.ibm.com:9943/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/224727>
Predicate (bad)
<
http
://
open-services.net/ns/cm-x#priority
>Object (ugly) <
https
://jazzop05.rtp.raleigh.ibm.com:9943/jazz/oslc/enumerations/_
QYx2UBIzEd6bpunPP4ZLOA/priority/priority.literal.l3
>
18Slide19
Object of priority is not an RDF vocabulary term
19Slide20
ProblemsThe priority
predicate comes from a non-existent vocabulary (bad)
http://open-services.net/ns/cm-x#
RDF vocabularies should be dereferenceableOSLC should publish it, tagged as
archaic
The object is a dereferenceable URI (good), but not a vocabulary term (ugly)
Need
rdfs:label
,
rdfs:comment for query authorsResult: no easy way to write queries based on priority
20Slide21
Best Practice for external vocabulariesRTC project template should refer to external vocabularies for standard terms
OSLC CM V3 defines priority and 4 values
Teach and enable clients to create corporate standard vocabularies for reuse of common terms (UA)
Needed for cross-project queriesProvide export/import UI to manage vocabularies
E.g. Focal Point uses simple spreadsheet format
21Slide22
Best Practice for local vocabulariesRTC (and all other tools) should generate a local RDF vocabulary for all user-defined terms
Include
rdfs:label
, rdfs:comment for query authors (and other consumers)LQE admin should load user-defined vocabularies into LQE to make them available to queries
provide programmatic integration, e.g. a special purpose vocabulary TRS
22Slide23
Best Practice for all vocabulariesWhen an administrator changes the RDF representation of a set of resources, corresponding change events MUST be added to the TRS change log
Add/remove custom attributes and values
Modify mapping to RDF URIs
Allow the administrator to make multiple representation changes and then manually trigger the generation of change events
Batch multiple representation changes together to minimize re-indexing time and server load
23Slide24
Real-world things
24Slide25
La Trahison des Images
"The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it's just a
representation
, is it not? So if I had written on my picture "This is a pipe", I'd have been lying
!“
-
René Magritte
25Slide26
Real-world thingsLinked Data differentiates between two kinds of thing
Information, e.g. a document on the web
Real-world, e.g. a person
Both kinds should be identified with HTTP URIsLooking up a real-world URI should result in an information resource that contains information about the real-world thing
URI-references (hash URIs)
HTTP redirect: 303 See Other (303 URIs)
Refer to
Cool URIs for the Semantic Web
26Slide27
Example foaf:PersonSuppose you create a document,
http
://
people.org/johnsmith, about John Smith on 2013-09-17The following is nonsense because John Smith was not created on 2013-09-17:
<http
://people.org/johnsmith> a
foaf:Person
.
<http://people.org/johnsmith>
dcterms:created “2013-09-17”^^
xsd:date
.
The following makes sense:
<http://
people.org/johnsmith#me>
a
foaf:Person
.
<http://people.org/johnsmith>
dcterms:created
“2013-09-17”^^
xsd:date
.
27Slide28
AbusesFailure to differentiate between a person and an account owned by a personLeads to nonsense triples
Focal Point Defect 234212
JTS Defect 307861See following JTS users case
study
NOTE
: email address is the preferred way to identify people across tools
28Slide29
Work items refer to people
29Slide30
JTS UsersOSLC Core specifies that the object of dcterms:creator
,
dcterms:contributor
, oslc:modifiedBy should be a resource of class foaf:Agent
or
foaf:Person
(real-world)
RTC implements OSLC CM and has triples
like:
<https://jazz.net/jazz02/resource/...WorkItem/72226>
dcterms
: creator <https://jazz.net/jts04/users/ryman> ,
dcterms:contributor
<https
://
jazz.net/jts04/users/retchles> .
30Slide31
RDF representation of person contains nonsense
31Slide32
Best PracticeThe property j.1:archived applies to the user account (information resource), not the person (real-world)
Solution 1: use hash URIs for people:
<https
://jazz.net/jts04/users/ryman#me>Solution 2: use 303 URIs for accounts (preferred
by Philippe):
<https://
jazz.net/jts04/accounts/ryman>
32Slide33
303 URI Solution@prefix foaf
: <http://xmlns.com/foaf/0.1/>.
@prefix
jfs: <http://jazz.net/xmlns/prod/jazz/jfs/1.0/>.<https://jazz.net/jts04/accounts/ryman> a
foaf:OnlineAccount
,
jfs:archived
false.
<https://jazz.net/jts04/users/ryman> a foaf:Person; foaf:account
< https://jazz.net/jts04/accounts/ryman> ,
foaf:img
<https://jazz.net/jts04/users/photo/ryman>;
foaf:mbox
<mailto:ryman@ca.ibm.com>;
foaf:name
"Arthur Ryman";
foaf:nick
"
ryman
".
33Slide34
Json and rdf
34Slide35
JSONFamiliar to OO and Web developersPopularity fueled by Cloud
e.g. Amazon uses JSON as the payload in AWS REST APIs as an alternative to SOAP and XML
Simpler/faster to handle by web clients
Use is spreading across the stackMongoDB
,
CouchDB
/
Cloudant
node.js
35Slide36
JSON and RDFSome developers are saying:
“JSON is simpler and more popular than RDF. Let’s use JSON instead of RDF.”
This is a
false dichotomyJSON is just as problematic as XML for data integrationJSON and XML are
message formats
Linked Data is our integration strategy
RDF expresses
semantics
Use
JSON-LD, now a W3C standardOSLC and Rational should publish standard
contexts
See following
LQE Security Context case study
36Slide37
Initial JSON designSimple, but no explicit semantics
Use of UUIDs instead of HTTP URIs
[
{
"
security_context_id
" : "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
"name" : "Resources for Alpha project"
},
{
"
security_context_id
" : "urn:uuid:g92e5gbf-8efd-22e1-b876-11b1d02f7cg7",
"
name" : "Resources for Beta project"
}
]
37Slide38
Equivalent JSON-LD design{
"@
context": { "@base": "
https://example.com/
sc
",
"
dcterms": "http://purl.org/dc/terms/"
},
"@
graph": [
{ "@
id": "#1",
"
dcterms:title
": "Resources for Alpha project"
},
{ "@
id": "#2",
"
dcterms:title
": "Resources for Beta project"
}
]
}
38Slide39
Final JSON-LD design with type info{
"@
graph": [ { "@
id": "
https://example.com/
sc
",
"@
type": "http://open-services.net/ns/core/sc#SecurityContextList
"
},
{
"@
id": "
https://example.com/sc#1
",
"@
type": "
http://open-services.net/ns/core/
sc#SecurityContext
",
"
http://purl.org/dc/terms/title
": "Resources for Alpha project"
},
{
"@
id": "
https://example.com/sc#2
",
"@
type": "
http://open-services.net/ns/core/
sc#SecurityContext
",
"
http://purl.org/dc/terms/title
": "Resources for Beta project"
}
]
}
39Slide40
Multi-valued and optional properties
40Slide41
Multi-valued and optional propertiesRDF documentations contain sets
of triples
Model multi-valued properties by a set of triples that share a common subject and object
Model the absence of an optional property by an empty set of triples
41Slide42
AbusesModel multiple values of a property by concatenating the values into a single object
Defeats database indexing
Slows queries since substring matching must be used
Model the absence of an optional value using the presence of an empty stringAdds many unnecessary triples
Slows queries (longer scans)
Sometimes an empty string is a meaning value
Sometimes an empty string is lexically invalid
See following RTC tag case study
Defect 271867
42Slide43
“Tags” is multi-valued“Estimate” is optional
43Slide44
RDF representation@prefix
dcterms
:
<http://purl.org/dc/terms/> .@prefix rtc_cm: <http
://jazz.net/xmlns/prod/jazz/rtc/cm/1.0
/> .
@prefix
xsd
: <http
://www.w3.org/2001/XMLSchema#> .@base <https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem
/> .
<271867>
dcterms:subject
"
datagap
,
oslc
,
next_release_candidate
,
data_gap
,
reporting-gap"^^
xsd:string
;
…
rtc_cm:estimate
"
"
^^
xsd:long
.
Syntax validated OK. There were warnings:
Typed
literal has an invalid lexical value: Input string was not in the correct format:
s.Length
==0.: ""^^<http://www.w3.org/2001/XMLSchema#long>.
44Slide45
Provenance and inverse properties
45Slide46
Provenance: Where did the triple come from?A statement is represented by a triple
Triples from multiple documents may be merged and queried
Default graph is a triple store
When storing RDF documents, the document URL is often used as the name of a graph (e.g. in LQE)triple + graph name = quad
triple stores are really quad stores
Provenance of triples is important in several use cases
Updating a document
Access control
VVC (which version)
46Slide47
Provenance and authorityThe authority (trust) of a triple depends on the author of the document that contains the tripleTriples should be placed in the document that the author is authorized to modify
When creating a link from A to B, put the link in the document that the author is editing, not necessarily A or B or both
Document C may contain links from A to B
47Slide48
Inverse propertiesDirected relations between resources (links) may be stated in two equivalent ways, e.g.
Testcase1 validates Requirement2 .
Requirement2
isValidatedBy Testcase1 .There is no benefit to having mutual inverse pairs of properties
The existence of mutual inverse pairs of properties makes query authoring more complex, and query execution more expensive
A triple should be put in the document that the author of the triple is editing (provenance)
There is no special significance attached to being the subject of a triple
See
OSLC guidance
on preferred direction of properties
Direction should be from downstream to upstream,
e.g. test case validates requirement
48Slide49
AbusesOSLC domain specs define many pairs of mutual inverse predicatesRecommendation
Deprecate one member of each pair
Replace deprecated property in all RDF representations and queries
49Slide50
Ontologies and constraints
50Slide51
Vocabularies and OntologiesA vocabulary defines the meaning of terms
Use RDFS:
rdfs:label
, rdfs:comment, rdfs:isDefinedBy
,
…
An
ontology
defines inference rules
Given a set of triples, infer more triplesUse RDFS: rdfs:domain
,
rdfs:range
,
rdfs:subClassOf
,
…
Use OWL for more complex inference rules
51Slide52
Ontologies and ConstraintsOntologies are not designed to define integrity constraints
See
Linked Data Interfaces
for examplesAn RDFS or OWL reasoner
will add triples to create a model for the ontology
A reasoner will report an
inconsistency
if it cannot create a model
However, this mechanism cannot in practice be used to check for typical integrity constraints
52Slide53
Best Practice: OntologiesYour triples may end up in a reasoner one day, so only add inference rules when they produce the intended results
If you define generic properties, such as “
uses
”, then you probably SHOULD NOT define rdfs:domain and
rdfs:range
If you define type-specific properties, such as “
usesTestCase
” then
rdfs:domain
and rdfs:range MAY make sensee.g
.
If you intend to
infer
that the object of
oslc_qm:usesTestCase
is an
oslc_qm:TestCase
then include the following triple in an ontology:
oslc_qm:usesTestCase
rdfs:range
oslc_qm:TestCase
.
53Slide54
Best Practice: ConstraintsW3C is starting an activity on RDF validationSee
W3C workshop
We have submitted the OSLC Resource Shape specification to W3C
See Resource Shape 2.0
Use Resource Shape 2.0 to describe integrity constraints on RDF documents
54Slide55
Other topicsBlank nodesMean
there exists
or
someuse fragment ids for internal resourcesContainersAvoid
Seq
, Bag, List
Use
Linked Data Platform
containers
Consuming external vocabulariesTools should gracefully degrade when external resources are unreachable
Be a well-behaved HTTP client
wrt
caching, etc.
55