/
Linked Data Best Practices Linked Data Best Practices

Linked Data Best Practices - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
351 views
Uploaded On 2018-09-23

Linked Data Best Practices - PPT Presentation

and Abuses Lessons Learned in IBM Rational Arthur Ryman 20140415 Best Practices Publishing vocabularies Data model customization Realworld things JSON and RDF Multivalued and optional properties ID: 676263

http rdf jazz net rdf http net jazz rdfs https vocabulary oslc foaf terms document triples triple dcterms data vocabularies ryman org

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Linked Data Best Practices" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Linked Data Best Practices(and Abuses)Lessons Learned in IBM Rational

Arthur Ryman

2014-04-15Slide2

Best PracticesPublishing vocabulariesData model customization

Real-world things

JSON and RDF

Multi-valued and optional propertiesProvenance and inverse propertiesOntologies and constraints

2Slide3

Publishing vocabularies

3Slide4

Publishing vocabulariesWe should use established vocabularies if they exist

W3C, Dublin Core, OSLC, …

Any new terms we define should be described in vocabulary documents rooted at

http://jazz.net/nspropose generally useful terms to OSLC

When you look up an RDF term, you should get its vocabulary document

HTML for web browsers

RDF for programs, e.g. query builders

e.g.

http://jazz.net/ns/qm/rqm#Category

4Slide5

Vocabulary page for http://jazz.net/ns/qm/rqm#Category

5Slide6

How to publish a vocabularyWe have a new public wiki!https://jazz.net/wiki/bin/view/LinkedData

Read the

guidelines

Create a wiki page and attach the HTML, Turtle, and RDF/XML files

Request a review from Nelson

Allow

dev

time to address issues

Arthur will redirect jazz.net/ns to the wiki

6Slide7

LinkedData wiki

7Slide8

AbusesYou published your vocabulary but skimped on the content

e.g. minimal or cryptic comments

You published your vocabulary, but didn’t keep it up-to-date

e.g. Focal Point 227292You created some new terms but didn’t publish your vocabulary

e.g. JLIP Tracked Resource Set

306919

8Slide9

Data model customization

9Slide10

Data model customizationMany of our tools allow customization

e.g. RTC work items

We need to expose the custom data elements as RDF

Tools should allow users to map custom data elements to externally defined RDF termsindustry standards

corporate standards

When no mapping is specified, tools should generate local RDF terms and vocabularies

vocabularies are needed by query authors

tools must host the vocabularies they generate

10Slide11

AbusesYour tool generates a cryptic URI for local RDF terms

Obfuscates meaning

Forces humans to access vocabulary document

Your tool does not generate a vocabulary document for local RDF termse.g. RTC 304143

see following case study

When the mapping to RDF is changed, your tool does not create TRS change events for just the affected resources

11Slide12

Case study: RTC Work ItemsSome attributes are built-inSome are defined by OSLC CM 2.0

Some are user defined

Consider

Priority

12Slide13

Project area editor allows customization

13Slide14

Enumerated values should specify RDF URIs (External Value)

14Slide15

Priority values are enumerated

15Slide16

Get the resource URL

16Slide17

Look for priority in the RDF representation of Task 224727

17Slide18

RDF triple for prioritySubject (good)

<

https

://jazzop05.rtp.raleigh.ibm.com:9943/jazz/resource/itemName/com.ibm.team.workitem.WorkItem/224727>

Predicate (bad)

<

http

://

open-services.net/ns/cm-x#priority

>Object (ugly) <

https

://jazzop05.rtp.raleigh.ibm.com:9943/jazz/oslc/enumerations/_

QYx2UBIzEd6bpunPP4ZLOA/priority/priority.literal.l3

>

18Slide19

Object of priority is not an RDF vocabulary term

19Slide20

ProblemsThe priority

predicate comes from a non-existent vocabulary (bad)

http://open-services.net/ns/cm-x#

RDF vocabularies should be dereferenceableOSLC should publish it, tagged as

archaic

The object is a dereferenceable URI (good), but not a vocabulary term (ugly)

Need

rdfs:label

,

rdfs:comment for query authorsResult: no easy way to write queries based on priority

20Slide21

Best Practice for external vocabulariesRTC project template should refer to external vocabularies for standard terms

OSLC CM V3 defines priority and 4 values

Teach and enable clients to create corporate standard vocabularies for reuse of common terms (UA)

Needed for cross-project queriesProvide export/import UI to manage vocabularies

E.g. Focal Point uses simple spreadsheet format

21Slide22

Best Practice for local vocabulariesRTC (and all other tools) should generate a local RDF vocabulary for all user-defined terms

Include

rdfs:label

, rdfs:comment for query authors (and other consumers)LQE admin should load user-defined vocabularies into LQE to make them available to queries

provide programmatic integration, e.g. a special purpose vocabulary TRS

22Slide23

Best Practice for all vocabulariesWhen an administrator changes the RDF representation of a set of resources, corresponding change events MUST be added to the TRS change log

Add/remove custom attributes and values

Modify mapping to RDF URIs

Allow the administrator to make multiple representation changes and then manually trigger the generation of change events

Batch multiple representation changes together to minimize re-indexing time and server load

23Slide24

Real-world things

24Slide25

La Trahison des Images

"The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it's just a

representation

, is it not? So if I had written on my picture "This is a pipe", I'd have been lying

!“

-

René Magritte

25Slide26

Real-world thingsLinked Data differentiates between two kinds of thing

Information, e.g. a document on the web

Real-world, e.g. a person

Both kinds should be identified with HTTP URIsLooking up a real-world URI should result in an information resource that contains information about the real-world thing

URI-references (hash URIs)

HTTP redirect: 303 See Other (303 URIs)

Refer to

Cool URIs for the Semantic Web

26Slide27

Example foaf:PersonSuppose you create a document,

http

://

people.org/johnsmith, about John Smith on 2013-09-17The following is nonsense because John Smith was not created on 2013-09-17:

<http

://people.org/johnsmith> a

foaf:Person

.

<http://people.org/johnsmith>

dcterms:created “2013-09-17”^^

xsd:date

.

The following makes sense:

<http://

people.org/johnsmith#me>

a

foaf:Person

.

<http://people.org/johnsmith>

dcterms:created

“2013-09-17”^^

xsd:date

.

27Slide28

AbusesFailure to differentiate between a person and an account owned by a personLeads to nonsense triples

Focal Point Defect 234212

JTS Defect 307861See following JTS users case

study

NOTE

: email address is the preferred way to identify people across tools

28Slide29

Work items refer to people

29Slide30

JTS UsersOSLC Core specifies that the object of dcterms:creator

,

dcterms:contributor

, oslc:modifiedBy should be a resource of class foaf:Agent

or

foaf:Person

(real-world)

RTC implements OSLC CM and has triples

like:

<https://jazz.net/jazz02/resource/...WorkItem/72226>

dcterms

: creator <https://jazz.net/jts04/users/ryman> ,

dcterms:contributor

<https

://

jazz.net/jts04/users/retchles> .

30Slide31

RDF representation of person contains nonsense

31Slide32

Best PracticeThe property j.1:archived applies to the user account (information resource), not the person (real-world)

Solution 1: use hash URIs for people:

<https

://jazz.net/jts04/users/ryman#me>Solution 2: use 303 URIs for accounts (preferred

by Philippe):

<https://

jazz.net/jts04/accounts/ryman>

32Slide33

303 URI Solution@prefix foaf

: <http://xmlns.com/foaf/0.1/>.

@prefix

jfs: <http://jazz.net/xmlns/prod/jazz/jfs/1.0/>.<https://jazz.net/jts04/accounts/ryman> a

foaf:OnlineAccount

,

jfs:archived

false.

<https://jazz.net/jts04/users/ryman> a foaf:Person; foaf:account

< https://jazz.net/jts04/accounts/ryman> ,

foaf:img

<https://jazz.net/jts04/users/photo/ryman>;

foaf:mbox

<mailto:ryman@ca.ibm.com>;

foaf:name

"Arthur Ryman";

foaf:nick

"

ryman

".

33Slide34

Json and rdf

34Slide35

JSONFamiliar to OO and Web developersPopularity fueled by Cloud

e.g. Amazon uses JSON as the payload in AWS REST APIs as an alternative to SOAP and XML

Simpler/faster to handle by web clients

Use is spreading across the stackMongoDB

,

CouchDB

/

Cloudant

node.js

35Slide36

JSON and RDFSome developers are saying:

“JSON is simpler and more popular than RDF. Let’s use JSON instead of RDF.”

This is a

false dichotomyJSON is just as problematic as XML for data integrationJSON and XML are

message formats

Linked Data is our integration strategy

RDF expresses

semantics

Use

JSON-LD, now a W3C standardOSLC and Rational should publish standard

contexts

See following

LQE Security Context case study

36Slide37

Initial JSON designSimple, but no explicit semantics

Use of UUIDs instead of HTTP URIs

[

{

"

security_context_id

" : "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6",

"name" : "Resources for Alpha project"

},

{

"

security_context_id

" : "urn:uuid:g92e5gbf-8efd-22e1-b876-11b1d02f7cg7",

"

name" : "Resources for Beta project"

}

]

37Slide38

Equivalent JSON-LD design{

"@

context": { "@base": "

https://example.com/

sc

",

"

dcterms": "http://purl.org/dc/terms/"

},

"@

graph": [

{ "@

id": "#1",

"

dcterms:title

": "Resources for Alpha project"

},

{ "@

id": "#2",

"

dcterms:title

": "Resources for Beta project"

}

]

}

38Slide39

Final JSON-LD design with type info{

"@

graph": [ { "@

id": "

https://example.com/

sc

",

"@

type": "http://open-services.net/ns/core/sc#SecurityContextList

"

},

{

"@

id": "

https://example.com/sc#1

",

"@

type": "

http://open-services.net/ns/core/

sc#SecurityContext

",

"

http://purl.org/dc/terms/title

": "Resources for Alpha project"

},

{

"@

id": "

https://example.com/sc#2

",

"@

type": "

http://open-services.net/ns/core/

sc#SecurityContext

",

"

http://purl.org/dc/terms/title

": "Resources for Beta project"

}

]

}

39Slide40

Multi-valued and optional properties

40Slide41

Multi-valued and optional propertiesRDF documentations contain sets

of triples

Model multi-valued properties by a set of triples that share a common subject and object

Model the absence of an optional property by an empty set of triples

41Slide42

AbusesModel multiple values of a property by concatenating the values into a single object

Defeats database indexing

Slows queries since substring matching must be used

Model the absence of an optional value using the presence of an empty stringAdds many unnecessary triples

Slows queries (longer scans)

Sometimes an empty string is a meaning value

Sometimes an empty string is lexically invalid

See following RTC tag case study

Defect 271867

42Slide43

“Tags” is multi-valued“Estimate” is optional

43Slide44

RDF representation@prefix

dcterms

:

<http://purl.org/dc/terms/> .@prefix rtc_cm: <http

://jazz.net/xmlns/prod/jazz/rtc/cm/1.0

/> .

@prefix

xsd

: <http

://www.w3.org/2001/XMLSchema#> .@base <https://jazz.net/jazz/resource/itemName/com.ibm.team.workitem.WorkItem

/> .

<271867>

dcterms:subject

"

datagap

,

oslc

,

next_release_candidate

,

data_gap

,

reporting-gap"^^

xsd:string

;

rtc_cm:estimate

"

"

^^

xsd:long

.

Syntax validated OK. There were warnings:

Typed

literal has an invalid lexical value: Input string was not in the correct format:

s.Length

==0.: ""^^<http://www.w3.org/2001/XMLSchema#long>.

44Slide45

Provenance and inverse properties

45Slide46

Provenance: Where did the triple come from?A statement is represented by a triple

Triples from multiple documents may be merged and queried

Default graph is a triple store

When storing RDF documents, the document URL is often used as the name of a graph (e.g. in LQE)triple + graph name = quad

triple stores are really quad stores

Provenance of triples is important in several use cases

Updating a document

Access control

VVC (which version)

46Slide47

Provenance and authorityThe authority (trust) of a triple depends on the author of the document that contains the tripleTriples should be placed in the document that the author is authorized to modify

When creating a link from A to B, put the link in the document that the author is editing, not necessarily A or B or both

Document C may contain links from A to B

47Slide48

Inverse propertiesDirected relations between resources (links) may be stated in two equivalent ways, e.g.

Testcase1 validates Requirement2 .

Requirement2

isValidatedBy Testcase1 .There is no benefit to having mutual inverse pairs of properties

The existence of mutual inverse pairs of properties makes query authoring more complex, and query execution more expensive

A triple should be put in the document that the author of the triple is editing (provenance)

There is no special significance attached to being the subject of a triple

See

OSLC guidance

on preferred direction of properties

Direction should be from downstream to upstream,

e.g. test case validates requirement

48Slide49

AbusesOSLC domain specs define many pairs of mutual inverse predicatesRecommendation

Deprecate one member of each pair

Replace deprecated property in all RDF representations and queries

49Slide50

Ontologies and constraints

50Slide51

Vocabularies and OntologiesA vocabulary defines the meaning of terms

Use RDFS:

rdfs:label

, rdfs:comment, rdfs:isDefinedBy

,

An

ontology

defines inference rules

Given a set of triples, infer more triplesUse RDFS: rdfs:domain

,

rdfs:range

,

rdfs:subClassOf

,

Use OWL for more complex inference rules

51Slide52

Ontologies and ConstraintsOntologies are not designed to define integrity constraints

See

Linked Data Interfaces

for examplesAn RDFS or OWL reasoner

will add triples to create a model for the ontology

A reasoner will report an

inconsistency

if it cannot create a model

However, this mechanism cannot in practice be used to check for typical integrity constraints

52Slide53

Best Practice: OntologiesYour triples may end up in a reasoner one day, so only add inference rules when they produce the intended results

If you define generic properties, such as “

uses

”, then you probably SHOULD NOT define rdfs:domain and

rdfs:range

If you define type-specific properties, such as “

usesTestCase

” then

rdfs:domain

and rdfs:range MAY make sensee.g

.

If you intend to

infer

that the object of

oslc_qm:usesTestCase

is an

oslc_qm:TestCase

then include the following triple in an ontology:

oslc_qm:usesTestCase

rdfs:range

oslc_qm:TestCase

.

53Slide54

Best Practice: ConstraintsW3C is starting an activity on RDF validationSee

W3C workshop

We have submitted the OSLC Resource Shape specification to W3C

See Resource Shape 2.0

Use Resource Shape 2.0 to describe integrity constraints on RDF documents

54Slide55

Other topicsBlank nodesMean

there exists

or

someuse fragment ids for internal resourcesContainersAvoid

Seq

, Bag, List

Use

Linked Data Platform

containers

Consuming external vocabulariesTools should gracefully degrade when external resources are unreachable

Be a well-behaved HTTP client

wrt

caching, etc.

55