/
©University of Glamorgan ©University of Glamorgan

©University of Glamorgan - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
372 views
Uploaded On 2017-01-19

©University of Glamorgan - PPT Presentation

the key to interoperability httpwwwheritagedataorg The SENESCHAL Project seneschal  n Historical   The steward or majordomo of a medieval great house 12 month AHRC funded project ID: 511642

alignment data services vocabularies data alignment vocabularies services web widgets introduction http linked skos heritagedata org schemes www seneschal

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "©University of Glamorgan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

©University of Glamorgan

“the key to interoperability”

http://www.heritagedata.org/Slide2

The SENESCHAL Projectseneschal

 n. Historical The steward or major-domo of a medieval great house

12 month AHRC funded projectMarch 2013 - February 2014DeliverablesControlled vocabularies online Linked data (SKOS)

Downloadable filesWeb services term suggestion, term validation, legacy data alignment

Tools to align data with controlled vocabulariesBrowser-based ‘widget’ controls

Introduction

- Vocabularies –

Data Alignment – Web

Services - Widgets

Slide3

You say potato, I say tomato…

Multiple datasets, multiple schema, multiple organisations, multiple languagesUnification of data schema is possible, BUT…

Incompatible terminology hinders cross search and prevents greater interoperabilityApplications attempting to reuse data must all individually sort out the same old problemsE.g. Get all the iron age post holes…

Feature

Period

Post-hole

IRON AGEPosthole

|ron agePOST HOLE

Iron age?

POSTHLOLE

EARLY

IRON AGE

POST HOLE (POSSIBLE)

250 BC

POSTHOLESC 500-200 B.C.

History repeated... Solution = data cleansing and controlled vocabularies?

Introduction

- Vocabularies –

Data Alignment – Web

Services - Widgets

Slide4

Solutions - SENESCHALControlled vocabularies (again)

Commonly agreed concepts, terminology and identifiersExisting / new thesauri – community contributions?Openness and availability

Licensing, web services, downloads, data formatsAlignment of existing dataData cleansing toolsAlignment techniquesAlignment of new dataInteractive embedded data entry tools

Validation at point of data entry Rather than trying to solve this familiar vocabulary problem, help to prevent it from happening in the first place

Introduction

- Vocabularies –

Data Alignment – Web

Services - Widgets Slide5

General System Architecture

SENESCHAL data store

Linked Data

(REST API)

SPARQL query endpoint

widget controls & applications

Web Services

(REST API)

Native vocabularies

Data conversion –

STELLAR (SKOS) templates

SKOS RDF vocabularies

(upload)

Additional metadata

Introduction

- Vocabularies –

Data Alignment – Web

Services - Widgets

Slide6

Vocabularies online as (SKOS) Linked Data

Vocabularies from English HeritageArchaeological Sciences Building

Materials ComponentsEvent TypeEvidenceFISH Archaeological ObjectsMaritime Craft Type

Monument TypePeriodsVocabularies from

RCAHMSArchaeological Objects Thesaurus Maritime Craft ThesaurusMonument Type Thesaurus - multilingual - including Scottish Gaelic translations

Vocabularies from RCAHMWMonument Type Thesaurus PeriodMoving from term based towards concept based indexing

Start to create links between concepts… between vocabularies… between datasets… between sites… between countriesCross searching of thesauri from different providersCross searching of (multilingual) cultural heritage resources

Introduction -

Vocabularies

Data Alignment – Web

Services - Widgets

Slide7

Linked Data APIThe project implements a Linked Data (restful) API

The base URI is http://purl.org/heritagedata/ Seneschal is a sub-project within the wider scope of ‘www.heritagedata.org’ – so:

http://www.heritagedata.org/blog/seneschal - wiki/blog for project details, and<base uri

>/schemes/123 (e.g.) for actual data API – see below…REST API:

/schemes – return list of all SKOS concept schemes held/schemes/{id} – return details of specified SKOS concept scheme

/schemes/{id}.html, .n3, .rdf, .json

– return different serializations of that data, obtained either by content negotiation or by direct request including extension/

schemes/{id} /concepts/{id} – return details of specified SKOS concept (current version)/schemes/{id

}/concepts/{id}.html, .n3, .

rdf

, .

json

– return different serializations of the data, obtained either by content negotiation or by direct request including extension

Introduction -

Vocabularies – Data Alignment – Web Services - Widgets Slide8

Multilinguality

Multilingual labels & notesLabels attached to concepts – so possible to search in one language, retrieve in another

Introduction -

Vocabularies

Data Alignment – Web Services - Widgets

Slide9

Online Linked Data – Examples

Introduction -

Vocabularies

Data Alignment – Web

Services - Widgets

Slide10

Bulk Data Alignment Exercise

Bulk metadata alignmentADS OASIS ADS ImageBank

Alignment of specific fields against 3 vocabulariesMonument typesObject typesPeriods

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide11

Typical alignment issues encountered

Simple spelling errorsPOSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”

Alternate word forms“BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES” Prefixes / suffixes“RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”, “PORTAL DOLMEN (RE-ERECTED)”Nested delimiters

“POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”Terms not intended for indexing“NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”

Terms that would not be in (any) thesauri“WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A VILLA“, “ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”More specific phrases

“SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE SHAFT”, “ALIGNMENT OF PLATFORMS AND STONES”

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide12

Data alignment approach

Levenshtein edit distance algorithmMeasures optimal number of character edits required to change one string into anotherAccommodates small spelling differences/errors

Compare term to all terms from specified thesaurus – obtain best textual matchSimilarity threshold introduced to suppress low scoring matchesPeriods require an additional approach

Introduction - Vocabularies –

Data Alignment

– Web Services - Widgets Slide13

Data Alignment Results – Monument Types

Data value

Highest scoring match

Score

ABBEY FOUNDATIONS

Foundation

74%

AXE FACOTRY

Axe Factory

90%

BOUNDARIES

BOUNDARY

77%

BOUNDARY

BOUNDARY

100%BUIED SOIL HORIZON

BURIED SOIL HORIZON

97%

CAIRN

CAIRN

100%

CAIRN (POSSIBLE)

CAIRN

100%

CAIRNN

CAIRN

90%

CESS PITT

CESS PIT

94%

CHAMBERED TOM

CHAMBERED TOMB

96%

COMERCIAL

COMMERCIAL

94%

CROFT?

CROFT

90%

CUP-MARKED STONE

CUP MARKED STONE

93%

DICTH

DITCH

80%

ENCLSOURE

ENCLOSURE

88%

EXTRACTION PIT

EXTRACTIVE PIT

85%

EXTRACTIVE PIT

EXTRACTIVE PIT

100%

Data value

Highest scoring match

Score

FEATURE – COBBLED SURFACE

Cobbled Surface

75%

GULLEY

GULLY

90%

GULLIES

GULLY

66%

HILL FORT

HILLFORT

94%

HILLFORT

HILLFORT

100%IINEAR SYSTEMLINEAR SYSTEM92%MEDIEVAL CASTLE / FORTIFIED MANOR RUINSFORTIFIED MANOR HOUSE60%PARIS CHURCHPARISH CHURCH96%PASSAGE GRACEPASSAGE GRAVE92%PORTAL DOLMEN (RE-ERECTED)PORTAL DOLMEN100%POSTHLOLEPOST HOLE88%PRIORY? WALLPriory Wall95%RED HILL (POSSIBLE)RED HILL100%ROMAN STRUCTURE POSSIBLY A VILLATRAINING STRUCTURE52%SOIL FILLED PITRIFLE PIT66%ST GUTHLACS BENEDICTINE PRIORYBenedictine Priory75%STONE ALIGMENTSTONE ALIGNMENT96%TRACKWAY (COBBLED)TRACKWAY100%WORCESTER-BIRMINGHAM CANALORNAMENTAL CANAL52%

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide14

Data Alignment Results (Objects and Periods)

Data value

Highest scoring match

Score

BRICK

PICK

66%

FE NAILS

NAIL

66%

FLINT SCRAPPER

SCRAPER (TOOL)

66%

INDUSTRIAL RSSIDUE

INDUSTRIAL BY PRODUCT

71%LOOM WEIGHT

LOOMWEIGHT

95%

POTTEY

POTTERY

92%

SAMIEN SHERD

RIM SHERD

66%

UNIDENTIFIED OBJECT

UNIDENTIFIED OBJECT

100%

Data value

Highest scoring match

Score

NEOLOTHIC

NEOLITHIC

88%

NEOTLITHIC

NEOLITHIC

94%

POST-MEDIEVAL

POST MEDIEVAL

92%

|RON AGE

IRON AGE

87%

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide15

Data alignment results - categorised

Correct matches – may not be 100% textual match“AXE FACOTRY

”  AXE FACTORY“CAIRNN” 

CAIRN“PASSAGE GRACE”

 PASSAGE GRAVE“STONE ALIGMENT

”  STONE ALIGNMENT

Unsure matches – illustrate the need for expert oversight of results“ARCHITECTURAL FEATURE” 

ARCHITECTURAL FRAGMENT“AXIAL-STONE CIRCLE” 

SMALL STONE CIRCLE

RADIAL CAIRN

TRI RADIAL CAIRNIncorrect matches – may be reduced by raising the match threshold “CLAY STRUCTURE”  COAL GAS STRUCTURE“

CONCENTRATION CAMP”  CONSTRUCTION CAMP“RAIN MAKING SITE”  PAINTBALLING SITENon matches – score exceeding threshold was not achieved“ARCHAEOLOGY”, “CLAVA CAIRN COMPLEX”, “DOMKYRKAN”, “WEDGE TOMB”

Dataset

Correct

Unsure

Incorrect

No match

Total

OASIS monument types

1617

47

216

836

2716

OASIS object types

564

11

86

717

1378

OASIS periods

39

0

0

13

52

Image Bank monument types

131

4

7

24

166

Image Bank periods

43

0

0

38

81

Totals

2394

62

309

1629

4395

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide16

Period alignment – identifying years

Achieved by matching predefined textual patterns (plus a bit of processing - e.g. “AD 375-8”)

AD:

Centuries

starting at year 1 and finishing at year 100

“EARLY”= 1

40

“MID”= 30

70

“LATE”= 60

100

BC:

Centuries starting at year -100 and finishing at year -1“EARLY” = -100  -60“MID” = -70  -30“LATE” = -40-1There is no year zero…Once delimiting years are identified, can align with known periods frameworkData value

Identified Start

year

Identified End

year

250-400

250

400

500 BC

-500

-500

600-300 BC

-600

-300

AD 375-8

375

378

AD400-600

400

600

C2-C3

101

300

C6

501

600

EARLY 3RD CENTURY

201

240

EARLY FOURTH CENTURY BC

-400

-360

LATE 3RD CENTURY

260

300

LATE FOURTH CENTURY BC

-340

-301

MID 4TH CENTURY BC

-370

-330

MID THIRD CENTURY

230

270

Introduction - Vocabularies –

Data Alignment

– Web

Services - Widgets

Slide17

Vocabulary ServicesDescriptions and example service calls at

http://www.heritagedata.org/blog/services/

getSchemesgetTopConceptsForSchemegetConceptsForSchemegetConceptRelations

getConceptLabelsgetConceptLabelMatchgetConceptExists

+ Alignment functionality as a service (soon)

Introduction - Vocabularies –

Data Alignment –

Web Services - Widgets

Slide18

Browser-based ‘widget’ controls

Introduction - Vocabularies –

Data Alignment – Web

Services -

Widgets

Slide19

SummaryControlled vocabularies online

Linked Open Data (SKOS) Downloadable data filesHierarchical and alphabetical listings, generated from SKOS files

Data alignment Identify Linked Data URIs for free text termsWeb services Vocabulary accessTerm suggestion & validationTools using controlled vocabularies

Browser-based ‘widget’ controls for embedding into web pagesSlide20

Next StepsComplete case studies and document

Move towards publication by vocab providers?Promote HeritageData

on Linked Data sitesFinal workshop at ADS and at RCAHMSlinked vocabulary data workshop at CAA ParisManagement & Governance of

HeritageData.org passes to FISH Terminology Working Group at end of SENESCHAL Slide21

Future PossibilitiesP

erform vocab mapping between the UK thesauriVocabulary mapping toolMapping metadata

Search services/widgets (eg semantic expansion)P

otential other vocabularies from FISH members?Potential further applications of terminology services and widgets?Slide22

Contact information

ceri.binding@southwales.ac.uk

douglas.tudhope@southwales.ac.uk

http://www.heritagedata.org/ SENESCHAL

http://hypermedia.research.glam.ac.uk/kos/STELLAR/

http://hypermedia.research.glam.ac.uk/resources/STELLAR-applications/ STELLAR tools, templates and documentationhttp://data.archaeologydataservice.ac.uk

STELLAR linked datahttp://hypermedia.research.glam.ac.uk/kos/STAR/

http://hypermedia.research.glam.ac.uk/resources/star-demonstrator/STAR Research Demonstrator

http://intarch.ac.uk/journal/issue30/tudhope_index.html

STAR Internet Archaeology paper (open access

)Slide23

©University of Glamorgan

“the key to interoperability”

http://www.heritagedata.org/