the key to interoperability httpwwwheritagedataorg The SENESCHAL Project seneschal n Historical The steward or majordomo of a medieval great house 12 month AHRC funded project ID: 511642
Download Presentation The PPT/PDF document "©University of Glamorgan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
©University of Glamorgan
“the key to interoperability”
http://www.heritagedata.org/Slide2
The SENESCHAL Projectseneschal
n. Historical The steward or major-domo of a medieval great house
12 month AHRC funded projectMarch 2013 - February 2014DeliverablesControlled vocabularies online Linked data (SKOS)
Downloadable filesWeb services term suggestion, term validation, legacy data alignment
Tools to align data with controlled vocabulariesBrowser-based ‘widget’ controls
Introduction
- Vocabularies –
Data Alignment – Web
Services - Widgets
Slide3
You say potato, I say tomato…
Multiple datasets, multiple schema, multiple organisations, multiple languagesUnification of data schema is possible, BUT…
Incompatible terminology hinders cross search and prevents greater interoperabilityApplications attempting to reuse data must all individually sort out the same old problemsE.g. Get all the iron age post holes…
Feature
Period
Post-hole
IRON AGEPosthole
|ron agePOST HOLE
Iron age?
POSTHLOLE
EARLY
IRON AGE
POST HOLE (POSSIBLE)
250 BC
POSTHOLESC 500-200 B.C.
History repeated... Solution = data cleansing and controlled vocabularies?
Introduction
- Vocabularies –
Data Alignment – Web
Services - Widgets
Slide4
Solutions - SENESCHALControlled vocabularies (again)
Commonly agreed concepts, terminology and identifiersExisting / new thesauri – community contributions?Openness and availability
Licensing, web services, downloads, data formatsAlignment of existing dataData cleansing toolsAlignment techniquesAlignment of new dataInteractive embedded data entry tools
Validation at point of data entry Rather than trying to solve this familiar vocabulary problem, help to prevent it from happening in the first place
Introduction
- Vocabularies –
Data Alignment – Web
Services - Widgets Slide5
General System Architecture
SENESCHAL data store
Linked Data
(REST API)
SPARQL query endpoint
widget controls & applications
Web Services
(REST API)
Native vocabularies
Data conversion –
STELLAR (SKOS) templates
SKOS RDF vocabularies
(upload)
Additional metadata
Introduction
- Vocabularies –
Data Alignment – Web
Services - Widgets
Slide6
Vocabularies online as (SKOS) Linked Data
Vocabularies from English HeritageArchaeological Sciences Building
Materials ComponentsEvent TypeEvidenceFISH Archaeological ObjectsMaritime Craft Type
Monument TypePeriodsVocabularies from
RCAHMSArchaeological Objects Thesaurus Maritime Craft ThesaurusMonument Type Thesaurus - multilingual - including Scottish Gaelic translations
Vocabularies from RCAHMWMonument Type Thesaurus PeriodMoving from term based towards concept based indexing
Start to create links between concepts… between vocabularies… between datasets… between sites… between countriesCross searching of thesauri from different providersCross searching of (multilingual) cultural heritage resources
Introduction -
Vocabularies
–
Data Alignment – Web
Services - Widgets
Slide7
Linked Data APIThe project implements a Linked Data (restful) API
The base URI is http://purl.org/heritagedata/ Seneschal is a sub-project within the wider scope of ‘www.heritagedata.org’ – so:
http://www.heritagedata.org/blog/seneschal - wiki/blog for project details, and<base uri
>/schemes/123 (e.g.) for actual data API – see below…REST API:
/schemes – return list of all SKOS concept schemes held/schemes/{id} – return details of specified SKOS concept scheme
/schemes/{id}.html, .n3, .rdf, .json
– return different serializations of that data, obtained either by content negotiation or by direct request including extension/
schemes/{id} /concepts/{id} – return details of specified SKOS concept (current version)/schemes/{id
}/concepts/{id}.html, .n3, .
rdf
, .
json
– return different serializations of the data, obtained either by content negotiation or by direct request including extension
Introduction -
Vocabularies – Data Alignment – Web Services - Widgets Slide8
Multilinguality
Multilingual labels & notesLabels attached to concepts – so possible to search in one language, retrieve in another
Introduction -
Vocabularies
–
Data Alignment – Web Services - Widgets
Slide9
Online Linked Data – Examples
Introduction -
Vocabularies
–
Data Alignment – Web
Services - Widgets
Slide10
Bulk Data Alignment Exercise
Bulk metadata alignmentADS OASIS ADS ImageBank
Alignment of specific fields against 3 vocabulariesMonument typesObject typesPeriods
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide11
Typical alignment issues encountered
Simple spelling errorsPOSTHLOLE”, “CESS PITT”, “FURRROWS”, FLINT SCRAPPER”
Alternate word forms“BOUNDARY”/”BOUNDARIES”, “GULLEY”/”GULLIES” Prefixes / suffixes“RED HILL (POSSIBLE)”, “TRACKWAY (COBBLED)”, “CROFT?”, “CAIRN (POSSIBLE)”, “PORTAL DOLMEN (RE-ERECTED)”Nested delimiters
“POTTERY, CERAMIC TILE, IRON OBJECTS, GLASS”Terms not intended for indexing“NONE”, “UNIDENTIFIED OBJECT”, “N/A”, “NA”, “INCOHERENT”
Terms that would not be in (any) thesauri“WOTSITS PACKET”, “CHARLES 2ND COIN”, “ROMAN STRUCTURE POSSIBLY A VILLA“, “ST GUTHLACS BENEDICTINE PRIORY”, “WORCESTER-BIRMINGHAM CANAL”, “KUNGLIGA SLOTTET”, “SUB-FOSSIL BEETLES”More specific phrases
“SIDE WALL OF POT WITH LUG”, “BRICK-LINED INDUSTRIAL WELL OR MINE SHAFT”, “ALIGNMENT OF PLATFORMS AND STONES”
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide12
Data alignment approach
Levenshtein edit distance algorithmMeasures optimal number of character edits required to change one string into anotherAccommodates small spelling differences/errors
Compare term to all terms from specified thesaurus – obtain best textual matchSimilarity threshold introduced to suppress low scoring matchesPeriods require an additional approach
Introduction - Vocabularies –
Data Alignment
– Web Services - Widgets Slide13
Data Alignment Results – Monument Types
Data value
Highest scoring match
Score
ABBEY FOUNDATIONS
Foundation
74%
AXE FACOTRY
Axe Factory
90%
BOUNDARIES
BOUNDARY
77%
BOUNDARY
BOUNDARY
100%BUIED SOIL HORIZON
BURIED SOIL HORIZON
97%
CAIRN
CAIRN
100%
CAIRN (POSSIBLE)
CAIRN
100%
CAIRNN
CAIRN
90%
CESS PITT
CESS PIT
94%
CHAMBERED TOM
CHAMBERED TOMB
96%
COMERCIAL
COMMERCIAL
94%
CROFT?
CROFT
90%
CUP-MARKED STONE
CUP MARKED STONE
93%
DICTH
DITCH
80%
ENCLSOURE
ENCLOSURE
88%
EXTRACTION PIT
EXTRACTIVE PIT
85%
EXTRACTIVE PIT
EXTRACTIVE PIT
100%
Data value
Highest scoring match
Score
FEATURE – COBBLED SURFACE
Cobbled Surface
75%
GULLEY
GULLY
90%
GULLIES
GULLY
66%
HILL FORT
HILLFORT
94%
HILLFORT
HILLFORT
100%IINEAR SYSTEMLINEAR SYSTEM92%MEDIEVAL CASTLE / FORTIFIED MANOR RUINSFORTIFIED MANOR HOUSE60%PARIS CHURCHPARISH CHURCH96%PASSAGE GRACEPASSAGE GRAVE92%PORTAL DOLMEN (RE-ERECTED)PORTAL DOLMEN100%POSTHLOLEPOST HOLE88%PRIORY? WALLPriory Wall95%RED HILL (POSSIBLE)RED HILL100%ROMAN STRUCTURE POSSIBLY A VILLATRAINING STRUCTURE52%SOIL FILLED PITRIFLE PIT66%ST GUTHLACS BENEDICTINE PRIORYBenedictine Priory75%STONE ALIGMENTSTONE ALIGNMENT96%TRACKWAY (COBBLED)TRACKWAY100%WORCESTER-BIRMINGHAM CANALORNAMENTAL CANAL52%
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide14
Data Alignment Results (Objects and Periods)
Data value
Highest scoring match
Score
BRICK
PICK
66%
FE NAILS
NAIL
66%
FLINT SCRAPPER
SCRAPER (TOOL)
66%
INDUSTRIAL RSSIDUE
INDUSTRIAL BY PRODUCT
71%LOOM WEIGHT
LOOMWEIGHT
95%
POTTEY
POTTERY
92%
SAMIEN SHERD
RIM SHERD
66%
UNIDENTIFIED OBJECT
UNIDENTIFIED OBJECT
100%
Data value
Highest scoring match
Score
NEOLOTHIC
NEOLITHIC
88%
NEOTLITHIC
NEOLITHIC
94%
POST-MEDIEVAL
POST MEDIEVAL
92%
|RON AGE
IRON AGE
87%
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide15
Data alignment results - categorised
Correct matches – may not be 100% textual match“AXE FACOTRY
” AXE FACTORY“CAIRNN”
CAIRN“PASSAGE GRACE”
PASSAGE GRAVE“STONE ALIGMENT
” STONE ALIGNMENT
Unsure matches – illustrate the need for expert oversight of results“ARCHITECTURAL FEATURE”
ARCHITECTURAL FRAGMENT“AXIAL-STONE CIRCLE”
SMALL STONE CIRCLE
“
RADIAL CAIRN
”
TRI RADIAL CAIRNIncorrect matches – may be reduced by raising the match threshold “CLAY STRUCTURE” COAL GAS STRUCTURE“
CONCENTRATION CAMP” CONSTRUCTION CAMP“RAIN MAKING SITE” PAINTBALLING SITENon matches – score exceeding threshold was not achieved“ARCHAEOLOGY”, “CLAVA CAIRN COMPLEX”, “DOMKYRKAN”, “WEDGE TOMB”
Dataset
Correct
Unsure
Incorrect
No match
Total
OASIS monument types
1617
47
216
836
2716
OASIS object types
564
11
86
717
1378
OASIS periods
39
0
0
13
52
Image Bank monument types
131
4
7
24
166
Image Bank periods
43
0
0
38
81
Totals
2394
62
309
1629
4395
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide16
Period alignment – identifying years
Achieved by matching predefined textual patterns (plus a bit of processing - e.g. “AD 375-8”)
AD:
Centuries
starting at year 1 and finishing at year 100
“EARLY”= 1
40
“MID”= 30
70
“LATE”= 60
100
BC:
Centuries starting at year -100 and finishing at year -1“EARLY” = -100 -60“MID” = -70 -30“LATE” = -40-1There is no year zero…Once delimiting years are identified, can align with known periods frameworkData value
Identified Start
year
Identified End
year
250-400
250
400
500 BC
-500
-500
600-300 BC
-600
-300
AD 375-8
375
378
AD400-600
400
600
C2-C3
101
300
C6
501
600
EARLY 3RD CENTURY
201
240
EARLY FOURTH CENTURY BC
-400
-360
LATE 3RD CENTURY
260
300
LATE FOURTH CENTURY BC
-340
-301
MID 4TH CENTURY BC
-370
-330
MID THIRD CENTURY
230
270
Introduction - Vocabularies –
Data Alignment
– Web
Services - Widgets
Slide17
Vocabulary ServicesDescriptions and example service calls at
http://www.heritagedata.org/blog/services/
getSchemesgetTopConceptsForSchemegetConceptsForSchemegetConceptRelations
getConceptLabelsgetConceptLabelMatchgetConceptExists
+ Alignment functionality as a service (soon)
Introduction - Vocabularies –
Data Alignment –
Web Services - Widgets
Slide18
Browser-based ‘widget’ controls
Introduction - Vocabularies –
Data Alignment – Web
Services -
Widgets
Slide19
SummaryControlled vocabularies online
Linked Open Data (SKOS) Downloadable data filesHierarchical and alphabetical listings, generated from SKOS files
Data alignment Identify Linked Data URIs for free text termsWeb services Vocabulary accessTerm suggestion & validationTools using controlled vocabularies
Browser-based ‘widget’ controls for embedding into web pagesSlide20
Next StepsComplete case studies and document
Move towards publication by vocab providers?Promote HeritageData
on Linked Data sitesFinal workshop at ADS and at RCAHMSlinked vocabulary data workshop at CAA ParisManagement & Governance of
HeritageData.org passes to FISH Terminology Working Group at end of SENESCHAL Slide21
Future PossibilitiesP
erform vocab mapping between the UK thesauriVocabulary mapping toolMapping metadata
Search services/widgets (eg semantic expansion)P
otential other vocabularies from FISH members?Potential further applications of terminology services and widgets?Slide22
Contact information
ceri.binding@southwales.ac.uk
douglas.tudhope@southwales.ac.uk
http://www.heritagedata.org/ SENESCHAL
http://hypermedia.research.glam.ac.uk/kos/STELLAR/
http://hypermedia.research.glam.ac.uk/resources/STELLAR-applications/ STELLAR tools, templates and documentationhttp://data.archaeologydataservice.ac.uk
STELLAR linked datahttp://hypermedia.research.glam.ac.uk/kos/STAR/
http://hypermedia.research.glam.ac.uk/resources/star-demonstrator/STAR Research Demonstrator
http://intarch.ac.uk/journal/issue30/tudhope_index.html
STAR Internet Archaeology paper (open access
)Slide23
©University of Glamorgan
“the key to interoperability”
http://www.heritagedata.org/