Amanda Hicks University of Florida aehicksufledu HealthInsight Workshop Oslo Norway 20 May 2016 Overview Ontologies versus w ordnets InterLingual Index Future work to map National Cancer Institute Thesaurus to the Collaborative InterLingual Index ID: 546138
Download Presentation The PPT/PDF document "Mapping the NCI Thesaurus and the Collab..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Mapping the NCI Thesaurus and the Collaborative Inter-Lingual Index
Amanda Hicks
University of Florida
aehicks@ufl.edu
HealthInsight
Workshop, Oslo, Norway
20 May
2016Slide2
Overview
Ontologies versus
w
ordnetsInter-Lingual IndexFuture work to map National Cancer Institute Thesaurus to the Collaborative Inter-Lingual IndexIn collaboration with Francis Bond, Nanyan Technological University, SingaporeSelja Seppälä, University Florida, USA
2Slide3
Ontologies versus Wordnets
3Slide4
Semantic Networks
Words, concepts, or classes that are arranged in a network
Provide a framework for
machine readable meaning and context for what would otherwise be uninterpreted syntax
Simultaneous
ly
logical objects and mathematical objects, subject to
inference
and graph theoretical analysis
4Slide5
Wordnets versus Ontologies
Wordnets are semantic networks that represent how we use language. Meanings are stated in natural language definitions and relatively sparse semantic relations
.
The word ‘cat’ in context
Ontologies are semantic networks that represent properties of things in the world. Meanings are encoded
in
logical
form.
W
hat
it is to be a
cat
5Slide6
Comparative Strengths
Wordnets
NLP
applications that require word sense
discrimination
Cross
-lingual comparison of lexical
categories
Distance
or related measures of
concepts
6
Ontologies
Provide
a coherent, stable and unified frame of reference for
the
interpretation
of
concepts
and specification of classes
May support i
nteroperability
of data
sets
Support d
eductive
reasoning over structured dataSlide7
Inter-Lingual Index
7Slide8
Current State of Mapping Wordnets
Wordnets exist for many languages.
33 open
wordnets in the Global Wordnet Gridhttp://globalwordnet.org/global-wordnet-grid/Mapping often occurs through English WordNet.English centricEnglish does not have a word for every concept.Some wordnets
are mapped to each other directly.
8Slide9
Mapping wordnets
to each other directly gets messy.
Vossen
, GlobalWordNet Conference, 2016
9Slide10
The Collaborative Inter-Lingual Index (CILI)
en
es
no
pt
10Slide11
CILI
Flat list of concepts with a persistent Semantic Web compliant IRI
Synsets from
wordnets mapped directly to the ILI IRIEnglish WordNet 3.0, 3.1 and Dutch Open Wordnet currently mappedUnique English definitions are associated with each ILI to support mapping (but no English words or labels)Not imposed on linked wordnetsNot imposed on linked wordnetsOpen, anyone can contribute
https://
github.com
/
globalwordnet
/
ili
11Slide12
NCI thesaurus and CILI
12Slide13
National Cancer Institute Thesaurus (NCIt
)
An English medical
reference terminologyDefinitions crafted by teams of medical experts and terminologistsCovers vocabulary for clinical care, translational and basic research, and public information and administrative activitiesWidely used in biomedical and health informatics in the USA13Slide14
Why map NCIt to CILI?
Specialized terminology in CILI should be defined by subject matter experts, not linguists.
There is currently no prototype for mapping specialized vocabulary to CILI.
More resources may lead to improved formal semantics to be integrated with CILI.To support integration of health knowledge extracted from linguistically heterogeneous sourcesMulti-lingualLayperson/specialized vocab
14Slide15
“Patella” in WN and NCIt
WordNet
patella,
kneecap, kneepanA small flat triangular bone in front of the knee that protects the knee jointPart_holonym
knee
Hypernym
Sesamoid
bone
NCI T
BONE, PATELLA
A small flat triangular bone in front of the knee that
articulates with the femur
and protects the knee joint
.subClassOf
Bone of the Lower Extremity
Short
BoneSemantic Type
Anatomical Structure
Additional synonyms
Additional definitional knowledge: potentially useful for formalizing semantics
Additional formal semantic information
15Slide16
Semantic Modeling
One of the goals of CILI is to have ontologies that provide formal semantics for the indexed concepts.
Different semantic resources encode different semantic information.
NCIt can be used to enrich the common semantic model.16Slide17
Our Planned Approach To the Project
Map NCI Thesaurus to CILI
Convert NCI Thesaurus to Lexical Markup Framework
Partially automate mapping NCIt to CILI using string matching on WordNet synsets and NCIt names an similarity measures on definitionsThere will still be false negatives that will need to be identified by hand. Formalize the semantics of NCIt related CILIs with other ontologies.
KYOTO
BFO
17Slide18
… smokes
hubbly
-bubbly on weekends …
Smoking status| L
ILI Concept
Use for Knowledge Integration
18Slide19
Thank you
19