/
Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms

Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms - PowerPoint Presentation

studyne
studyne . @studyne
Follow
349 views
Uploaded On 2020-06-23

Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms - PPT Presentation

Using Information Content Measures of Similarity Bridget McInnes Ted Pedersen Ying Liu Genevieve B Melton Serguei Pakhomov 1 Objective of this work Develop and evaluate a method than can disambiguate terms in biomedical text by ID: 784751

skin tolerance path terms tolerance skin terms path based umls concepts cancer similarity busprione morphine mice information senserelate measures

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Knowledge-based Method for Determining t..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Knowledge-based Method for Determining the Meaning of Ambiguous Biomedical Terms Using Information Content Measures of Similarity

Bridget McInnesTed Pedersen Ying LiuGenevieve B. MeltonSerguei Pakhomov

1

Slide2

Objective of this work

Develop and evaluate a method than can

disambiguate terms in biomedical text by

exploiting similarity information

extrapolated from the Unified Medical Language SystemEvaluate the efficacy of Information Content-based similarity measures over path-based similarity measures for Word Sense Disambiguation, WSD

2

Slide3

Word Sense Disambiguation

Word sense disambiguation is

the task of determining the appropriate sense of a term given context in which it is used.

TERM:

toleranceDrugTolerance

Immune

Tolerance

3

Slide4

Word Sense Disambiguation

Word sense disambiguation

is the task of determining the appropriate sense of a term given context in which it is used.

Busprione

attenuates tolerance to morphine in mice with skin cancer

Drug

Tolerance

Immune

Tolerance

4

Slide5

Sense inventory

: Unified Medical Language System

Unified Medical Language Sources (UMLS)

Semantic Network

Metathesaurus~1.7 million biomedical and clinical concepts; integrated semi-automaticallyCUIs (Concept Unique Identifiers), linked:Hierarchical: PAR/CHD and RB/RNNon-hierarchical: SIB, RO

Sources viewed together or independently

Medical Subject Heading (MSH)

SPECIALIST Lexicon

Biomedical and clinical terms, including variants

5

Slide6

Word Sense Disambiguation

Busprione attenuates tolerance

to morphine

in mice with skin cancer

DrugTolerance: C0013220ImmuneTolerance:C0020963Concept Unique Identifiers: CUIs

6

Slide7

SenseRelate

algorithm

Each possible sense of a

target word

is assigned a score [sum similarity between it and its surrounding terms]Assign target word the sense with highest scoreProposed by Patwardhan and Pedersen 2003 using WordNet

UMLS::

SenseRelate

is a modification of this algorithm using

information from the UMLS

NEXT UP: an example

7

Slide8

SenseRelate Example

Busprione

attenuates

tolerance

to morphine in mice with skin cancer8

Slide9

SenseRelate Example

Busprione

attenuates

tolerance

to morphine in mice with skin cancerDrugTolerance: C0013220ImmuneTolerance:C00209639

Slide10

SenseRelate Example

Busprione

attenuates

tolerance

to morphine in mice with skin cancerDrugTolerance: C0013220ImmuneTolerance:C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

10

Slide11

SenseRelate Example

0.09

0.16

0.11

Busprione

attenuates

tolerance

to morphine

in mice with skin cancer

0.09

Drug

Tolerance:

C0013220

Immune

Tolerance:

C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

11

Slide12

SenseRelate Example

0.09

0.16

0.11

Busprione

attenuates

tolerance

to morphine

in mice with skin cancer

0.09

Drug

Tolerance:

C0013220

Immune

Tolerance:

C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

Drug Tolerance

Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

12

Slide13

SenseRelate Example

0.09

0.16

0.11

0.09

0.05

0.04

Busprione

attenuates

tolerance

to morphine

in mice with skin cancer

0.09

0.09

Drug

Tolerance:

C0013220

Immune

Tolerance:

C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

Drug Tolerance

Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

13

Slide14

SenseRelate Example

0.09

0.16

0.11

0.09

0.05

0.04

Busprione

attenuates

tolerance

to morphine

in mice with skin cancer

0.09

0.09

Drug

Tolerance:

C0013220

Immune

Tolerance:

C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

Drug Tolerance

Score = 0.09 + 0.09 + 0.16 + 0.11 = 0.45

Immune Tolerance

Score = 0.09 + 0.09 + 0.05 + 0.05 = 0.27

14

Slide15

SenseRelate Example

0.09

0.16

0.11

0.09

0.05

0.04

Busprione

attenuates

tolerance

to morphine

in mice with skin cancer

0.09

0.09

Drug

Tolerance:

C0013220

Immune

Tolerance:

C0020963

Busprione

:

C0006462

Morphine:

C0026549

Mice:

C0026809

Skin cancer:

C0007114

Drug Tolerance

Score = 0.09 + 0.09 + 0.16 + 0.11 =

0.45

Immune Tolerance

Score = 0.09 + 0.09 + 0.05 + 0.05 = 0.27

15

Slide16

Sense Relate Assumption

An ambiguous word is often used in the sense

that is most similar to the sense of the

terms that surround it

16

Slide17

SenseRelate

Components

Identifying the concepts of surrounding terms

Calculating semantic similarity

17

Slide18

Identifying the concepts of the surrounding terms

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table

in the

UMLS

18

Slide19

Identifying the concepts of the surrounding terms

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS

Busprione

attenuates

tolerance

to morphine

in mice with

skin cancer

19

Slide20

Identifying the concepts of the surrounding terms

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS

...

skin cancer

skin grafting

skin

disease

...

SPECIALIST

LEXICON

Busprione

attenuates

tolerance

to morphine

in mice with

skin cancer

20

Slide21

Identifying the concepts of the surrounding terms

Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS

...

skin cancer

skin grafting

skin

disease

...

...

skin cancer C0007114

skin grafting C0037297

skin disease

C0037274

...

SPECIALIST

LEXICON

MRCONSO

Busprione

attenuates

tolerance

to morphine

in mice with

skin cancer

21

Slide22

Semantic Similarity Measures

Path-based measures

Path

Wu and Palmer

Leacock and ChodorowNgyuen and Al-MubaidInformation content (IC)-based measures

Resnik

Lin

Jiang and

Conrath

22

Slide23

Path-based similarity measures

Use only the path information obtained from a taxonomy

23

Slide24

Path-based similarity measures

Use only the path information obtained from a taxonomy

Path measure

sim

(c1,c2) = 1 / minpath(c2,c2)where minpath is the shortest path between the two concepts

24

Slide25

Path-based similarity measures

Use only the path information obtained from a taxonomy

Path measure

sim

(c1,c2) = 1/minpath(c2,c2)where minpath is the shortest path between the two concepts

Wu and Palmer, 1994

sim

(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2))

where LCS is the least common

subsumer

of the two concepts

25

Slide26

Path-based similarity measures

Use only the path information obtained from a taxonomy

Path measure

sim

(c1,c2) = 1/ minpath(c2,c2)where minpath is the shortest path between the two concepts

Wu and Palmer, 1994

sim

(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2))

where LCS is the least common

subsumer

of the two concepts

Leacock and

Chodorow

, 1998

sim

(c1,c2) = -log(

minpath

(c1,c2) / (2D) )

where D is the total depth of the taxonomy

26

Slide27

Path-based similarity measures

Use only the path information obtained from a taxonomy

Path measure

sim

(c1,c2) = 1/ minpath(c2,c2)where minpath is the shortest path between the two concepts

Leacock and

Chodorow

, 1998

sim

(c1,c2) = -log(

minpath

(c1,c2) / (2D) )

where D is the total depth of the taxonomy

Wu and Palmer, 1994

sim

(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2))

where LCS is the least common

subsumer

of the two concepts

Nyguen

and Al-

Mubaid

, 2006

sim

(c1,c2) = log ( (2 +

minpath

(c1,c2) - 1) *

(D - depth(LCS(c1,c2))) )

27

Slide28

Path-based Similarity Measures

USE ONLY THE

PATH INFORMATION OBTAINED FROM A TAXONOMY

Disease:

C0012634

Drug Related Disorder: C0277579

Drug

Tolerance:

C0013220

Neoplasm:

C1302761

Neoplastic

Disease:

C1882062

Malignant Neoplasm:

C0006826

Skin cancer:

C0007114

28

Slide29

Information content-based

MeasuresIncorporate the probability of the concepts

IC = -log(P(concept))

29

Slide30

Information content-based

MeasuresIncorporate the probability of the concepts

IC = -log(P(concept))

P(concept)

Calculated by summing the probability of the concept and the probability of its descendants

Probabilities are obtained from an external corpus

30

Slide31

Information content-based

MeasuresIncorporate the probability of the concepts

IC = -log(P(concept)

Resnik

, 1995sim(c1,c2) = IC(LCS(c1,c2))31

Slide32

Information content-based

MeasuresIncorporate the probability of the concepts

IC = -log(P(concept)

Resnik

, 1995sim(c1,c2) = IC(LCS(c2,c2))

Jiang and

Conrath

, 1997

sim

(c1,c2) = 1 / (IC(c1)+IC(c2) – 2* IC(LCS(c1,c2))

32

Slide33

Information content-based

MeasuresIncorporate the probability of the concepts

IC = -log(P(concept)

Resnik

, 1995sim(c1,c2) = IC(LCS(c2,c2))

Jiang and

Conrath

, 1997

sim

(c1,c2) = 1 ÷ (IC(c1)+IC(c2) – 2* IC(LCS(c1,c2))

Lin, 1998

sim

(c1,c2) = (2*IC(LCS(c2,c2))) / (IC(c1)+IC(c2))

33

Slide34

IC-based similarity measures

Disease:

C0012634

Drug Related Disorder: C0277579

Drug

Tolerance:

C0013220

Neoplasm:

C1302761

Neoplastic

Disease:

C1882062

Malignant Neoplasm:

C0006826

Skin cancer:

C0007114

+

PATH INFORMATION

PROBABILITY OF CONCEPTS

EXTERNAL CORPUS

34

Slide35

Experimental Framework

Use open-source UMLS

::Similarity package to obtain the

similarity between

the terms and possible senses in the SenseRelate algorithmPath information: parent/child relations in MSH source Information content: calculated using the UMLSonMedline

dataset created by NLM

Consists of concepts from 2009AB UMLS and the frequency they occurred in Medline using the Essie Search Engine (

Ide

et al 2007

)

Medline: database of citations of biomedical/clinical articles

35

Slide36

Evaluation Data: MSH WSD

MSH-WSD dataset (

Jimeno-Yepes

, et al 2011)

203 target words (ambiguous word) from Medline106 terms e.g. tolerance 88 acronyms e.g. CA (calcium, california) 9 mixtures e.g. bat (brown adipose tissue)

Each target word contains ~187 instances

(Medline abstracts)

abstract = ~ 500 words

Each target word in the

instances assigned

a concept from MSH by exploiting the manually assigned MSH

concepts

assigned to the abstract

Average of 2.08

possible

senses

per target word

Majority sense over all the target words is 54.5%

36

Slide37

Results

baselinepath

lch

wup

namresjcnaccurac

y

Path-based

IC-based

lin

37

Slide38

Comparison across subsets of msh-wsd

accu

r

a

cy38

Slide39

Comparison across subsets of msh-wsd

accu

r

a

cy39

Slide40

Comparison across subsets of msh-wsd

accu

r

a

cy40

Slide41

Comparison across subsets of msh-wsd

accu

r

a

cy41

Slide42

Comparison across subsets of msh-wsd

accu

r

a

cy42

Slide43

Window sizesUse the terms surrounding the target word within a specified window: 1, 2, 5, 10, 25, 50, 60, 70

Busprione

attenuates

tolerance

to morphine in mice with skin_cancerWINDOW SIZE = 243

Slide44

Comparison of window sizes for lin

accu

r

a

cywindow size44

Slide45

Surrounding terms Not all terms have a concept in the UMLS

thereforeNot all surrounding terms in the window mapped to CUIs

45

Slide46

Window sizes versus mapped terms

numb

e

r

ofmappingswindow size46

Slide47

Future work: mapping TermsCurrently looking at mapping the terms to CUIs using information from the concept mapping system

MetaMapObtain the terms from MetaMap and do a dictionary look up in MRCONSOHypothesis – the terms obtained by

MetaMap

are more accurate than using the SPECIALIST Lexicon

Obtain the CUIs from MetaMapHypothesis – the CUIs obtained by MetaMap will be more accurate than the dictionary look-up47

Slide48

Objective #1

Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the UMLS

UMLS::

SenseRelate

statistically significantly higher disambiguation accuracy than the baselineOn par with previous unsupervised methods for terms48

Slide49

Objective #2

Evaluate the efficacy of IC-based similarity measures over path-based measures on a secondary task

There is no statistically significant difference between the accuracies obtained by the IC-based measures

There is a statistically significant difference between the IC-based measures and the path-based measures

49

Slide50

Take home message:

An ambiguous word is often used in the sense

that is most similar to the sense of the concepts

of the terms that surround it

50

Slide51

Resources

Software:

UMLS::

SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/UMLS::Similarityhttp://search.cpan.org/dist/UMLS-Similarity/DataMSH-WSD

http://

wsd.nlm.nih.gov/collaboration.shtml

51

Slide52

Resources

Software:

UMLS::

SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/UMLS::Similarityhttp://search.cpan.org/dist/UMLS-Similarity/DataMSH-WSD

http://

wsd.nlm.nih.gov/collaboration.shtml

THANK YOU

52

Slide53

Resources

Software:

UMLS::

SenseRelate

http://search.cpan.org/dist/UMLS-SenseRelate/UMLS::Similarityhttp://search.cpan.org/dist/UMLS-Similarity/DataMSH-WSD

http://

wsd.nlm.nih.gov/collaboration.shtml

QUESTIONS?

53