/
Processing Semantic Relations Across Textual Genres Processing Semantic Relations Across Textual Genres

Processing Semantic Relations Across Textual Genres - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
363 views
Uploaded On 2018-02-26

Processing Semantic Relations Across Textual Genres - PPT Presentation

Bryan Rink University of Texas at Dallas December 13 2013 Outline Introduction Supervised relation identification Unsupervised relation discovery Proposed work Conclusions Motivation We think about our world in terms of ID: 636680

relations relation supervised words relation relations words supervised argument problem feature word context unsupervised features medical discovery nominals knowledge

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Processing Semantic Relations Across Tex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Processing Semantic Relations Across Textual Genres

Bryan Rink

University of Texas at Dallas

December 13, 2013Slide2

Outline

Introduction

Supervised relation identification

Unsupervised relation discovery

Proposed work

ConclusionsSlide3

Motivation

We think about our world in terms of:

Concepts (e.g.,

bank, afternoon, decision, nose

)

Relations (e.g.

Is-A, Part-Whole, Cause-Effect

)

Powerful mental constructions for:

Representing knowledge about the world

Reasoning over that knowledge:

From

Part-Whole

(

brain

,

Human

) and

Is-A

(

Socrates

,

Human

)

We can reason that

Part-Whole

(

brain

,

Socrates

)Slide4

Representation and Reasoning

Large general knowledge bases exist:

WordNet, Wikipedia/DBpedia/Yago, ConceptNet, OpenCyc

Some domain specific knowledge bases exist:

Biomedical (UMLS,

Music (Musicbrainz)

Books (RAMEAU)

All of these are available in the standard RDF/OWL data model

Powerful reasoners exist for making inferences over data stored in RDF/OWL

Knowledge acquisition is still the most time consuming and difficult among theseSlide5

Relation Extraction from Text

Relations between concepts are encoded explicitly or implicitly in many textual resources:

Encyclopedias, news articles, emails, medical records, academic articles, web pages

For example:

The report found Firestone made mistakes in the production of the tires.

Product-Producer

(

tires

,

Firestone

)Slide6

Outline

Introduction

Supervised relation identification

Unsupervised relation discovery

Proposed work

ConclusionsSlide7

Supervised Relation Identification

SemEval-2010 Task 8 – “Multi-Way Classification of Semantic Relations Between Pairs of Nominals”

Given a sentence and two marked nominals

Determine the semantic relation and directionality of that relation between the nominals.

Example:

A small

piece

of rock landed into the

trunk

This contains an

Entity-Destination

(

piece, trunk

) relation:

The situation described in the sentence entails the fact that

trunk

is the destination of

piece

in the sense of

piece

moving (in a physical or abstract sense) toward

trunk

.Slide8

Semantic Relations

Relation

Definition

Cause-Effect

X causes Y

Instrument-Agency

Y uses X; X is the instrument

of Y

Product-Producer

Y produces

X; X is the product of Y

Content-Container

X is or was stored or carried inside Y

Entity-Origin

Y is origin of an entity X, X coming/derived from Y

Entity-Destination

X moves toward Y

Component-Whole

X is component of Y and has a functional relation

Member-Collection

X is

a member of Y

Message-Topic

X is a message containing information about Y

Other

if none of the nine relations appears to be suitableSlide9

Observations

Three types of evidence useful for classifying relations:

Lexical/Contextual cues

“The seniors poured

flour

into

wax

paper

and threw the items as projectiles on freshmen during a morning pep rally”

Knowledge of the typical role of one nominal

The

rootball

was in a

crate

the size of a refrigerator, and some of the arms were over 12 feet tall

.

Knowledge of a pre-existing relation between the nominals

“The Ca content in the

corn

flour

has also a strong dependence on the pericarp thickness.”Slide10

Approach

Use an SVM classifier to first determine the relation type

Each relation type then has its own SVM classifier to determine direction of the relation

All SVMs share same set of 45 feature types which fall into the following 8 categories:

Lexical/Contextual

Hypernyms from WordNet

Dependency parse

PropBank parse

FrameNet parse

Nominalization

Nominal similarity derived from Google N-Grams

TextRunner predicatesSlide11

SystemSlide12

Lexical/Contextual Features

Words between the nominals are very important:

Number of tokens between the nominals is also helpful:

Product-Producer, Entity-Origin

often have zero:

“organ builder”, “Coconut oil”

Additional features for:

E

1

/E

2

words, E

1

/E

2

part of speech, Words before/after the nominals, Prefixes of words between

Sequence of word classes between the nominals:

Verb_Determiner, Preposition_Determiner, Preposition_Adjective_Adjective, etc.

cause

 Cause-Effect

used

 Instrument-Agency

makes

 Product-Producer

contained

 Content-Container

from

 Entity-Origin

into

Entity-Destination

on

Component-Whole

of

Member-Collection

about

Message-TopicSlide13

Example Feature Values

Sentence: Forward

[motion]

E1

of the vehicle through the air caused a

[suction]

E2

on the road draft tube.

Feature values:

e1Word

=motion,

e2Word

=suction

e1OrE2Word

={motion, suction}

between

={of, the, vehicle, through, the, air, caused, a}

posE1

=NN,

posE2

=NN

posE1orE2

=NN

posBetween

=I_D_N_I_D_N_V_D

distance

=8

wordsOutside

={Forward, on}

prefix5Between

={air, cause, a, of, the, vehic, throu, the}Slide14

Parsing Features

Dependency Parse (Stanford parser)

Paths of length 1 from each nominal

Paths of length 2 between E

1

and E

2

PropBank

SRL Parse (ASSERT)

Predicate associated with both nominals

Number of tokens in the predicate

Hypernyms of predicate

Argument types of nominals

FrameNet SRL Parse (LTH)

Lemmas of frame trigger words, with and without part of speech

Also make use of VerbNet to generalize verbs from dependency and

PropBank

parsesSlide15

Example Feature Values

Sentence: Forward

[motion]

E1

of the vehicle through the air caused a

[suction]

E2

on the road draft tube.

Dependency

<E

1

>

nsubj

caused

dobj

<E

2

>

<E

1

>

nsubj

vn:27

dobj

<E

2

>

VerbNet/Levin class 27 is the class of

engender

verbs such as:

cause, spawn, generate

, etc.

This feature value indicates that E

1

is the subject of an engender verb, and the direct object is E

2

PropBank

Hypernyms of the predicate: cause#v#1, create#v#1Slide16

Example Feature Values

Feature

Set

Feature Values

Dependency

depPathLen1

={caused

nsubj

<E1>,caused

dobj

<E2>,...}

depPathLen1VN

={vn:27

nsubj!<E1>,vn:27

dobj

<E2>},

depPathLen2VerbNet

={<E1>

nsubj vn:27

dobj

<E2>},

depPathLen2Location

= {<E1>  nsubj  BETWEEN dobj<E2>}PropBankpbPredStem=caus, pbVerbNet=27, pbE1CoarseRole=ARG0, pbE2CoarseRole=ARG1,pbE1orE2CoarseRole={ARG1,ARG2}, pbNumPredToks=1, pbE1orE2PredHyper = {cause#v#1, create#v#1}FrameNetfnAnyLU={cause.v, vehicle.n, road.n}, fnAnyTarget={cause,vehicle,road}, fnE2LU=cause.v,fnE1OrE2LU=cause.v

Sentence: Forward

[motion]

E1

of the vehicle through the air caused a

[suction]

E2

on the road draft tube.Slide17

Nominal Role Affiliation Features

Sometimes context is not enough and we must use background knowledge about the nominals

Consider the nominal:

writer

Knowing that a writer is a person increases the likelihood that the nominal will act as a

Producer

or an

Agency

Use WordNet hypernyms for the nominal’s sense determined by SenseLearner

Additionally,

writer

nominalizes the verb

write

, which is classified by Levin as a “Creation and Transformation” verb.

Most likely to act as a

Producer

Use NomLex-Plus to determine the verb being nominalized and retrieve the Levin class from VerbNetSlide18

Google N-Grams for Nominal Role Affiliation

Semantically-similar nominals should participate in the same roles

They should also occur in similar contexts in a large corpus

Using Google 5-grams, the 1,000 most frequent words appearing in the context of a nominal are collected

Using Jaccard similarity on those context words, the 4 nearest neighbor nominals are determined, and used as a feature

Also, determine the role most frequently associated with those neighborsSlide19

Example Values for Google N-Grams Feature

Sentence 4739:

As part of his wicked plan, Pete promotes Mickey and his pals into the

[legion]

E1

of

[musketeers]

E2

and assigns them to guard Minnie.

Member-Collection

(E

2

, E

1

)

E

1

nearest neighbors:

legion, army, heroes, soldiers, world

Most frequent role:

Collection

E

2 nearest neighbors: musketeers , admirals, sentries, swordsmen, larks

Most frequent role:

MemberSlide20

Pre-existing Relation Features

Sometimes the context gives few clues about the relation

Can use knowledge about a context-independent relation between the nominals

TextRunner

A queryable database of

Noun-Verb-Noun

triples from a large corpus of web text

Plug in E

1

and E

2

as the nouns and query for predicates that occur between themSlide21

Example Feature Values for TextRunner Features

Sentence:

Forward

[motion]

E1

of the vehicle through the air caused a

[suction]

E2

on the road draft tube.

E

1

____ E

2

:

may result from

, to contact,

created

, moves, applies,

causes

, fall below, corresponds to which

E

2 ____ E1 : including, are moved under, will cause

, according to,

are effected by

, repeats, can matchSlide22

Results

Relation

Precision

Recall

F1

Cause-Effect

89.63

89.63

89.63

Component-Whole

74.34

81.73

77.86

Content-Container

84.62

85.94

85.27

Entity-Destination

88.22

89.73

88.96

Entity-Origin

83.87

80.62

82.21

Instrument-Agency

71.83

65.38

68.46

Member-Collection

84.30

87.55

85.89Message-Topic81.0285.0682.99Product-Producer82.3874.8978.46Other52.9751.1052.02Overall82.2582.2882.19Slide23

Learning Curve

Training Size

73.08

77.02

79.93

82.19Slide24

Ablation Tests

All 255 (= 2

8

– 1) combinations of the 8 feature sets were evaluated by 10-fold cross validation

# of feature sets

Optimal

feature sets

F1

1

Lexical

73.8

2

+Hypernym

77.8

3

+FrameNet

78.9

4

+Ngrams

79.7

5

-FrameNet +PropBank +TextRunner

80.5

6

+FrameNet

81.1

7

+Dependency

81.3

8

+NomLex-Plus

81.3

Lexical is the single best feature set, Lexical+Hypernym is the best 2-feature set combination, etc.Slide25

Other Supervised Tasks

Causal relations between events – FLAIRS

2010Slide26

Causal Relations Between Events

Discovered graph patterns that were then used as features in a supervised classifier

Example pattern:

Under the agreement

”, “

In the affidavits

”, etc.Slide27

Detecting Indications of Appendicitis in Radiology Reports

Submitted to AMIA TBI 2013Slide28

Resolving Coreference in Medical Records

i2b2 2011 and JAMIA 2012

Approach

Based on Stanford Multi-Pass Sieve method

Added supervised learning by introducing features to each pass

Showed that creating a first pass which identifies all the mentions of the patient provides a competitive baselineSlide29

Extracting Relations Between Concepts in Medical Records

i2b2 2010 Shared Task and JAMIA 2011Slide30

Supervised Relations Conclusion

Identifying semantic relations

requires

going beyond contextual and lexical features

Use the fact that

arguments

sometimes have

a high affinity for one of the semantic roles

Knowledge of pre-existing relations

can

aid classification when context is not enoughSlide31

Outline

Introduction

Supervised relation identification

Unsupervised relation discovery

Proposed work

ConclusionsSlide32

Relations in Electronic Medical Records

Medical records contain natural language narrative with very valuable information

Often in the form of a relation between medical treatments, tests, and problems

Example:

… with

the

[transfusion]

and

[IV Lasix]

she did not go into

[flash

pulmonary

edema]

Treatment-Improves-Problem

relations:

(transfusion, flash pulmonary edema)

(IV Lasix, flash pulmonary edema)Slide33

Relations in Electronic Medical Records

Additional examples:

[Anemia]

secondary to

[blood loss].

A causal relationship between problems

On

[exam]

, the patient looks well and lying down flat in her bed with no

[acute distress] .

Relationship between a medical test (“exam”) and what it revealed (“acute distress”).

We consider both positive and negative findings.Slide34

Relations in Electronic Medical Records

Utility

Detected relations can aid information retrieval

Automated systems which review patient records for unusual circumstances

Drugs prescribed despite previous allergy

Tests and treatments never performed despite recommendationSlide35

Relations in Electronic Medical Records

Unsupervised detection of relations

No need for large annotation efforts

Easily adaptable to new hospitals, doctors, medical domains

Does not require a pre-defined set of relation types

Discover relations actually present in the data, not what the annotator thinks is present

Relations can be informed by very large corporaSlide36

Unsupervised Relation Discovery

Assumptions:

Relations exist between entities in text

Those relations are often triggered by contextual words: trigger words

Secondary to, improved, revealed, caused

Entities in relations belong to a small set of semantic classes

Anemia, heart failure, edema

:

problems

Exam, CT scan, blood pressure

:

tests

Entities near each other in text are more likely to have a relationSlide37

Unsupervised Relation Discovery

Latent Dirichlet Allocation baseline

Assume entities have already been identified

Form pseudo-documents for every consecutive pair of entities:

Words from first entity

Words between the entities

Words from second entity

Example:

If

she has evidence of

[neuropathy]

then we would consider a

[nerve biopsy]

Pseudo-document: {neuropathy, then, we, would, consider, a, nerve, biopsy}Slide38

Unsupervised Relation Discovery

These pseudo-documents lead LDA to form clusters such as:

“causal”

“stopwords”

“reveal problem”

“prescription”

to

and

was

(

due

,

on

mg

secondary

is

and

)

was

she

,

needed

be

had

which

as

,

has

showed

PO

likely

this

hePRNhavearedone:foundthatshowingforthoughtafterdemonstratedeverySlide39

Unsupervised Relation Discovery

Clusters formed by LDA

Some good trigger words

Many stop words as well

No differentiation between:

Words in first argument

Words between the arguments

Words in second argument

Can do a better job

By better modeling the linguistic phenomenon

Slide40

Relation Discovery Model (RDM)

Three observable variables:

w

1

: Token from the first argument

w

c

: Context word (between the arguments)

w

2

: Tokens from the second argument

Example:

Recent

[chest x-ray]

shows

[resolving

right lower lobe

pneumonia] .

w

1

: {chest, x-ray}

w

c

: {shows}w2: {resolving, right, lower, lobe, pneumonia}Slide41

Relation Discovery Model (RDM)

In RDM:

A relation type (t

r

)

is generated

Context words (w

c

) are generated from:

Relation type-specific word distribution (showed, secondary, etc.); or

General word distribution (she, patient, hospital)

Relation type-specific semantic classes for the arguments are generated

e.g. a problem-causes-problem relation would be unlikely to generate a test or a treatment class

Argument words (w

1

, w

2

) are generated from argument class-specific word distributions

“pneumonia”, “anemia”, “neuropathy” from a problem classSlide42

Relation Discovery Model (RDM)

Graphical model:Slide43

Experimental Setup

Dataset

349 medical records from 4 hospitals

Annotated with:

Entities: problems, treatments, tests

Relations: Used to evaluate our unsupervised approach

Treatment-Addresses-Problem

Treatment-Causes-Problem

Treatment-Improves-Problem

Treatment-Worsens-Problem

Treatment-Not-Administered-Due-To-Problem

Test-Reveals-Problem

Test-Conducted-For-Problem

Problem-Indicates-ProblemSlide44

Results

Trigger word clusters formed by the RDM:

“connected problems”

“test showed”

“prescription”

“prescription 2”

due

showed

mg

(

consistent

no

p.r.n.

)

not

revealed

p.o.

Working

likely

evidence

hours

ICD9

secondary

done

pm

Problem

patient

2007

q

Diagnosis

(

performed

needed30starteddemonstrateddaycontmostwithoutq.):s/pnormal4closedSlide45

Results

Instances of “connected problems”

First

Argument

Context

Second Argument

ESRD

secondary to her

DM

slightly lightheaded

and with

increased HR

Echogenic kidneys

consistent

with

renal parenchymal disease

A 40% RCA

, which was

Hazy

Librium

for

Alcohol

withdrawal

Last example is actually a Treatment-Administered-For-ProblemSlide46

Results

Instances of “Test showed”

First Argument

Context

Second

Argument

V-P lung scan

Was performed on May

24 2007, showed

low probability of PE

A bedside transthoracic echocardiogram

done in the Cardiac Catheterization

laboratory without evidence of

an effusion

Exploration of the abdomen

revealed

significant nodularity

of the liver

echocardiogram

showed

moderate dilated

left atrium

An MRI of the right leg

was done which was equivocal for

osteomyelitisSlide47

Results

Instances of “prescription”

First Argument

Context

Second Argument

Haldol

0.5-1 milligrams p.o. q.6-8h. P.r.n.

agitation

Plavix

every day to prevent

failure of these stents

KBL mouthwash

, 15 ccp .0. q.d. prn

mouth discomfort

Miconazole nitrate powder

tid prn for

groin rash

AmBisome

300 mg IV q.d. for treatment of

her hepatic candidiasisSlide48

Results

Instances of “prescription 2”

First Argument

Context

Second Argument

MAGNESIUM HYDROXIDE SUSP

30 ML ) , 30 mL

, Susp , By Mouth , At Bedtime , PRN, For

constipation

Depression, major

( ICD9

296.00 , Working, Problem ) cont

NOS

home meds

Diabetes mellitus type II

( ICD9 250.00 , Working , Problem ) cont

home meds

ASCITES

( ICD9 789.6 , Working , Diagnosis

) on

spironalactone

Dilutional hyponatremia

( SNMCT **ID-NUM

, Working , Diagnosis ) improved with

fluid restrictionSlide49

Results

Discovered Argument Classes

“problems”

“treatments/tests”

“tests”

pain

Percocet

CT

disease

Hgb

scan

right

Hct

chest

left

Anion

x-ray

renal

Gap

examination

patient

Vicodin

Chest

artery

RDW

EKG

-

Bili

MRI

symptoms

RBC

culture

mildCaheadSlide50

Evaluation

Two versions of the data:

DS1: Consecutive pairs of entities which have a manually identified relation between them

DS2: All consecutive pairs of entities

Train/Test sets:

Train: 349 records, with 5,264 manually annotated relations

Test: 477 records, with 9,069 manually annotated relationsSlide51

Evaluation

Evaluation metrics

NMI: Normalized Mutual Information

An information-theoretic measure of how well two clusterings match

F measure:

Computed based on the cluster precision and cluster recall

Each cluster is paired with the cluster which maximizes the scoreSlide52

Evaluation

Method

DS1

DS2

NMI

F

NMI

F

Train Set

Complete-link

4.2

37.8

N/A

N/A

K-means

8.25

38.0

5.4

38.1

LDA baseline

12.8

23.1

15.6

26.2

RDM

18.2

39.1

18.1

37.4

Test Set

LDA baseline

10.0

26.1

11.5

26.3

RDM

11.8

37.7

14.0

36.4

Results with 9 relation types, 15 general word classes, and 15 argument classes for RDM.Slide53

Unsupervised Relations Conclusion

Trigger words and argument classes are jointly modeled

RDM uses only entities and tokens

Relations are local to the context, rather than global

RDM outperforms several baselines

Discovered relations match well with manually chosen

relations

Presented at EMNLP 2011Slide54

Additional Relation Tasks

Relational Similarity – SemEval 2012 Task 2

Define a relation through prototypes:

water:drop time:moment pie:slice

Decide which is most similar:

feet:inches

country:city

Used a probabilistic approach to detect high precision patterns for the relations

Pattern precision was then used to rank word pairs occurring with that patternSlide55

Relational Selectional Preferences

Submitted to IWCS 2013

Use LDA to induce latent semantic classesSlide56

Outline

Introduction

Supervised relation identification

Unsupervised relation discovery

Proposed work

ConclusionsSlide57

Proposed work

Supervised vector representations

Initially: word representations

Most existing approaches create unsupervised word representations

Latent Semantic Analysis (Deerwester et al., 1990)

Latent Dirichlet Allocation (Blei

et al., 1998)

Integrated Components Analysis (Scholkopf, 1998)

More recent approaches allow for supervisionSlide58

Existing supervised approaches

HDLR

“Structured Metric Learning for High Dimensional Problems”

Davis and Dhillon (KDD 2008)

S2Net

“Learning Discriminative Projections for Text Similarity Measures”

Yih, Toutanova, Platt, and Meet (CoNLL 2011)

Learns lower-dimensional representations of documents

Optimizes a cosine similarity metric in the lower-dimensional space for similar document retrievalSlide59

Supervised word representations

Relational selectional preferences:

Classify words according to their admissibility for filling the role of a relation:

report, article, thesis, poem

are admissible for the

Message

role of a

Message-Topic

relation

Assume a (possibly very small) training setSlide60

Supervised word representations

Each word is represented by a high-dimensional context vector

v

over a large corpus

e.g., documents the word occurs in, other words it co-occurs with, or grammatical links

Learn a transformation matrix

T

which transforms

v

into a much lower dimensional vector

w

subject to a loss function which is maximized when words from the target set have high cosine similarity

Learning can be performed using LBFGS optimization on the loss function because the cosine similarity function is twice differentiableSlide61

Proposed application

Supervised word representations can be used for many supervised tasks which use words as features

Relation arguments

Contextual words

Not limited to words

arbitrary n-grams

syntactic features

We believe this approach could be useful for any high-dimensionality linguistic features (sparse features)

Benefit comes from both a larger corpus and the supervised learning of the representationSlide62

Additional evaluations

ACE 2004/2005 relation data

Relations between entities in newswire

e.g.,

Member-Of-Group

– “an

activist

for

Peace Now

BioInfer 2007

Relations between biomedical concepts

e.g., locations, causality, part-whole, regulation

SemEval 2013 Task 4 and SemEval 2010 Task 9

Paraphrases for noun compounds

e.g., “flu virus”

 “cause”, “spread”, “give”Slide63

Outline

Introduction

Supervised relation identification

Unsupervised relation discovery

Proposed work

ConclusionsSlide64

Conclusions

State of the art supervised relation extraction methods in both general domain and medical texts

Identifying relations in text relies on more than just context

Semantic and background knowledge of arguments

Background knowledge about relations themselves

An unsupervised relation discovery modelSlide65

Thank you!

Questions??