Bryan Rink University of Texas at Dallas December 13 2013 Outline Introduction Supervised relation identification Unsupervised relation discovery Proposed work Conclusions Motivation We think about our world in terms of ID: 636680
Download Presentation The PPT/PDF document "Processing Semantic Relations Across Tex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Processing Semantic Relations Across Textual Genres
Bryan Rink
University of Texas at Dallas
December 13, 2013Slide2
Outline
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
ConclusionsSlide3
Motivation
We think about our world in terms of:
Concepts (e.g.,
bank, afternoon, decision, nose
)
Relations (e.g.
Is-A, Part-Whole, Cause-Effect
)
Powerful mental constructions for:
Representing knowledge about the world
Reasoning over that knowledge:
From
Part-Whole
(
brain
,
Human
) and
Is-A
(
Socrates
,
Human
)
We can reason that
Part-Whole
(
brain
,
Socrates
)Slide4
Representation and Reasoning
Large general knowledge bases exist:
WordNet, Wikipedia/DBpedia/Yago, ConceptNet, OpenCyc
Some domain specific knowledge bases exist:
Biomedical (UMLS,
Music (Musicbrainz)
Books (RAMEAU)
All of these are available in the standard RDF/OWL data model
Powerful reasoners exist for making inferences over data stored in RDF/OWL
Knowledge acquisition is still the most time consuming and difficult among theseSlide5
Relation Extraction from Text
Relations between concepts are encoded explicitly or implicitly in many textual resources:
Encyclopedias, news articles, emails, medical records, academic articles, web pages
For example:
“
The report found Firestone made mistakes in the production of the tires.
”
Product-Producer
(
tires
,
Firestone
)Slide6
Outline
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
ConclusionsSlide7
Supervised Relation Identification
SemEval-2010 Task 8 – “Multi-Way Classification of Semantic Relations Between Pairs of Nominals”
Given a sentence and two marked nominals
Determine the semantic relation and directionality of that relation between the nominals.
Example:
A small
piece
of rock landed into the
trunk
This contains an
Entity-Destination
(
piece, trunk
) relation:
The situation described in the sentence entails the fact that
trunk
is the destination of
piece
in the sense of
piece
moving (in a physical or abstract sense) toward
trunk
.Slide8
Semantic Relations
Relation
Definition
Cause-Effect
X causes Y
Instrument-Agency
Y uses X; X is the instrument
of Y
Product-Producer
Y produces
X; X is the product of Y
Content-Container
X is or was stored or carried inside Y
Entity-Origin
Y is origin of an entity X, X coming/derived from Y
Entity-Destination
X moves toward Y
Component-Whole
X is component of Y and has a functional relation
Member-Collection
X is
a member of Y
Message-Topic
X is a message containing information about Y
Other
if none of the nine relations appears to be suitableSlide9
Observations
Three types of evidence useful for classifying relations:
Lexical/Contextual cues
“The seniors poured
flour
into
wax
paper
and threw the items as projectiles on freshmen during a morning pep rally”
Knowledge of the typical role of one nominal
“
The
rootball
was in a
crate
the size of a refrigerator, and some of the arms were over 12 feet tall
.
Knowledge of a pre-existing relation between the nominals
“The Ca content in the
corn
flour
has also a strong dependence on the pericarp thickness.”Slide10
Approach
Use an SVM classifier to first determine the relation type
Each relation type then has its own SVM classifier to determine direction of the relation
All SVMs share same set of 45 feature types which fall into the following 8 categories:
Lexical/Contextual
Hypernyms from WordNet
Dependency parse
PropBank parse
FrameNet parse
Nominalization
Nominal similarity derived from Google N-Grams
TextRunner predicatesSlide11
SystemSlide12
Lexical/Contextual Features
Words between the nominals are very important:
Number of tokens between the nominals is also helpful:
Product-Producer, Entity-Origin
often have zero:
“organ builder”, “Coconut oil”
Additional features for:
E
1
/E
2
words, E
1
/E
2
part of speech, Words before/after the nominals, Prefixes of words between
Sequence of word classes between the nominals:
Verb_Determiner, Preposition_Determiner, Preposition_Adjective_Adjective, etc.
cause
Cause-Effect
used
Instrument-Agency
makes
Product-Producer
contained
Content-Container
from
Entity-Origin
into
Entity-Destination
on
Component-Whole
of
Member-Collection
about
Message-TopicSlide13
Example Feature Values
Sentence: Forward
[motion]
E1
of the vehicle through the air caused a
[suction]
E2
on the road draft tube.
Feature values:
e1Word
=motion,
e2Word
=suction
e1OrE2Word
={motion, suction}
between
={of, the, vehicle, through, the, air, caused, a}
posE1
=NN,
posE2
=NN
posE1orE2
=NN
posBetween
=I_D_N_I_D_N_V_D
distance
=8
wordsOutside
={Forward, on}
prefix5Between
={air, cause, a, of, the, vehic, throu, the}Slide14
Parsing Features
Dependency Parse (Stanford parser)
Paths of length 1 from each nominal
Paths of length 2 between E
1
and E
2
PropBank
SRL Parse (ASSERT)
Predicate associated with both nominals
Number of tokens in the predicate
Hypernyms of predicate
Argument types of nominals
FrameNet SRL Parse (LTH)
Lemmas of frame trigger words, with and without part of speech
Also make use of VerbNet to generalize verbs from dependency and
PropBank
parsesSlide15
Example Feature Values
Sentence: Forward
[motion]
E1
of the vehicle through the air caused a
[suction]
E2
on the road draft tube.
Dependency
<E
1
>
nsubj
caused
dobj
<E
2
>
<E
1
>
nsubj
vn:27
dobj
<E
2
>
VerbNet/Levin class 27 is the class of
engender
verbs such as:
cause, spawn, generate
, etc.
This feature value indicates that E
1
is the subject of an engender verb, and the direct object is E
2
PropBank
Hypernyms of the predicate: cause#v#1, create#v#1Slide16
Example Feature Values
Feature
Set
Feature Values
Dependency
depPathLen1
={caused
nsubj
<E1>,caused
dobj
<E2>,...}
depPathLen1VN
={vn:27
nsubj!<E1>,vn:27
dobj
<E2>},
depPathLen2VerbNet
={<E1>
nsubj vn:27
dobj
<E2>},
depPathLen2Location
= {<E1> nsubj BETWEEN dobj<E2>}PropBankpbPredStem=caus, pbVerbNet=27, pbE1CoarseRole=ARG0, pbE2CoarseRole=ARG1,pbE1orE2CoarseRole={ARG1,ARG2}, pbNumPredToks=1, pbE1orE2PredHyper = {cause#v#1, create#v#1}FrameNetfnAnyLU={cause.v, vehicle.n, road.n}, fnAnyTarget={cause,vehicle,road}, fnE2LU=cause.v,fnE1OrE2LU=cause.v
Sentence: Forward
[motion]
E1
of the vehicle through the air caused a
[suction]
E2
on the road draft tube.Slide17
Nominal Role Affiliation Features
Sometimes context is not enough and we must use background knowledge about the nominals
Consider the nominal:
writer
Knowing that a writer is a person increases the likelihood that the nominal will act as a
Producer
or an
Agency
Use WordNet hypernyms for the nominal’s sense determined by SenseLearner
Additionally,
writer
nominalizes the verb
write
, which is classified by Levin as a “Creation and Transformation” verb.
Most likely to act as a
Producer
Use NomLex-Plus to determine the verb being nominalized and retrieve the Levin class from VerbNetSlide18
Google N-Grams for Nominal Role Affiliation
Semantically-similar nominals should participate in the same roles
They should also occur in similar contexts in a large corpus
Using Google 5-grams, the 1,000 most frequent words appearing in the context of a nominal are collected
Using Jaccard similarity on those context words, the 4 nearest neighbor nominals are determined, and used as a feature
Also, determine the role most frequently associated with those neighborsSlide19
Example Values for Google N-Grams Feature
Sentence 4739:
As part of his wicked plan, Pete promotes Mickey and his pals into the
[legion]
E1
of
[musketeers]
E2
and assigns them to guard Minnie.
Member-Collection
(E
2
, E
1
)
E
1
nearest neighbors:
legion, army, heroes, soldiers, world
Most frequent role:
Collection
E
2 nearest neighbors: musketeers , admirals, sentries, swordsmen, larks
Most frequent role:
MemberSlide20
Pre-existing Relation Features
Sometimes the context gives few clues about the relation
Can use knowledge about a context-independent relation between the nominals
TextRunner
A queryable database of
Noun-Verb-Noun
triples from a large corpus of web text
Plug in E
1
and E
2
as the nouns and query for predicates that occur between themSlide21
Example Feature Values for TextRunner Features
Sentence:
Forward
[motion]
E1
of the vehicle through the air caused a
[suction]
E2
on the road draft tube.
E
1
____ E
2
:
may result from
, to contact,
created
, moves, applies,
causes
, fall below, corresponds to which
E
2 ____ E1 : including, are moved under, will cause
, according to,
are effected by
, repeats, can matchSlide22
Results
Relation
Precision
Recall
F1
Cause-Effect
89.63
89.63
89.63
Component-Whole
74.34
81.73
77.86
Content-Container
84.62
85.94
85.27
Entity-Destination
88.22
89.73
88.96
Entity-Origin
83.87
80.62
82.21
Instrument-Agency
71.83
65.38
68.46
Member-Collection
84.30
87.55
85.89Message-Topic81.0285.0682.99Product-Producer82.3874.8978.46Other52.9751.1052.02Overall82.2582.2882.19Slide23
Learning Curve
Training Size
73.08
77.02
79.93
82.19Slide24
Ablation Tests
All 255 (= 2
8
– 1) combinations of the 8 feature sets were evaluated by 10-fold cross validation
# of feature sets
Optimal
feature sets
F1
1
Lexical
73.8
2
+Hypernym
77.8
3
+FrameNet
78.9
4
+Ngrams
79.7
5
-FrameNet +PropBank +TextRunner
80.5
6
+FrameNet
81.1
7
+Dependency
81.3
8
+NomLex-Plus
81.3
Lexical is the single best feature set, Lexical+Hypernym is the best 2-feature set combination, etc.Slide25
Other Supervised Tasks
Causal relations between events – FLAIRS
2010Slide26
Causal Relations Between Events
Discovered graph patterns that were then used as features in a supervised classifier
Example pattern:
“
Under the agreement
”, “
In the affidavits
”, etc.Slide27
Detecting Indications of Appendicitis in Radiology Reports
Submitted to AMIA TBI 2013Slide28
Resolving Coreference in Medical Records
i2b2 2011 and JAMIA 2012
Approach
Based on Stanford Multi-Pass Sieve method
Added supervised learning by introducing features to each pass
Showed that creating a first pass which identifies all the mentions of the patient provides a competitive baselineSlide29
Extracting Relations Between Concepts in Medical Records
i2b2 2010 Shared Task and JAMIA 2011Slide30
Supervised Relations Conclusion
Identifying semantic relations
requires
going beyond contextual and lexical features
Use the fact that
arguments
sometimes have
a high affinity for one of the semantic roles
Knowledge of pre-existing relations
can
aid classification when context is not enoughSlide31
Outline
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
ConclusionsSlide32
Relations in Electronic Medical Records
Medical records contain natural language narrative with very valuable information
Often in the form of a relation between medical treatments, tests, and problems
Example:
… with
the
[transfusion]
and
[IV Lasix]
she did not go into
[flash
pulmonary
edema]
Treatment-Improves-Problem
relations:
(transfusion, flash pulmonary edema)
(IV Lasix, flash pulmonary edema)Slide33
Relations in Electronic Medical Records
Additional examples:
[Anemia]
secondary to
[blood loss].
A causal relationship between problems
On
[exam]
, the patient looks well and lying down flat in her bed with no
[acute distress] .
Relationship between a medical test (“exam”) and what it revealed (“acute distress”).
We consider both positive and negative findings.Slide34
Relations in Electronic Medical Records
Utility
Detected relations can aid information retrieval
Automated systems which review patient records for unusual circumstances
Drugs prescribed despite previous allergy
Tests and treatments never performed despite recommendationSlide35
Relations in Electronic Medical Records
Unsupervised detection of relations
No need for large annotation efforts
Easily adaptable to new hospitals, doctors, medical domains
Does not require a pre-defined set of relation types
Discover relations actually present in the data, not what the annotator thinks is present
Relations can be informed by very large corporaSlide36
Unsupervised Relation Discovery
Assumptions:
Relations exist between entities in text
Those relations are often triggered by contextual words: trigger words
Secondary to, improved, revealed, caused
Entities in relations belong to a small set of semantic classes
Anemia, heart failure, edema
:
problems
Exam, CT scan, blood pressure
:
tests
Entities near each other in text are more likely to have a relationSlide37
Unsupervised Relation Discovery
Latent Dirichlet Allocation baseline
Assume entities have already been identified
Form pseudo-documents for every consecutive pair of entities:
Words from first entity
Words between the entities
Words from second entity
Example:
If
she has evidence of
[neuropathy]
then we would consider a
[nerve biopsy]
Pseudo-document: {neuropathy, then, we, would, consider, a, nerve, biopsy}Slide38
Unsupervised Relation Discovery
These pseudo-documents lead LDA to form clusters such as:
“causal”
“stopwords”
“reveal problem”
“prescription”
to
and
was
(
due
,
on
mg
secondary
is
and
)
was
she
,
needed
be
had
which
as
,
has
showed
PO
likely
this
hePRNhavearedone:foundthatshowingforthoughtafterdemonstratedeverySlide39
Unsupervised Relation Discovery
Clusters formed by LDA
Some good trigger words
Many stop words as well
No differentiation between:
Words in first argument
Words between the arguments
Words in second argument
Can do a better job
By better modeling the linguistic phenomenon
Slide40
Relation Discovery Model (RDM)
Three observable variables:
w
1
: Token from the first argument
w
c
: Context word (between the arguments)
w
2
: Tokens from the second argument
Example:
Recent
[chest x-ray]
shows
[resolving
right lower lobe
pneumonia] .
w
1
: {chest, x-ray}
w
c
: {shows}w2: {resolving, right, lower, lobe, pneumonia}Slide41
Relation Discovery Model (RDM)
In RDM:
A relation type (t
r
)
is generated
Context words (w
c
) are generated from:
Relation type-specific word distribution (showed, secondary, etc.); or
General word distribution (she, patient, hospital)
Relation type-specific semantic classes for the arguments are generated
e.g. a problem-causes-problem relation would be unlikely to generate a test or a treatment class
Argument words (w
1
, w
2
) are generated from argument class-specific word distributions
“pneumonia”, “anemia”, “neuropathy” from a problem classSlide42
Relation Discovery Model (RDM)
Graphical model:Slide43
Experimental Setup
Dataset
349 medical records from 4 hospitals
Annotated with:
Entities: problems, treatments, tests
Relations: Used to evaluate our unsupervised approach
Treatment-Addresses-Problem
Treatment-Causes-Problem
Treatment-Improves-Problem
Treatment-Worsens-Problem
Treatment-Not-Administered-Due-To-Problem
Test-Reveals-Problem
Test-Conducted-For-Problem
Problem-Indicates-ProblemSlide44
Results
Trigger word clusters formed by the RDM:
“connected problems”
“test showed”
“prescription”
“prescription 2”
due
showed
mg
(
consistent
no
p.r.n.
)
not
revealed
p.o.
Working
likely
evidence
hours
ICD9
secondary
done
pm
Problem
patient
2007
q
Diagnosis
(
performed
needed30starteddemonstrateddaycontmostwithoutq.):s/pnormal4closedSlide45
Results
Instances of “connected problems”
First
Argument
Context
Second Argument
ESRD
secondary to her
DM
slightly lightheaded
and with
increased HR
Echogenic kidneys
consistent
with
renal parenchymal disease
A 40% RCA
, which was
Hazy
Librium
for
Alcohol
withdrawal
Last example is actually a Treatment-Administered-For-ProblemSlide46
Results
Instances of “Test showed”
First Argument
Context
Second
Argument
V-P lung scan
Was performed on May
24 2007, showed
low probability of PE
A bedside transthoracic echocardiogram
done in the Cardiac Catheterization
laboratory without evidence of
an effusion
Exploration of the abdomen
revealed
significant nodularity
of the liver
echocardiogram
showed
moderate dilated
left atrium
An MRI of the right leg
was done which was equivocal for
osteomyelitisSlide47
Results
Instances of “prescription”
First Argument
Context
Second Argument
Haldol
0.5-1 milligrams p.o. q.6-8h. P.r.n.
agitation
Plavix
every day to prevent
failure of these stents
KBL mouthwash
, 15 ccp .0. q.d. prn
mouth discomfort
Miconazole nitrate powder
tid prn for
groin rash
AmBisome
300 mg IV q.d. for treatment of
her hepatic candidiasisSlide48
Results
Instances of “prescription 2”
First Argument
Context
Second Argument
MAGNESIUM HYDROXIDE SUSP
30 ML ) , 30 mL
, Susp , By Mouth , At Bedtime , PRN, For
constipation
Depression, major
( ICD9
296.00 , Working, Problem ) cont
NOS
home meds
Diabetes mellitus type II
( ICD9 250.00 , Working , Problem ) cont
home meds
ASCITES
( ICD9 789.6 , Working , Diagnosis
) on
spironalactone
Dilutional hyponatremia
( SNMCT **ID-NUM
, Working , Diagnosis ) improved with
fluid restrictionSlide49
Results
Discovered Argument Classes
“problems”
“treatments/tests”
“tests”
pain
Percocet
CT
disease
Hgb
scan
right
Hct
chest
left
Anion
x-ray
renal
Gap
examination
patient
Vicodin
Chest
artery
RDW
EKG
-
Bili
MRI
symptoms
RBC
culture
mildCaheadSlide50
Evaluation
Two versions of the data:
DS1: Consecutive pairs of entities which have a manually identified relation between them
DS2: All consecutive pairs of entities
Train/Test sets:
Train: 349 records, with 5,264 manually annotated relations
Test: 477 records, with 9,069 manually annotated relationsSlide51
Evaluation
Evaluation metrics
NMI: Normalized Mutual Information
An information-theoretic measure of how well two clusterings match
F measure:
Computed based on the cluster precision and cluster recall
Each cluster is paired with the cluster which maximizes the scoreSlide52
Evaluation
Method
DS1
DS2
NMI
F
NMI
F
Train Set
Complete-link
4.2
37.8
N/A
N/A
K-means
8.25
38.0
5.4
38.1
LDA baseline
12.8
23.1
15.6
26.2
RDM
18.2
39.1
18.1
37.4
Test Set
LDA baseline
10.0
26.1
11.5
26.3
RDM
11.8
37.7
14.0
36.4
Results with 9 relation types, 15 general word classes, and 15 argument classes for RDM.Slide53
Unsupervised Relations Conclusion
Trigger words and argument classes are jointly modeled
RDM uses only entities and tokens
Relations are local to the context, rather than global
RDM outperforms several baselines
Discovered relations match well with manually chosen
relations
Presented at EMNLP 2011Slide54
Additional Relation Tasks
Relational Similarity – SemEval 2012 Task 2
Define a relation through prototypes:
water:drop time:moment pie:slice
Decide which is most similar:
feet:inches
country:city
Used a probabilistic approach to detect high precision patterns for the relations
Pattern precision was then used to rank word pairs occurring with that patternSlide55
Relational Selectional Preferences
Submitted to IWCS 2013
Use LDA to induce latent semantic classesSlide56
Outline
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
ConclusionsSlide57
Proposed work
Supervised vector representations
Initially: word representations
Most existing approaches create unsupervised word representations
Latent Semantic Analysis (Deerwester et al., 1990)
Latent Dirichlet Allocation (Blei
et al., 1998)
Integrated Components Analysis (Scholkopf, 1998)
More recent approaches allow for supervisionSlide58
Existing supervised approaches
HDLR
“Structured Metric Learning for High Dimensional Problems”
Davis and Dhillon (KDD 2008)
S2Net
“Learning Discriminative Projections for Text Similarity Measures”
Yih, Toutanova, Platt, and Meet (CoNLL 2011)
Learns lower-dimensional representations of documents
Optimizes a cosine similarity metric in the lower-dimensional space for similar document retrievalSlide59
Supervised word representations
Relational selectional preferences:
Classify words according to their admissibility for filling the role of a relation:
report, article, thesis, poem
are admissible for the
Message
role of a
Message-Topic
relation
Assume a (possibly very small) training setSlide60
Supervised word representations
Each word is represented by a high-dimensional context vector
v
over a large corpus
e.g., documents the word occurs in, other words it co-occurs with, or grammatical links
Learn a transformation matrix
T
which transforms
v
into a much lower dimensional vector
w
subject to a loss function which is maximized when words from the target set have high cosine similarity
Learning can be performed using LBFGS optimization on the loss function because the cosine similarity function is twice differentiableSlide61
Proposed application
Supervised word representations can be used for many supervised tasks which use words as features
Relation arguments
Contextual words
Not limited to words
arbitrary n-grams
syntactic features
We believe this approach could be useful for any high-dimensionality linguistic features (sparse features)
Benefit comes from both a larger corpus and the supervised learning of the representationSlide62
Additional evaluations
ACE 2004/2005 relation data
Relations between entities in newswire
e.g.,
Member-Of-Group
– “an
activist
for
Peace Now
”
BioInfer 2007
Relations between biomedical concepts
e.g., locations, causality, part-whole, regulation
SemEval 2013 Task 4 and SemEval 2010 Task 9
Paraphrases for noun compounds
e.g., “flu virus”
“cause”, “spread”, “give”Slide63
Outline
Introduction
Supervised relation identification
Unsupervised relation discovery
Proposed work
ConclusionsSlide64
Conclusions
State of the art supervised relation extraction methods in both general domain and medical texts
Identifying relations in text relies on more than just context
Semantic and background knowledge of arguments
Background knowledge about relations themselves
An unsupervised relation discovery modelSlide65
Thank you!
Questions??