SHARPn NLP Presentation to SHARPn Summit Secondary Use June 1112 2012 Cheryl Clark PhD MITRE Corporation Negation event has not occurred or entity does not exist ID: 679128
Download Presentation The PPT/PDF document "Identifying Negation/Uncertainty Attribu..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Identifying Negation/Uncertainty Attributes for SHARPn NLP
Presentation to SHARPn Summit “Secondary Use” June 11-12, 2012
Cheryl Clark, PhD
MITRE Corporation Slide2
Negation: event has not occurred or
entity does not exist She had fever yesterday.
Uncertainty: a measure of doubt
The symptoms are
renal failure.Conditional: could exist or occur under certain circumstances The patient should come back to the ED any rash occurs. Subject: person the observation is on; experiencer had lung cancer.Generic: no clear subject/experiencer E. coli is sensitive to Cipro but enterococcus is not
The Challenge: Text Mentions versus Clinical Facts
Page 2
not inconsistent with
no
if
fever
renal infarction
rash
lung cancer
Cipro …
no
uncertain
conditional
f
amily member
Mother
genericSlide3
Assertion Classifier
(Maximum Entropy)
Extract words, concepts, locations
Identify word classes and ordering
Compute scope enclosures by rule
Negation & Uncertainty Cue/Scope Tagger
Background:
Assertion Analysis Tool, Version 1
3
Independent Evaluation:
i2b2/VA 2010 Clinical NLP Challenge
Assertion Status Task
F
Score = 0.93
Input
docs
i2b2 concepts
i2b2 assertions
Identify sectionsSlide4
Assertion Status Integration within SHARPn Clinical Document Pipeline
Input
docs
…
4
…
…
All annotations are UIMA Common Analysis Structure (CAS)
Assertion Classifier
(Maximum Entropy)
Extract words, concepts, locations
Identify word classes and ordering
Compute scope enclosures by rule
Negation & Uncertainty Cue/Scope Tagger
Identify sections
Updated attribute
annotations
Annotations
cTAKES analysis enginesSlide5
i2b2
Assertion Categories
Page
5
Corresponds to SHARPn
conditional
Assertion classification system designed to meet requirements of 2010
i2b2/VA
Challenge Assertion
subtask
Present:
default
category
Patient
had a
stroke
Absent: problem does not exist in the patientHistory inconsistent with stroke
Possible: uncertainty expressed
We
are unable to determine whether she has leukemia
Conditional: patient experiences the problem only under certain conditionsPatient reports
shortness of breath upon climbing
stairs
Hypothetical: medical
problems the patient may develop
If
you experience wheezing
or shortness of breathNot Patient:
problem associated with someone who is not the patient
Family
history of
prostate
cancerSlide6
i2b2 assertion output valuesdefined for medical problems
closed set of valuesmutually exclusive (fixed priority when multiple values apply)SHARPn assertion attributes
Re-architecting Assertions Page
6
present
absent
possible
hypothetical
not patient
conditional
negation
yes/no
uncertainty
yes/no
c
onditional yes/nosubject multi-valued (patient, family, donor, other…)…apply to various entities, events, relationsindependent
attributes can have multiple valuesadditional attributes may be added
single, multi-way classifier
multiple classifiers, some binary
(no SHARPn equivalent)Slide7
Simple mapping from i2b2 assertion classes to SHARPn attributesUses existing i2b2-trained single classifier model
Identifies i2b2/SHARPn equivalencesMaps to SHARPn attribute valuesAssertion
Module Refactoring: Phase 1Page
7
Please call physician you develop .if[]i2b2 assertion status = “hypothetical”SHARPn conditional attribute = “true”
shortness of
breathSlide8
Direct assignment of SHARPn attribute values Will
use multiple classifiers trained on SHARPn dataWill identify attribute values directly BenefitsAligns with SHARPn concept attributes requirementsAligns with SHARPn clinical data annotationEnables more accurate meaning representation
Assertion Module Refactoring: Phase 2
Page
8He does not smoke , has no hypertension , and has history of coronary artery disease.i2b2 2010 ParadigmChoose one:presentabsentpossiblehypotheticalconditionalnot patient
negator
family
SHARPn Attribute Paradigm
negation = present
subject = family_member
no
absent
not patient
familySlide9
System Errors=> Need for Better Linguistic Analysis for Assertions
Need for phrasal structure; scope extent not always enough9
She had [
no
chest pain or chest pressure ] with this and this was deemed a negative test.negated
not negatedSlide10
Insert a signifier node into constituency parse above entityUse tree kernel methods to compare similarity with negated sentences in training data (can be used on other modifiers as well with varying degrees of success)
Syntactic Approaches*
* Slide courtesy of Tim Miller, Children’s Hospital BostonSlide11
Use TK model to extract tree fragment features (Pighin & Moschitti 07)
Allows interaction with other feature typesFaster to find fragments than do whole-tree comparisonsTree kernel fragment mining*
* Slide courtesy of Tim Miller, Children’s Hospital BostonSlide12
Some assertion attributes apply to relations, too.negation
uncertaintyconditionalNext Steps: Assertions for RelationsPage
12
The
are a although do the extent of .bleedingbleedingfundal AVMsexplainsite of
potential
not
causal relation
location relation
uncertain
negatedSlide13
Model RetrainingModels for individual attributes Linguistic features based on parser output
Training on SHARPn dataEnhancements to parsersEvaluationAccuracy on i2b2 gold annotations vs. accuracy on SHARPn gold annotationsi2b2
absent vs. SHARPn negatedi2b2 possible vs. SHARPn
uncertainty
i2b2 hypothetical vs. SHARPn conditional Evaluation based on system-generated entity annotationsEvaluation on CEM concept rather than on individual mentionsNext Steps: Classifier Retraining and Component EvaluationPage 13Slide14
Thank you!
Page 14
SHARPn
Negation/Uncertainty Team
John
AberdeenDavid Carrell
Cheryl Clark
Matt CoarrScott Halgrim
Lynette HirschmanDonna Ihrke
Tim MillerGuergana Savova
Ben WellnerSlide15
Backup SlidesSlide16
Negation and temporal
Circumstantial negation (i2b2 calls this
conditional
)AllergensClarifying Definitions Page 16No longer annotated as negated. Course: degree_of (tumor, CHANGED (span for “removed”))The text span “removed” indicates the
tumor was there but does not exist anymore. Originally annotated as negated.
While smoking, he does not use his nicotine patch
Allergen status distinguished from negation
Allergy_indicator_class
Medications mentioned as allergens originally negated
The patient had the
tumor
removed.
Annotated as negated
ALLERGIES
PCN
Sulpha
Zocor
Asendin
RocephinSlide17
System Errors=> Need for Better Linguistic Analysis for Assertions
She had no signs of infection on her
leg wounds and she did have some mild erythema around her right great toe
Issue is structure and not simply span extent:
17present = should not be negatedabsent = negated
She had
[
no
chest pain or chest pressure
] with this and this was deemed a negative test.
negated
not negated
[
]Slide18
[Add screenshot]
MASTIF-Generated SHARPn attributes in cTAKES OutputPage 18
default values
calculated valueSlide19
Assertions for Different Concept Types
Page 19
p
olarity = -1 negatedSlide20
UMLS CUI-driven annotation (SHARPn)
UMLE contains some concept-internal negation; concept-internal subjectCigarette smoker Concept: [C0337667] (finding)Never smoked Concept: [C0425293] Never smoked tobacco (finding)
Non-smoker Concept: [C0337672] Non-smoker (finding)
Mother smokes Concept
: [C0424969] (finding)Father smokes Concept: [C0424968] (finding)Mother does not smoke Concept: [C2586137] (finding)Father does not smoke Concept: [C2733448] (finding)i2b2 concept excludes contextual cues; SHARPn concept includes it.The patient has never smoked.Issues: Differences in training data annotationPage 20
i2b2 concept: smoked (negated)SHARPn concept:
never smoked (not negated)Slide21
No known
allergies Concept: [C0262580] No known allergiesi2b2: concept = known allergies; type =
problem; assertion = absentSHARPn
:
concept = no known allergies; type = disease/disorder; (finding in UMLS) assertion = presentNKAi2b2: concept = nka ; type= problem; assertion = absentIssue: Differences in training data annotation
Page 21Slide22
We describe a methodology for identifying negation and uncertainty in clinical documents and a system that uses that information to assign assertion values to medical problems mentioned in clinical text. This system was among the top performing systems in the assertion subtask of the 2010 i2b2/VA community evaluation
Challenges in natural language processing for clinical data, and has subsequently been packaged as a UIMA module called the MITRE Assertion Status Tool for Interpreting Facts (MASTIF), which can be integrated with cTAKES. We describe the process of extending MASTIF, which uses a single multi-way classifier to select among a closed set of mutually exclusive assertion categories, to a system that uses individual, independent classifiers to assign values to independent negation and uncertainty attributes associated with a variety of clinical concepts (e.g., medications, procedures, and relations) as specified by SHARPn requirements. We discuss the benefits that result from this new representation and the challenges associated with generating it automatically. We compare the accuracy of MASTIF on i2b2 data with accuracy on a subset of SHARPn clinical documents, and discuss the contribution of linguistic features to accuracy and generalizability of the system. Finally, we discuss our plans for future development.
Abstract
Page
22