Dr Stan Huff 2 Acknowledgements Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many many others 3 What are detailed clinical models ID: 493272
Download Presentation The PPT/PDF document "Data Normalization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Data Normalization
Dr. Stan HuffSlide2
#
2
Acknowledgements
Tom Oniki
Joey Coyle
Craig Parker
Yan Heras
Cessily Johnson
Roberto Rocha
Lee Min Lau
Alan James
Many, many, others…Slide3
#
3
What are detailed clinical models?
Why do we need them?Slide4
#
4
A diagram of a simple clinical model
data
138 mmHg
quals
SystolicBP
SystolicBPObs
data
Right Arm
BodyLocation
BodyLocation
data
Sitting
PatientPosition
PatientPosition
Clinical Element Model for Systolic Blood PressureSlide5
#
5
Need for a standard model
A stack of coded items is ambiguous (SNOMED CT)
Numbness of right arm and left leg
Numbness (44077006)
Right (24028007)
Arm (40983000)
Left (7771000)
Leg (30021000)
Numbness of left arm and right leg
Numbness (44077006)
Left (7771000)
Arm (40983000)Right (24028007)Leg (30021000)Slide6
#
6
70
What if there is no model?
Hct, manual:
Site #1
%
Hct :
Site #2
Manual
%
Auto
37
70
37
70
Hct, auto :
%
35
EstimatedSlide7
HL7 V2.X Messages
Site 1:
OBX|1|CE|
4545-0^Hct, manual||37||%|OBX|1|CE|4544-3^Hct, auto
||35||%|Site 2:OBX|1|CE|20570-8^
Hct
||
37
||%|….|manual|
OBX|1|CE|
20570-8^
Hct||35||%|….|auto|Slide8
#
8
Too many ways to say the same thing
A single name/code and value
Hct, manual
is
37 %
Two names/codes and values
Hct
is
37 %
Method
is
manual (spun)Slide9
#
9
Model fragment in XML
Pre-coordinated representation
<observation>
<cd>
Hct, manual
(LOINC 4545-0 )
</cd>
<value>
37 %
</value></observation>Post-coordinated (compositional) representation<observation> <cd> Hct (LOINC 20570-8) </cd> <qualifier> <cd> Method </cd>
<value> Manual </value> <qualifier> <value> 37 % </value></observation>Slide10
#
10
Isosemantic Models
data
37 %
HematocritManual (LOINC 4545-0)
HematocritManualModel
data
37 %
quals
Hematocrit (LOINC 20570-8)
HematocritModel
data
Manual
Hematocrit Method
HematocritMethodModel
Precoordinated Model
Post coordinated Model (Storage Model)Slide11
#
11
Relational database implications
If the patient
’
s hematocrit is <= 35 then ….
Patient Identifier
Date and Time
Observation Type
Observation Value
Units
123456789
7/4/2005
Hct, manual
37
%
123456789
7/19/2005
Hct, auto
35
%
Patient Identifier
Date and Time
Observation Type
Weight type
Observation Value
Units
123456789
7/4/2005
Hct
manual
37
%
123456789
7/19/2005
Hct
auto
35
%Slide12
#
12
More complicated items:
Signs, symptoms
Diagnoses
Problem list
Family History
Use of negation –
“
No Family Hx of Cancer
”
Description of a heart murmur
Description of breath sounds“Rales in right and left upper lobes”“Rales, rhonchi, and egophony in right lower lobe
”Slide13
#
13
What do we model?
All health care data, including:
Allergies
Problem lists
Laboratory results
Medication and diagnostic orders
Medication administration
Physical exam and clinical measurements
Signs, symptoms, diagnoses
Clinical documents
ProceduresFamily history, medical history and review of symptomsSlide14
#
14
How are the models used?
EMR: data entry screens, flow sheets, reports, ad hoc queries
Basis for application access to clinical data
Data normalization
Creation of maps from models in the local system to the standard model
Target for the output of structured data from NLP
Validation of data as it is stored in the database
Phenotype algorithms (decision logic)
Basis for referencing data in phenotype definitions
Does
NOT dictate physical storage strategySlide15
#
15
Model Source Expression (CDL)
model BloodPressurePanel is panel
{
key code(BloodPressurePanel_KEY_ECID);
statement SystolicBloodPressureMeas systolicBloodPressureMeas optional
systolicBloodPressureMeas.methodDevice.conduct(methodDevice)
systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord)
systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition)
systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext)
systolicBloodPressureMeas.subject.conduct(subject)
systolicBloodPressureMeas.observed.conduct(observed) systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived) systolicBloodPressureMeas.verified.conduct(verified); statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional …. statement MeanArterialPressureMeas meanArterialPressureMeas optional …. qualifier MethodDevice methodDevice optional; md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID); qualifier BodyLocationPrecoord bodyLocationPrecoord optional; blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID); modifier Subject subject optional;
attribution Observed observed optional; attribution ReportedReceived reportedReceived optional; attribution Verified verified optional;}Slide16
#
16
Compiler
CE
Source
File
CE
Translator
“
In Memory
”
Form
HTML
SMArt RDF?
openEHR Archetype?
HL7 RIM Static Models?
Java Class
XML Template - .
xsd
OWL?
UML?Slide17
Artifacts Used
CDL Model Definition
CEM XML Schema
HL7 Data Source
CEM XML InstanceSlide18
StandardLabObsQuantitative
- CDL Definition
import StandardLabObs;
import ReferenceRangeNar;
model
StandardLabObsQuantitative
is statement extends StandardLabObs {
key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID);
data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID)
alternate {
match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID);
match CD altCDValue code.domain(LabValue_VALUESET_ECID);
otherwise ST altSTValue; };
qualifier ReferenceRangeNar referenceRangeNar card(0..1); constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID); constraint abnormalInterpretation.CD.code.domain (AbnormalInterpretationNumericNom_VALUESET_ECID); constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID);}Slide19
StandardLabObsQuantitative
- Schema Snippet
<xs:complexType name="StandardLabObsQuantitative">
<xs:sequence>
<xs:element name="key" minOccurs="0" maxOccurs="1" type="CD"/>
<xs:element name="primaryPQValue" type="PQ"/>
<xs:element name="referenceRangeNar" minOccurs="0" maxOccurs="1" type="ReferenceRangeNar"/>
<xs:element name="accessionNumber" minOccurs="0" maxOccurs="1" type="AccessionNumber"/>
<xs:element name="fillerOrderNumber" minOccurs="0" maxOccurs="1" type="FillerOrderNumber"/>
<xs:element name="placerOrderNumber" minOccurs="0" maxOccurs="1" type="PlacerOrderNumber"/>
<xs:element name="resultStatus" minOccurs="0" maxOccurs="1" type="ResultStatus"/>
<xs:element name="reportingPriority" minOccurs="0" maxOccurs="1" type="ReportingPriority"/>
<xs:element name="abnormalInterpretation" minOccurs="0" maxOccurs="1" type="AbnormalInterpretation"/> <xs:element name="ordinalInterpretation" minOccurs="0" maxOccurs="1" type="OrdinalInterpretation"/>
<xs:element name="deltaFlag" minOccurs="0" maxOccurs="1" type="DeltaFlag"/> <xs:element name="responsibleObserver" minOccurs="0" maxOccurs="unbounded" type="ResponsibleObserver"/> <xs:element name="performingLaboratory" minOccurs="0" maxOccurs="1" type="PerformingLaboratory"/> <xs:element name="comment" minOccurs="0" maxOccurs="unbounded" type="Comment"/> <xs:element name="subject" minOccurs="0" maxOccurs="1" type="Subject"/> <xs:element name="specimenCollected" minOccurs="0" maxOccurs="1" type="SpecimenCollected"/> <xs:element name="specimenReceivedByLab" minOccurs="0" maxOccurs="1" type="SpecimenReceivedByLab"/> <xs:element name="resulted" minOccurs="0" maxOccurs="1" type="Resulted"/> \ <xs:element name="patientId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="status" minOccurs="0" maxOccurs="1" type="anonymous"/> <xs:element name="instanceId" minOccurs="0" maxOccurs="1" type="anonymous.2"/>
<xs:element name="typeId" minOccurs="0" maxOccurs="1" type="anonymous.3"/> </xs:sequence> <xs:attribute name="class" type="statement.type" default="statement"/> <xs:attribute name="type" type="ecid.type" default="b1ceaebb-dd15-4317-3f99-67ef3af81778"/></xs:complexType>Slide20
HL7 Source Instance
MSH|^~\&|OADD|153|DADD|XNEPHA|20110208000109||ORU^R01|20110207000036|T|2.2||||
EVN|R01|201102080000|
PID||1234567|274382554|007261|WHYLING^KAYLIE^O'TEST||19460413|F||W|||(801)224-1528|(866)772-3150||||21443041|535194412|
PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST |^||||||||||OP||||||||||||||||||||||||||201102070000||||||||
ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^|
OBR||^|F506556^|HCT^HEMATOCRIT|R||201102071554|||70011^ROSEN,AUBRY^ O'TEST |||20110207161200|^|28826^Allyson^Josephine^ O'TEST ||||M2415648||||C|F|RFP^RFP|^^^^^R|^~^~^|||||||
OBX|1|NM|
HCT^HEMATOCRIT
|1.1|
48
|
%|||R||F|||201102080000|IM^Performed at Inte|58528^ANDERSON^MARK| Slide21
LabObsQuantitative
- XML Instance Snippet
<
labObsQuantitative
type="b1ceaebb-dd15-4317-3f99-67ef3af81778"> <
key
>
<
code
>
<
value>20570-8</value>
</code> <codeSystem>
<value>LOINC</value> </codeSystem> <originalText>HCT</originalText> </key> <
primaryPQValue> <operator> <value>equals</value> </operator> <unit> <value>%
</value> </unit>
<value>48</value> </primaryPQValue> <referenceRangeNar type="6f422ce6-7bc6-2cc2-8c96-58c137b5c9fc"> … </referenceRangeNar> <abnormalInterpretation type="9a3c3c60-18f7-5a91-c10c-c15532a96303">
… </abnormalInterpretation> </labObsQuantitative>Slide22
#
22
Issues
Different groups use models differently
NLP versus EMR
Structuring the models to meet more than one use
Options for different granularities of models
Hematocrit model, model of pneumonia
Quantitative lab result model, x-ray finding
Terminology integration – use of standards and terminology services
Models for
“
rare” kinds of dataMedication being taken by a friend, not recommended by the physicianSlide23
#
23
Questions?Slide24
Data Normalization
Dr. Christopher ChuteSlide25
IHC-Medication,
Mayo, IHC LAB to CEM
HL7
(Meds)
HL7
Initializer
IHC-GCN
TO-RXNORM
Annotator
Drug
CEM
CAS
Consumer
Mirth
SharpDb
HL7
(Labs)
HL7
Initializer
Generic-LAB-
Annotator
LAB
CEMCASConsumerMayoLOINC resourceIHCLOINC
resourceIHCRXNORMresourceSlide26
UIMA Normalization Pipeline
Convert HL7 V2.x Lab / Med Order Messages into CEM XML instances
Load
SofA with HL7 messageCreate Segment Objects in CASNormalize Segments in CASTransform Segments into CEM instancesSlide27
Mayo, IHC LAB to CEM
Mirth
SharpDb
HL7
(XML)
HL7
Initializer
LAB
CEM
CAS
Consumer
Mayo
LOINC resource
IHC
LOINC
resource
One of the new pipelines created to normalize HL7 2.x Lab Messages into CEM instances.
We pre-processed the HL7 messages convertingfrom HL7 pipe syntax into HL7 XML format.Mirth
HL7
Pipe Delimited
Generic-LAB-
Annotators
Generic-LAB-
AnnotatorsSlide28
CAS
(SOFA=HL7-XML)
HL7
message
CAS
PID
PV1
OBX
10109
45373-3
CEM
Initialize
Parse
Normalize
Transform
UIMA Pipeline FlowMayo, IHC LAB to CEMSlide29
Normalization Anatomy
Lab Annotators
HL7
Segment Parser
Date-Time
To
ISO Format
Syntactic Integrity
LOINC lookups
IHC codes to LOINC table
Mayo codes to LOINC table
LexGrid
/CTS2
Terminology ServicesSlide30
Architectural Opportunities
Mirth
CAS
To XML
Mirth
HL7 2.x
HL7 2.x
CDA
CDA
CEM format
CEM format
CEM format
HL7 2.x
Mayo
CDA
CDA
CEM format
CEM format
Time,
Syntax
Etc.
SemanticSlide31
Tactical Next Step Enhancements
Single CEM for multiple OBX segments
Efficiently utilize terminology services
Incorporate a library for HL7 clean-up routinesIncrease scope of vocabulary standardizationEnhancements for the Drug Annotator Context enhancement issueDrug name surprisesSlide32
Additional Vocabularies
Review sources used for normalization opportunities E.g.
In HL7 OBR Segments
Standardize Service ID (Codes)In HL7 OBX Segments Standardize UnitsStandardize Reference RangesStandardize Normal FlagsSlide33
Drug Name Disambiguity
Real patient data, presented a unique case in drug names. “ToDAY” is brand name for: cephapirin sodium.
This presents an interesting named entity disambiguation use case.Slide34
Where Persistence Fits In…
10
IHC
(Backend CDR
Systems)
Mirth
Connect
IHC
NwHIN
Aurion
Gateway
SHARP
NwHIN
Aurion
Gateway
Mirth
Connect
UIMA
Pipeline
CEM
Instance
Database
1
2
3
4
5
6
7
8
9
6a
Mayo
EDT SystemSlide35
Persistence Channels
One Channel per model
Data stored as an XML Instance of the model
Fields extracted from XML to use as indicesXML Schema defined for each modelStored using database transactionsCEM ModelMirth ChannelAdministrative DiagnosisCemAdminDxToDatabaseStandard Lab Panel
CemLabToDatabaseAmbulatory Medication OrderCemMedicationToDatabaseSlide36
General Channel Design
Input Message Directory
Channel
CEM XML Instance
Processed Message Directory
Error Message Directory
Persistence Store
Connector
ConnectorSlide37
SharpDB a CEM Instance DatabaseSlide38
Database Tables
Table
Purpose
DemographicsPatient demographics(One row per patient)PatientCrossReferenceAssociates internal Patient ID with Site Patient ID(One row per cross map)SourceDataInformation about the original source data(One row per instance message)PatientDataCEM Instance XML with some source information(One row per instance message)
IndexDataIndices into the XML instance.(Multiple rows per instance message AdminDx – One per message Lab – One per observation Medication – One per orderable item.)Slide39
Patient Demographics
Each message contains patient demographics
Demographics created on first received message based on site patient ID
Internal Patient ID is created and cross mapped to site patient IDSharpDB is keyed off internally generated Patient IDSlide40
Running in a Cloud…
Various images were installed:
NwHIN Gateway provided by Aurion
MIRTH Connect our interface engineUIMA Pipelines of various sortsMySQL database for persistenceJBOSS / Drools rules engineAll open source, running in a Ubuntu Cloud!Slide41
Node Controller
Cloud Controller
Walrus Controller
Cloud Server
Node Server 1
SHARP Hardware Infrastructure
Admin
Client
Interface
VPN/
LAN
Node Controller
VM
VM
To Manage Cloud
VM
User
VPN/
LAN
To Connect To Instances
Persistence
Storage
Node Server 3
Node Controller
VM
VM
VM
Node Server 2Node Controller
VMVMVMNode Server 11
VMVMVMHardware
No. of Physical
MachinesCPU
Memory
Disk
Disk Space
Networking
Functionality
No. of NICs
Cloud Server
1
8
12 GB
10000 RPM SAS
1 TB
1 Gbps
Cloud, Walrus, Cluster and Storage Controller
4
Node Server
1
8
32 GB
10000 RPM SAS
1 TB
1 Gbps
Node Controller
4
Node Server
8
24
128 GB
10000 RPM SAS
600 GB/600 GB
1 Gbps
Node Controller
4
Node Server
1
8
64 GB
7200 RPM SATA
1 TB/1 TB
1 Gbps
Node Controller
4
Node Server
1
8
32 GB
10000 RPM SAS
4 TB
1 Gbps
Node Controller
4
Build/Backup Server
1
2
8 GB
7200 RPM SATA
2 TB
1
Gbps
Build and Backup
2
Storage
2
10000 RPM SAS
7.5 TB
1
Gbps
Persistence and Image Storage
Storage
2
10000 RPM SAS
3.6 TB
1 Gbps
Volume Storage
Cisco 48 Port Switch
2
1 GB
Image
Storage
…
Private Switch
Build/Backup
Server
Cluster Controller
Storage ControllerSlide42
Data Normalization Summary
Initial “tracer shot” at Data Normalization
Cloud based processing using open source tools
Proof on concept, UIMA for Data NormalizationMove on to new problems / solutions…Opportunities exist:Add new annotators (modules) to the pipelinesWiden usage and scope of vocabulary servicesSwitch to real live flows and add HOSS clean up routines.Various tweaks in NLP algorithms