/
Data Normalization Data Normalization

Data Normalization - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
520 views
Uploaded On 2016-11-25

Data Normalization - PPT Presentation

Dr Stan Huff 2 Acknowledgements Tom Oniki Joey Coyle Craig Parker Yan Heras Cessily Johnson Roberto Rocha Lee Min Lau Alan James Many many others 3 What are detailed clinical models ID: 493272

hl7 type data element type hl7 element data cem minoccurs maxoccurs model hct xml lab patient instance ecid domain

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Normalization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Normalization

Dr. Stan HuffSlide2

#

2

Acknowledgements

Tom Oniki

Joey Coyle

Craig Parker

Yan Heras

Cessily Johnson

Roberto Rocha

Lee Min Lau

Alan James

Many, many, others…Slide3

#

3

What are detailed clinical models?

Why do we need them?Slide4

#

4

A diagram of a simple clinical model

data

138 mmHg

quals

SystolicBP

SystolicBPObs

data

Right Arm

BodyLocation

BodyLocation

data

Sitting

PatientPosition

PatientPosition

Clinical Element Model for Systolic Blood PressureSlide5

#

5

Need for a standard model

A stack of coded items is ambiguous (SNOMED CT)

Numbness of right arm and left leg

Numbness (44077006)

Right (24028007)

Arm (40983000)

Left (7771000)

Leg (30021000)

Numbness of left arm and right leg

Numbness (44077006)

Left (7771000)

Arm (40983000)Right (24028007)Leg (30021000)Slide6

#

6

70

What if there is no model?

Hct, manual:

Site #1

%

Hct :

Site #2

Manual

%

Auto

37

70

37

70

Hct, auto :

%

35

EstimatedSlide7

HL7 V2.X Messages

Site 1:

OBX|1|CE|

4545-0^Hct, manual||37||%|OBX|1|CE|4544-3^Hct, auto

||35||%|Site 2:OBX|1|CE|20570-8^

Hct

||

37

||%|….|manual|

OBX|1|CE|

20570-8^

Hct||35||%|….|auto|Slide8

#

8

Too many ways to say the same thing

A single name/code and value

Hct, manual

is

37 %

Two names/codes and values

Hct

is

37 %

Method

is

manual (spun)Slide9

#

9

Model fragment in XML

Pre-coordinated representation

<observation>

<cd>

Hct, manual

(LOINC 4545-0 )

</cd>

<value>

37 %

</value></observation>Post-coordinated (compositional) representation<observation> <cd> Hct (LOINC 20570-8) </cd> <qualifier> <cd> Method </cd>

<value> Manual </value> <qualifier> <value> 37 % </value></observation>Slide10

#

10

Isosemantic Models

data

37 %

HematocritManual (LOINC 4545-0)

HematocritManualModel

data

37 %

quals

Hematocrit (LOINC 20570-8)

HematocritModel

data

Manual

Hematocrit Method

HematocritMethodModel

Precoordinated Model

Post coordinated Model (Storage Model)Slide11

#

11

Relational database implications

If the patient

s hematocrit is <= 35 then ….

Patient Identifier

Date and Time

Observation Type

Observation Value

Units

123456789

7/4/2005

Hct, manual

37

%

123456789

7/19/2005

Hct, auto

35

%

Patient Identifier

Date and Time

Observation Type

Weight type

Observation Value

Units

123456789

7/4/2005

Hct

manual

37

%

123456789

7/19/2005

Hct

auto

35

%Slide12

#

12

More complicated items:

Signs, symptoms

Diagnoses

Problem list

Family History

Use of negation –

No Family Hx of Cancer

Description of a heart murmur

Description of breath sounds“Rales in right and left upper lobes”“Rales, rhonchi, and egophony in right lower lobe

”Slide13

#

13

What do we model?

All health care data, including:

Allergies

Problem lists

Laboratory results

Medication and diagnostic orders

Medication administration

Physical exam and clinical measurements

Signs, symptoms, diagnoses

Clinical documents

ProceduresFamily history, medical history and review of symptomsSlide14

#

14

How are the models used?

EMR: data entry screens, flow sheets, reports, ad hoc queries

Basis for application access to clinical data

Data normalization

Creation of maps from models in the local system to the standard model

Target for the output of structured data from NLP

Validation of data as it is stored in the database

Phenotype algorithms (decision logic)

Basis for referencing data in phenotype definitions

Does

NOT dictate physical storage strategySlide15

#

15

Model Source Expression (CDL)

model BloodPressurePanel is panel

{

key code(BloodPressurePanel_KEY_ECID);

statement SystolicBloodPressureMeas systolicBloodPressureMeas optional

systolicBloodPressureMeas.methodDevice.conduct(methodDevice)

systolicBloodPressureMeas.bodyLocationPrecoord.conduct(bodyLocationPrecoord)

systolicBloodPressureMeas.bodyPosition.conduct(bodyPosition)

systolicBloodPressureMeas.relativeTemporalContext.conduct(relativeTemporalContext)

systolicBloodPressureMeas.subject.conduct(subject)

systolicBloodPressureMeas.observed.conduct(observed) systolicBloodPressureMeas.reportedReceived.conduct(reportedReceived) systolicBloodPressureMeas.verified.conduct(verified); statement DiastolicBloodPressureMeas diastolicBloodPressureMeas optional …. statement MeanArterialPressureMeas meanArterialPressureMeas optional …. qualifier MethodDevice methodDevice optional; md.code.domain(BloodPressureMeasurementDevice_DOMAIN_ECID); qualifier BodyLocationPrecoord bodyLocationPrecoord optional; blp.code.domain(BloodPressureBodyLocationPrecoord_DOMAIN_ECID); modifier Subject subject optional;

attribution Observed observed optional; attribution ReportedReceived reportedReceived optional; attribution Verified verified optional;}Slide16

#

16

Compiler

CE

Source

File

CE

Translator

In Memory

Form

HTML

SMArt RDF?

openEHR Archetype?

HL7 RIM Static Models?

Java Class

XML Template - .

xsd

OWL?

UML?Slide17

Artifacts Used

CDL Model Definition

CEM XML Schema

HL7 Data Source

CEM XML InstanceSlide18

StandardLabObsQuantitative

- CDL Definition

import StandardLabObs;

import ReferenceRangeNar;

model

StandardLabObsQuantitative

is statement extends StandardLabObs {

key domain(StandardLabObsQuantitative_KEY_VALUESET_ECID);

data PQ primaryPQValue unit.domain (UnitsOfMeasure_VALUESET_ECID)

alternate {

match CD secondaryCDValue code.domain(LabValue_VALUESET_ECID);

match CD altCDValue code.domain(LabValue_VALUESET_ECID);

otherwise ST altSTValue; };

qualifier ReferenceRangeNar referenceRangeNar card(0..1); constraint primaryPQValue.isNullReasonCode.domain(LabNullFlavor_VALUESET_ECID); constraint abnormalInterpretation.CD.code.domain (AbnormalInterpretationNumericNom_VALUESET_ECID); constraint deltaFlag.CD.code.domain (DeltaFlagNumericNom_VALUESET_ECID);}Slide19

StandardLabObsQuantitative

- Schema Snippet

<xs:complexType name="StandardLabObsQuantitative">

<xs:sequence>

<xs:element name="key" minOccurs="0" maxOccurs="1" type="CD"/>

<xs:element name="primaryPQValue" type="PQ"/>

<xs:element name="referenceRangeNar" minOccurs="0" maxOccurs="1" type="ReferenceRangeNar"/>

<xs:element name="accessionNumber" minOccurs="0" maxOccurs="1" type="AccessionNumber"/>

<xs:element name="fillerOrderNumber" minOccurs="0" maxOccurs="1" type="FillerOrderNumber"/>

<xs:element name="placerOrderNumber" minOccurs="0" maxOccurs="1" type="PlacerOrderNumber"/>

<xs:element name="resultStatus" minOccurs="0" maxOccurs="1" type="ResultStatus"/>

<xs:element name="reportingPriority" minOccurs="0" maxOccurs="1" type="ReportingPriority"/>

<xs:element name="abnormalInterpretation" minOccurs="0" maxOccurs="1" type="AbnormalInterpretation"/> <xs:element name="ordinalInterpretation" minOccurs="0" maxOccurs="1" type="OrdinalInterpretation"/>

<xs:element name="deltaFlag" minOccurs="0" maxOccurs="1" type="DeltaFlag"/> <xs:element name="responsibleObserver" minOccurs="0" maxOccurs="unbounded" type="ResponsibleObserver"/> <xs:element name="performingLaboratory" minOccurs="0" maxOccurs="1" type="PerformingLaboratory"/> <xs:element name="comment" minOccurs="0" maxOccurs="unbounded" type="Comment"/> <xs:element name="subject" minOccurs="0" maxOccurs="1" type="Subject"/> <xs:element name="specimenCollected" minOccurs="0" maxOccurs="1" type="SpecimenCollected"/> <xs:element name="specimenReceivedByLab" minOccurs="0" maxOccurs="1" type="SpecimenReceivedByLab"/> <xs:element name="resulted" minOccurs="0" maxOccurs="1" type="Resulted"/> \ <xs:element name="patientId" minOccurs="0" maxOccurs="1" type="anonymous.2"/> <xs:element name="status" minOccurs="0" maxOccurs="1" type="anonymous"/> <xs:element name="instanceId" minOccurs="0" maxOccurs="1" type="anonymous.2"/>

<xs:element name="typeId" minOccurs="0" maxOccurs="1" type="anonymous.3"/> </xs:sequence> <xs:attribute name="class" type="statement.type" default="statement"/> <xs:attribute name="type" type="ecid.type" default="b1ceaebb-dd15-4317-3f99-67ef3af81778"/></xs:complexType>Slide20

HL7 Source Instance

MSH|^~\&|OADD|153|DADD|XNEPHA|20110208000109||ORU^R01|20110207000036|T|2.2||||

EVN|R01|201102080000|

PID||1234567|274382554|007261|WHYLING^KAYLIE^O'TEST||19460413|F||W|||(801)224-1528|(866)772-3150||||21443041|535194412|

PV1||O|XNEPHA^XNEPHA^^IM||||28826^Allyson^Josephine^ O'TEST |^||||||||||OP||||||||||||||||||||||||||201102070000||||||||

ORC|RE||F506556|||||||||28826^Allyson^Josephine^ O'TEST ||||^|

OBR||^|F506556^|HCT^HEMATOCRIT|R||201102071554|||70011^ROSEN,AUBRY^ O'TEST |||20110207161200|^|28826^Allyson^Josephine^ O'TEST ||||M2415648||||C|F|RFP^RFP|^^^^^R|^~^~^|||||||

OBX|1|NM|

HCT^HEMATOCRIT

|1.1|

48

|

%|||R||F|||201102080000|IM^Performed at Inte|58528^ANDERSON^MARK| Slide21

LabObsQuantitative

- XML Instance Snippet

<

labObsQuantitative

type="b1ceaebb-dd15-4317-3f99-67ef3af81778"> <

key

>

<

code

>

<

value>20570-8</value>

</code> <codeSystem>

<value>LOINC</value> </codeSystem> <originalText>HCT</originalText> </key> <

primaryPQValue> <operator> <value>equals</value> </operator> <unit> <value>%

</value> </unit>

<value>48</value> </primaryPQValue> <referenceRangeNar type="6f422ce6-7bc6-2cc2-8c96-58c137b5c9fc"> … </referenceRangeNar> <abnormalInterpretation type="9a3c3c60-18f7-5a91-c10c-c15532a96303">

… </abnormalInterpretation> </labObsQuantitative>Slide22

#

22

Issues

Different groups use models differently

NLP versus EMR

Structuring the models to meet more than one use

Options for different granularities of models

Hematocrit model, model of pneumonia

Quantitative lab result model, x-ray finding

Terminology integration – use of standards and terminology services

Models for

rare” kinds of dataMedication being taken by a friend, not recommended by the physicianSlide23

#

23

Questions?Slide24

Data Normalization

Dr. Christopher ChuteSlide25

IHC-Medication,

Mayo, IHC LAB to CEM

HL7

(Meds)

HL7

Initializer

IHC-GCN

TO-RXNORM

Annotator

Drug

CEM

CAS

Consumer

Mirth

SharpDb

HL7

(Labs)

HL7

Initializer

Generic-LAB-

Annotator

LAB

CEMCASConsumerMayoLOINC resourceIHCLOINC

resourceIHCRXNORMresourceSlide26

UIMA Normalization Pipeline

Convert HL7 V2.x Lab / Med Order Messages into CEM XML instances

Load

SofA with HL7 messageCreate Segment Objects in CASNormalize Segments in CASTransform Segments into CEM instancesSlide27

Mayo, IHC LAB to CEM

Mirth

SharpDb

HL7

(XML)

HL7

Initializer

LAB

CEM

CAS

Consumer

Mayo

LOINC resource

IHC

LOINC

resource

One of the new pipelines created to normalize HL7 2.x Lab Messages into CEM instances.

We pre-processed the HL7 messages convertingfrom HL7 pipe syntax into HL7 XML format.Mirth

HL7

Pipe Delimited

Generic-LAB-

Annotators

Generic-LAB-

AnnotatorsSlide28

CAS

(SOFA=HL7-XML)

HL7

message

CAS

PID

PV1

OBX

10109

45373-3

CEM

Initialize

Parse

Normalize

Transform

UIMA Pipeline FlowMayo, IHC LAB to CEMSlide29

Normalization Anatomy

Lab Annotators

HL7

Segment Parser

Date-Time

To

ISO Format

Syntactic Integrity

LOINC lookups

IHC codes to LOINC table

Mayo codes to LOINC table

LexGrid

/CTS2

Terminology ServicesSlide30

Architectural Opportunities

Mirth

CAS

To XML

Mirth

HL7 2.x

HL7 2.x

CDA

CDA

CEM format

CEM format

CEM format

HL7 2.x

Mayo

CDA

CDA

CEM format

CEM format

Time,

Syntax

Etc.

SemanticSlide31

Tactical Next Step Enhancements

Single CEM for multiple OBX segments

Efficiently utilize terminology services

Incorporate a library for HL7 clean-up routinesIncrease scope of vocabulary standardizationEnhancements for the Drug Annotator Context enhancement issueDrug name surprisesSlide32

Additional Vocabularies

Review sources used for normalization opportunities E.g.

In HL7 OBR Segments

Standardize Service ID (Codes)In HL7 OBX Segments Standardize UnitsStandardize Reference RangesStandardize Normal FlagsSlide33

Drug Name Disambiguity

Real patient data, presented a unique case in drug names. “ToDAY” is brand name for: cephapirin sodium.

This presents an interesting named entity disambiguation use case.Slide34

Where Persistence Fits In…

10

IHC

(Backend CDR

Systems)

Mirth

Connect

IHC

NwHIN

Aurion

Gateway

SHARP

NwHIN

Aurion

Gateway

Mirth

Connect

UIMA

Pipeline

CEM

Instance

Database

1

2

3

4

5

6

7

8

9

6a

Mayo

EDT SystemSlide35

Persistence Channels

One Channel per model

Data stored as an XML Instance of the model

Fields extracted from XML to use as indicesXML Schema defined for each modelStored using database transactionsCEM ModelMirth ChannelAdministrative DiagnosisCemAdminDxToDatabaseStandard Lab Panel

CemLabToDatabaseAmbulatory Medication OrderCemMedicationToDatabaseSlide36

General Channel Design

Input Message Directory

Channel

CEM XML Instance

Processed Message Directory

Error Message Directory

Persistence Store

Connector

ConnectorSlide37

SharpDB a CEM Instance DatabaseSlide38

Database Tables

Table

Purpose

DemographicsPatient demographics(One row per patient)PatientCrossReferenceAssociates internal Patient ID with Site Patient ID(One row per cross map)SourceDataInformation about the original source data(One row per instance message)PatientDataCEM Instance XML with some source information(One row per instance message)

IndexDataIndices into the XML instance.(Multiple rows per instance message AdminDx – One per message Lab – One per observation Medication – One per orderable item.)Slide39

Patient Demographics

Each message contains patient demographics

Demographics created on first received message based on site patient ID

Internal Patient ID is created and cross mapped to site patient IDSharpDB is keyed off internally generated Patient IDSlide40

Running in a Cloud…

Various images were installed:

NwHIN Gateway provided by Aurion

MIRTH Connect our interface engineUIMA Pipelines of various sortsMySQL database for persistenceJBOSS / Drools rules engineAll open source, running in a Ubuntu Cloud!Slide41

Node Controller

Cloud Controller

Walrus Controller

Cloud Server

Node Server 1

SHARP Hardware Infrastructure

Admin

Client

Interface

VPN/

LAN

Node Controller

VM

VM

To Manage Cloud

VM

User

VPN/

LAN

To Connect To Instances

Persistence

Storage

Node Server 3

Node Controller

VM

VM

VM

Node Server 2Node Controller

VMVMVMNode Server 11

VMVMVMHardware

No. of Physical

MachinesCPU

Memory

Disk

Disk Space

Networking

Functionality

No. of NICs

Cloud Server

1

8

12 GB

10000 RPM SAS

1 TB

1 Gbps

Cloud, Walrus, Cluster and Storage Controller

4

Node Server

1

8

32 GB

10000 RPM SAS

1 TB

1 Gbps

Node Controller

4

Node Server

8

24

128 GB

10000 RPM SAS

600 GB/600 GB

1 Gbps

Node Controller

4

Node Server

1

8

64 GB

7200 RPM SATA

1 TB/1 TB

1 Gbps

Node Controller

4

Node Server

1

8

32 GB

10000 RPM SAS

4 TB

1 Gbps

Node Controller

4

Build/Backup Server

1

2

8 GB

7200 RPM SATA

2 TB

1

Gbps

Build and Backup

2

Storage

2

10000 RPM SAS

7.5 TB

1

Gbps

Persistence and Image Storage

Storage

2

10000 RPM SAS

3.6 TB

1 Gbps

Volume Storage

Cisco 48 Port Switch

2

1 GB

Image

Storage

Private Switch

Build/Backup

Server

Cluster Controller

Storage ControllerSlide42

Data Normalization Summary

Initial “tracer shot” at Data Normalization

Cloud based processing using open source tools

Proof on concept, UIMA for Data NormalizationMove on to new problems / solutions…Opportunities exist:Add new annotators (modules) to the pipelinesWiden usage and scope of vocabulary servicesSwitch to real live flows and add HOSS clean up routines.Various tweaks in NLP algorithms