/
Discovering De Facto Diagnosis Specialties Discovering De Facto Diagnosis Specialties

Discovering De Facto Diagnosis Specialties - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
404 views
Uploaded On 2017-09-11

Discovering De Facto Diagnosis Specialties - PPT Presentation

ACM BCB 2015 Xun Lu 1 Aston Zhang 1 Carl A Gunter 1 Daniel Fabbri 2 David Liebovitz 3 Bradley Malin 2 1 University of Illinois at Urbana Champaign 2 Vanderbilt University ID: 587036

specialties diagnosis specialty facto diagnosis specialties facto specialty data taxonomy codes diagnoses discovered lda learning npi users user medical cancer breast evaluation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Discovering De Facto Diagnosis Specialti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Discovering De Facto Diagnosis Specialties

ACM BCB 2015

Xun Lu1*, Aston Zhang1*, Carl A. Gunter1, Daniel Fabbri2, David Liebovitz3, Bradley Malin21University of Illinois at Urbana-Champaign, 2Vanderbilt University, 3Northwestern UniversityPresented by Aston Zhang | Sep 11, 2015*Equal contributorsSlide2

Medical specialties provide information about which providers have the skills needed to carry out key procedures or make critical judgmentsHowever, organizing specialties into departments or wards has limitations

Some specialties may be lacking or inaccurate (they are not always entered for new hire documents)

Employees can change rolesEncoded departments do not always align with specialtiesAs a result, there could be a gap between diagnosis histories of certain providers and their specialtiesMedical specialties are useful, but could be inconsistent with diagnosis histories2Slide3

Providers select from Health Care Provider Taxonomy Code Set

(HPTCS) when they apply for their

National Provider Identifiers (NPI)However, providers may not always choose their taxonomy codes based on the certifications they holdNational Plan & Provider Enumeration System does not verify the selected taxonomy codeCertain taxonomy codes do not correspond to any nationwide certifications that are approved by a professional board (e.g., Men and Masculinity)Some national certifications are not reflected by the taxonomy code listNational Provider Identifiers (NPI) are not always accurate3Slide4

As we have seen, there are limitations in purely relying on NPI taxonomy codesHence, we propose to leverage real-world diagnosis histories to infer and recognize actual specialties

De facto

specialties are medical specialties that exist in practice regardless of the specialty codes (NPI taxonomy codes)De facto diagnosis specialties are medical specialties that exist in practice and are highly predictable by the diagnoses inherent in the EHRsWe leverage diagnosis histories to infer and recognize actual specialties (de facto specialties)4Slide5

Urology is an example of diagnosis specialty as opposed to anesthesiologyIt should be

easier to characterize a urologist in terms of medical diagnoses for conditions, for example, of the kidney, ureter, and bladder

It should be harder to characterize an anesthesiologist, whose duties are more cross-cutting with respect to diagnoses, concerning essentially all conditions related to surgeriesDe facto diagnosis specialties are highly predictable by the diagnoses inherent in EHRs5Slide6

There is no ground truth to determine the validity of a discovered de facto

diagnosis specialtyA discovered de facto diagnosis specialty can be recognized by classifiers as accurately as the existing listed diagnosis specialties

We discover de facto diagnosis specialties that do not have corresponding codes in HPTCS6Slide7

The users (providers) can be likened to readers of documents, where there is an archive of documents in which the words in each document correspond to diagnoses

Users with specialties are groups of readers who have a common de facto

diagnosis specialty and interest in the same groupTo solve the de facto diagnosis specialty discovery problem we aim to develop a classifier that characterizes this common interest in terms of the documents that they have readWe can think of users as readers of documents whose words are diagnoses7Slide8

We use access log data from a hospital and combine it with the diagnosis lists in patient discharge recordsFine-grained data set

A small portion of the data has an explicit mapping between users and diagnoses of the EHRs they accessedGeneral data set (more representative of the challenging scenarios encountered in practice)

The entire data after removal of all the fine-grained mapping informationWe study two data sets from Northwestern Memorial Hospital8Slide9

A few taxonomy codes account for the majority of specialists in the data sets

9Slide10

The ICD-9 codes for diagnoses are mapped down to 603 Clinical Classification Software (CCS) codesNPI taxonomy codes with fewer than 20 user instances are filtered out

Based on the guidance of clinicians and hospital administrators, we further identify 12 NPI taxonomy codes as diagnosis specialties (core NPI taxonomy codes)

Obstetrics & Gynecology, Cardiovascular Disease, Neurology, Ophthalmology, Gastroenterology, Dermatology, Orthopaedic Surgery, Neonatal-Perinatal Medicine, Infectious Disease, Pulmonary Disease, Neurological Surgery, and Urology. We identify 12 NPI taxonomy codes as diagnosis specialties10Slide11

We invoke machine learning to

discover potential de facto diagnosis specialties in the data set that lack corresponding codes in the HPTCSA semi-supervised learning model (

PathSelClus) for fine-grained data setAn unsupervised learning model (LDA) for larger general data setWe use supervised learning models to evaluate the recognition accuracy of the discovered specialty by comparing our approach with the existing listed diagnosis specialties (12 core NPI taxonomy codes)Ideally, their recognition accuracy should be similarSuch recognition accuracy is evaluated by four classifiers: decision trees, random forests, PCA-KNN, and SVMWe solve the problem under a general discovery-evaluation framework11Slide12

A heterogeneous information network consists of multiple types of objects and/or multiple types

of links. Link-based clustering in heterogeneous information networks groups objects based on their connections to other objects in the networks

Two meta-paths in our model:User (access) -> Patient (accessed by) -> UserUser (access) -> Patient (diagnosed with) -> Diagnosis (assigned to) -> Patient (accessed by) -> UserPathSelClus is used for discovery in the fine-grained data set12Slide13

PathSelClus provides semi-supervised learning on heterogeneous information networks

13Slide14

In practice, fine-grained data sets may not be available for PathSelClus

. Hence, we also employ LDA, an unsupervised learning method based on topic modelingIn LDA, topics act as summaries of different themes pervasive in the corpus and documents are characterized with respect to these topics

We associate users with diagnoses via the patients they accessLatent Dirichlet Allocation (LDA) is used for discovery in the general data set14Slide15

After applying LDA, each user is assigned to an allocation in the specialty topic simplexA higher frequency in a specialty indicates that the user is more likely to access patients with diagnoses popular in that specialty

We cluster users by the closest specialties because this specialty has the highest proportion in the specialty topic simplex

LDA provides unsupervised clustering of users15Slide16

Four classifiers are used for evaluation

Features: we map each user to a term frequency-inverse document frequency (TF-IDF) weighted diagnosis vectors

TF: number of times that a user has accessed patients with a diagnosisIDF: inverse of the number of users that have accessed patients with a diagnosisClassifiers:Decision treesRandom forestsKNN-PCASVMSlide17

The de facto diagnosis specialty

Breast Cancer is discovered by

PathSelClusIt is represented by the top 10 most accessed diagnoses by all the users that are associated with the Breast Cancer specialtySlide18

De facto diagnosis specialties

Breast Cancer and Obesity are discovered by LDA

They are represented by 10 most probable diagnoses respectively as an output of LDASlide19

The recognition accuracy of the discovered

de facto diagnosis

specialty and the ones listed in HPTCS are similarEvaluation of the Breast Cancer specialty discovered by PathSelClusEvaluation of the Breast Cancer specialty discovered by LDA

Evaluation of the Obesity specialty discovered by LDA

P: Precision, R: Recall, F1: F1 ScoreAll values are in percentage

Boldfaced results indicate significant improvement (5x2 cross-validation & paired t-test with p < 0.05)Slide20

In conclusion, de facto

diagnosis specialties can be discovered systematically

Medical specialties are useful, but are often inconsistent with actual diagnosis historiesEven National Provider Identifiers are not always accurateMachine learning can be leveraged to discover and evaluate de facto diagnosis specialties, such as Breast Cancer and ObesitySemi-supervised and unsupervised learning are used for discoverySupervised learning are used for evaluation