/
NLP-Based Parsing of Reports NLP-Based Parsing of Reports

NLP-Based Parsing of Reports - PowerPoint Presentation

susan2
susan2 . @susan2
Follow
27 views
Uploaded On 2024-02-03

NLP-Based Parsing of Reports - PPT Presentation

THE TUH EEG SEIZURE CORPUS M Golmohammadi 1 V Shah 2 S Lopez 2 S Ziyabari 2 S Yang 2 J Camaratta 1 I Obeid 2 and J Picone 2 1 Biosignal Analytics Inc 2 The Neural Engineering Data Consortium Temple University ID: 1044485

eeg seizure data corpus seizure eeg corpus data tuh sessions annotation seizures learning tools detection machine hrs developed reports

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "NLP-Based Parsing of Reports" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. NLP-Based Parsing of ReportsTHE TUH EEG SEIZURE CORPUSM. Golmohammadi1, V. Shah2, S. Lopez2, S. Ziyabari2, S. Yang2, J. Camaratta1, I. Obeid2 and J. Picone21. Biosignal Analytics, Inc.2. The Neural Engineering Data Consortium, Temple University Corpus Statisticswww.nedcdata.orgOutcomes The TUH EEG Seizure Detection Corpus is an ongoing effort that includes:identifying and annotating the remaining sessions with seizures in the TUH EEG Corpus;manually reviewing each annotation by a panel of at least three expert neurologists;collecting marked data from other institutions (e.g., NYU, Duke and Emory).The TUH EEG Seizure Corpus is the world’s largest open-source clinical EEG seizure corpus including more than 500 hrs. of EEGs.A hybrid machine learning system was developed on this data using a combination of hidden Markov models (HMMs) for sequential decoding and deep learning for postprocessing. A deep learning system is also under development.AcknowledgementsResearch reported in this poster was supported by  National Human Genome Research Institute of the NIH under award number 3U01HG008468-02S1 and by the NSF under Grant No. IIP-1622765. The TUH EEG Corpus development was sponsored by DARPA, Temple University’s College of Engineering and Office of Research.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding organizations.Abstract Introduction: Automatic seizure detection can reduce the time to diagnosis and enhance real-time applications such as ICU monitoring. A major goal of this study was to generate a large annotated corpus of seizure events to support the development of machine learning technology.Methods: Using the TUH EEG Corpus, we implemented a semi-automated strategy:EEG reports were parsed using natural language processing techniques to locate sessions most likely to contain seizures.Two seizure detection tools (Persyst and AutoEEG) were used to identify sessions likely to contain seizures.Sessions for which both tools agreed with high confidence were manually annotated by a group of experts based on ACNS guidelines.The data was partitioned by patient into an evaluation and training set.Results: The current dataset includes 50 patients for evaluation and 219 patients for training. Conclusion: The existence of the TUH EEG Seizure Corpus provides a sufficient amount of data for machine learning research.The TUH EEG Data CorpusExisting corpora are not large enough to train complex deep learning models:The CHB-MIT dataset contains only EEG recordings from 22 pediatric subjects.IEEG used only intracranial EEGs and contains EEG recordings from animals and humans.Our publicly available corpus consists of 30,000+ clinical EEG recordings from 16,000+ patients(see www.isip.piconepress.com/projects/tuh_eeg).Corpus development involved pairing, de-identification and annotation of EEG data:Visualization Tools For AnnotationAn annotation tool was developed to increase productivity, accuracy and consistency.Waveform, spectrogram and energy displays are supported in user-customizable displays.Alternate methods for visualizing signals allows more accurate identification of seizure start times.Standard filters commonly found in commercial EEG tools (e.g., notch filters) are supported.Users can scroll forward by time or by selected events.Per-channel and per-epoch labels are supported.Integrated annotation and cohort retrieval tools are being developed in related research projects.A modular object-oriented Python programming environment is used that makes it easy to add views and customize displays.EEG reports were parsed using a natural language processing (NLP) method based on NegEx to most likely sessions with seizures. Algorithm: (1) Pre-process reports to show one sentence per line (2) Remove all punctuation (3) Index medical conditions (4) Index different types of negation. Two types of negation were selected: [POST] and [PREN].Labels for the word “seizure” ([PRES]) and affirmative expressions ([AFFR]).Approximately 25% of the sessions identified contained seizures.http://www.biosignalanalytics.comThe TUH EEG Seizure CorpusEEG ReportsEEG SignalsNegExSeizures were not observed during the recording[PREN]Two seizures were observed as the patient …[AFFR]Seizure AnnotationSeizure eventannotations include:start and stop times;localization of a seizure (e.g., focal,generalized)type of seizure (e.g.,simple partial);nature of the seizure (e.g., convulsive).Non-seizure event annotations include:artifacts which could be confused with seizure-like events such as ventilatory artifacts;non-epileptiform activity that may resemble epileptiform discharges, such as psychomotor variant, mu, breach rhythms and POSTS;abnormal background which could be confused with seizure-like events (e.g. triphasics);interictal and postictal states.There are multiple sessions for each patient record; each expert reviewed the entire record.Our goal is to reach a consensus amongst all annotators, so several iterations are being conducted to reconcile differences.Data Extraction and Annotation ProcessLennox-Gastaut Syndrome Each seizure event was manually annotated by at least three neurologists as well as three members of our annotation team.Types of seizures included in the corpus: Tonic, Tonic-Clonic, Simple-partial, Complex-partial, Myoclonic, and Absence.The corpus includes many EEGs that are difficult to interpret, which is crucial for training robust machine learning technology.For example, the screenshot to the left is from a patient with Lennox-Gastaut syndrome. It is a very hard to identify a seizure event, particularly the onset of the seizure, when there are generalized spikes or sharp waves.The lack of big data resources that can be used to train sophisticated statistical models compounds a major problem in automatic seizure detection.Manual annotation of a large amount of data by a team of certified neurologists is extremely expensive and time consuming.We have developed a team of students trained by an expert to expedite data selection and preliminary annotation.Two commercially-available automatic seizure detection tools were used to find sessions that most likely have seizures.These sessions were annotated by a group of trained students using medical reports and advanced visualization tools.The inter-rater agreement was found to be 0.83 using the kappa statistic. TrainEvalPatients21950Sessions590262Files22701190Seizure (hrs.)5012Non-Seizure (hrs.)300140Total (hrs.)350152