David A Broniatowski Asst Prof EMSE http wwwseasgwuedu broniatowski Public Health Cycle Population Doctors Surveillance Intervention Traditional mechanisms Surveys Clinical visits ID: 667352
Download Presentation The PPT/PDF document "Bringing Together the Social and Technic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Bringing Together the Social and Technical in Big Data Analytics: Why You Can't Predict the Flu from Twitter, and Here's How
David A. BroniatowskiAsst. Prof. EMSEhttp://www.seas.gwu.edu/~broniatowskiSlide2
Public Health Cycle
Population
Doctors
Surveillance
InterventionSlide3
Traditional mechanismsSurveys
Clinical visitsRequires:
Data on the population
This has limited researchSlide4
Twitter
Short messages (140 chars) posted to public internetContent: news, conversation, pointless babbleHuge volume500 million a daySlide5
Why Twitter?
Huge volumes of dataA constant stream of small updatesNothing like waiting in line to buy cigarettes behind a guy in a business suit buying gasoline with ten dollars in dimesI eat pizza too muchI'm at Cvs Pharmacy (117th and kendall, Miami)Slide6
Influenza SurveillanceSlide7
Influenza Surveillance
CDC has nationwide surveillance network with 2700 outpatient centers reportingILI: influenza-like illnessCons:Slow (2 weeks)Varying levels ofgeographicgranularitySlide8
Twitter Surveillance
Twitter influenza surveillance must be1) Accurately track ground truthIdentify infection tweets 2) Effective at both municipal and national levelExpand tweet geolocation and evaluate municipal accuracy
3) Predictive in real time
Deploy previously trained system on this flu seasonSlide9Slide10Slide11
Pipeline Classifiers
Three steps using supervised machine learning+NLPStep 1: Identify health tweetsStep 2: Identify flu relatedStep 3: Awareness vs. infectionSlide12
Twitter Surveillance
Twitter influenza surveillance must be1) Accurately track ground truthIdentify infection tweets 2) Effective at both municipal and national levelExpand tweet geolocation and evaluate municipal accuracy
3) Predictive in real time
Deploy previously trained system on this flu seasonSlide13
Local Effectiveness
Current work focuses on US national flu ratesUseful surveillance needed by region/state/cityHow can Twitter track local trends?Is it accurate?Is there enough data?Only about 1% of Twitter is geocodedSlide14Slide15
Carmen
(Dredze et al., 2013)Over 4000 known locations (countries, states, counties, cities)Geocordinates only: ~1%Expanded locations: ~22%Available in Python and JavaSlide16
Twitter Surveillance
Twitter influenza surveillance must be1) Accurately track ground truthIdentify infection tweets 2) Effective at both municipal and national levelExpand tweet geolocation and evaluate municipal accuracy
3) Predictive in real time
Deploy previously trained system on this flu seasonSlide17
Surveillance Results
Pearson Correlation20092011Keywords
0.97
0.646
Flu Classifier
0.97
0.519
Google Flu Trends
0.97
0.897
Infection
0.972
0.7832Slide18
Google Flu Trends Gets it Wrong?
Lohr, S. (2014). Google flu trends: the limits of big data. New York Times.Slide19
Pearson Correlation:
Keywords: 0.75Infection: 0.93Slide20Slide21
ILI counts:
Infection: 0.88Keywords: 0.72Blind EvaluationSlide22
2013-2014
0.95 CorrelationSlide23Slide24
Most recent data
Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.Slide25
Predicting actual FLU in Baltimore
Broniatowski, D. A., Dredze, M., Paul, M. J., & Dugas, A. (2015). Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health and Surveillance, 1(1), e5.Slide26
Healthtweets.orgSlide27
Healthtweets worldwideSlide28
Some Other Projects
David A. BroniatowskiAsst. Prof. EMSEhttp://www.seas.gwu.edu/~broniatowskiSlide29
29
Big
Data for Group Decision Making:
Extracting Social Networks from FDA Advisory Panel Meeting Transcripts
(Broniatowski & Magee,
2013
American Journal of Therapeutics;
Broniatowski & Magee,
2012
IEEE Signal Processing Magazine;
Broniatowski & Magee, in preparation)Slide30
“Germs are Germs” and “Why Not Take A Risk?”Models and Data for Risky Decision Making in the ED
(Broniatowski, Klein, & Reyna
, in press,
Medical Decision Making
Broniatowski & Reyna,
in preparation)Slide31
Examples:
Phylogenetic trees
General Motors
Problem decomposition
Tree Hierarchy
Layered Hierarchy
Examples:
Levels of abstraction
Law firm organization
Problem abstraction
Grid Networks and Teams
Examples:
Contagion
Markets
Crowdsourcing
Families (teams)
How do We Design Systems to Use Information Flow to our advantage?
We would like to
deepen our intuition
regarding system
architectures
(
Broniatowski
& Moses,
in preparation)Slide32
32Questions?
Big dataInfluenza tracking and coupled contagionGroup decision-makingIndividual decision-makingFormal modelsMedical and engineering applicationsFormal and mathematical models Systems architectureDesign for flexibility
broniatowski@
gwu.edu