Karin Becker Data Mining Integration and Analysis Knowledge Discovery Web and Text Mining Data Science Recommendation Systems Scalability and Performance Reproducibility Ana Lucia Cetertich ID: 805801
Download The PPT/PDF document "Data Mining, Integration and Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Data Mining, Integration and Analysis
Karin Becker
Slide2Data Mining, Integration and Analysis
Knowledge Discovery
Web and Text Mining
Data ScienceRecommendation SystemsScalability and PerformanceReproducibility
Ana Lucia Cetertich BazzanJoao Luiz Dihl CombaKarin BeckerLeandro Krug WivesLucas Mello SchnorrMara AbelRenata De Matos GalanteViviane Pereira Moreira
Reserach Areas
Faculty
Slide3Knowledge Discovery
What do we do?
Slide4Knowledge Discovery
Data Collection
Data Integration
Data Preprocessing
Data Mining
Data Analysis
Slide5Karin Becker
Slide6Extract Knowledge from Social Media
Semantic enrichment framework for event-related tweet identification (Simone Romero)
No assumptions about event properties
Contextual knowledge from semantic web and external documentsImproved mainly recall
Simone Romero, Karin Becker. A framework for event classification in tweets based on hybrid semantic enrichment . Expert Systems with Applications 118: 522-538 (2019)
Slide7Extract Knowledge from Social Media
Identifica
tion of stance in tweets (Marcelo Dias)
No threads of argumentationsUnsupervised and weakly supervised* frameworks (runner-up)Target and stance expression depends on the domain
Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approachfor Classification of Stance in Tweets . Proc. of Web Inteligence, 2016.
Slide8Extract Knowledge from Social Media
Identifica
tion of stance in tweets
Unsupervised frameworkExcelent perfomance on straightfoward targets (Hillary, Clinton)
Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approachfor Classification of Stance in Tweets . Proc. of Web Inteligence, 2016.
Slide9Extracting Knowledge from Social Midia
analyze the
emotions people express about terrorism events
in Twitter using demographics (Jonathas Harb)Automatic emotion classification (4 terrorism events)Tested deep learning with different seeding strategiesDemographic analysis (Face++, Profile Location)
Jonathas Harb, Karin Becker. Emotion Analysis of Reaction to Terrorism on Twitter. Proc. of Workshop on Big Social Data and Urban Computing, 2018.
Slide10Analysis
Q2: Do different terrorism events raise the same emotional reaction?
NO
Gender? Age?
Location?Our hypothesis: it depends on how people relate to the event
Slide11Extracting Knowledge from Social Midia
C
ompare engagement of twitter users in Pink October and Blue November campaigns (Roberto Walter)
5 different countriesDemographic analysis (Face++, Profile Location)Tweet topic categorization
Roberto Walter, Karin Becker. Caracterização e Comparação das Campanhas do Outubro Rosa e Novembro Azul no Twitter. SBBD 2018: 133-144
Slide12Extracting Knowledge from Social Midia
T
opic discovery and drift analysis
Slide13Extracting Knowledge from Social Interaction
Relating conversational topics and toxic behavior effects in a MOBA game
(Joaquim Mesquita)
MOBA Games (LoL)Effects of toxic behavior on other playersBehavioral patterns based on on-line chats
Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10-29 (2018)
Slide14Extracting Knowledge from Social Interaction
Relating conversational topics and toxic behavior effects in a MOBA game
(Joaquim Mesquita)
MOBA Games (LoL)Effects of toxic behaviorBehavioral Patterns based on on-line chats
Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10-29 (2018)
Slide15Extracing Knowledge from Medical Data
Machine translation for biomedical texts, paralel corpus (Felipe Soares)
H
ierarchical classifier for non-invasive colorectal cancer screeningPlasma fluorescence data
Cancer, No findings, Further investigationFelipe Soares, Karin Becker, Michel J. Anzanello:A hierarchical classifier based on human blood plasma fluorescence for non-invasive colorectal cancer screening. Artificial Intelligence in Medicine 82: 1-10 (2017)
Slide16Extracting Knowledge from Medical Data
Relating mental states using social media (Vanessa Borba)
Characterization of mental states (verbal cues, emotions and sentiments, behavioral and social patterns)
Analysis of temporal evolution of mental states (e.g. Ansiety – depression – suicide)Detecting Anomalies in Health Provision Records (Cristiano Sulzbach)
Lack of parameters of “normality”Discovery of groups of dataAnalysis of closeness
Slide17A final word on Software Engineering
Strong background on software engineering
Industry experience
Agile MethodsSentiment analysis on software artifactsSatisfaction of IT users (Sentiment analysis on IT Tickets, Blaz, 2016)Analisis of assertiveness of user stories and development productivity and quality metrics (Guilherme Dias, 2018)
Using gamefication in SCRUM for self-imrpovement (Camilla Schmidt, on-going)
Slide18Renata Galante
galante@
inf.ufrgs.br
Data
Integration
Data
Analysis
Slide19Raul Barth (master)
Passenger
density
and flow analysis and city zones and bus stops
classification
for
public
bus
service
management
Slide20Framework
DMBSM
– Data Mining Framework for Bus Service
ManagementInput: GPS, bus stop and smart card data Extracting as
passengers’ density and flow informationBus stops segmentation based on travel purposesFinding the real bus service demandEnabling decision-making. Based on Lambda Architecture, using Big Data for parallel processing
Slide21Framework – Architecture and Results
Slide22Case of Study - Results
Slide23Ricardo
Slide24Slide25Slide26Slide27Slide28Slide29Slide30Slide31Slide32Marcos
Slide33Drunk Text Identification
Marcos
Grzeça
, Karin Becker, Renata Galante (UFRGS)
Slide34Drunk Text Identification
Detecção de textos escritos por pessoas alcoolizadas
Marcos
Grzeça, Karin Becker, Renata Galante (UFRGS)
Romero & Becker (2019)
Slide35Drunk Text Identification
Slide36Drunk Text Identification