/
Mentionanomalybased Event Detection and Tracking in Twitter Adrien Guille ERIC Lab University Mentionanomalybased Event Detection and Tracking in Twitter Adrien Guille ERIC Lab University

Mentionanomalybased Event Detection and Tracking in Twitter Adrien Guille ERIC Lab University - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
438 views
Uploaded On 2014-11-16

Mentionanomalybased Event Detection and Tracking in Twitter Adrien Guille ERIC Lab University - PPT Presentation

guilleunivlyon2fr ecile Favre ERIC Lab University of Lyon 2 cecilefavreunivlyon2fr Abstract The evergrowing number of people using Twitter makes it a valuable source of timely information However detect ing events in Twitter is a dif64257cult task be ID: 12722

guilleunivlyon2fr ecile Favre ERIC Lab

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Mentionanomalybased Event Detection and ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

i.e.tweets.Usersshare,discussandforwardvariouskindsofinformationÐrangingfrompersonaldailyeventstoimportantandglobaleventrelatedinformationÐinreal-time.Theever-growingnumberofusersaroundtheworldtweet-ing,makesTwitteravaluablesourceoftimelyinformation.Ontheotherhand,itgivesrisetoaninformationoverloadphenomenonanditbecomesincreasinglydifÞculttoidentifyrelevantinformationrelatedtointerestingevents.Thesefactsraisethefollowingquestion:HowcanweuseTwitterforautomatedeventdetectionandtracking?Theanswertothisquestionwouldhelpanalyzewhichevents,ortypesofevents,mostinterestthecrowd.Thisiscriticaltoapplicationsforjournalisticanalysis,playbackofevents,etc.YetthelistofÒtrendsÓdeterminedbyTwitterisnÕtsohelpfulsinceitonlylistskeywordsandprovidesnoinformationaboutthelevelofattentionitreceivesfromthecrowdnortemporalindications.Twitterdeliversacontinuousstreamoftweets,thusallow-ingthestudyofhowtopicsgrowandfadeovertime[1].Inparticular,eventdetectionmethodsfocusondetectingÒburstyÓpatternsÐwhichareintuitivelyassumedtosignalevents[2]Ð NumberoftweetsinthecorpusthatcontainthewordtandatleastonementionNi@tNumberoftweetsthatcontaintheword distinctwordsusedintweets.ComputationofthemagnitudeofimpactThemagnitudeofimpact,Mag,ofaneventassociatedwiththetimeintervalI=[a;b]andthemainwordtisgivenbytheformulabelow.Itcorrespondstothealgebraicareaoftheanomalyfunctionon[a;b].Mag(t,I)=#baanomaly(t,i)di=b$i=aanomaly(t,i)Thealgebraicareaisobtainedbyintegratingthediscreteanomalyfunction,whichinthiscaseboilsdowntoasum.IdentiÞcationofeventsForeachwordt!V@,weidentifytheintervalthatmaximizesthemagnitudeofimpactbysolving (ii)aperiodoftimeand(iii)themagnitudeofitsimpactoverthetweetingbehavioroftheusers,Mag ,suchthat$t"q!S,wq�".Theparameterspand"allowtheusersofMABEDtoadjustthelevelofinformationanddetailtheyrequire.E.GeneratingtheListoftheTopkEventsEachtimeaneventhasbeenprocessedbythesecondcomponent,itispassedtothethirdcomponent.Itisresponsibleforstoringthedescriptionoftheeventswhilemanagingduplicatedevents.Forthat,itusestwographstructures:thetopicgraphandtheredundancygraph.TheÞrstisadirected,weighted,labeledgraphthatstoresthedescriptionsofthedetectedevents.Therepresentationofaneventeinthisgraphisasfollows.Onenoderepresentsthemainwordtandislabeledwiththeinterval intheredundancygraph.Whenthecountofdistincteventsreachesk,theduplicatedeventsaremergedandthelistofthetopkmostimpactfuleventsisreturned.WedescribehowduplicatedeventsareidentiÞedandhowtheyaremergedtogetherhereafter.DetectingduplicatedeventsTheevente1isconsideredtobeaduplicateoftheevente0alreadystoredinthetopicgraphif(i)themainwordst1andt0wouldbemutuallyconnectedand(ii)iftheoverlapcoefÞcientbetweentheperiodsoftimeI1andI0exceedsaÞxedthreshold.TheoverlapcoefÞcientisdeÞnedas|I1#I0|min(I1,I0)andthethresholdisnoted#,#!]0;1].Inthiscase,thedescriptionofe1isstoredasideandarelationisaddedbetweent1andt0intheredundancygraph.MergingduplicatedeventsIdentifyingwhichduplicatedeventsshouldbemergedtogetherisequivalenttoidentifyingtheconnectedcomponentsintheredundancygraph.Thisisdoneinlineartimeusingthealgorithmdescribedin[19].Ineachconnectedcomponent,thereisexactlyonenodethatcor-respondstoaneventstoredinthetopicgraph.ThedeÞnitionofthiseventisupdatedaccordingtotheextrainformationbroughtbyduplicatedevents.Themainwordbecomestheaggregationofthemainwordsofallduplicatedevents.ThewordsdescribingtheupdatedeventarethepwordsamongallTABLEII.CORPUSSTATISTICS.@:PROPORTIONOFTWEETSTHATCONTAINMENTIONS,RT:PROPORTIONOFRETWEETS.CorpusTweetsAuthors@RTCen1,437,12652,4940.540.17Cfr2,086,136150,2090.680.43thewordsdescribingtheduplicatedeventswiththephighestweights.IV.EVALUATION wordlists.AlltimestampsareinGMT.TableIIgivesfurtherdetailsabouteachcorpus.BaselinesforcomparisonWeconsidertworecentmeth-odsfromtheliterature:ET(clustering-based)andTS(term-weighting-based).ETisbasedonthehierarchicalclusteringofbigramsusingcontentandappearancepatternssimilarity[9].TSisanormalizedfrequencymetricforidentifyingn-gramsthatarerelatedtoevents[4].Weapplyittobothbigrams(TS2)andtrigrams(TS3).WealsoconsideravariantofMABED,noted$-MABED,thatignoresthepresenceofmentionsintweets.ThismeansthattheÞrstcomponentdetectseventsandestimatestheirmagnitudeofimpactbasedonthevaluesofNitinsteadof signiÞcant)ratingstoeachevent.TheannotatorsareFrenchgraduatestudentswhoarenÕtinvolvedinthisproject.AneventisconsideredsigniÞcantifitcouldbecoveredintraditionalmedia.Overall,adetectedeventissigniÞcantifithasbeenrated1bybothannotators.Consideringthatbothcorporacovera1-monthtimeperiodandthatannotatingeventsisatimeconsumingtaskfortheannotators,welimittheevaluationtothe40mostimpactfuleventsdetectedbyeachmethod(i.e.k=40)ineachcorpus.Wemeasureprecisionasthefractionofdetectedeventsthatbothannotatorshaverated1,andrecallasthefractionofdistinctsigniÞcanteventsamongallthedetectedevents[8].WealsomeasuretheDERate[8],whichdenotesthepercentageofeventsthatareduplicatesamongallsigniÞcanteventsdetected.B.QuantitativeEvaluationHereafter,wediscusstheperformanceoftheÞveconsid-eredmethods,basedontheratesassignedbytheannotators.Theinter-annotatoragreementmeasuredwithCohenÕsKappais%%0.76,showingastrongagreement.TableIIIreportstheprecision,theF-measuredeÞnedastheharmonicmeanofprecisionandrecall,theDERateandtherunning-time(averagedoverthreeruns)ofeachmethodforbothcorpora.WenoticethatMABEDachievesbetterperformancethan rangingfrom