/
Identifying Breakpoints in Public Opinion Cuneyt Gurcan Akcora Murat Ali Bayir Murat Demirbas Identifying Breakpoints in Public Opinion Cuneyt Gurcan Akcora Murat Ali Bayir Murat Demirbas

Identifying Breakpoints in Public Opinion Cuneyt Gurcan Akcora Murat Ali Bayir Murat Demirbas - PDF document

phoebe-click
phoebe-click . @phoebe-click
Follow
494 views
Uploaded On 2014-12-24

Identifying Breakpoints in Public Opinion Cuneyt Gurcan Akcora Murat Ali Bayir Murat Demirbas - PPT Presentation

Department University at Buffalo SUNY 14260 Buffalo NY USA cgakcora mbayir demirbascsebuffaloedu Hakan Ferhatosmanoglu Computer Science Eng Department The Ohio State University Columbus OH 43210 USA hakancseohiostateedu ABSTRACT While polls are tra ID: 28613

Department University Buffalo

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Identifying Breakpoints in Public Opinio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

IdentifyingBreakpointsinPublicOpinionCuneytGurcanAkcora,MuratAliBayir,MuratDemirbasComputerScience&Eng.DepartmentUniversityatBuffalo,SUNY14260,Buffalo,NY,USA{cgakcora,mbayir,demirbas}@cse.buffalo.eduHakanFerhatosmanogluComputerScience&Eng.DepartmentTheOhioStateUniversityColumbus,OH43210,USAhakan@cse.ohio-state.eduABSTRACTWhilepollsaretraditionallyusedforobservingpublicopin-ion,theyprovideapointsnapshot,notacontinuum.Weconsidertheproblemofidentifyingbreakpointsinpublicopinion,andproposeusingmicro-bloggingsitestocapturetrendsinpublicopinion.Wedevelopmethodstodetectchangesinpublicopinion,and ndeventsthatcausethesechanges.Ourexperimentsshowthattheproposedmethodsareabletodeterminechangesinpublicopinionandextractthema-jornewsabouttheeventse ectively.Wealsodeployanapplicationwhereuserscanviewtheimportantnewsstoriesforacontinuingeventand ndtherelatedarticlesonweb.CategoriesandSubjectDescriptorsH.2.8[InformationSystems]:DatabaseManagement|DatabaseApplications,DataMining;H.3[InformationSys-tems]:InformationStorageandRetrievalGeneralTermsOpinionMining,EmotionCorpus,Microblogging,SentimentAnalysis.1.INTRODUCTIONSince18241,pollshavebeenusedtotakeasnapshotofpublicopinion,buttheycannotreachmanypeoplenorcap-tureopinionsaboutthetopicsthatarenotaskedintheques-tionnaire.Moreover,whileeventsunfoldrapidlyandpublicopinionchangeswiththoseevents,pollscannotaccountforthetemporalchangesinpublicopinion.Withtheadvanceofmicro-bloggingsiteslikeTwitter[7,10],wearenowabletoobserveindividualopinionsandkeepupwiththechangesinthepublicopinion.Whencarefullyaggregatedandclassi- ed,individualopinionscangiveusabetterunderstandingofhowsomeeventsarereceivedbythepublic. 1ConductedinthecontestfortheUnitedStatespresidency.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.1stWorkshoponSocialMediaAnalytics(SOMA'10),July25,2010,Washington,DC,USA.Copyright2010ACM978-1-4503-0217-3...$10.00.Inthispaper,weproposeecientmethodstoidentifyandclassifyopinionsinalargestreamofinformation,andpinpointrelatedeventsthatstimulateuserstoexpresstheiropinions.Inparticular,thecontributionsofthispaperareasfollows:Wedevelopandutilizeanemotioncorpustodetectemotionsintweets.Thismethodenablesexpandingopinionrepresentationfrombinaryoptions(\positiveornegative")tomultipledimensionsbyprovidingmoregranularityinclassi cation.Weproposecombiningsetandvectorspacemodelstoobservethepublicopinionanddetectchangesovertime.Fromtheexperimentalresults,wefoundthatusingthesetwomethodstogethereliminatesfalsepos-itivesandimprovestheaccuracyofour ndings.Wedevelopadynamicscoringfunctiontogiveasyn-opsisofnews(intermsofprominentwords)thatledtobreakpointsinpublicopinion.Wecreateacustomizedeventtrackingapplicationthatcannotifyuserswithout\roodingthemwitheverynewentryabouttheevent.WeshowthatourapplicationismoreuserfriendlythantheGoogleAlert2service.2.RELATEDWORKOpinionMininghasreceivedgreatattentionrecentlyandresearchersstartedtoinvestigatepeople'sopinionaboutcer-taintopicsornews[6].Existingopinionminingmethodsareusuallygroupedun-dertwocategories[8,11]calleddocumentbasedandat-tributebasedapproaches.Theseapproachesarefocusedoncharacterizinguseropinionsaspositiveornegativeoverdo-mainspeci cwebsites[4,13]fordi erentapplications.Asadocumentlevelapproach,Turneyetal.[14]proposeddeterminingpolarityofdocumentsbyusingsemanticori-entationofextractedphrases.Asanexampleofattributebasedapproaches,Zhuangetal.[15]proposedamethodforgroupingmoviereviewsbasedonfrequentopinionterms.Di eringfromthesesupervisedapproaches,weproposeus-inga nergranularityclassi cation(8emotionclasses)foropinions.Toaccountforthetemporalchangesinpublicopinion,arelatedworktoourapproachisproposedbyKuetal.[9] 2http://www.google.com/alerts wheretheauthorsusedthelanguagecharacteristicsofChi-nese.Intemporaldimension,theirmethodcapturesopin-ionsandshowschangesinoverallsentimentaboutcandi-datesinapresidentialelection.3.METHODOLOGYWebeginourdiscussionformethodologyby rstexplain-ingwhatindicatesachangeinpublicopinioninstreamingtweets.Forthispurpose,wenotetwoobservationsonTwit-terdata.Observation1:Ifaneventresultsinachangeofpublicopinion,moretweetscontainemotionwords.Furthermore,emotionpatternoftweetsinthattimeperiodisdi erentfromtheemotionpatternoftheprecedingperiod,butmoresimilartotheemotionpatternoftweetsinthefollowingperiod,i.e,thenewshasanenduringimpressiononpublic.ExampleTweet:(TransgressionclaimsadmittedbyWoods.)TigerWoods-Whatadisappointment.Observation2:Ifanimportantstoryabouttheeventappears,thewordpatternoftweetsisdi erentfromlastperiod.Ontheotherhand,thesamewordpatternrepeatsinthenextperiod,i.e,tweetscontainsimilarwordsinthenextperiodasstillthesametopicisdiscussed.ExampleTweet:(Companiesstartendingsponsor-shipagreements.)AccentureDumpsTigerWoodsFromCorporateHomepage.Followingtheseobservations,weconcludethat,toclaimachangeinthepublicopinion,theemotionpatternandthewordpatternmustchangeaccordingtotheseobservations.Wearelookingfornewsthatarebothmajoreventsandopinionchangers.InSection3.1wediscusshowwe ndemotionandwordpatternsandusementionedobservationstodetectopinionchanges.Wecontinuewith ndingtopicsrelatedtotheeventsinsection3.23.1OpinionDetectionFortheemotionpattern,weuseanemotioncorpusbasedmethod,whileusingsetspacemodelforthewordpattern.EmotionCorpusBasedMethodisbasedonvectorspacemodelforcalculatingdocumentsimilarity.Fortheemotiondetectionintweets,weuseanemotioncorpusthatisbasedon8basicclasses,E=fAnger,Sadness,Love,Fear,Disgust,Shame,Joy,Surpriseg,from[12].Webuilta309wordemotioncorpustopopulatethose8classes.EachclassrepresentsadimensionintheBooleanemotionvectorofatweet.Welookforemotionwordsinatweet,andiffound,setthecorrespondingclassdimensionintheemotionvectorto1,otherwiseitremains0.Tweet:IwasonmainstreetinNorfolkwhenIheardabouttigerwoodsupdatesanditmademefeelangry,on2009-12-11.Emotionvector:(1;0;0;0;0;0;0;0).Forallthetweetsinachosentimeinterval,acentroidofallcorrespondingemotionvectordimensionsiscalculated,andthiscentroidisconsideredadocumentforeachinterval.ForagiventimeintervalTthatcontainsNtweets,letVfv1;v2,:::,vNgbeasetofvectors(withl=8dimen-sionseach)generatedfromthesetweets.Wede necentroidvforperiodTas:v=(kNPk=1v1k N;kNPk=1v2k N:::;kNPk=1vlk N)(1)After ndingcentroidvectorforeachinterval,wede netheopinionsimilaritybetweentwointervalsT1andT2bycalculatingcosinesimilaritybetweentheircentroidvectors:Sim(T1;T2)=v1:v2 jv1jjv2j(2)SetSpaceModelprescribesrepresentingeachintervalbyasingledocumentwhichistheunionofthetweetspostedinthatparticulartimeinterval.AfterremovingthestopwordsandstemmingthetermsusingPorterstemmer3,wecollectalltermsinahashsetforeachinterval.Wede nethesimilaritybetweentwointervalsT1andT2bycalculatingJaccardSimilarity[2]:Sim(T1;T2)=j(Set)T1\(Set)T2j j(Set)T1[(Set)T2j(3)To ndthechanges,neithercorpusbasedmethodnorthesetspacemodelaloneissuitable.Forthecorpusbasedmethod,achangeinthecentroidcanbemisleadingwhentheintervalhasveryfewemotionwordscomparedtoitsneighbors.Forthesetspacemodel,achangeinsimilaritydoesnotbyitselfimplyanopinionchange,becausenotallofthewordsareemotionwords.Inourmethod,we rstan-alyzevectorspacesimilarity.Ifwedetectapossiblechange,wevalidateitbyanalyzingtheJaccardSimilarity.Followingtheobservations1and2,ifbothmethodsdetectthechange,wereportthatpointasabreakpoint.Tnisatimebreak,ifthefollowingsaresatis edinbothcorpusbasedmethodandsetspacemodel:Sim(Tn1;Tn)Sim(Tn2;Tn1)(4)Sim(Tn1;Tn)Sim(Tn;Tn+1)(5)3.2BreakpointRepresentationAfterdetectingthechanges,wesetouttoidentifytheeventsthatcausedthesechanges.Tothisend,welookfortheprominentwordsofanintervaltorepresentthebreak-point.Fortheprominentwordselection,weproposeaTfIdfbaseddynamicscoringfunction.Thealgorithmshouldef-fectively ndrecentlyemergingkeywordstoguideusersintocatchingbreakingnewsandpayspecialattentiontothewordswhichemergeinaperiodandstartappearinginmoreperiodsastimeprogresses.TheStreamingTfIdfAlgorithm.Toidentifytheeventsthatcausedbreakpoints,weneedto ndkeywordsthatrepresentthetopicsoftheseevents.WeproposetheStreamingTfIdfalgorithmforextractingeventrelatedkey-wordsfromaninformationstreamoftweets.DocumentPhase.Forbreakpointrepresentation,thesametimeintervallengthintheopiniondetectionisused,andforeverytimeintervalTn,adocumentDncontainstheunionofstemmedwordsfromalltweetsinthatperiod.ForwordxindocumentDn,TermFrequencyTfx;Dnis 3http://tartarus.org/martin/PorterStemmer/ calculatedas:Tfx;DnCountx;Dn nPk=1Countk;Dn(6)ForthetotalcountofdocumentsuptodocumentDn,InverseDocumentFrequencyofawordxindocumentDn,Idfx;Dniscalculatedas:Idfx;Dn=log(n jf8k;kn:x2Dkgj)(7)Notethat,nisnota xedvalue.Aswemovefromtheoldestdocumenttothenewestdocument,thetotalnumberofdocuments,n,increases.Bythisparameter,the rstappearanceofakeywordwillalwayshaveabiggerIdfvalue,andthefollowingappearancesofthewordwillhavesmallervalues.BasedonthecalculatedIdfx;DnandTfx;Dn,wecalculatetheTfIdfvalueas:TfIdfx;DnTfx;DnIdfx;Dn(8)ProminenceUpdatePhase.ForakeywordxthatrecentlyappearedinDn,wede netheTfx;DoforthewordxindocumentDowhereonas:tfx;Dotfx;DoF(To;Tn)tfx;Dn(9)Here,weapplyadecayfunctionF(o;n)topreventthewordxinthedocumentDntoincreasetheTfvalueofxinatooolddocumentDo.Thisfollowsfromthefactthat,tweetsarehighlytemporal,i.e,neweventstendtoa ectusertweetsonlyforashortperiodoftime.Aswemoveforwardinthetimedomain,akeywordinanewperiodshouldnotincreasetheprominenceofakeywordinawaybackperiod,becauseitishighlyunlikelythatappearanceofakeywordisbecauseofaveryoldevent.Fortheperiodnumbersoandn,wede nethedecayfunc-tionfortwoperiodsToandTnas:F(To;Tn)=1=(no)(10)FortheupdatedTfvaluesofthekeywordxindocumentDo,were-calculatetheTfIdfx;Doas:TfIdfx;DoTfx;DoIdfx;Do(11)WechoosepwordswithhighestTfIdfvaluesfromeachdocument,andcallthemprominentwordsofthatdocument.4.EXPERIMENTALRESULTSInthissection,wepresentexperimentalresultsofourmethodsonTwitter.Weanalyzeddataabouttwotopics,(1)FortHoodshootingsinTexas,USA,November05,2009and(2)TigerWoods,November27,2009caraccident.DuetospacelimitationshereweonlypresenttheTigerWoodsnewsstory.WeusedaTwittersearchengine,Twopular4tocollectdata.Weprocessed258548tweets,andfound23280emotionwordsinthosetweets.Figure1showsthetweetcountofeachday. 4www.twopular.com     Figure1:TweetCountofDays4.1OpinionDetectionThelengthoftimeintervalsisanimportantfactorinouranalysis.Weevaluatedunitlengthsvaryingfrom2hoursto24hours.Intervalsshorterthan12hoursleadtobiasedresults,becausetheycontaintoofewtweetstoformamean-ingfulsample.Ontheotherhand,intervalslongerthan24hoursarenotsuitablefortheproblemdomain(medianewscycle).Wechose12hours,becauseitistheshortestintervaltoprovidemeaningfuldatabesidesenablingustocaptureeventsin negranularity.Inourdatafor20days,wefound8possiblebreaksbyEmotionCorpusMethod(Figure2)f5,10,17,23,25,27,32,36g,and5ofthemf5,10,23,25,27gwerealsocapturedbyJaccardsimilarity(Figure3).Figure2containsblackbarsthatrepresentoutlierintervalswithveryfewtweets.Wetestedour ndingswithatimelineofTigerWoodsrelatedeventsfromCNN,ABCNewsandESPN5.Our3validatedbreakingpointsarerelatedtothefollowingeventsinsuccessiveorder:(5)TransgressionclaimsacceptedbyTigerWoods,(10)morewomenallegedtohavea airswithWoods,(23)GatoradeendsasponsorshipagreementwithWoods,andTwitterusersstartwritingthousandsofjokesaboutWoodswithSantaClaus#hashtagsnearingChrist-mas.Amongthevalidatedbreakpoints,25and27arefalsepositives.4.2BreakpointRepresentationUpondetectingopinionchangesintheTigerWoodscase,wefoundfrequentkeywordsofallperiods,andbyusingtheStreamingTfIdfalgorithm,weextractedtheprominentwordsfromthesekeywords.Whilecreatingdocumentsforeach12hourperiod,weputtopFmostfrequentwordsintotheirrespectivedocu-ments.Duringthisprocess,weusedthePorterStemmertoremovethecommonermorphologicalandin\rexionalendingsofwordsandanalyzedthefrequencydistributiongraphofthewords.Wefound50tobethebestchoicebecauseforvalueslargerthan50,bigclustersofwordswithlowfrequen-ciesappear.Forthenumberofprominentwordsp,weusedp=5.The rstdocumenthastheprominentwords:crash,re-port,\rorida,injur,golfer.Theprominentwordscanmanytimesbeselfexplanatory:accenture,drop,stop, 5http://sports.espn.go.com/golf/news/story?id=4922436    \n  \n  Figure2:EmotionVectorSimilarityoftwosuccessiveintervals \r  \r\r\r\r Figure3:JaccardSimilarityoftwosuccessiveintervalsgolfer,sponsorship.ThisreferstotheAccenture'sdeci-siontodropasponsorshipwithTigerWoods.Thealgorithmcansuccessfullydetectappearancedatesofemergingtopics.Whileprominentwordsofthe11thdocumentwiththetra-ditionalTfIdfdoesnotincludetheword\voicemail",theStreamingTfIdfalgorithmcorrectlyidenti esitasbreakingnewsandaddsittotheprominentwords.Apartfromidentifyingtheprominentwords,thealgo-rithmcorrectlydiscriminatesagainstwordsthatarenotre-latedtotheevents.Inthe11thinterval,theword"Afghanistan"isinthesetofprominentwords.ItisbecauseofthetweetsthatprotestTigerWoodheadlineswhile"Afghanistanwar"getsmoreviolent.Inthefollowingdays,theprominentwordsetofthedocumentisupdatedand"Afghanistan"disappearsfromtheprominentwordset,asitisnotactuallyrelatedtotheevent.Thebreakpointrepresentationmethodidenti esthesig-ni cantperiodsas6,11and24.Notethat,abreakonthe(n)thbarinthesimilaritygraphs(Figures2-3)indicatesanopinionchangebetween(n)thand(n+1)thtimeperiods.Forthesebreakpoints,Table1givesustheprominentwordsfor(n+1)thintervals.RunTimeAnalysisofourmethodsshowalinearchar-acteristicasthetweetcountincreases.Inordertotestscal-ability,weexperimentedwith5000;10000and20000tweetsandfoundtheruntimeofourmethodstobe24224;45985and92867milisecondsonAMDTurionDual-Core2.00GHzprocessor. Period ProminentWords 1 crash,\rorida,injur,golf,accident 6 crash,wife,accident,mistress,golf 11 voicemail,wife,f***,golf,cheat 24 drop,stop,santa,claus,gatorade Table1:ProminentValuesforSigni cantPeriods5.CUSTOMIZEDNEWSTRACKINGWedevelopedanewstrackingapplicationonTwitter.Theresultingapplicationcanbeseenattheprojectwebsite6,anditsscreenshotisgiveninFigure4.Theappli-cationusesaninteractiveJavascriptinterfacethatliststhetweetcountsofeachperiod.Theusercanclickonthepe-riodcolumnstoseetheeventsofatimeperioddependingontheprominentwords.Foreachperiod,wesearchforthearticlesthatarepublishedinthedaterangeoftheperiod.Wearenotstoringthoseweblinksinadatabase,becausethelinkscanberemovedorre-locatedovertime.GoogleAlerto erssuchacustomizedwebservice,anditprovidesasystemwhichnoti esusersbyemailwhenachosenkeywordhasanewentryonweb.WhereasGooglesendsupdatesabouteveryentryonatrackedkeyword,ourapplicationobservesthepublicopiniontoidentifybreakingpointsand ndskeywordsofimportanteventstonotifyusersaboutthem.6.CONCLUSIONSInthispaperwepresentedanecientwaytoobservepub-licopinionontemporaldimension.Ourmethodscaniden- 6http://ubicomp.cse.bu alo.edu/upinion/  Figure4:UpinionApplicationtifybreakpoints,and ndrelatedeventsthatcausedtheseopinionchanges.WetestedourresultswiththetimelineofTigerWoodscaseandshowedtheaccuracyofourresults.Wedevelopedanapplicationthatcanserveuserswithnewspagesdependingonthetimeperiod.Wearecurrentlywork-ingonexpandingtheemotioncorpusforeliminatingoutlierintervalsinouranalysis.Asafuturework,weareplanningtodevelopcustomizedversionofourwebservicethatenableswebuserstotracktheirselectedtopicsonTwitter.WearealsoworkingondistributedimplementationofoursystemoverHadoop7Map/Reduceframework.Map/Reduce[5]allowslargesoft-wareframeworks[1,3]toprocessunlimitedamountofdatainadistributedmanner.ByusingpowerofMap/Reduceparadigm,weareplanningtohandlemillionsoftweetatthesametimebelongingtomultipletopics.7.REFERENCES[1]M.A.Bayir,I.H.Toroslu,A.Cosar,andG.Fidan.Smartminer:anewframeworkformininglargescalewebusagedata.InWWW,pages161{170,2009.[2]M.W.Berry,editor.Surveyoftextmining:clustering,classi cation,andretrieval.Springer,2004.[3]H.Cao,D.Jiang,J.Pei,Q.He,Z.Liao,E.Chen,andH.Li.Context-awarequerysuggestionbyminingclick-throughandsessiondata.InKDD,pages875{883,2008.[4]K.Dave,S.Lawrence,andD.M.Pennock.Miningthepeanutgallery:opinionextractionandsemanticclassi cationofproductreviews.InWWW,pages519{528,2003.[5]J.DeanandS.Ghemawat.Mapreduce:Simpli eddataprocessingonlargeclusters.InOSDI,pages137{150,2004. 7http://hadoop.apache.org/[6]N.DiakopoulosandD.A.Shamma.Characterizingdebateperformanceviaaggregatedtwittersentiment.InConferenceonHumanFactorsinComputingSystems(CHI),April2010.[7]A.Java,X.Song,T.Finin,andB.Tseng.Whywetwitter:understandingmicrobloggingusageandcommunities.InProceedingsofthe9thWebKDDand1stSNA-KDD2007workshoponWebminingandsocialnetworkanalysis,pages56{65.ACM,2007.[8]W.Jin,H.H.Ho,andR.K.Srihari.Opinionminer:anovelmachinelearningsystemforwebopinionminingandextraction.InKDD,pages1195{1204,2009.[9]L.Ku,Y.Liang,andH.Chen.Opinionextraction,summarizationandtrackinginnewsandblogcorpora.InProceedingsofAAAI-2006SpringSymposiumonComputationalApproachestoAnalyzingWeblogs,pages100{107,2006.[10]H.Kwak,C.Lee,H.Park,andS.B.Moon.Whatistwitter,asocialnetworkoranewsmedia?InWWW,pages591{600,2010.[11]B.PangandL.Lee.Opinionminingandsentimentanalysis.FoundationsandTrendsinInformationRetrieval,2(1-2):1{135,2007.[12]W.G.Parrott,editor.Emotionsinsocialpsychology:essentialreadings.PsychologyPress,2001.[13]A.PopescuandO.Etzioni.Extractingproductfeaturesandopinionsfromreviews.InEMNLP-05,2005.[14]P.D.Turney.Thumbsuporthumbsdown?semanticorientationappliedtounsupervisedclassi cationofreviews.InACL,pages417{424,2002.[15]L.Zhuang,F.Jing,X.-Y.Zhu,andL.Zhang.Moviereviewminingandsummarization.InCIKM-06,2006.