acin helamba1inibmcom joshic sumbcedu ABSTRACT In todays world online social media plays a vital role dur ing real world events especially crisis events There are both positive and negative e64256ects of social media coverage of events it can be used ID: 87176
Download Pdf The PPT/PDF document "Faking Sandy Characterizing and Identify..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
FakingSandy:CharacterizingandIdentifyingFakeImagesonTwitterduringHurricaneSandyAditiGupta,HemankLamba,PonnurangamKumaraguru,AnupamJoshiyIndraprasthaInstituteofInformationTechnology,Delhi,IndiaIBMResearchLabs,Delhi,IndiayUniversityofMarylandBaltimoreCounty,Maryland,USA{aditig,pk}@iiitd.ac.in,helamba1@in.ibm.com,joshi@cs.umbc.eduABSTRACTIntoday'sworld,onlinesocialmediaplaysavitalroledur-ingrealworldevents,especiallycrisisevents.Therearebothpositiveandnegativeeectsofsocialmediacoverageofevents,itcanbeusedbyauthoritiesforeectivedisastermanagementorbymaliciousentitiestospreadrumorsandfakenews.Theaimofthispaper,istohighlighttheroleofTwitter,duringHurricaneSandy(2012)tospreadfakeim-agesaboutthedisaster.Weidentied10,350uniquetweetscontainingfakeimagesthatwerecirculatedonTwitter,dur-ingHurricaneSandy.Weperformedacharacterizationanal-ysis,tounderstandthetemporal,socialreputationandin-\ruencepatternsforthespreadoffakeimages.Eightysixpercentoftweetsspreadingthefakeimageswereretweets,henceveryfewwereoriginaltweets.Ourresultsshowedthattopthirtyusersoutof10,215users(0.3%)resultedin90%oftheretweetsoffakeimages;alsonetworklinkssuchasfollowerrelationshipsofTwitter,contributedveryless(only11%)tothespreadofthesefakephotosURLs.Next,weusedclassicationmodels,todistinguishfakeimagesfromrealimagesofHurricaneSandy.BestresultswereobtainedfromDecisionTreeclassier,wegot97%accuracyinpre-dictingfakeimagesfromreal.Also,tweetbasedfeatureswereveryeectiveindistinguishingfakeimagestweetsfromreal,whiletheperformanceofuserbasedfeatureswasverypoor.Ourresults,showedthat,automatedtechniquescanbeusedinidentifyingrealimagesfromfakeimagespostedonTwitter.CategoriesandSubjectDescriptorsH.4[InformationSystemsApplications]:Miscellaneous;D.2.8[SoftwareEngineering]:Metrics|complexitymea-sures,performancemeasuresKeywordsOnlinesocialmedia,Twitter,crisis,fakepictures1.INTRODUCTIONOverthepastfewyearstherehasbeenincreaseintheus-ageofOnlineSocialMedia(OSM)servicesasamediumforpeopletoshare,coordinateandspreadinformationabouteventswhiletheyaregoingon.ThoughalargevolumeofCopyrightisheldbytheInternationalWorldWideWebConferenceCommittee(IW3C2).IW3C2reservestherighttoprovideahyperlinktotheauthor'ssiteiftheMaterialisusedinelectronicmedia.WWW2013Companion,May1317,2013,RiodeJaneiro,Brazil.ACM978-1-4503-2038-2/13/05.contentispostedonOSM,notalloftheinformationisofgoodqualitywithrespecttotheevent,likeitmaybefake,in-correctornoisy.ExtractinggoodqualityinformationisoneofthebiggestchallengesinutilizinginformationfromOSM.Overlastfewyears,peoplehavehighlightedhowOSMcanbeusedtohelpinextractingusefulinformationaboutreallifeevents.But,ontheotherhand,therehavebeenmanyinstanceswhichhavehighlightedthenegativeeectsoncon-tentononlinesocialmediaonreallifeevents.Theinforma-tionsharedandaccessedonsocialmediasuchasTwitter,isinreal-time,theimpactofanymaliciousintendedactivity,likespreadingfakeimagesandrumorsneedstobedetectedandcurbedfromspreadingimmediately.Suchfalseandin-correctinformationcanleadtochaosandpanicamongpeo-pleontheground.Sincedetectingwhetherimagespostedarefakeornot,usingtraditionalimageanalysismethods,canbehighlytimeandresourceconsuming,weexploretheoptionofusingTwitterspecicfeatures,likethecontentofthetweetandtheuserdetails,inidentifyingfakeimagesfromreal.HurricaneSandy:HurricaneSandycausedmassde-structionandturmoilinandaroundUSAfromOctober22ndtoOctober31st,2012.AccordingtoNBCNews,thedeathtollintheU.S.was109,includingatleast40inNewYorkCity.NBCalsoreportedthatdamagesfromHurri-caneSandyexceeded$50billion.OnlinesocialmediasuchasTwitterandFacebookwerewidelyusedbypeopletokeepabreastaboutlatestupdatesofthestorm.1SocialmediawasalsowidelyexploitedbymaliciousentitiesduringSandy,tospreadrumorsandfakepicturesinreal-time.23SuchfakeimagesandnewsbecameextremelyviralonOSMandcausedpanicandchaosamongthepeopleaectedbythehurricane.Hence,itisanidealevent,toanalyzethespreadandimpactoffakeandincorrectinformationonsocialme-dia.Figure1showssomeofthefakeimagesthatwerespreadduringHurricaneSandy,whichwealsofoundinourdataset.Thereisdireneedtobuildautomatedsolutionsthatcanhelppeoplejudgethequalityofinformationappearingon 1http://www.guardian.co.uk/world/us-news-blog/2013/feb/20/mta-conedison-hurricane-sandy-social-media-week2http://news.yahoo.com/10-fake-photos-hurricane-sandy-075500934.html3http://www.guardian.co.uk/news/datablog/2012/nov/06/fake-sandy-pictures-social-media (a) (b) (c)Figure1:SomeofthefakepicturesofHurricaneSandythatweresharedonTwitter.(a)PictureofsharkinNewJersey(b)FakedimageofstormyNewYorkskyline(c)Anotherpictureofsharkinthestreets.OSMinreal-time.Theaimofthisworkistocharacter-izeandidentifythepropagationoffakepicturesonOSM,Twitter.Thesefakeimages,createdpanicandchaosamongthepeople.Theaectofthespreadingsuchfalseinforma-tioncanbemultifoldincaseofcrisissituations.Hence,weanalyzedthepropagationoffakeimagesURLsduringHur-ricaneSandy.Thepowerandimpactofonlinesocialmediainshapingrealworldeventshasbeenwidelystudiedbyre-searchersacrosstheglobe.TothebestofourknowledgethisistherstpapertostudythediusionandspreadoffakepicturesonOSM.Themaincontributionsofthisworkare:Weperformedin-depthcharacterizationoftweetsshar-ingfakeimagesonTwitterduringHurricaneSandy.WefoundthatthetweetscontainingthefakeimagesURLsweremostlyretweets(86%),henceveryfewuserspostedoriginaltweetswithfakeimages.Also,wefoundthatsocialnetworkofauseronTwitterhadlittleim-pactonmakingthesefakeimagesviral,therewasjust11%overlapbetweentheretweetandfollowergraphsoftweetscontainingfakeimages.Weusedclassicationalgorithmstodistinguishbe-tweentweetscontainingfakeandrealimages.Wepri-marilyusedtwokindsoffeatures:userlevelandtweetlevelfeatures.Bestaccuracyof97%wasachievedus-ingdecisiontreeclassier,usingtweetbasedfeatures.Therestofthepaperisorganizedasfollows:Section2,describesthecloselyrelatedworktothispaper.Section3explainsmethodologythatweusedincollectingdata,an-alyzingandclassifyingthetweets.Section4describestheanalysisperformed.Section5summarizestheresultsfromouranalysisandhighlightstheimplicationsofourresults.Thelastsectionpresentsthelimitations,andfutureworkofthepaper.2.RELATEDWORK2.1RoleofOSMduringRealWorldEventsRoleofsocialmediahasbeenanalyzedbycomputersci-entists,psychologistsandsociologistsforimpactinthereal-world.TheOSMhasprogressedfrombeingmerelyamediumtoshareusers'opinions;toaninformationsharinganddis-seminationagent;topropagationandcoordinationofreliefandresponseeorts.Palenetal.presentedapathbreak-ingvisiononhowInternetresources(technologyandcrowdbased)canbeusedforsupportandassistanceduringmassemergenciesanddisasters[20].Theyviewedpeoplecollec-tivelyasanimportantresourcethatcanplayacriticalroleincrisis.Inafollowupworktotheaboveresearchproposal,Palenetal.studiedtworealworldevents,tounderstandandcharacterizethewidescaleinteractiononsocialnet-workingwebsiteswithrespecttotheevents[21].Thetwoeventsconsideredbythemwere:NorthernIllinoisUniver-sity(NIU)shootingsofFebruary14,2008andVirginiaTech(VT)tragedy10monthsearlier.Sakakietal.usedtweetsassocialsensorstodetectearthquakeevents.Theydevel-opedaprobabilisticspatio-temporalmodelforpredictingthecenterandtrajectoryofaneventusingKalmanandparticlelteringtechniques.Basedupontheabovemodels,theycreatedanearthquakereportingapplicationforJapan,whichdetectedtheearthquakeoccurrencesbasedontweetsandsentusersalertemails[23].Sakakietal.inadierentresearchwork,analyzedtweettrendtoextracttheeventsthathappenduringacrisisfromtheTwitterlogofuserac-tivityanalyzedJapanesetweetsonallearthquakesduring2010-2011[24].Someoftheprominentresultsobtainedbythemviastatisticalanalysis,liketweetfrequenciesoffea-turephonesandsmart-phonesweredominantjustaftertheearthquake,althoughthoseofPCswasdominantinless-damagedareas.Cheongetal.performedsocialnetworkanalysisonTwitterdataduringAustralian\roodsof2011toidentifyactiveplayersandtheireectivenessindisseminat-ingcriticalinformation[6].Workhasbendonetoextractsituationalawarenessinfor-mationfromthevastamountofdatapostedonOSMduringreal-worldevents.Viewegetal.analyzedtheTwitterlogsfortheOklahomaGrassres(April2009)andtheRedRiverFloods(MarchandApril2009)forpresenceofsituationalawarenesscontent.Anautomatedframeworktoenhancesituationalawarenessduringemergencysituationswasde-velopedbyViewegetal.Theyextractedgeo-locationandlocation-referencinginformationfromusers'tweets;whichhelpedinincreasingsituationawarenessduringemergencyevents[26].Vermaetal.usednaturallanguagetechniquestobuildanautomatedclassiertodetectmessagesonTwit-terthatmaycontributetosituationalawareness[25].An-othercloselyrelatedworkwasdonebyOhetal.,wheretheyanalyzedTwitterstreamduringthe2008Mumbaiter-roristattacks[19].Theiranalysisshowedhowinformationavailableononlinesocialmediaduringtheattacksaidedtheterroristsintheirdecisionmakingbyincreasingtheirsocialawareness.Corveyetal.analyzedoneoftheimportantas-pectsofapplyingcomputationaltechniquesandalgorithms tosocialmediadatatoobtainusefulinformationforsocialmediacontent,i.e.linguisticandbehavioralannotations[8].Oneimportantconclusionobtainedbythemwasthatdur-ingemergencysituations,usersuseaspecicvocabularytoconveytacticalinformationonTwitter,asindicatedbytheaccuracyachievedusingbag-of-wordsmodelforsituationalawarenesstweetsclassication.Mendozaetal.usedthedatafrom2010earthquakeinChiletoexplorethebehav-iorofTwitterusersforemergencyresponseactivity[15].Theirresultsshowedthatpropagationoftweetsrelatedtorumorsversustruenewsdieredandcouldbeusedtode-velopautomatedclassicationsolutionstoidentifycorrectinformation.Longuevilleetal.analyzedTwitterfeedsdur-ingforestMarseillereeventinFrance.Theyshowedin-formationfromlocationbasedsocialnetworkscanbeusedtoacquirespatialtemporaldatathatcanbeanalyzedtoprovideusefullocalizedinformationabouttheevent[9].AteamatNationalICTAustraliaLtd.(NICTA)hasbeenworkingondevelopingafocusedsearchengineforTwitterandFacebookthatcanbeusedinhumanitariancrisissitua-tion.4Hughesetal.intheirworkcomparedthepropertiesoftweetsandusersduringanemergencytonormalsitua-tions[1].TheyperformedempiricalandstatisticalanalysisontheirdatacollectedduringdisastereventsandshowedanincreaseintheuseofURLsintweetsandadecreasein@-mentionsduringemergencysituations.2.2AssessingQualityofInformationonOSMPresenceofspam,compromisedaccounts,malware,andphishingattacksaremajorconcernswithrespecttothequal-ityofinformationonTwitter.Techniquestolteroutspam/phishingonTwitterhavebeenstudiedandvariouseec-tivesolutionshavebeenproposed.Chhabraetal.high-lightedtheroleofURLshortenerserviceslikebit.ly5inspreadingphishing;theirresultsshowedthatURLshorten-ersareusedfornotonlysavingspacebutalsohidingtheidentityofthephishinglinks[7].InafollowupstudyAg-garwaletal.furtheranalyzedandidentiedfeaturesthatindicatetophishingtweets[2].Usingthem,theydetectedphishingtweetswithanaccuracyof92.52%.Oneofthema-jorcontributionsoftheirwork,wastheChromeExtensiontheydevelopedanddeployedforreal-timephishingdetec-tiononTwitter.Grieretal.characterizedspamspreadonTwitterviaURLs.Theyfoundthat8%of25millionURLspostedonTwitterpointtophishing,malware,andscamslistedonpopularblacklists[12].Ghoshetal.characterizedsocialfarmingonTwitter,andalsoproposedamethodologytocombatlinkfarming[11].Yangetal.analyzedcommu-nityorecosystemofcybercriminalsandtheirsupportersonTwitter[28].Yardietal.appliedmachinelearningtech-niquestoidentifyspammers[29].Theyusedfeatures(1)searchesforURLs;(2)usernamepatternmatches;and,(3)keyworddetection;andobtained91%accuracy.Benevenutoetal.classiedrealYouTubeusers,asspammers,promot-ers,andlegitimates[3].Theyusedtechniquessuchassu-pervisedmachinelearningalgorithmstodetectpromotersandspammers;theyachievedhigheraccuracyfordetectingpromotors;thealgorithmswerelesseectivefordetectingspammers.Naziretal.providedinsightfulcharacteriza- 4http://leifhanlen.wordpress.com/2011/07/22/crisis-management-using-twitter-and-facebook-for-the-greater-good/5https://bitly.com/tionofphantomprolesforgamingapplicationsonFace-book[17].TheyproposedaclassicationframeworkusingSVMclassierfordetectingphantomprolesofusersfromrealprolesbasedoncertainsocialnetworkrelatedfeatures.Now,wediscusssomeoftheresearchworkdonetoassess,characterize,analyzeandcomputetrustandcredibilityofcontentononlinesocialmedia.Truthy6,wasdevelopedbyRatkiewiczetal.tostudyinformationdiusiononTwitterandcomputeatrustworthinessscoreforapublicstreamofmicro-bloggingupdatesrelatedtoaneventtodetectpoliti-calsmears,astroturng,misinformation,andotherformsofsocialpollution[22].Itworksonreal-timeTwitterdatawiththreemonthsofdatahistory.Castilloetal.showedthatau-tomatedclassicationtechniquescanbeusedtodetectnewstopicsfromconversationaltopicsandassessedtheircredibil-itybasedonvariousTwitterfeatures[5].Theyachievedaprecisionandrecallof70-80%usingJ48decisiontreeclassi-cationalgorithms.Theyevaluatedtheirresultswithrespecttodataannotatedbyhumansasgroundtruth.Caninietal.analyzedusageofautomatedrankingstrategiestomeasurecredibilityofsourcesofinformationonTwitterforanygiventopic[4].Theauthorsdeneacredibleinformationsourceasonewhichhastrustanddomainexpertiseassociatedwithit.Guptaetal.intheirworkonanalyzingtweetspostedduringtheterroristbombblastsinMumbai(India,2011),showedthatmajorityofsourcesofinformationareunknownandwithlowTwitterreputation(lessnumberoffollowers)[14].Thishighlightsthedicultyinmeasuringcredibilityofin-formationandtheneedtodevelopautomatedmechanismstoassesscredibilityofinformationonTwitter.Theauthorsinafollowupstudyappliedmachinelearningalgorithms(SVMRank)andinformationretrievaltechniques(relevancefeedback)toassesscredibilityofcontentonTwitter[13].Theyanalyzedfourteenhighimpacteventsof2011;theirresultsshowedthatonaverage30%oftotaltweetspostedaboutaneventcontainedsituationalinformationabouttheeventwhile14%wasspam.Only17%ofthetotaltweetspostedabouttheeventcontainedsituationalawarenessin-formationthatwascredible.Another,verysimilarworktotheabovewasdonebyXiaetal.ontweetsgenerateddur-ingtheEnglandriotsof2011[27].TheyusedasupervisedmethodofBayesianNetworkisusedtopredictthecredibil-ityoftweetsinemergencysituations.Donovanetalfocussedtheirworkonndingindicatorsofcredibilityduringdierentsituations(8separateeventtweets)wereconsidered.TheirresultsshowedthatthebestindicatorsofcredibilitywereURLs,mentions,retweetsandtweetlength[18].Adierentmethodology,thantheabovepaperswasfollowedbyMorrisetal.,whoconductedasurveytounderstanduserspercep-tionsregardingcredibilityofcontentonTwitter[16].Theyaskedabout200participantstomarkwhattheyconsiderareindicatorsofcredibilityofcontentandusersonTwit-ter.Theyfoundthattheprominentfeaturesbasedonwhichusersjudgecredibilityarefeaturesvisibleataglance,forex-ample,usernameandpictureofauser.Anotherapproachtodetectuserswithhighvalueusersofcredibilityandtrust-worthinesswastakenbyGhoshetal.,theyidentiedthetopicbasedexpertsonTwitter[10].TheirtechniquesrelyonthewisdomoftheTwittercrowds-i.e.theyusedtheTwitterListsfeaturetoidentifyexpertsinvarioustopics. 6http://truthy.indiana.edu/ 3.METHODOLOGYInthissection,wediscussourresearchmethodologyindetail.FirstwedescribethemethodologyofcollectingdatafromTwitter,followedbythevariousanalyticaltechniquesappliedinthispaper.3.1DataFordatacollectionfromTwitterwehavea247setup,whichhasbeenfunctionalforaboutlast20months.WecollecteddatafromTwitterusingtheStreamingAPI.7ThisAPIenablesresearcherstoextracttweetsinreal-time,basedoncertainqueryparameterslikewordsinthetweet,timeofpostingoftweet,etc.WequeriedtheTwitterTrendsAPIaftereveryhourforthecurrenttrendingtopics,8andcollecttweetscorrespondingtothesetopicsasquerysearchwordsfortheStreamingAPI.HurricaneSandy'simpactlastedfromOct.20thtoNov.1st,2012,hencefromallthetweetscollectedduringthisperiod,welteredouttweetscontainingthewords`sandy'and`hurricane'.Welteredoutabout1.8milliontweetsby1.2millionuniqueusersonHurricaneSandyfromOct.20thtoNov.1st,2012.Table1givesthedescriptivestatisticsofthetweetsandusersdatacollectedtotheevent,andFigure2showsthespatialdistributionofthesetweets(about19Ktweetshadgeo-locationembeddedinthem).Table1:DescriptivestatisticsoftheTwitterdatasetforHurricaneSandy. Totaltweets 1,782,526 Totaluniqueusers 1,174,266 TweetswithURLs 622,860 Usingcertainonlineresources(articles,tweetsandblogs)wewereabletoidentifycertainURLsthatbelongedtofakepicturesofHurricaneSandy.OneoftheprominentdatasourcesusedbyuswasthelistoffakeandrealimagesmadepublicbytheGuardiannewsmediacompany.9ThelistprovidedbyGaurdian,classiedthetopimageURLssharedduringthehurricaneasfakeorrealimageURLs,whichweusedtoformourdataset.ThereweremanyotherarticlesandblogsthatcoveredtherealandfakeimagesthatwerespreadonTwitter.101112Table2describesthestatisticsfordatarelatedtotweetscontainingfakeandrealimageURLs.WeidentiedeightuniquefakeimagesofSandythatwerespreadonTwitterinourdataset,wecollectedabout10KtweetsfortheseURLs.3.2CharacterizationAnalysisWeperformedcharacterizationofthetweetscontainingfakeimagesURLsandtheirpropagation,tounderstandhowtheybecameviral.Firstweperformedtemporalanalysisonthefakeimagestweets.Weanalyzedhowmanysuchtweets 7https://dev.twitter.com/docs/streaming-api.8https://dev.twitter.com/docs/api/1/get/trends9http://www.guardian.co.uk/news/datablog/2012/nov/06/fake-sandy-pictures-social-media10http://now.msn.com/hurricane-sandy-fake-photos11http://mashable.com/2012/10/29/fake-hurricane-sandy-photos/12http://theweek.com/article/index/235578/10-fake-photos-of-hurricane-sandy Figure2:SpatialdistributionoftotaltweetsonHurricaneSandy.Herewehaveplottedabout19Ktweets,whichhadembeddedgeo-locationdatainthem.Table2:DescriptivestatisticsofthetweetswithfakeandrealimagesURLs. Tweetswithfakeimages 10,350 Userswithfakeimages 10,215 Tweetswithrealimages 5,767 Userswithrealimages 5,678 weresharedperhouronTwitter.Also,weanalyzedthesud-denpeaks(fromx1hourtox1+1)inthegraphmoreclosely.Weconstructedtheretweetgraphforthesuddenpeakinthetemporalanalysis,tondoutwhatchangesinthenetworktopologyleadtotheviralspreadoftheseimages.Weob-tainedcertainusefulinsights,aboutthenatureandspreadoffakeimageURLsonTwitter,whicharesummarizedinthenextsectionNext,weanalyzedwhatrolethesocialnetworkgraphofauseronTwitterplaysinpropagationoffakeURLs.TheexplicitsocialnetworkofauseronTwitter,isthatofhisfollowergraph.Wewantedtoanalyzewhatpercentageofinformationdiusiontakesplaceviathisfollowernetworkgraphofauser.Thedetailsofthealgorithmusedtocom-putearesummarizedinAlgorithm1. Algorithm1Compute Overlap 1:Create Graph Retweets()2:Create Graph Followers()3:foreachedgeintheretweetnetworkdo4:num retweet edges++5:Insertedgeintohashmap,H[1::n]6:endfor7:foreachedgeinthefollowernetworkdo8:Inserteachedgeinhashmap,H[1::n]9:ifcollisionthen10:intersections++11:endif12:endfor13:%overlap=(intersections=num retweet edges)100 Inthefunction,Create Graph Followers,wecrawledthefollowernetworkofalltheuniqueusersthathadtweetedthefakeimages,usingtheRESTAPIofTwitter.Thenetworkcreatedhad10,779,122edgesand10,215nodes.InCre-ate Graph Retweets,wecreatedaretweetnetwork,wherean edgebetweentwonodesexistsifoneuserhadretweetedtheother'stweet.Ahashmap,H[1::n],iscreatedtocomputetheoverlapbetweenthefollowerandretweetsgraphs.3.3ClassicationAnalysisWeanalyzedtheeectivenessofmachinelearningalgo-rithmsindetectingtweetscontainingfakeimageURLsver-sustweetscontainingrealimagesofSandy.Weperformedtwo-classclassicationusingNaiveBayesandJ48DecisionTreeclassiers.Wehadadatasetof10,350tweetscon-tainingfakeimageURLsand5,767tweetscontainingrealimagesURLs.Toavoidanybias,duetounequalsizeofanyoftheclasses,werandomlyselected5,767tweetsfromthefakeimagestweets,andthenappliedclassication.Weusedtwokindsoffeatures,fortheclassicationalgo-rithm.Table3summarizesthefeaturescomputedbyusforeachtweetandtheuserofthetweet.Sourceoruserlevelfeatures[F1]:Theattributesoftheuserwhopostedthetweet.Weconsiderprop-ertiessuchasnumberoffriends,followersandstatusmessagesoftheuseraspartofthisset.Contentortweetlevelfeatures[F2]:The140characterspostedbyuserscontaindata(e.g.words,URLs,hashtags)andmeta-data(e.g.istweetareplyoraretweet)relatedtoit. UserFeatures[F1] NumberofFriends NumberofFollowers Follower-FriendRatio Numberoftimeslisted UserhasaURL Userisaverieduser Ageofuseraccount TweetFeatures[F2] LengthofTweet NumberofWords ContainsQuestionMark? ContainsExclamationMark? NumberofQuestionMarks NumberofExclamationMarks ContainsHappyEmoticon ContainsSadEmoticon ContainsFirstOrderPronoun ContainsSecondOrderPronoun ContainsThirdOrderPronoun Numberofuppercasecharacters Numberofnegativesentimentwords Numberofpositivesentimentwords Numberofmentions Numberofhashtags NumberofURLs Retweetcount Table3:Userandtweetbasedfeaturesusedforclas-sicationoffakeandrealimagesofSandy.4.RESULTSInthissection,wesummarizetheresultsobtainedforthecharacterizationandclassicationanalysisperformed.4.1CharacterizationResultsWefoundthatoutofthe10,350tweetsidentiedbyus,containingfakeimagesURLs,about86%wereretweets.Thatis,onlyabout14%peoplepostedoriginaltweetedthatcontainedsuchURLs.Fromthetemporalanalysis,weplot-tedtheperhourtweetingactivityofthefakeimagesURLs.FromFigure3weseethatthefakeURLsspreadspikesat,12hoursaftertheintroductionoftheURLsintheTwit-ternetwork.WenowanalyzethespreadofthesepictureURLsonehourbeforeandafterthespike.WeconstructthereplyandretweetgraphforthetweetssharingthesefakepictureURLsonOctober29th,at21hoursand22hours,asshowninFigure5.Weseethatthereareonlyafewuserswithveryhighdegree,thatis,onlyafewusersresultsinmajorityoftheretweets.Weconrmedthisstatistically,Figure4(CDF)showsthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimages.Combiningresultsfromboththegraphs,weconcludethatthoughthefakeURLswerepresentintheTwitternetworkforalmost12hoursbeforetheybecameviral,alsothesuddenspikeintheirpropagationviaretweetshappenedonlybecauseofafewusers. Figure3:Detailsofdatacollectedforthefakeim-agesURLsharing.Temporaldistributionoftweets,hourwise,startingfromthersthourthatafakeimagetweetwasposted.Next,wedeterminetheroleofTwitternetworkgraphontheretweetspropagationofthefakeimagetweets.WerantheCompute overlapalgorithmdiscussedabove.Wefoundthenumberofoverlappingedgesas1,215,whichleadstoapercentageoverlapof11%betweentheretweetandfol-lowergraphs.Table4summarizestheresultsoftheCom-pute overlapalgorithm.Thisindicatesthattherewasaverylimitedretweetactivitywhichoriginatedbecauseofthepeo-pleinauser'sfollowergraph.Hence,incasesofcrisis,peopleoftenretweetandpropagatetweetsthattheyndinTwittersearchortrendingtopics,irrespectiveofwhethertheyfollowtheuserornot.4.2ClassicationResultsIntheabovesection,wecharacterizedthepropertiesandbehaviorassociatedwithspreadoffalseinformation,informoffakeimages,onTwitter.Thenextimportantstepisto (a)Allusers (b)Top30usersFigure4:CDFofretweetsofthefakeimagetweetsbytheusers.Itshowsthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimages Totaledgesintheretweetnetwork 10,508 Totaledgesinthefollower-followeenetwork 10,799,122 Totaledgesthatexistinbothretweetnetworkandthefollower-followeenetwork 1,215 %ageoverlap 11% Table4:ResultsoftheAlgorithmCompute overlap.Wefoundonly11%overlapbetweenthefollowerandretweetgraphsforthetweetscontainingfakeimages.explorefeaturesandalgorithmsthatcaneectivelyhelpusisidentifyingthefakecontentinreal-time.Weperformed10-foldcrossvalidationwhileapplyingclassicationmodels.Weappliedtwostandardalgorithmsusedforclassication:NaiveBayesandDecisionTree(J48).Asdescribedbefore,wetook5,767tweetsforbothfakeandrealimagecontain-ingtweets.Foreachdatapoint,wecreateduserandtweetlevelfeaturevectors.Table5summarizestheresultsfromtheclassicationexperiment.Weachieveagoodaccuracyofabove90%forbothclassiers,thoughDecisionTreeoutper-formstheNaivesBayesclassier.Wecanalsoseethat,userbasedfeatures,provideverypooraccuracyindistinguishingfakeimageURLs,whiletweetbasedfeaturesperfumedverywell.Wewouldalsoliketomentionthathighaccuracyre-sultsobtainedbyus,maybeattributedtothesimilarnatureofmanytweets(sincealotoftweetsareretweetsofothertweetsinourdataset).Wecanconcludethat,contentandpropertyanalysisoftweetscanhelpusinidentifyingrealimageURLsbeingsharedonTwitterwithahighaccuracy. F1 F2 F1+F2 NaiveBayes 56.32% 91.97% 91.52% DecisionTree 53.24% 97.65% 96.65% Table5:Classicationresultsfortweetscontain-ingfakeimageandrealimages.Ourresultsshowedthat,tweetbasedfeaturesaremoreeectiveindis-tinguishingthetwoclasses.5.DISCUSSIONOnlinesocialmediahasthecapabilityofplayingtheroleof,eitheralifesaverorthatofadaemonduringthetimesofcrisis.Inthisresearchwork,wehighlightedoneofthema-liciousintendedusageofTwitterduringareal-worldevent.WeanalyzedtheactivityontheonlinesocialnetworkingwebsiteTwitter,duringHurricaneSandy(2012)thatspreadfakeimages.Weidentied10,350uniquetweetscontainingfakeimagesthatwerecirculatedonTwitter,duringHurri-caneSandy.Weperformedacharacterizationanalysis,tounderstandthetemporal,socialreputationandin\ruencepatternsofthespreadofthesefakeimages.Wefoundthat86%tweetsspreadingthefakeimageswereretweets,henceveryfewwereoriginaltweetsbyusers.Also,ourresultsshowedthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimage.Hence,wecanconcludedthatonlyahandfulofuserscontributedtomajorityofthedamage,viatheretweetingactivityontheTwitter.Weana-lyzedtheroleofTwittersocialgraphinpropagatingthefakeimages.Wecrawledthenetworklinks,thatis,thefollowerrelationshipsoftheusersandappliedouralgorithmtocom-putetheoverlap.Wefoundonlya11%overlapbetweentheretweetandfollowergraphsfortheuserswhotweetedfakeimagesofSandy.Thisresulthighlightsthefactthat,atthetimeofcrisis,usersretweetinformationfromotherusersir-respectiveofthefactwhethertheyfollowthemornot.Next,weusedclassicationmodels,toidentifyfakeimagesfromrealimagesofHurricaneSandy.BestresultswereobtainedfromDecisionTreeclassier,wegot97%accuracyinpre-dictingfakeimagesfromreal.Tweetbasedfeaturesareveryeectiveindistinguishingfakeimagestweetsfromreal,whiletheperformanceofuserbasedfeatureswasverypoor.Ourresearchworkprovidedinsightsintothebehavioralpatternofthespreadoffakeimagetweets.Alsoourresultspro-videdaproofofconceptthat,automatedtechniquescanbeusedinidentifyingrealimagesfromfakeimagespostedonTwitter.6.FUTUREWORKTheworkdonebyus,providesaproofofconceptthatautomatedtechniquescanbeusedtoidentifymaliciousorfakecontentspreadonTwitterduringrealworldevents.Wewouldliketoconductalargerstudywithmoreeventsforidenticationoffakeimagesandnewspropogation.Also,wewouldliketoexpandourstudy,todetectingrumorsandothermaliciouscontentspreadduringrealworldeventsapartfromimages.Asanextstep,wewouldliketodevelopabrowserplug-inthatcandetectfakeimagesbeingsharedonTwitterinreal-time. (a) (b)Figure5:SpreadoffakepicturesURLs(retweetandreplygraph),thenumberonthenodeisuserproleIDonTwitter.Thegureshowsthatthefakeimagesbecameviralveryfast,withinanhourtherewasatremendousgrowthinthenumberofpeopletweetingthem.(a)Oct.29,2100hours(b)Oct.29,2200hours.7.ACKNOWLEDGMENTSWewouldliketothankGovernmentofIndiaforfundingthisproject.WewouldliketoexpressoursincerestthankstoallmembersofPreCogresearchgroupatIIIT,13Delhi,fortheircontinuedsupportandfeedbackontheproject.8.REFERENCES[1]LeysiaPalenAmandaL.Hughes.TwitterAdoptionandUseinMassConvergenceandEmergencyEvents.ISCRAMConference,2009.[2]PonnurangamKumaraguruAnupamaAggarwal,AshwinRajadesingan.Phishari:Automaticrealtimephishingdetectionontwitter.7thIEEEAPWGeCrimeResearchersSummit(eCRS),2012.[3]FabrcioBenevenuto,TiagoRodrigues,VirglioAlmeida,JussaraAlmeida,andMarcosGoncalves.Detectingspammersandcontentpromotersinonlinevideosocialnetworks.InProceedingsofthe32ndinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,SIGIR'09,pages620{627,NewYork,NY,USA,2009.ACM.[4]KevinR.Canini,BongwonSuh,andPeterL.Pirolli.Findingcredibleinformationsourcesinsocialnetworksbasedoncontentandsocialstructure.InSocialCom,2011.[5]CarlosCastillo,MarceloMendoza,andBarbaraPoblete.Informationcredibilityontwitter.InProceedingsofthe20thinternationalconferenceon 13precog.iiitd.edu.inWorldwideweb,WWW'11,pages675{684,NewYork,NY,USA,2011.ACM.[6]FranceCheongandChristopherCheong.Socialmediadatamining:Asocialnetworkanalysisoftweetsduringthe2010-2011australian\roods.InPACIS,2011.[7]SidharthChhabra,AnupamaAggarwal,FabricioBenevenuto,andPonnurangamKumaraguru.Phi.sh/$ocial:thephishinglandscapethroughshorturls.InProceedingsofthe8thAnnualCollaboration,Electronicmessaging,Anti-AbuseandSpamConference,CEAS'11,pages92{101,NewYork,NY,USA,2011.ACM.[8]WilliamJ.Corvey,SudhaVerma,SarahVieweg,MarthaPalmer,andJamesH.Martin.Foundationsofamultilayerannotationframeworkfortwittercommunicationsduringcrisisevents.InProceedingsoftheEightInternationalConferenceonLanguageResourcesandEvaluation(LREC'12),Istanbul,Turkey,may2012.EuropeanLanguageResourcesAssociation(ELRA).[9]BertrandDeLongueville,RobinS.Smith,andGianlucaLuraschi."omg,fromhere,icanseethe\rames!":ausecaseofmininglocationbasedsocialnetworkstoacquirespatio-temporaldataonforestres.InProceedingsofthe2009InternationalWorkshoponLocationBasedSocialNetworks,LBSN'09,pages73{80,NewYork,NY,USA,2009.ACM.[10]SaptarshiGhosh,NaveenSharma,FabricioBenevenuto,NiloyGanguly,andKrishnaGummadi. Cognos:crowdsourcingsearchfortopicexpertsinmicroblogs.InProceedingsofthe35thinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,SIGIR'12,2012.[11]SaptarshiGhosh,BimalViswanath,FarshadKooti,NaveenKumarSharma,GautamKorlam,FabricioBenevenuto,NiloyGanguly,andKrishnaPhaniGummadi.Understandingandcombatinglinkfarminginthetwittersocialnetwork.InProceedingsofthe21stinternationalconferenceonWorldWideWeb,WWW'12,2012.[12]ChrisGrier,KurtThomas,VernPaxson,andMichaelZhang.@spam:theundergroundon140charactersorless.InProceedingsofthe17thACMconferenceonComputerandcommunicationssecurity,CCS'10,pages27{37,NewYork,NY,USA,2010.ACM.[13]AditiGuptaandPonnurangamKumaraguru.Credibilityrankingoftweetsduringhighimpactevents.InProceedingsofthe1stWorkshoponPrivacyandSecurityinOnlineSocialMedia,PSOSM'12,pages2:2{2:8,NewYork,NY,USA,2012.ACM.[14]AditiGuptaandPonnurangamKumaraguru.Twitterexplodeswithactivityinmumbaiblasts!alifelineoranunmonitoreddaemoninthelurking?IIIT,Delhi,Technicalreport,IIITD-TR-2011-005,2011.[15]MarceloMendoza,BarbaraPoblete,andCarlosCastillo.Twitterundercrisis:canwetrustwhatwert?InProceedingsoftheFirstWorkshoponSocialMediaAnalytics,SOMA'10,pages71{79,NewYork,NY,USA,2010.ACM.[16]MeredithRingelMorris,ScottCounts,AstaRoseway,AaronHo,andJuliaSchwarz.Tweetingisbelieving?:understandingmicroblogcredibilityperceptions.InProceedingsoftheACM2012conferenceonComputerSupportedCooperativeWork,CSCW'12,pages441{450,NewYork,NY,USA,2012.ACM.[17]AtifNazir,SaqibRaza,Chen-NeeChuah,andBurkhardSchipper.Ghostbustingfacebook:detectingandcharacterizingphantomprolesinonlinesocialgamingapplications.InProceedingsofthe3rdconferenceonOnlinesocialnetworks,WOSN'10,2010.[18]J.O'Donovan,B.Kang,G.Meyer,T.HZllerer,andS.Adali.Credibilityincontext:Ananalysisoffeaturedistributionsintwitter.ASE/IEEEInternationalConferenceonSocialComputing,SocialCom,2012.[19]OnookOh,ManishAgrawal,andH.RaghavRao.Informationcontrolandterrorism:Trackingthemumbaiterroristattackthroughtwitter.InformationSystemsFrontiers,13(1):33{43,March2011.[20]LeysiaPalen,KennethM.Anderson,GloriaMark,JamesMartin,DouglasSicker,MarthaPalmer,andDirkGrunwald.Avisionfortechnology-mediatedsupportforpublicparticipation&assistanceinmassemergencies&disasters.InProceedingsofthe2010ACM-BCSVisionsofComputerScienceConference,ACM-BCS'10,2010.[21]LeysiaPalenandSarahVieweg.Theemergenceofonlinewidescaleinteractioninunexpectedevents:assistance,alliance&retreat.InProceedingsofthe2008ACMconferenceonComputersupportedcooperativework,CSCW'08,pages117{126,NewYork,NY,USA,2008.ACM.[22]JacobRatkiewicz,MichaelConover,MarkMeiss,BrunoGoncalves,SnehalPatil,AlessandroFlammini,andFilippoMenczer.Truthy:mappingthespreadofastroturfinmicroblogstreams.WWW'11,2011.[23]TakeshiSakaki,MakotoOkazaki,andYutakaMatsuo.Earthquakeshakestwitterusers:real-timeeventdetectionbysocialsensors.InProceedingsofthe19thinternationalconferenceonWorldwideweb,WWW'10,pages851{860,NewYork,NY,USA,2010.ACM.[24]TakeshiSakaki,FujioToriumi,andYutakaMatsuo.Tweettrendanalysisinanemergencysituation.InProceedingsoftheSpecialWorkshoponInternetandDisasters,SWID'11,pages3:1{3:8,NewYork,NY,USA,2011.ACM.[25]SudhaVerma,SarahVieweg,WilliamCorvey,LeysiaPalen,JamesH.Martin,MarthaPalmer,AaronSchram,andKennethMarkAnderson.Naturallanguageprocessingtotherescue?extracting"situationalawareness"tweetsduringmassemergency.InLadaA.Adamic,RicardoA.Baeza-Yates,andScottCounts,editors,ICWSM.TheAAAIPress,2011.[26]SarahVieweg,AmandaL.Hughes,KateStarbird,andLeysiaPalen.Microbloggingduringtwonaturalhazardsevents:whattwittermaycontributetosituationalawareness.InProceedingsofthe28thinternationalconferenceonHumanfactorsincomputingsystems,CHI'10,pages1079{1088,NewYork,NY,USA,2010.ACM.[27]XinXia,XiaohuYang,ChaoWu,ShanpingLi,andLinfengBao.Informationcredibilityontwitterinemergencysituation.InProceedingsofthe2012PacicAsiaconferenceonIntelligenceandSecurityInformatics,PAISI'12,2012.[28]ChaoYang,RobertHarkreader,JialongZhang,SeungwonShin,andGuofeiGu.Analyzingspammers'socialnetworksforfunandprot:acasestudyofcybercriminalecosystemontwitter.InProceedingsofthe21stinternationalconferenceonWorldWideWeb,WWW'12,2012.[29]SaritaYardi,DanielRomero,GrantSchoenebeck,andDanahBoyd.DetectingspaminaTwitternetwork.FirstMonday,15(1),January2010.