/
Faking Sandy Characterizing and Identifying Fake Image Faking Sandy Characterizing and Identifying Fake Image

Faking Sandy Characterizing and Identifying Fake Image - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
412 views
Uploaded On 2015-06-16

Faking Sandy Characterizing and Identifying Fake Image - PPT Presentation

acin helamba1inibmcom joshic sumbcedu ABSTRACT In todays world online social media plays a vital role dur ing real world events especially crisis events There are both positive and negative e64256ects of social media coverage of events it can be used ID: 87176

acin helamba1inibmcom joshic sumbcedu ABSTRACT

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Faking Sandy Characterizing and Identify..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

FakingSandy:CharacterizingandIdentifyingFakeImagesonTwitterduringHurricaneSandyAditiGupta,HemankLamba,PonnurangamKumaraguru,AnupamJoshiyIndraprasthaInstituteofInformationTechnology,Delhi,IndiaIBMResearchLabs,Delhi,IndiayUniversityofMarylandBaltimoreCounty,Maryland,USA{aditig,pk}@iiitd.ac.in,helamba1@in.ibm.com,joshi@cs.umbc.eduABSTRACTIntoday'sworld,onlinesocialmediaplaysavitalroledur-ingrealworldevents,especiallycrisisevents.Therearebothpositiveandnegativee ectsofsocialmediacoverageofevents,itcanbeusedbyauthoritiesfore ectivedisastermanagementorbymaliciousentitiestospreadrumorsandfakenews.Theaimofthispaper,istohighlighttheroleofTwitter,duringHurricaneSandy(2012)tospreadfakeim-agesaboutthedisaster.Weidenti ed10,350uniquetweetscontainingfakeimagesthatwerecirculatedonTwitter,dur-ingHurricaneSandy.Weperformedacharacterizationanal-ysis,tounderstandthetemporal,socialreputationandin-\ruencepatternsforthespreadoffakeimages.Eightysixpercentoftweetsspreadingthefakeimageswereretweets,henceveryfewwereoriginaltweets.Ourresultsshowedthattopthirtyusersoutof10,215users(0.3%)resultedin90%oftheretweetsoffakeimages;alsonetworklinkssuchasfollowerrelationshipsofTwitter,contributedveryless(only11%)tothespreadofthesefakephotosURLs.Next,weusedclassi cationmodels,todistinguishfakeimagesfromrealimagesofHurricaneSandy.BestresultswereobtainedfromDecisionTreeclassi er,wegot97%accuracyinpre-dictingfakeimagesfromreal.Also,tweetbasedfeatureswereverye ectiveindistinguishingfakeimagestweetsfromreal,whiletheperformanceofuserbasedfeatureswasverypoor.Ourresults,showedthat,automatedtechniquescanbeusedinidentifyingrealimagesfromfakeimagespostedonTwitter.CategoriesandSubjectDescriptorsH.4[InformationSystemsApplications]:Miscellaneous;D.2.8[SoftwareEngineering]:Metrics|complexitymea-sures,performancemeasuresKeywordsOnlinesocialmedia,Twitter,crisis,fakepictures1.INTRODUCTIONOverthepastfewyearstherehasbeenincreaseintheus-ageofOnlineSocialMedia(OSM)servicesasamediumforpeopletoshare,coordinateandspreadinformationabouteventswhiletheyaregoingon.ThoughalargevolumeofCopyrightisheldbytheInternationalWorldWideWebConferenceCommittee(IW3C2).IW3C2reservestherighttoprovideahyperlinktotheauthor'ssiteiftheMaterialisusedinelectronicmedia.WWW2013Companion,May13–17,2013,RiodeJaneiro,Brazil.ACM978-1-4503-2038-2/13/05.contentispostedonOSM,notalloftheinformationisofgoodqualitywithrespecttotheevent,likeitmaybefake,in-correctornoisy.ExtractinggoodqualityinformationisoneofthebiggestchallengesinutilizinginformationfromOSM.Overlastfewyears,peoplehavehighlightedhowOSMcanbeusedtohelpinextractingusefulinformationaboutreallifeevents.But,ontheotherhand,therehavebeenmanyinstanceswhichhavehighlightedthenegativee ectsoncon-tentononlinesocialmediaonreallifeevents.Theinforma-tionsharedandaccessedonsocialmediasuchasTwitter,isinreal-time,theimpactofanymaliciousintendedactivity,likespreadingfakeimagesandrumorsneedstobedetectedandcurbedfromspreadingimmediately.Suchfalseandin-correctinformationcanleadtochaosandpanicamongpeo-pleontheground.Sincedetectingwhetherimagespostedarefakeornot,usingtraditionalimageanalysismethods,canbehighlytimeandresourceconsuming,weexploretheoptionofusingTwitterspeci cfeatures,likethecontentofthetweetandtheuserdetails,inidentifyingfakeimagesfromreal.HurricaneSandy:HurricaneSandycausedmassde-structionandturmoilinandaroundUSAfromOctober22ndtoOctober31st,2012.AccordingtoNBCNews,thedeathtollintheU.S.was109,includingatleast40inNewYorkCity.NBCalsoreportedthatdamagesfromHurri-caneSandyexceeded$50billion.OnlinesocialmediasuchasTwitterandFacebookwerewidelyusedbypeopletokeepabreastaboutlatestupdatesofthestorm.1SocialmediawasalsowidelyexploitedbymaliciousentitiesduringSandy,tospreadrumorsandfakepicturesinreal-time.23SuchfakeimagesandnewsbecameextremelyviralonOSMandcausedpanicandchaosamongthepeoplea ectedbythehurricane.Hence,itisanidealevent,toanalyzethespreadandimpactoffakeandincorrectinformationonsocialme-dia.Figure1showssomeofthefakeimagesthatwerespreadduringHurricaneSandy,whichwealsofoundinourdataset.Thereisdireneedtobuildautomatedsolutionsthatcanhelppeoplejudgethequalityofinformationappearingon 1http://www.guardian.co.uk/world/us-news-blog/2013/feb/20/mta-conedison-hurricane-sandy-social-media-week2http://news.yahoo.com/10-fake-photos-hurricane-sandy-075500934.html3http://www.guardian.co.uk/news/datablog/2012/nov/06/fake-sandy-pictures-social-media (a) (b) (c)Figure1:SomeofthefakepicturesofHurricaneSandythatweresharedonTwitter.(a)PictureofsharkinNewJersey(b)FakedimageofstormyNewYorkskyline(c)Anotherpictureofsharkinthestreets.OSMinreal-time.Theaimofthisworkistocharacter-izeandidentifythepropagationoffakepicturesonOSM,Twitter.Thesefakeimages,createdpanicandchaosamongthepeople.Thea ectofthespreadingsuchfalseinforma-tioncanbemultifoldincaseofcrisissituations.Hence,weanalyzedthepropagationoffakeimagesURLsduringHur-ricaneSandy.Thepowerandimpactofonlinesocialmediainshapingrealworldeventshasbeenwidelystudiedbyre-searchersacrosstheglobe.Tothebestofourknowledgethisisthe rstpapertostudythedi usionandspreadoffakepicturesonOSM.Themaincontributionsofthisworkare:Weperformedin-depthcharacterizationoftweetsshar-ingfakeimagesonTwitterduringHurricaneSandy.WefoundthatthetweetscontainingthefakeimagesURLsweremostlyretweets(86%),henceveryfewuserspostedoriginaltweetswithfakeimages.Also,wefoundthatsocialnetworkofauseronTwitterhadlittleim-pactonmakingthesefakeimagesviral,therewasjust11%overlapbetweentheretweetandfollowergraphsoftweetscontainingfakeimages.Weusedclassi cationalgorithmstodistinguishbe-tweentweetscontainingfakeandrealimages.Wepri-marilyusedtwokindsoffeatures:userlevelandtweetlevelfeatures.Bestaccuracyof97%wasachievedus-ingdecisiontreeclassi er,usingtweetbasedfeatures.Therestofthepaperisorganizedasfollows:Section2,describesthecloselyrelatedworktothispaper.Section3explainsmethodologythatweusedincollectingdata,an-alyzingandclassifyingthetweets.Section4describestheanalysisperformed.Section5summarizestheresultsfromouranalysisandhighlightstheimplicationsofourresults.Thelastsectionpresentsthelimitations,andfutureworkofthepaper.2.RELATEDWORK2.1RoleofOSMduringRealWorldEventsRoleofsocialmediahasbeenanalyzedbycomputersci-entists,psychologistsandsociologistsforimpactinthereal-world.TheOSMhasprogressedfrombeingmerelyamediumtoshareusers'opinions;toaninformationsharinganddis-seminationagent;topropagationandcoordinationofreliefandresponsee orts.Palenetal.presentedapathbreak-ingvisiononhowInternetresources(technologyandcrowdbased)canbeusedforsupportandassistanceduringmassemergenciesanddisasters[20].Theyviewedpeoplecollec-tivelyasanimportantresourcethatcanplayacriticalroleincrisis.Inafollowupworktotheaboveresearchproposal,Palenetal.studiedtworealworldevents,tounderstandandcharacterizethewidescaleinteractiononsocialnet-workingwebsiteswithrespecttotheevents[21].Thetwoeventsconsideredbythemwere:NorthernIllinoisUniver-sity(NIU)shootingsofFebruary14,2008andVirginiaTech(VT)tragedy10monthsearlier.Sakakietal.usedtweetsassocialsensorstodetectearthquakeevents.Theydevel-opedaprobabilisticspatio-temporalmodelforpredictingthecenterandtrajectoryofaneventusingKalmanandparticle lteringtechniques.Basedupontheabovemodels,theycreatedanearthquakereportingapplicationforJapan,whichdetectedtheearthquakeoccurrencesbasedontweetsandsentusersalertemails[23].Sakakietal.inadi erentresearchwork,analyzedtweettrendtoextracttheeventsthathappenduringacrisisfromtheTwitterlogofuserac-tivityanalyzedJapanesetweetsonallearthquakesduring2010-2011[24].Someoftheprominentresultsobtainedbythemviastatisticalanalysis,liketweetfrequenciesoffea-turephonesandsmart-phonesweredominantjustaftertheearthquake,althoughthoseofPCswasdominantinless-damagedareas.Cheongetal.performedsocialnetworkanalysisonTwitterdataduringAustralian\roodsof2011toidentifyactiveplayersandtheire ectivenessindisseminat-ingcriticalinformation[6].Workhasbendonetoextractsituationalawarenessinfor-mationfromthevastamountofdatapostedonOSMduringreal-worldevents.Viewegetal.analyzedtheTwitterlogsfortheOklahomaGrass res(April2009)andtheRedRiverFloods(MarchandApril2009)forpresenceofsituationalawarenesscontent.Anautomatedframeworktoenhancesituationalawarenessduringemergencysituationswasde-velopedbyViewegetal.Theyextractedgeo-locationandlocation-referencinginformationfromusers'tweets;whichhelpedinincreasingsituationawarenessduringemergencyevents[26].Vermaetal.usednaturallanguagetechniquestobuildanautomatedclassi ertodetectmessagesonTwit-terthatmaycontributetosituationalawareness[25].An-othercloselyrelatedworkwasdonebyOhetal.,wheretheyanalyzedTwitterstreamduringthe2008Mumbaiter-roristattacks[19].Theiranalysisshowedhowinformationavailableononlinesocialmediaduringtheattacksaidedtheterroristsintheirdecisionmakingbyincreasingtheirsocialawareness.Corveyetal.analyzedoneoftheimportantas-pectsofapplyingcomputationaltechniquesandalgorithms tosocialmediadatatoobtainusefulinformationforsocialmediacontent,i.e.linguisticandbehavioralannotations[8].Oneimportantconclusionobtainedbythemwasthatdur-ingemergencysituations,usersuseaspeci cvocabularytoconveytacticalinformationonTwitter,asindicatedbytheaccuracyachievedusingbag-of-wordsmodelforsituationalawarenesstweetsclassi cation.Mendozaetal.usedthedatafrom2010earthquakeinChiletoexplorethebehav-iorofTwitterusersforemergencyresponseactivity[15].Theirresultsshowedthatpropagationoftweetsrelatedtorumorsversustruenewsdi eredandcouldbeusedtode-velopautomatedclassi cationsolutionstoidentifycorrectinformation.Longuevilleetal.analyzedTwitterfeedsdur-ingforestMarseille reeventinFrance.Theyshowedin-formationfromlocationbasedsocialnetworkscanbeusedtoacquirespatialtemporaldatathatcanbeanalyzedtoprovideusefullocalizedinformationabouttheevent[9].AteamatNationalICTAustraliaLtd.(NICTA)hasbeenworkingondevelopingafocusedsearchengineforTwitterandFacebookthatcanbeusedinhumanitariancrisissitua-tion.4Hughesetal.intheirworkcomparedthepropertiesoftweetsandusersduringanemergencytonormalsitua-tions[1].TheyperformedempiricalandstatisticalanalysisontheirdatacollectedduringdisastereventsandshowedanincreaseintheuseofURLsintweetsandadecreasein@-mentionsduringemergencysituations.2.2AssessingQualityofInformationonOSMPresenceofspam,compromisedaccounts,malware,andphishingattacksaremajorconcernswithrespecttothequal-ityofinformationonTwitter.Techniquesto lteroutspam/phishingonTwitterhavebeenstudiedandvariouse ec-tivesolutionshavebeenproposed.Chhabraetal.high-lightedtheroleofURLshortenerserviceslikebit.ly5inspreadingphishing;theirresultsshowedthatURLshorten-ersareusedfornotonlysavingspacebutalsohidingtheidentityofthephishinglinks[7].InafollowupstudyAg-garwaletal.furtheranalyzedandidenti edfeaturesthatindicatetophishingtweets[2].Usingthem,theydetectedphishingtweetswithanaccuracyof92.52%.Oneofthema-jorcontributionsoftheirwork,wastheChromeExtensiontheydevelopedanddeployedforreal-timephishingdetec-tiononTwitter.Grieretal.characterizedspamspreadonTwitterviaURLs.Theyfoundthat8%of25millionURLspostedonTwitterpointtophishing,malware,andscamslistedonpopularblacklists[12].Ghoshetal.characterizedsocialfarmingonTwitter,andalsoproposedamethodologytocombatlinkfarming[11].Yangetal.analyzedcommu-nityorecosystemofcybercriminalsandtheirsupportersonTwitter[28].Yardietal.appliedmachinelearningtech-niquestoidentifyspammers[29].Theyusedfeatures(1)searchesforURLs;(2)usernamepatternmatches;and,(3)keyworddetection;andobtained91%accuracy.Benevenutoetal.classi edrealYouTubeusers,asspammers,promot-ers,andlegitimates[3].Theyusedtechniquessuchassu-pervisedmachinelearningalgorithmstodetectpromotersandspammers;theyachievedhigheraccuracyfordetectingpromotors;thealgorithmswerelesse ectivefordetectingspammers.Naziretal.providedinsightfulcharacteriza- 4http://leifhanlen.wordpress.com/2011/07/22/crisis-management-using-twitter-and-facebook-for-the-greater-good/5https://bitly.com/tionofphantompro lesforgamingapplicationsonFace-book[17].Theyproposedaclassi cationframeworkusingSVMclassi erfordetectingphantompro lesofusersfromrealpro lesbasedoncertainsocialnetworkrelatedfeatures.Now,wediscusssomeoftheresearchworkdonetoassess,characterize,analyzeandcomputetrustandcredibilityofcontentononlinesocialmedia.Truthy6,wasdevelopedbyRatkiewiczetal.tostudyinformationdi usiononTwitterandcomputeatrustworthinessscoreforapublicstreamofmicro-bloggingupdatesrelatedtoaneventtodetectpoliti-calsmears,astrotur ng,misinformation,andotherformsofsocialpollution[22].Itworksonreal-timeTwitterdatawiththreemonthsofdatahistory.Castilloetal.showedthatau-tomatedclassi cationtechniquescanbeusedtodetectnewstopicsfromconversationaltopicsandassessedtheircredibil-itybasedonvariousTwitterfeatures[5].Theyachievedaprecisionandrecallof70-80%usingJ48decisiontreeclassi -cationalgorithms.Theyevaluatedtheirresultswithrespecttodataannotatedbyhumansasgroundtruth.Caninietal.analyzedusageofautomatedrankingstrategiestomeasurecredibilityofsourcesofinformationonTwitterforanygiventopic[4].Theauthorsde neacredibleinformationsourceasonewhichhastrustanddomainexpertiseassociatedwithit.Guptaetal.intheirworkonanalyzingtweetspostedduringtheterroristbombblastsinMumbai(India,2011),showedthatmajorityofsourcesofinformationareunknownandwithlowTwitterreputation(lessnumberoffollowers)[14].Thishighlightsthedicultyinmeasuringcredibilityofin-formationandtheneedtodevelopautomatedmechanismstoassesscredibilityofinformationonTwitter.Theauthorsinafollowupstudyappliedmachinelearningalgorithms(SVMRank)andinformationretrievaltechniques(relevancefeedback)toassesscredibilityofcontentonTwitter[13].Theyanalyzedfourteenhighimpacteventsof2011;theirresultsshowedthatonaverage30%oftotaltweetspostedaboutaneventcontainedsituationalinformationabouttheeventwhile14%wasspam.Only17%ofthetotaltweetspostedabouttheeventcontainedsituationalawarenessin-formationthatwascredible.Another,verysimilarworktotheabovewasdonebyXiaetal.ontweetsgenerateddur-ingtheEnglandriotsof2011[27].TheyusedasupervisedmethodofBayesianNetworkisusedtopredictthecredibil-ityoftweetsinemergencysituations.Donovanetalfocussedtheirworkon ndingindicatorsofcredibilityduringdi erentsituations(8separateeventtweets)wereconsidered.TheirresultsshowedthatthebestindicatorsofcredibilitywereURLs,mentions,retweetsandtweetlength[18].Adi erentmethodology,thantheabovepaperswasfollowedbyMorrisetal.,whoconductedasurveytounderstanduserspercep-tionsregardingcredibilityofcontentonTwitter[16].Theyaskedabout200participantstomarkwhattheyconsiderareindicatorsofcredibilityofcontentandusersonTwit-ter.Theyfoundthattheprominentfeaturesbasedonwhichusersjudgecredibilityarefeaturesvisibleataglance,forex-ample,usernameandpictureofauser.Anotherapproachtodetectuserswithhighvalueusersofcredibilityandtrust-worthinesswastakenbyGhoshetal.,theyidenti edthetopicbasedexpertsonTwitter[10].TheirtechniquesrelyonthewisdomoftheTwittercrowds-i.e.theyusedtheTwitterListsfeaturetoidentifyexpertsinvarioustopics. 6http://truthy.indiana.edu/ 3.METHODOLOGYInthissection,wediscussourresearchmethodologyindetail.FirstwedescribethemethodologyofcollectingdatafromTwitter,followedbythevariousanalyticaltechniquesappliedinthispaper.3.1DataFordatacollectionfromTwitterwehavea247setup,whichhasbeenfunctionalforaboutlast20months.WecollecteddatafromTwitterusingtheStreamingAPI.7ThisAPIenablesresearcherstoextracttweetsinreal-time,basedoncertainqueryparameterslikewordsinthetweet,timeofpostingoftweet,etc.WequeriedtheTwitterTrendsAPIaftereveryhourforthecurrenttrendingtopics,8andcollecttweetscorrespondingtothesetopicsasquerysearchwordsfortheStreamingAPI.HurricaneSandy'simpactlastedfromOct.20thtoNov.1st,2012,hencefromallthetweetscollectedduringthisperiod,we lteredouttweetscontainingthewords`sandy'and`hurricane'.We lteredoutabout1.8milliontweetsby1.2millionuniqueusersonHurricaneSandyfromOct.20thtoNov.1st,2012.Table1givesthedescriptivestatisticsofthetweetsandusersdatacollectedtotheevent,andFigure2showsthespatialdistributionofthesetweets(about19Ktweetshadgeo-locationembeddedinthem).Table1:DescriptivestatisticsoftheTwitterdatasetforHurricaneSandy. Totaltweets 1,782,526 Totaluniqueusers 1,174,266 TweetswithURLs 622,860 Usingcertainonlineresources(articles,tweetsandblogs)wewereabletoidentifycertainURLsthatbelongedtofakepicturesofHurricaneSandy.OneoftheprominentdatasourcesusedbyuswasthelistoffakeandrealimagesmadepublicbytheGuardiannewsmediacompany.9ThelistprovidedbyGaurdian,classi edthetopimageURLssharedduringthehurricaneasfakeorrealimageURLs,whichweusedtoformourdataset.ThereweremanyotherarticlesandblogsthatcoveredtherealandfakeimagesthatwerespreadonTwitter.101112Table2describesthestatisticsfordatarelatedtotweetscontainingfakeandrealimageURLs.Weidenti edeightuniquefakeimagesofSandythatwerespreadonTwitterinourdataset,wecollectedabout10KtweetsfortheseURLs.3.2CharacterizationAnalysisWeperformedcharacterizationofthetweetscontainingfakeimagesURLsandtheirpropagation,tounderstandhowtheybecameviral.Firstweperformedtemporalanalysisonthefakeimagestweets.Weanalyzedhowmanysuchtweets 7https://dev.twitter.com/docs/streaming-api.8https://dev.twitter.com/docs/api/1/get/trends9http://www.guardian.co.uk/news/datablog/2012/nov/06/fake-sandy-pictures-social-media10http://now.msn.com/hurricane-sandy-fake-photos11http://mashable.com/2012/10/29/fake-hurricane-sandy-photos/12http://theweek.com/article/index/235578/10-fake-photos-of-hurricane-sandy Figure2:SpatialdistributionoftotaltweetsonHurricaneSandy.Herewehaveplottedabout19Ktweets,whichhadembeddedgeo-locationdatainthem.Table2:DescriptivestatisticsofthetweetswithfakeandrealimagesURLs. Tweetswithfakeimages 10,350 Userswithfakeimages 10,215 Tweetswithrealimages 5,767 Userswithrealimages 5,678 weresharedperhouronTwitter.Also,weanalyzedthesud-denpeaks(fromx1hourtox1+1)inthegraphmoreclosely.Weconstructedtheretweetgraphforthesuddenpeakinthetemporalanalysis,to ndoutwhatchangesinthenetworktopologyleadtotheviralspreadoftheseimages.Weob-tainedcertainusefulinsights,aboutthenatureandspreadoffakeimageURLsonTwitter,whicharesummarizedinthenextsectionNext,weanalyzedwhatrolethesocialnetworkgraphofauseronTwitterplaysinpropagationoffakeURLs.TheexplicitsocialnetworkofauseronTwitter,isthatofhisfollowergraph.Wewantedtoanalyzewhatpercentageofinformationdi usiontakesplaceviathisfollowernetworkgraphofauser.Thedetailsofthealgorithmusedtocom-putearesummarizedinAlgorithm1. Algorithm1Compute Overlap 1:Create Graph Retweets()2:Create Graph Followers()3:foreachedgeintheretweetnetworkdo4:num retweet edges++5:Insertedgeintohashmap,H[1::n]6:endfor7:foreachedgeinthefollowernetworkdo8:Inserteachedgeinhashmap,H[1::n]9:ifcollisionthen10:intersections++11:endif12:endfor13:%overlap=(intersections=num retweet edges)100 Inthefunction,Create Graph Followers,wecrawledthefollowernetworkofalltheuniqueusersthathadtweetedthefakeimages,usingtheRESTAPIofTwitter.Thenetworkcreatedhad10,779,122edgesand10,215nodes.InCre-ate Graph Retweets,wecreatedaretweetnetwork,wherean edgebetweentwonodesexistsifoneuserhadretweetedtheother'stweet.Ahashmap,H[1::n],iscreatedtocomputetheoverlapbetweenthefollowerandretweetsgraphs.3.3ClassicationAnalysisWeanalyzedthee ectivenessofmachinelearningalgo-rithmsindetectingtweetscontainingfakeimageURLsver-sustweetscontainingrealimagesofSandy.Weperformedtwo-classclassi cationusingNaiveBayesandJ48DecisionTreeclassi ers.Wehadadatasetof10,350tweetscon-tainingfakeimageURLsand5,767tweetscontainingrealimagesURLs.Toavoidanybias,duetounequalsizeofanyoftheclasses,werandomlyselected5,767tweetsfromthefakeimagestweets,andthenappliedclassi cation.Weusedtwokindsoffeatures,fortheclassi cationalgo-rithm.Table3summarizesthefeaturescomputedbyusforeachtweetandtheuserofthetweet.Sourceoruserlevelfeatures[F1]:Theattributesoftheuserwhopostedthetweet.Weconsiderprop-ertiessuchasnumberoffriends,followersandstatusmessagesoftheuseraspartofthisset.Contentortweetlevelfeatures[F2]:The140characterspostedbyuserscontaindata(e.g.words,URLs,hashtags)andmeta-data(e.g.istweetareplyoraretweet)relatedtoit. UserFeatures[F1] NumberofFriends NumberofFollowers Follower-FriendRatio Numberoftimeslisted UserhasaURL Userisaveri eduser Ageofuseraccount TweetFeatures[F2] LengthofTweet NumberofWords ContainsQuestionMark? ContainsExclamationMark? NumberofQuestionMarks NumberofExclamationMarks ContainsHappyEmoticon ContainsSadEmoticon ContainsFirstOrderPronoun ContainsSecondOrderPronoun ContainsThirdOrderPronoun Numberofuppercasecharacters Numberofnegativesentimentwords Numberofpositivesentimentwords Numberofmentions Numberofhashtags NumberofURLs Retweetcount Table3:Userandtweetbasedfeaturesusedforclas-si cationoffakeandrealimagesofSandy.4.RESULTSInthissection,wesummarizetheresultsobtainedforthecharacterizationandclassi cationanalysisperformed.4.1CharacterizationResultsWefoundthatoutofthe10,350tweetsidenti edbyus,containingfakeimagesURLs,about86%wereretweets.Thatis,onlyabout14%peoplepostedoriginaltweetedthatcontainedsuchURLs.Fromthetemporalanalysis,weplot-tedtheperhourtweetingactivityofthefakeimagesURLs.FromFigure3weseethatthefakeURLsspreadspikesat,12hoursaftertheintroductionoftheURLsintheTwit-ternetwork.WenowanalyzethespreadofthesepictureURLsonehourbeforeandafterthespike.WeconstructthereplyandretweetgraphforthetweetssharingthesefakepictureURLsonOctober29th,at21hoursand22hours,asshowninFigure5.Weseethatthereareonlyafewuserswithveryhighdegree,thatis,onlyafewusersresultsinmajorityoftheretweets.Wecon rmedthisstatistically,Figure4(CDF)showsthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimages.Combiningresultsfromboththegraphs,weconcludethatthoughthefakeURLswerepresentintheTwitternetworkforalmost12hoursbeforetheybecameviral,alsothesuddenspikeintheirpropagationviaretweetshappenedonlybecauseofafewusers. Figure3:Detailsofdatacollectedforthefakeim-agesURLsharing.Temporaldistributionoftweets,hourwise,startingfromthe rsthourthatafakeimagetweetwasposted.Next,wedeterminetheroleofTwitternetworkgraphontheretweetspropagationofthefakeimagetweets.WerantheCompute overlapalgorithmdiscussedabove.Wefoundthenumberofoverlappingedgesas1,215,whichleadstoapercentageoverlapof11%betweentheretweetandfol-lowergraphs.Table4summarizestheresultsoftheCom-pute overlapalgorithm.Thisindicatesthattherewasaverylimitedretweetactivitywhichoriginatedbecauseofthepeo-pleinauser'sfollowergraph.Hence,incasesofcrisis,peopleoftenretweetandpropagatetweetsthatthey ndinTwittersearchortrendingtopics,irrespectiveofwhethertheyfollowtheuserornot.4.2ClassicationResultsIntheabovesection,wecharacterizedthepropertiesandbehaviorassociatedwithspreadoffalseinformation,informoffakeimages,onTwitter.Thenextimportantstepisto (a)Allusers (b)Top30usersFigure4:CDFofretweetsofthefakeimagetweetsbytheusers.Itshowsthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimages Totaledgesintheretweetnetwork 10,508 Totaledgesinthefollower-followeenetwork 10,799,122 Totaledgesthatexistinbothretweetnetworkandthefollower-followeenetwork 1,215 %ageoverlap 11% Table4:ResultsoftheAlgorithmCompute overlap.Wefoundonly11%overlapbetweenthefollowerandretweetgraphsforthetweetscontainingfakeimages.explorefeaturesandalgorithmsthatcane ectivelyhelpusisidentifyingthefakecontentinreal-time.Weperformed10-foldcrossvalidationwhileapplyingclassi cationmodels.Weappliedtwostandardalgorithmsusedforclassi cation:NaiveBayesandDecisionTree(J48).Asdescribedbefore,wetook5,767tweetsforbothfakeandrealimagecontain-ingtweets.Foreachdatapoint,wecreateduserandtweetlevelfeaturevectors.Table5summarizestheresultsfromtheclassi cationexperiment.Weachieveagoodaccuracyofabove90%forbothclassi ers,thoughDecisionTreeoutper-formstheNaivesBayesclassi er.Wecanalsoseethat,userbasedfeatures,provideverypooraccuracyindistinguishingfakeimageURLs,whiletweetbasedfeaturesperfumedverywell.Wewouldalsoliketomentionthathighaccuracyre-sultsobtainedbyus,maybeattributedtothesimilarnatureofmanytweets(sincealotoftweetsareretweetsofothertweetsinourdataset).Wecanconcludethat,contentandpropertyanalysisoftweetscanhelpusinidentifyingrealimageURLsbeingsharedonTwitterwithahighaccuracy. F1 F2 F1+F2 NaiveBayes 56.32% 91.97% 91.52% DecisionTree 53.24% 97.65% 96.65% Table5:Classi cationresultsfortweetscontain-ingfakeimageandrealimages.Ourresultsshowedthat,tweetbasedfeaturesaremoree ectiveindis-tinguishingthetwoclasses.5.DISCUSSIONOnlinesocialmediahasthecapabilityofplayingtheroleof,eitheralifesaverorthatofadaemonduringthetimesofcrisis.Inthisresearchwork,wehighlightedoneofthema-liciousintendedusageofTwitterduringareal-worldevent.WeanalyzedtheactivityontheonlinesocialnetworkingwebsiteTwitter,duringHurricaneSandy(2012)thatspreadfakeimages.Weidenti ed10,350uniquetweetscontainingfakeimagesthatwerecirculatedonTwitter,duringHurri-caneSandy.Weperformedacharacterizationanalysis,tounderstandthetemporal,socialreputationandin\ruencepatternsofthespreadofthesefakeimages.Wefoundthat86%tweetsspreadingthefakeimageswereretweets,henceveryfewwereoriginaltweetsbyusers.Also,ourresultsshowedthattop30users(0.3%oftheusers)resultedin90%ofretweetsofthefakeimage.Hence,wecanconcludedthatonlyahandfulofuserscontributedtomajorityofthedamage,viatheretweetingactivityontheTwitter.Weana-lyzedtheroleofTwittersocialgraphinpropagatingthefakeimages.Wecrawledthenetworklinks,thatis,thefollowerrelationshipsoftheusersandappliedouralgorithmtocom-putetheoverlap.Wefoundonlya11%overlapbetweentheretweetandfollowergraphsfortheuserswhotweetedfakeimagesofSandy.Thisresulthighlightsthefactthat,atthetimeofcrisis,usersretweetinformationfromotherusersir-respectiveofthefactwhethertheyfollowthemornot.Next,weusedclassi cationmodels,toidentifyfakeimagesfromrealimagesofHurricaneSandy.BestresultswereobtainedfromDecisionTreeclassi er,wegot97%accuracyinpre-dictingfakeimagesfromreal.Tweetbasedfeaturesareverye ectiveindistinguishingfakeimagestweetsfromreal,whiletheperformanceofuserbasedfeatureswasverypoor.Ourresearchworkprovidedinsightsintothebehavioralpatternofthespreadoffakeimagetweets.Alsoourresultspro-videdaproofofconceptthat,automatedtechniquescanbeusedinidentifyingrealimagesfromfakeimagespostedonTwitter.6.FUTUREWORKTheworkdonebyus,providesaproofofconceptthatautomatedtechniquescanbeusedtoidentifymaliciousorfakecontentspreadonTwitterduringrealworldevents.Wewouldliketoconductalargerstudywithmoreeventsforidenti cationoffakeimagesandnewspropogation.Also,wewouldliketoexpandourstudy,todetectingrumorsandothermaliciouscontentspreadduringrealworldeventsapartfromimages.Asanextstep,wewouldliketodevelopabrowserplug-inthatcandetectfakeimagesbeingsharedonTwitterinreal-time. (a) (b)Figure5:SpreadoffakepicturesURLs(retweetandreplygraph),thenumberonthenodeisuserpro leIDonTwitter.The gureshowsthatthefakeimagesbecameviralveryfast,withinanhourtherewasatremendousgrowthinthenumberofpeopletweetingthem.(a)Oct.29,2100hours(b)Oct.29,2200hours.7.ACKNOWLEDGMENTSWewouldliketothankGovernmentofIndiaforfundingthisproject.WewouldliketoexpressoursincerestthankstoallmembersofPreCogresearchgroupatIIIT,13Delhi,fortheircontinuedsupportandfeedbackontheproject.8.REFERENCES[1]LeysiaPalenAmandaL.Hughes.TwitterAdoptionandUseinMassConvergenceandEmergencyEvents.ISCRAMConference,2009.[2]PonnurangamKumaraguruAnupamaAggarwal,AshwinRajadesingan.Phishari:Automaticrealtimephishingdetectionontwitter.7thIEEEAPWGeCrimeResearchersSummit(eCRS),2012.[3]FabrcioBenevenuto,TiagoRodrigues,VirglioAlmeida,JussaraAlmeida,andMarcosGoncalves.Detectingspammersandcontentpromotersinonlinevideosocialnetworks.InProceedingsofthe32ndinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,SIGIR'09,pages620{627,NewYork,NY,USA,2009.ACM.[4]KevinR.Canini,BongwonSuh,andPeterL.Pirolli.Findingcredibleinformationsourcesinsocialnetworksbasedoncontentandsocialstructure.InSocialCom,2011.[5]CarlosCastillo,MarceloMendoza,andBarbaraPoblete.Informationcredibilityontwitter.InProceedingsofthe20thinternationalconferenceon 13precog.iiitd.edu.inWorldwideweb,WWW'11,pages675{684,NewYork,NY,USA,2011.ACM.[6]FranceCheongandChristopherCheong.Socialmediadatamining:Asocialnetworkanalysisoftweetsduringthe2010-2011australian\roods.InPACIS,2011.[7]SidharthChhabra,AnupamaAggarwal,FabricioBenevenuto,andPonnurangamKumaraguru.Phi.sh/$ocial:thephishinglandscapethroughshorturls.InProceedingsofthe8thAnnualCollaboration,Electronicmessaging,Anti-AbuseandSpamConference,CEAS'11,pages92{101,NewYork,NY,USA,2011.ACM.[8]WilliamJ.Corvey,SudhaVerma,SarahVieweg,MarthaPalmer,andJamesH.Martin.Foundationsofamultilayerannotationframeworkfortwittercommunicationsduringcrisisevents.InProceedingsoftheEightInternationalConferenceonLanguageResourcesandEvaluation(LREC'12),Istanbul,Turkey,may2012.EuropeanLanguageResourcesAssociation(ELRA).[9]BertrandDeLongueville,RobinS.Smith,andGianlucaLuraschi."omg,fromhere,icanseethe\rames!":ausecaseofmininglocationbasedsocialnetworkstoacquirespatio-temporaldataonforest res.InProceedingsofthe2009InternationalWorkshoponLocationBasedSocialNetworks,LBSN'09,pages73{80,NewYork,NY,USA,2009.ACM.[10]SaptarshiGhosh,NaveenSharma,FabricioBenevenuto,NiloyGanguly,andKrishnaGummadi. Cognos:crowdsourcingsearchfortopicexpertsinmicroblogs.InProceedingsofthe35thinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval,SIGIR'12,2012.[11]SaptarshiGhosh,BimalViswanath,FarshadKooti,NaveenKumarSharma,GautamKorlam,FabricioBenevenuto,NiloyGanguly,andKrishnaPhaniGummadi.Understandingandcombatinglinkfarminginthetwittersocialnetwork.InProceedingsofthe21stinternationalconferenceonWorldWideWeb,WWW'12,2012.[12]ChrisGrier,KurtThomas,VernPaxson,andMichaelZhang.@spam:theundergroundon140charactersorless.InProceedingsofthe17thACMconferenceonComputerandcommunicationssecurity,CCS'10,pages27{37,NewYork,NY,USA,2010.ACM.[13]AditiGuptaandPonnurangamKumaraguru.Credibilityrankingoftweetsduringhighimpactevents.InProceedingsofthe1stWorkshoponPrivacyandSecurityinOnlineSocialMedia,PSOSM'12,pages2:2{2:8,NewYork,NY,USA,2012.ACM.[14]AditiGuptaandPonnurangamKumaraguru.Twitterexplodeswithactivityinmumbaiblasts!alifelineoranunmonitoreddaemoninthelurking?IIIT,Delhi,Technicalreport,IIITD-TR-2011-005,2011.[15]MarceloMendoza,BarbaraPoblete,andCarlosCastillo.Twitterundercrisis:canwetrustwhatwert?InProceedingsoftheFirstWorkshoponSocialMediaAnalytics,SOMA'10,pages71{79,NewYork,NY,USA,2010.ACM.[16]MeredithRingelMorris,ScottCounts,AstaRoseway,AaronHo ,andJuliaSchwarz.Tweetingisbelieving?:understandingmicroblogcredibilityperceptions.InProceedingsoftheACM2012conferenceonComputerSupportedCooperativeWork,CSCW'12,pages441{450,NewYork,NY,USA,2012.ACM.[17]AtifNazir,SaqibRaza,Chen-NeeChuah,andBurkhardSchipper.Ghostbustingfacebook:detectingandcharacterizingphantompro lesinonlinesocialgamingapplications.InProceedingsofthe3rdconferenceonOnlinesocialnetworks,WOSN'10,2010.[18]J.O'Donovan,B.Kang,G.Meyer,T.HZllerer,andS.Adali.Credibilityincontext:Ananalysisoffeaturedistributionsintwitter.ASE/IEEEInternationalConferenceonSocialComputing,SocialCom,2012.[19]OnookOh,ManishAgrawal,andH.RaghavRao.Informationcontrolandterrorism:Trackingthemumbaiterroristattackthroughtwitter.InformationSystemsFrontiers,13(1):33{43,March2011.[20]LeysiaPalen,KennethM.Anderson,GloriaMark,JamesMartin,DouglasSicker,MarthaPalmer,andDirkGrunwald.Avisionfortechnology-mediatedsupportforpublicparticipation&assistanceinmassemergencies&disasters.InProceedingsofthe2010ACM-BCSVisionsofComputerScienceConference,ACM-BCS'10,2010.[21]LeysiaPalenandSarahVieweg.Theemergenceofonlinewidescaleinteractioninunexpectedevents:assistance,alliance&retreat.InProceedingsofthe2008ACMconferenceonComputersupportedcooperativework,CSCW'08,pages117{126,NewYork,NY,USA,2008.ACM.[22]JacobRatkiewicz,MichaelConover,MarkMeiss,BrunoGoncalves,SnehalPatil,AlessandroFlammini,andFilippoMenczer.Truthy:mappingthespreadofastroturfinmicroblogstreams.WWW'11,2011.[23]TakeshiSakaki,MakotoOkazaki,andYutakaMatsuo.Earthquakeshakestwitterusers:real-timeeventdetectionbysocialsensors.InProceedingsofthe19thinternationalconferenceonWorldwideweb,WWW'10,pages851{860,NewYork,NY,USA,2010.ACM.[24]TakeshiSakaki,FujioToriumi,andYutakaMatsuo.Tweettrendanalysisinanemergencysituation.InProceedingsoftheSpecialWorkshoponInternetandDisasters,SWID'11,pages3:1{3:8,NewYork,NY,USA,2011.ACM.[25]SudhaVerma,SarahVieweg,WilliamCorvey,LeysiaPalen,JamesH.Martin,MarthaPalmer,AaronSchram,andKennethMarkAnderson.Naturallanguageprocessingtotherescue?extracting"situationalawareness"tweetsduringmassemergency.InLadaA.Adamic,RicardoA.Baeza-Yates,andScottCounts,editors,ICWSM.TheAAAIPress,2011.[26]SarahVieweg,AmandaL.Hughes,KateStarbird,andLeysiaPalen.Microbloggingduringtwonaturalhazardsevents:whattwittermaycontributetosituationalawareness.InProceedingsofthe28thinternationalconferenceonHumanfactorsincomputingsystems,CHI'10,pages1079{1088,NewYork,NY,USA,2010.ACM.[27]XinXia,XiaohuYang,ChaoWu,ShanpingLi,andLinfengBao.Informationcredibilityontwitterinemergencysituation.InProceedingsofthe2012Paci cAsiaconferenceonIntelligenceandSecurityInformatics,PAISI'12,2012.[28]ChaoYang,RobertHarkreader,JialongZhang,SeungwonShin,andGuofeiGu.Analyzingspammers'socialnetworksforfunandpro t:acasestudyofcybercriminalecosystemontwitter.InProceedingsofthe21stinternationalconferenceonWorldWideWeb,WWW'12,2012.[29]SaritaYardi,DanielRomero,GrantSchoenebeck,andDanahBoyd.DetectingspaminaTwitternetwork.FirstMonday,15(1),January2010.