/
Autograph Toward Automated Distributed Worm Signature Detection HyangAh Kim hakimcs Autograph Toward Automated Distributed Worm Signature Detection HyangAh Kim hakimcs

Autograph Toward Automated Distributed Worm Signature Detection HyangAh Kim hakimcs - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
404 views
Uploaded On 2015-03-18

Autograph Toward Automated Distributed Worm Signature Detection HyangAh Kim hakimcs - PPT Presentation

cmuedu Carnegie Mellon University Brad Karp bradnkarpintelcom bkarpcscmuedu Intel Research Carnegie Mellon University Abstract Todays Internet intrusion detection systems IDSes moni tor edge networks DMZs to identify andor 64257lter malicious 64258o ID: 47217

cmuedu Carnegie Mellon University Brad

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Autograph Toward Automated Distributed W..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Autograph:TowardAutomated,DistributedWormSignatureDetectionHyang-AhKimhakim@cs.cmu.eduCarnegieMellonUniversityBradKarpbrad.n.karp@intel.com,bkarp@cs.cmu.eduIntelResearch/CarnegieMellonUniversityAbstractToday'sInternetintrusiondetectionsystems(IDSes)moni-toredgenetworks'DMZstoidentifyand/orltermaliciousows.WhileanIDShelpsprotectthehostsonitslocaledgenetworkfromcompromiseanddenialofservice,itcannotaloneeffectivelyintervenetohaltandreversethespreadingofnovelInternetworms.GenerationofthewormsignaturesrequiredbyanIDS—thebytepatternssoughtinmonitoredtrafctoidentifyworms—todayentailsnon-trivialhumanla-bor,andthussignicantdelay:asnetworkoperatorsdetectanomalousbehavior,theycommunicatewithoneanotherandmanuallystudypackettracestoproduceawormsignature.Yetinterventionmustoccurearlyinanepidemictohaltaworm'sspread.Inthispaper,wedescribeAutograph,asys-temthatautomaticallygeneratessignaturesfornovelInternetwormsthatpropagateusingTCPtransport.Autographgen-eratessignaturesbyanalyzingtheprevalenceofportionsofowpayloads,andthususesnoknowledgeofprotocolse-manticsabovetheTCPlevel.Itisdesignedtoproducesig-naturesthatexhibithighsensitivity(hightruepositives)andhighspecicity(lowfalsepositives);ourevaluationofthesystemonrealDMZtracesvalidatesthatitachievesthesegoals.WeextendAutographtoshareportscanreportsamongdistributedmonitorinstances,andusingtrace-drivensimula-tion,demonstratethevalueofthistechniqueinspeedingthegenerationofsignaturesfornovelworms.Ourresultselu-cidatethefundamentaltrade-offbetweenearlygenerationofsignaturesfornovelwormsandthespecicityofthesegener-atedsignatures.1IntroductionandMotivationInrecentyears,aseriesofInternetwormshasexploitedtheconuenceoftherelativelackofdiversityinsystemandserversoftwarerunbyInternet-attachedhosts,andtheeasewithwhichthesehostscancommunicate.Awormprogramisself-replicating:itremotelyexploitsasoftwarevulnerabil-ityonavictimhost,suchthatthevictimbecomesinfected,anditselfbeginsremotelyinfectingothervictims.Thesever-ityofthewormthreatgoesfarbeyondmereinconvenience.ThetotalcostoftheCodeRedwormepidemic,asmeasuredinlostproductivityowingtointerruptionsincomputerandnetworkservices,isestimatedat$2.6billion[7].MotivatedinlargepartbythecostsofInternetwormepi-demics,theresearchcommunityhasinvestigatedwormprop-agationandhowtothwartit.Initialinvestigationsfocusedoncasestudiesofthespreadingofsuccessfulworms[8],andoncomparativelymodelingdiversepropagationstrategiesfu-turewormsmightuse[18,21].Morerecently,researchers'attentionhasturnedtomethodsforcontainingthespreadofaworm.Broadlyspeaking,threechiefstrategiesexistforcon-tainingwormsbyblockingtheirconnectionstopotentialvic-tims:discoveringportsonwhichwormsappeartobespread-ing,andlteringalltrafcdestinedforthoseports;discover-ingsourceaddressesofinfectedhostsandlteringalltrafc(orperhapstrafcdestinedforafewports)fromthosesourceaddresses;anddiscoveringthepayloadcontentstringthatawormusesinitsinfectionattempts,andlteringallowswhosepayloadscontainthatcontentstring.Detectingthatawormappearstobeactiveonaparticularport[22]isausefulrststeptowardcontainment,butisoftentoobluntaninstrumenttobeusedalone;simplyblockingalltrafcforport80atedgenetworksacrosstheInternetshutsdowntheentirewebwhenawormthattargetswebserversisreleased.Mooreetal.[9]comparedtherelativeefcacyofsource-addresslteringandcontent-basedltering.Theirresultsshowthatcontent-basedlteringofinfectionattemptsslowsthespreadingofawormmoreeffectively:toconneanepidemicwithinaparticulartargetfractionofthevulner-ablehostpopulation,onemaybegincontent-basedlteringfarlaterafterthereleaseofawormthanaddress-basedl-tering.Motivatedbytheefcacyofcontent-basedltering,weseekinthispapertoanswerthecomplementaryquestionunansweredinpriorwork:howshouldoneobtainwormcon-tentsignaturesforuseincontent-basedltering?Here,asignatureisatuple(IP-proto,dst-port,byteseq),whereIP-protoisanIPprotocolnumber,dst-portisadestinationportnumberforthatproto-col,andbyteseqisavariable-length,xedsequenceofbytes.1Content-basedlteringconsistsofmatchingnetworkows(possiblyrequiringowreassembly)againstsignatures;amatchoccurswhenbyteseqisfoundwithinthepay-loadofaowusingtheIP-protoprotocoldestinedfordst-port.Werestrictourinvestigationtowormsthatprop-agateoverTCPinthiswork,andthushereafterconsidersig-naturesas(dst-port,byteseq)tuples. Today,thereexistTCP-ow-matchingsystemsthatare“consumers”ofthesesortsofsignatures.Intrusiondetec-tionsystems(IDSes),suchasBro[11]andSnort[19],moni-torallincomingtrafcatanedgenetwork'sDMZ,performTCPowreassembly,andsearchforknownwormsigna-tures.Thesesystemslogtheoccurrenceofinboundwormconnectionstheyobserve,andcanbecongured(inthecaseofBro)tochangeaccesscontrollistsintheedgenetwork'srouter(s)toblocktrafcfromsourceIPaddressesthathavesentknownwormpayloads.Cisco'sNBARsystem[3]forrouterssearchesforsignaturesinowpayloads,andblocksowsontheywhosepayloadsarefoundtocontainknownwormsignatures.Welimitthescopeofourinquirytothedetectionandgenerationofsignaturesforusebytheseandfuturecontent-basedlteringsystems.Itisimportanttonotethatallthecontent-basedlteringsystemsusedatabasesofwormsignaturesthataremanuallygenerated:asnetworkoperatorsdetectanomalousbehavior,theycommunicatewithoneanother,manuallystudypackettracestoproduceawormsignature,andpublishthatsignaturesothatitmaybeaddedtoIDSsystems'signaturedatabases.Thislabor-intensive,human-mediatedprocessofsignaturegenerationisslow(ontheorderofhoursorlonger),andren-derstoday'sIDSesunhelpfulinstemmingwormepidemics—bythetimeasignaturehasbeenfoundmanuallybynetworkoperators,awormmayalreadyhavecompromisedasigni-cantfractionofvulnerablehostsontheInternet.Weseektobuildasystemthatautomatically,withoutfore-knowledgeofaworm'spayloadortimeofintroduction,de-tectsthesignatureofanywormthatpropagatesbyrandomlyscanningIPaddresses.Weassumethesystemmonitorsallin-boundnetworktrafcatanedgenetwork'sDMZ.Autograph,ourwormsignaturedetectionsystem,hasbeendesignedtomeetthatgoal.Thesystemconsistsofthreeinterconnectedmodules:aowclassier,acontent-basedsignaturegenera-tor,andtattler,aprotocolthroughwhichmultipledistributedAutographmonitorsmayshareinformation,intheinterestofspeedingdetectionofasignaturethatmatchesanewlyre-leasedworm.InourevaluationofAutograph,weexploretwoimportantthemes.First,thereisatrade-offbetweenearlydetectionofwormsignaturesandavoidinggenerationofsignaturesthatcausefalsepositives.Intuitively,earlyinanepidemic,wormtrafcislessofanoutlieragainstthebackgroundofinnocu-oustrafc.Thus,targetingearlydetectionofwormsignaturesincreasestheriskofmistakinginnocuoustrafcforwormtrafc,andproducingsignaturesthatincurfalsepositives.Second,wedemonstratetheutilityofdistributed,collabora-tivemonitoringinspeedingdetectionofanovelworm'ssig-natureafteritsrelease.Intheremainderofthispaper,weproceedasfollows:Inthenextsection,wecatalogthegoalsthatdroveAutograph'sdesign.InSection3,wedescribethedetailedworkingsofasingleAutographmonitor:itstrafcclassierandcontent- hightrue+ lowtrue+ highfalse+ sensitive, insensitive, unspecic unspecic lowfalse+ sensitive, insensitive, specic specic Figure1:Combinationsofsensitivityandspecicity.basedsignaturegenerator.Next,inSection4,weevaluatethequalityofthesignaturesAutographndswhenrunonrealDMZtracesfromtwoedgenetworks.InSection5wede-scribetattlerandthedistributedversionofAutograph,andus-ingDMZ-trace-drivensimulationevaluatethespeedatwhichthedistributedAutographcandetectsignaturesfornewlyin-troducedworms.AftercataloginglimitationsofAutographandpossibleattacksagainstitinSection6,anddescribingrelatedworkinSection7,weconcludeinSection8.2DesiderataforaWormSignatureDetectionSystemSignaturequality.Ideally,asignaturedetectionsystemshouldgeneratesignaturesthatmatchwormsandonlyworms.Indescribingtheefcacyofwormsignaturesinl-teringtrafc,weadopttheparlanceusedinepidemiologytoevaluateadiagnostictest:Sensitivityrelatestothetruepositivesgeneratedbyasig-nature;inamixedpopulationofwormandnon-wormows,thefractionofthewormowsmatched,andthussuccessfullyidentied,bythesignature.Sensitivityistypicallyreportedast[0;1],thefractionoftrueposi-tivesamongwormows.Specicityrelatestothefalsepositivesgeneratedbyasignature;again,inamixedpopulation,thefractionofnon-wormowsmatchedbythesignature,andthusin-correctlyidentiedasworms.Specicityistypicallyre-portedas(1f)[0;1],wherefisthefractionoffalsepositivesamongnon-wormows.Throughoutthispaper,weclassifysignaturesaccordingtothisterminology,asshowninFigure1.Inpractice,thereisatensionbetweenperfectsensitivityandperfectspecicity;oneoftensufferswhentheotherim-proves,becauseadiagnostictest(e.g.,“isthisowawormornot?”)typicallymeasuresonlyanarrowsetoffeaturesinitsinput,andthusdoesnotperfectlyclassifyit.Theremaybecaseswheretwoinputspresentwithidenticalfeaturesintheeyesofatest,butbelongindifferentclasses.Weexaminethissensitivity-specicitytrade-offindetailinSection4. Signaturequantityandlength.Systemsthatmatchowpayloadsagainstsignaturesmustcompareaowtoallsigna-turesknownforitsIPprotocolandport.Thus,fewersigna-turesspeedmatching.Similarly,thecostofsignaturematch-ingisproportionaltothelengthofthesignature,soshortsignaturesmaybepreferabletolongones.Signaturelengthprofoundlyaffectsspecicity:whenonesignatureisasubse-quenceofanother,thelongeroneisexpectedtomatchfewerowsthantheshorterone.Robustnessagainstpolymorphicworms.Apolymorphicworm2changesitspayloadinsuccessiveinfectionattempts.Suchwormsposeaparticularchallengetomatchwithsig-natures,asasignaturesensitivetoaportionofonewormpayloadmaynotbesensitivetoanypartofanotherwormpayload.Ifawormwere“ideally”polymorphic,eachofitspayloadswouldcontainnobytesequenceincommonwithanyother.Thatidealisimpossible,ofcourse;single-bytese-quencesaresharedbyallpayloads.Inpractice,a“strongly”polymorphicwormisonewhosesuccessivepayloadsshareonlyveryshortbytesubsequencesincommon.Suchshortsubsequences,e.g.,4byteslong,cannotsafelybeusedaswormsignatures,astheymaybeinsufcientlyspecic.Poly-morphismgenerallycausesanexplosioninthenumberofsignaturesrequiredtomatchaworm.Anevaluationoftheextenttowhichsuchwormpayloadsareachievableisbeyondthescopeofthispaper.Wenote,however,thatifawormexhibitspolymorphism,butdoesnotchangeoneormorerel-ativelylongsubsequencesacrossitsvariants,anefcientsig-naturedetectionsystemwillgeneratesignaturesthatmatchtheseinvariantsubsequences,andthusminimizethenumberofsignaturesrequiredtomatchalltheworm'svariants.Timelinessofdetection.Leftuncheckedbypatches,trafcltering,orothermeans,port-scanningwormsinfectvulner-ablehostsatanexponentialrate,untiltheinfectedpopulationsaturates.Provos[12]showsinsimulationthatpatchingofinfectedhostsismoreeffectivetheearlieritisbegunaftertheinitialreleaseofanewworm,andthatinpracticaldeploy-mentscenarios,patchingmustbeginquickly(before5%ofvulnerablehostsbecomeinfected)inordertohavehopeofstemminganepidemicsuchthatnomorethan50%ofvul-nerablehostseverbecomeinfected.Mooreetal.[9]showsimilarlythatsignature-basedlteringofwormtrafcstopswormpropagationmosteffectivelywhenbegunearly.Automation.Asignaturedetectionsystemshouldrequireminimalreal-timeoperatorintervention.Vettingsignaturesforspecicitywithhumaneyes,e.g.,isatoddswithtimeli-nessofsignaturedetectionfornovelworms.Applicationneutrality.KnowledgeofapplicationprotocolsemanticsabovetheTCPlayer(e.g.,HTTP,NFSRPCs,&c.) Suspicious Flow Selection Selecting Suspicious traffic using heuristics Signature Generation PrevalenceHistogramConstruction Flow reassembly PayloadPartitioning(COPP) Suspicious inbound packetsContent blocksPrevalence Histogram Non-suspicious inboundpackets Worm Signatures Cross-DMZ traffic  tattler Other Autograph Monitors Port-scanner IP addresses Figure2:ArchitectureofanAutographMonitormaybeusefulindistinguishingwormandinnocuoustrafc,andthusinproducingsignaturesthataresensitiveandspe-cic.Avoidingleaningonsuchapplication-protocolknowl-edge,however,broadenstheapplicabilityofthesignaturede-tectionsystemtoallprotocolslayeredatopTCP.Bandwidthefciency.Ifasignaturedetectionsystemisde-ployedindistributedfashion,suchthattrafcmonitorscom-municatewithoneanotherabouttheirobservations,thatcom-municationshouldremainscalable,evenwhenawormgen-eratestremendousnetworkactivityasittriestospread.Thatis,monitor-to-monitorcommunicationshouldgrowslowlyaswormactivityincreases.3AutographSystemDesignMotivatedbythedesigngoalsgivenintheprevioussec-tion,wenowpresentAutograph.Webeginwithaschematicoverviewofthesystem,showninFigure2.AsingleAuto-graphmonitor'sinputisalltrafccrossinganedgenetwork'sDMZ,anditsoutputisalistofwormsignatures.Wedeferdiscussionoftattler,usedindistributeddeploymentsofAu-tograph,toSection5.2.TherearetwomainstagesinasingleAutographmonitor'sanalysisoftrafc.First,asuspiciousowselectionstageusesheuristicstoclassifyinboundTCPowsaseithersuspiciousornon-suspicious.Afterclassication,packetsfortheseinboundowsarestoredondiskinasuspiciousowpoolandnon-suspiciousowpool,respectively.Forclarity,throughoutthispaper,werefertotheoutputoftheclassierusingthoseterms,andre-fertothetruenatureofaowasmaliciousorinnocuous.Furtherprocessingoccursonlyonpayloadsinthesuspiciousowpool.Thus,owclassicationreducesthevolumeoftrafcthatmustbeprocessedsubsequently.Weassumeinourworkthatsuchheuristicswillbefarfromperfectlyaccu-rate.YetanyheuristicthatgeneratesasuspiciousowpoolinwhichtrulymaliciousowsareagreaterfractionofowsthaninthetotalinboundtrafcmixcrossingtheDMZwilllikelyreducegenerationofsignaturesthatcausefalseposi-tives,byfocusingAutograph'sfurtherprocessingonaowpopulationcontainingalesserfractionofinnocuoustrafc.AutographperformsTCPowreassemblyforinboundpay-loadsinthesuspiciousowpool.Theresultingreassembled payloadsareanalyzedinAutograph'ssecondstage,signaturegeneration.WestressthatAutographsegregatesowsbydestinationportforsignaturegeneration;intheremainderofthispaper,oneshouldenvisiononeseparateinstanceofsignaturegen-erationforeachdestinationport,operatingonowsinthesuspiciousowpooldestinedforthatport.Signaturegenera-tioninvolvesanalysisofthecontentofpayloadsofsuspiciousowstoselectsensitiveandspecicsignatures.Twoproper-tiesofwormssuggestthatcontentanalysismaybefruitful.First,awormpropagatesbyexploitingonesoftwarevulner-abilityorasetofsuchvulnerabilities.Thatcommonalityinfunctionalityhastodateledtocommonalityincode,andthusinpayloadcontent,acrossworminfectionpayloads.Infact,Internetwormstodatehavehadasingle,unchangingpayloadinmostcases.Eveninthosecaseswheremultiplevariantsofaworm'spayloadhaveexisted(e.g.,Nimda),thosevari-antshavesharedsignicantoverlappingcontent.3Second,awormgeneratesvoluminousnetworktrafcasitspreads;thistraitstemsfromworms'self-propagatingnature.Forport-scanningworms,theexponentialgrowthinthepopu-lationofinfectedhostsandattendantexponentialgrowthininfectionattempttrafcarewellknown[8].AsalsonotedandexploitedbySinghetal.[15],takentogether,thesetwotraitsofwormtrafc—contentcommonalityandmagnitudeoftrafcvolume—suggestthatanalyzingthefrequencyofpayloadcontentshouldbeusefulinidentifyingwormpay-loads.Duringsignaturegeneration,Autographmeasuresthefrequencywithwhichnon-overlappingpayloadsubstringsoccuracrossallsuspiciousowpayloads,andproposesthemostfrequentlyoccurringsubstringsascandidatesignatures.Intheremainderofthissection,wedescribeAutograph'stwostagesinfurtherdetail.3.1SelectingSuspiciousTrafcInthiswork,weuseasimpleport-scannerdetectiontech-niqueasaheuristictoidentifymalicioustrafc;weclas-sifyallowsfromport-scanningsourcesassuspicious.Notethatwedonotfocusonthedesignofsuspiciousowclas-siersherein;Autographcanadoptanyanomalydetectiontechniquethatclassieswormowsassuspiciouswithhighprobability.Infact,wedeliberatelyuseaport-scanningowclassierbecauseitissimple,computationallyefcient,andclearlyimperfect;ouraimistodemonstratethatAutographgenerateshighlyselectiveandspecicsignatures,evenwithanaiveowclassier.Withmoreaccurateowclassiers,onewillonlyexpectthequalityofAutograph'ssignaturestoimprove.ManyrecentwormsrelyonscanningoftheIPaddressspacetosearchforvulnerablehostswhilespreading.Ifawormndsanothermachinethatrunsthedesiredserviceonthetargetport,itsendsitsinfectiouspayload.Probinganon-existenthostorservice,however,resultsinanunsuc-cessfulconnectionattempt,easilydetectablebymonitoringoutboundICMPhost/portunreachablemessages,oridentify-ingunansweredinboundSYNpackets.Hit-listworms[18],whilenotyetobservedinthewild,violatethisport-scanningassumption;wedonotaddresstheminthispaper,butcom-mentonthembrieyinSection6.AutographstoresthesourceanddestinationaddressesofeachinboundunsuccessfulTCPconnectionitobserves.OnceanexternalhosthasmadeunsuccessfulconnectionattemptstomorethansinternalIPaddresses,theowclassiercon-sidersittobeascanner.AllsuccessfulconnectionsfromanIPaddressaggedasascannerareclassiedassuspicious,andtheirinboundpacketswrittentothesuspiciousowpool,untilthatIPaddressisremovedafteratimeout(24hoursinthecurrentprototype).4Packetsheldinthesuspiciousowpoolaredroppedfromstorageafteracongurableintervalt.Thus,thesuspiciousowpoolcontainsallpacketsreceivedfromsuspicioussourcesinthepasttimeperiodt.5AutographreassemblesallTCPowsinthesuspiciousowpool.Everyrminutes,Autographconsidersinitiatingsignaturegeneration.Itdoessowhenforasingledestinationport,thesuspiciousowpoolcontainsmorethanathresholdnumberofowsq.InanonlinedeploymentofAutograph,weenvisiontypicalrvaluesontheorderoftenminutes.Wecontinuewithadetaileddescriptionofsignaturegenerationinthenextsubsection.3.2Content-BasedSignatureGenerationAutographnextselectsthemostfrequentlyoccurringbytese-quencesacrosstheowsinthesuspiciousowpoolassigna-tures.Todoso,itdivideseachsuspiciousowintosmallercontentblocks,andcountsthenumberofsuspiciousowsinwhicheachcontentblockoccurs.Wetermthiscountacon-tentblock'sprevalence,andrankcontentblocksfrommosttoleastprevalent.Aspreviouslydescribed,theintuitionbehindthisrankingisthataworm'spayloadappearsincreasinglyfre-quentlyasthatwormspreads.Whenallwormowscontainacommon,worm-specicbytesequence,thatbytesequencewillbeobservedinmanysuspiciousows,andsowillbehighlyranked.LetusrstdescribehowAutographdividessuspiciousows'payloadsintoshorterblocks.Onemightnaivelydi-videpayloadsintoxed-size,non-overlappingblocks,andcomputetheprevalenceofthoseblocksacrossallsuspiciousows.Thatapproach,however,isbrittleifwormseventriv-iallyobfuscatetheirpayloadsbyreorderingthem,orinsert-ingordeletingafewbytes.Toseewhy,considerwhatoc-curswhenasinglebyteisdeletedorinsertedfromaworm'spayload;allxed-sizeblocksbeyondtheinsertionordeletionwillmostlikelychangeincontent.Thus,awormauthorcouldevadeaccuratecountingofitssubstringsbytrivialchangesinitspayload,ifxed-size,non-overlappingblockswereusedtopartitionpayloadsforcountingsubstringprevalence.   \n     Figure3:COPPwithabreakmarkofr(“0007”)Instead,asrstdoneinthelesystemdomaininLBFS[10],wedivideaow'spayloadintovariable-lengthcontentblocksusingCOntent-basedPayloadPartitioning(COPP).BecauseCOPPdeterminestheboundariesofeachblockbasedonpayloadcontent,thesetofblocksCOPPgen-erateschangeslittleunderbyteinsertionordeletion.Topartitionaow'spayloadintocontentblocks,COPPcomputesaseriesofRabinngerprintsrioveraslidingk-bytewindowoftheow'spayload,beginningwiththerstkbytesinthepayload,andslidingonebyteatatimetowardtheendofthepayload.ItisefcienttocomputeaRabinngerprintoveraslidingwindow[13].AsCOPPslidesitswindowalongthepayload,itendsacontentblockwhenrimatchesaprede-terminedbreakmark,B;whenriB(moda).6TheaveragecontentblocksizeproducedbyCOPP,a,iscongurable;as-sumingrandompayloadcontent,thewindowatanybytepo-sitionwithinthepayloadequalsthebreakmarkB(moda)withprobability1=a.Figure3presentsanexampleofCOPP,usinga2-bytewin-dow,fortwoowsf0andf1.Slidinga2-bytewindowfromtherst2bytestothelastbyte,COPPendsacontentblockciwheneveritseesthebreakmarkequaltotheRabinnger-printforthebytestring“0007”.Evenifthereexistbytein-sertions,deletions,orreplacementsbetweenthetwoows,COPPndsidenticalc1andc3blocksinbothofthem.BecauseCOPPdecidescontentblockboundariesproba-bilistically,theremaybecaseswhereCOPPgeneratesveryshortcontentblocks,ortakesanentireow'spayloadasasinglecontentblock.Veryshortcontentblocksarehighlyunspecic;theywillgeneratemanyfalsepositives.Takingthewholepayloadisnotdesirableeither,becauselongsigna-turesarenotrobustinmatchingwormsthatmightvarytheirpayloads.Thus,weimposeminimumandmaximumcontentblocksizes,mandM,respectively.WhenCOPPreachestheendofacontentblockandfewerthanmbytesremainintheowthereafter,itgeneratesacontentblockthatcontainsthelastmbytesoftheow'spayload.Inthisway,COPPavoidsgeneratingtooshortacontentblock,andavoidsignoringtheendofthepayload.AfterAutographdivideseveryowinthesuspiciousowpoolintocontentblocksusingCOPP,itdiscardscontentblocksthatappearonlyinowsthatoriginatefromasin-glesourceIPaddressfromfurtherconsideration.WefoundearlyonwhenapplyingAutographtoDMZtracesthatsuchcontentblockstypicallycorrespondtomisconguredoroth-erwisemalfunctioningsourcesthatarenotmalicious;suchcontentblockstypicallyoccurinmanyinnocuousows,andthusoftenleadtosignaturesthatcausefalsepositives.Singhetal.[15]alsohadthisinsight—theyconsiderowendpointaddressdistributionswhengeneratingwormsignatures.SupposethereareNdistinctowsinthesuspiciousowpool.EachremainingcontentblockmatchessomeportionoftheseNows.Autographrepeatedlyselectscontentblocksassignatures,untiltheselectedsetofsignaturesmatchesacongurablefractionwoftheowsinthesuspiciousowpool.Thatis,Autographselectsasignaturesetthat“covers”atleastwNowsinthesuspiciousowpool.WenowdescribehowAutographgreedilyselectscontentblocksassignaturesfromthesetofremainingcontentblocks.InitiallythesuspiciousowpoolFcontainsallsuspiciousows,andthesetofcontentblocksCcontainsallcontentblocksproducedbyCOPPthatwerefoundinowsorigi-natingfrommorethanonesourceIPaddress.Autographmeasurestheprevalenceofeachcontentblock—thenumberofsuspiciousowsinFinwhicheachcontentblockinCappears—andsortsthecontentblocksfromgreatesttoleastprevalence.Thecontentblockwiththegreatestprevalenceischosenasthenextsignature.ItisremovedfromthesetofremainingcontentblocksC,andtheowsitmatchesarere-movedfromthesuspiciousowpool,F.Thisentireprocessthenrepeats;theprevalenceofcontentblocksinCinowsinFiscomputed,themostprevalentcontentblockbecomesasignature,andsoon,untilwNowsintheoriginalFhavebeencovered.Thisgreedyalgorithmattemptstominimizethesizeofthesetofsignaturesbychoosingthemostpreva-lentcontentblockateachstep.Weincorporateablacklistingtechniqueintosignaturegen-eration.AnadministratormaycongureAutographwithablacklistofdisallowedsignatures,inanefforttopreventthesystemfromgeneratingsignaturesthatwillcausefalsepos-itives.Theblacklistissimplyasetofstrings.Anysig-natureAutographselectsthatisasubstringofanentryintheblacklistisdiscarded;AutographeliminatesthatcontentblockfromCwithoutselectingitasasignature,andcon-tinuesasusual.WeenvisionthatanadministratormayrunAutographforaninitialtrainingperiod,andvetsignatureswithhumaneyesduringthatperiod.Signaturesgeneratedduringthisperiodthatmatchcommonpatternsininnocu-ousows(e.g.,GET/index.htmlHTTP/1.0)canbeaddedtotheblacklist.Attheendofthisprocess,Autographreportstheselectedsetofsignatures.ThecurrentversionofthesystempublishessignaturebytepatternsinBro'ssignatureformat,fordirectuseinBro.Table1summarizestheparametersthatcontrolAutograph'sbehavior.Notethatbecausetheowclassierheuristicisimperfect,innocuousowswillunavoidablybeincludedinthesigna-turegenerationprocess.Weexpecttwochiefconsequences 0 10 20 30 40 50 60 70 0 20 40 60 80 100 Occurence (%) from Code-RedIIfrom Nimda (16 different payloads) m = 8 m = 16 m = 32 m = 40 m = 64 Figure4:Prevalencehistogramofcontentblocks,a=64bytes,ICSI2DMZtrace,day3(24hrs).oftheirinclusion:Prevalentsignaturesmatchinginnocuousandmaliciousows.OnepossibleresultisthattheprobabilisticCOPPprocesswillproducecontentblocksthatcontainonlypro-tocolheaderortrailerdatacommontonearlyallowscar-ryingthatprotocol,whetherinnocuousormalicious.Suchblockswilltoptheprevalencehistogram,butwouldclearlybeabysmallyunspecicifadoptedfortrafcltering.Toavoidchoosingsuchunspeciccontentblocks,wecanvaryaandmtowardlongerblocksizes.Non-prevalentsignaturesforinnocuousows.AnotherpossibilityisthatAutographchoosesacontentblockcom-montoonlyafewinnocuousows.Suchcontentblockswillnotbeprevalent,andwillbeatthetailoftheprevalencehis-togram.Twoheuristicscanexcludethesesignaturesfrompublication.First,byusingasmallerwvalue,Autographcanavoidgenerationofsignaturesforthebottom(1w)%oftheprevalencedistribution,thoughthischoicemayhavetheundesirablesideeffectofdelayingdetectionofworms.ThesecondusefulheuristiccomesfromourexperiencewiththeinitialCOPPimplementation.Figure4showsthepreva-lencehistogramAutographgeneratesfromarealDMZtrace.Amongallcontentblocks,onlyafewareprevalent(thosefromCode-RedII,Nimda,andWebDAV)andtheprevalencedistributionhasanoticeabletail.WecanrestrictAutographtochooseacontentblockasasignatureonlyifmorethanpowsinthesuspiciousowpoolcontainit,toavoidpublish-ingsignaturesfornon-prevalentcontentblocks.4Evaluation:LocalSignatureDetectionWenowevaluatethequalityofsignaturesAutographgener-ates.Inthissection,weanswerthefollowingtwoquestions:First,howdoescontentblocksizeaffectthethesensitivity Symbol Description s Portscannerdetectionthreshold a COPPparameter:averagecontentblocksize m COPPparameter:minimumcontentblocksize M COPPparameter:maximumcontentblocksize w Targetpercentageofsuspiciousowstoberepresented ingeneratedsignatures p Minimumcontentblockprevalenceforuseassignature t Durationsuspiciousowsheldinsuspiciousowpool r Intervalbetweensignaturegenerationattempts q Minimumsizeofsuspiciousowpooltoallow signaturegenerationprocess Table1:Autograph'ssignaturegenerationparameters.andspecicityofthesignaturesAutographgenerates?Andsecond,howrobustisAutographtowormsthatvarytheirpayloads?Ourexperimentsdemonstratethatascontentblocksizedecreases,thelikelihoodthatAutographdetectscommonal-ityacrosssuspiciousowsincreases.Asaresult,ascon-tentblocksizedecreases,Autographgeneratesprogressivelymoresensitivebutlessspecicsignatures.Theyalsorevealthatsmallblocksizesaremoreresilienttowormsthatvarytheircontent,inthattheycandetectsmallercommonpartsamongwormpayloads.4.1OfineSignatureDetectiononDMZTracesWerstinvestigatetheeffectofcontentblocksizeonthequalityofthesignaturesgeneratedbyAutograph.Inthissub-section,weuseasuspiciousowpoolaccumulatedduringanintervaltof24hours,andconsideronlyasingleinvocationofsignaturegenerationonthatowpool.Noblacklistingisusedintheresultsinthissubsection,andlteringofcontentblocksthatappearonlyfromonesourceaddressbeforesigna-turegenerationisdisabled.AllresultswepresenthereinareforaCOPPRabinngerprintwindowofwidthk4bytes.7Inourexperiments,wefeedAutographoneofthreepackettracesfromtheDMZsoftworesearchlabs;onefromIntelResearchPittsburgh(Pittsburgh,USA)andtwofromICSI(Berkeley,USA).IRP'sInternetlinkwasaT1atthetimeourtracewastaken,whereasICSI'sisovera100MbpsbertoUCBerkeley.Allthreetracescontainthefullpayloadsofallpackets.TheICSIandICSI2tracesonlycontaininboundtraf-ctoTCPport80,andareIP-source-anonymized.Bothsiteshaveaddressspacesof29IPaddresses,buttheICSItracescontainmoreport80trafc,asICSI'swebserversaremorefrequentlyvisitedthanIRP's.Forcomparison,weobtainthefulllistofHTTPwormsinthetracesusingBrowithwell-knownsignaturesfortheCode-Red,Code-RedII,andNimdaHTTPworms,andforanAgobotwormvariantthatexploitstheWebDAVbufferover-owvulnerability(presentonlyintheICSI2trace).Table2summarizesthecharacteristicsofallthreetraces. IRP ICSI ICSI2 MeasurementPeriod Aug1-7 Jan26 Mar22-29 2003 2004 2004 1week 24hours 1week InboundHTTPpackets 70K 793K 6353K InboundHTTPows 26K 102K 825K HTTPwormsources 72 351 1582 scanned 56 303 1344 notscanned 16 48 238 Nimdasources 18 57 254 CodeRedIIsources 54 294 997 WebDavexploitsources - - 336 HTTPwormows 375 1396 7127 Nimdaows 303 1022 5392 CodeRedows 72 374 1365 WebDavexploitows - - 370 Table2:Summaryoftraces.Autograph'ssuspiciousowclassieridentiesunsuccess-fulconnectionattemptsineachtrace.FortheIRPtrace,Au-tographusesICMPhost/portunreachablemessagestocom-pilethelistofsuspiciousremoteIPaddresses.AsneitherICSItraceincludesoutboundICMPpackets,Autographin-fersfailedconnectionattemptsinthosetracesbylookingatincomingTCPSYNandACKpairs.WerunAutographwithvariedscannerdetectionthresh-olds,s2f1;2;4g.ThesethresholdsarelowerthanthoseusedbyBroandSnort,intheinterestofcatchingasmanywormpayloadsaspossible(crucialearlyinanepidemic).Asare-sult,ourowclassiermisclassiesowsassuspiciousmoreoften,andmoreinnocuousowsaresubmittedforsignaturegeneration.Wealsovarytheminimumcontentblocksize(m)andaver-agecontentblocksize(a)parametersthatgovernCOPP,butxthemaximumcontentblocksize(M)at1024bytes.Wevaryw[10%;100%]inourexperiments.Recallthatwlim-itsthefractionofsuspiciousowsthatmaycontributecontenttothesignatureset.COPPaddscontentblockstothesigna-tureset(mostprevalentcontentblockrst,andtheninorderofdecreasingprevalence)untiloneormorecontentblocksinthesetmatchwpercentofowsinthesuspiciousowpool.Werstcharacterizethecontentblockprevalencedistribu-tionfoundbyAutographwithasimpleexample.Figure5showstheprevalenceofcontentblocksfoundbyCOPPwhenwerunCOPPwithm64,a64,andw100%overasus-piciousowpoolcapturedfromthefull24-hourICSItracewiths1.Atw100%,COPPaddscontentblockstothesignaturesetuntilallsuspiciousowsarematchedbyoneormorecontentblocksintheset.Here,thexaxisrepresentstheorderinwhichCOPPaddscontentblockstothesigna-tureset(mostprevalentrst).Theyaxisrepresentsthecu-mulativefractionofthepopulationofsuspiciousowscon-taininganyofthesetofsignatures,asthesetofsignaturesgrows.ThetracecontainsCode-RedII,Nimda,andWebDAV 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 CDF Selected Content Block IDPopularity Distribution of Suspicious Flows (s=1, m=64, a=64), ICSICodeRedIINimda(1)Nimda(16)Misclassified(1) Figure5:PrevalenceofSelectedContentBlocksinSuspi-ciousFlowPool,ICSIDMZtrace(24hrs).wormows.Nimdasourcessend16differentowswithev-eryinfectionattempt,tosearchforvulnerabilitiesunder16differentURLs.TherstsignatureCOPPgeneratesmatchesCode-RedII;28%ofthesuspiciousowsareCode-RedIIin-stances.Next,COPPselects16contentblocksassignatures,oneforeachofthedifferentpayloadsNimda-infectedma-chinestransmit.About5%ofthesuspiciousowsaremis-classiedows.Weobservethatcommonalityacrossthosemisclassiedowsisinsignicant.Thus,thecontentblocksfromthosemisclassiedowstendtobelowlyranked.Tomeasuretruepositives(fractionofwormowsfound),werunBrowiththestandardsetofpoliciestodetectworms(distributedwiththeBrosoftware)onatrace,andthenrunBrousingthesetofsignaturesgeneratedbyAutographonthatsametrace.ThetruepositiverateisthefractionofthetotalnumberofwormsfoundbyBro'ssignatures(presumedtondallworms)alsofoundbyAutograph'ssignatures.Tomeasurefalsepositives(fractionofnon-wormowsmatchedbyAutograph'ssignatures),wecreateasanitizedtraceconsistingofallnon-wormtrafc.Todoso,weelimi-nateallowsfromatracethatareidentiedbyBroasworms.WethenrunBrousingAutograph'ssignaturesonthesani-tizedtrace.ThefalsepositiverateisthefractionofallowsinthesanitizedtraceidentiedbyAutograph'ssignaturesasworms.Becausethenumberoffalsepositivesisverylowcom-paredtothetotalnumberofHTTPowsinthetrace,were-portourfalsepositiveresultsusingtheefciencymetricpro-posedbyStanifordetal.[17].Efciencyistheratioofthenumberoftruepositivestothetotalnumberofpositives,bothfalseandtrue.Efciencyisproportionaltothenumberoffalsepositives,butshowsthedetailinthefalsepositivetrendwhenthefalsepositiverateislow.ThegraphsinFigure6showthesensitivityandtheef-ciencyofthesignaturesgeneratedbyAutographrunningonthefull24-hourICSItraceforvariedm.Here,wepresentexperimentalresultsfors2,buttheresultsforothersare 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Efficiency Coverage (w)a=16 a=32 a=40 a=64 a=128 0 0.2 0.4 0.6 0.8 1 Sensitivity Minimum block size (m) = 16, (ICSI) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Efficiency Coverage (w)a=32 a=40 a=64 a=128 0 0.2 0.4 0.6 0.8 1 Sensitivity Minimum block size (m) = 32, (ICSI) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Efficiency Coverage (w)a=40 a=64 a=128 0 0.2 0.4 0.6 0.8 1 Sensitivity Minimum block size (m) = 40, (ICSI) 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Efficiency Coverage (w)a=64 a=128 0 0.2 0.4 0.6 0.8 1 Sensitivity Minimum block size (m) = 64, (ICSI) Figure6:SensitivityandEfciencyofSelectedSignatures,ICSIDMZtrace(24hrs). 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Number of signatures Coverage (=w)Minimum block size (m) = 16, (ICSI) a=16 a=32 a=40 a=64 a=128 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Number of signatures Coverage (=w)Minimum block size (m) = 32, (ICSI) a=32 a=40 a=64 a=128 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Number of signatures Coverage (=w)Minimum block size (m) = 40, (ICSI) a=40 a=64 a=128 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Number of signatures Coverage (=w)Minimum block size (m) = 64, (ICSI) a=64 a=128 Figure7:NumberofSignatures,ICSIDMZtrace(24hrs). similar.Notethatintheseexperiments,weapplythesigna-turesAutographgeneratesfromthe24-hourtracetothesame24-hourtraceusedtogeneratethem.Thexaxisvariesw.Aswincreases,thesetofsignaturesAutographgeneratesleadstogreatersensitivity(fewerfalsenegatives).Thisresultisexpected;greaterwvaluescauseAu-tographtoaddcontentblockstothesignaturesetforanever-greaterfractionofthesuspiciousowpool.Thus,ifawormappearsrarelyinthesuspiciousowpool,andthusgeneratesnon-prevalentcontentblocks,thoseblockswilleventuallybeincludedinthesignatureset,forsufcientlylargew.However,recallfromFigure5thatabout5%ofthesuspi-ciousowsareinnocuousowsthataremisclassiedbytheport-scannerheuristicassuspicious.Asaresult,forw95%,COPPrisksgeneratingalessspecicsignatureset,asCOPPbeginstoselectcontentblocksfromtheinnocuousows.ThosecontentblocksaremostoftenHTTPtrailers,foundincommonacrossmisclassiedinnocuousows.Forthistrace,COPPwithw[90%;94:8%]producesasetofsignaturesthatisperfect:itcauses0falsenegativesand0falsepositives.Ourclaimisnotthatthiswparametervalueisvalidfortracesatdifferentsites,orevenatdifferenttimes;onthecontrary,weexpectthattherangeinwhichnofalseposi-tivesandnofalsenegativesoccursissensitivetothedetailsofthesuspiciousowpopulation.Note,however,thattheexis-tenceofarangeofwvaluesforwhichperfectsensitivityandspecicityarepossibleservesasaverypreliminaryvalida-tionoftheCOPPapproach—ifnosuchrangeexistedforthistrace,COPPwouldalwaysbeforcedtotradefalsenegativesforfalsepositives,orvice-versa,foranywparametersetting.FurtherevaluationofCOPPonamorediverseandnumeroussetoftrafctracesisclearlyrequiredtodeterminewhethersucharangeexistsforawiderrangeofworkloads.DuringexaminationofthefalsepositivecasesfoundbyAutograph-generatedsignatureswhenw94:8%,wenotedwithinterestthatAutograph'ssignaturesdetectedNimdasourcesnotdetectedbyBro'sstocksignatures.ThereareonlythreestocksignaturesusedbyBrotospotaNimdasource,andtheNimdasourcesintheICSItracedidnottransmitthoseparticularpayloads.Weremovedthesefewcasesfromthecountoffalsepositives,asAutograph'ssignaturescorrectlyidentiedthemaswormows,andthuswehaderroneouslyaggedthemasfalsepositivesbyassumingthatanyownotcaughtbyBro'sstocksignaturesisnotaworm.WenowturntotheeffectofcontentblocksizeonthespecicityandthenumberofsignaturesAutographgenerates.Eveninthepresenceofinnocuousowsmisclassiedassus-picious,thelargestaverageandminimumcontentblocksizes(suchas64and128bytes)avoidmostfalsepositives;ef-ciencyremainscloseto1.Weexpectthisresultbecausein-creasedblocksizelowerstheprobabilityofndingcommoncontentacrossmisclassiedowsduringthesignaturegen-erationprocess.Moreover,assignaturelengthincreases,thenumberofinnocuousowsthatmatchasignaturedecreases.Thus,choosinglargeraandmvalueswillhelpAutographavoidgeneratingsignaturesthatcausefalsepositives.Note,however,thereisatrade-offbetweencontentblocklengthandthenumberofsignaturesAutographgenerates,too.Forlargeaandm,itismoredifcultforCOPPtodetectcommonalityacrosswormowsunlesstheowsareidentical.Soasaandmincrease,COPPmustse-lectmoresignaturestomatchanygroupofvariantsofawormthatcontainsomecommoncontent.ThegraphsinFigure7presentthesizeofthesignaturesetAutographgeneratesasafunctionofw.Forsmalleraandm,Au-tographneedsfewercontentblockstocoverwpercentofthesuspiciousows.Inthistrace,forexample,COPPcanselectashortbytesequenceincommonacrossdif-ferentNimdapayloadvariants(e.g.,cmd.exe?c+dirHTTP/1.0..Host:www..Connection:close....)whenweusesmallaandm,suchas16.Thesizeofthesignaturesetbecomesaparticularconcernwhenwormsaggressivelyvarytheircontentacrossinfectionattempts,aswediscussinthenextsection.Beforecontinuingon,wenotethatresultsobtainedrunningAutographontheIRPandICSI2tracesarequitesimilartothosereportedabove,andarethereforeelidedintheinterestofbrevity.4.2PolymorphicandMetamorphicWorms 8162432404864128Number of Signatures min=8 min=16 min=24 min=32 min=40 min=48 Figure8:Contentblocksizevs.numberofsignatures.Weexpectshortcontentblockstobemostrobustagainstwormsthatvarytheircontent,suchaspolymorphicworms,whichencrypttheircontentdifferentlyoneachconnection,andmetamorphicworms,whichobfuscatetheirinstructionsequencesoneachconnection.Unfortunately(fortunately?)nosuchInternetwormhasyetbeenreportedinthewild.TotestAutograph'srobustnessagainstthesevaryingworms,wegenerateasyntheticpolymorphicwormbasedontheCode-RedIIpayload.ACode-RedIIwormpayloadconsistsofaregularHTTPGETheader,morethan220llercharacters,asequenceofUnicode,andthemainwormexecutablecode.TheUnicodesequencecausesabufferoverowandtransfersexecutionowtothesubsequentwormbinary.Weuseran-domvaluesforallllerbytes,andevenforthewormcode, butleavetheHTTPGETcommandand56-byteUnicodese-quencexed.Thisdegreeofvariationincontentismoreseverethanthatintroducedbythevariousobfuscationtech-niquesdiscussedbyChristodorescuetal.[2].AsshowninFigure8,whenarelativelyshort,invariantstringispresentinapolymorphicormetamorphicworm,Autographcanndashortsignaturethatmatchesit,whenrunwithsmallaverageandminimumcontentblocksizes.However,suchshortcon-tentblocksizesmaybeunspecic,andthusyieldsignaturesthatcausefalsepositives.5Evaluation:DistributedSignatureDetectionOurevaluationofAutographintheprecedingsectionfocusedchieyonthebehaviorofasinglemonitor'scontent-basedapproachtosignaturegeneration.Thatevaluationconsid-eredthecaseofofinesignaturedetectiononaDMZtrace24hoursinlength.WenowturntoanexaminationofAu-tograph'sspeedindetectingasignatureforanewwormaf-tertheworm'srelease,anddemonstratethatoperatingmul-tiple,distributedinstancesofAutographsignicantlyspeedsthisprocess,vs.runningasingleinstanceofAutographonasingleedgenetwork.Weuseacombinationofsimula-tionofaworm'spropagationandDMZ-trace-drivensimu-lationtoevaluatethesystemintheonlinesetting;oursenseofethicsrestrainsusfromexperimentallymeasuringAuto-graph'sspeedatdetectinganovelworminvivo.MeasuringhowquicklyAutographdetectsandgeneratesasignatureforanewlyreleasedwormisimportantbecauseithasbeenshownintheliteraturethatsuccessfullycontain-ingawormrequiresearlyintervention.RecallthatProvos'results[12]showthatreversinganepidemicsuchthatfewerthan50%ofvulnerablehostseverbecomeinfectedcanre-quireinterveningintheworm'spropagationbefore5%ofvulnerablehostsareinfected.Twodelayscontributetothetotaldelayofsignaturegeneration:HowlongmustanAutographmonitorwaituntilitaccu-mulatesenoughwormpayloadstogenerateasignatureforthatworm?OnceanAutographmonitorreceivessufcientwormpayloads,howlongwillittaketogenerateasignaturefortheworm,giventhebackground“noise”(innocuousowsmisclassiedassuspicious)inthetrace?Weproceednowtomeasurethesetwodelays.5.1Singlevs.MultipleMonitorsLetusnowmeasurethetimerequiredforanAutographmon-itortoaccumulatewormpayloadsafterawormisreleased.Werstdescribeoursimulationmethodologyforsimulat-ingaCode-RedI-v2-likeworm,whichisafterthatofMooreetal.[9].Wesimulateavulnerablepopulationof338,652 0 10 20 30 40 50 60 70 80 90 100 0 50 100 150 200 250 Infected Machines (%) Time (min) Figure9:InfectionprogressforasimulatedCode-RedI-v2-likeworm.hosts,thenumberofinfectedsourceIPsobservedin[8]thatareuniquelyassignabletoasingleAutonomousSystem(AS)intheBGPtabledata(obtainedfromRouteViews[20])ofthe19thofJuly,2001,thedateoftheCode-Redoutbreak.Thereare6378ASesthatcontainatleastonesuchvulnera-blehostinthesimulation.UnlikeMooreetal.,wedonotsimulatethereachabilityamongASesinthatBGPtable;wemakethesimplifyingassumptionthatallASesmayreachallotherASes.ThisassumptionmaycausethewormtospreadsomewhatfasterinoursimulationthaninMooreetal.'s.WeassignactualIPaddressrangesforrealASesfromtheBGPtablesnapshottoeachASinthesimulation,accordingtoatruncateddistributionoftheper-ASIPaddressspacesizesfromtheentireBGPtablesnapshot.Thedistributionofad-dressrangesweassignistruncatedinthatweavoidassigninganyaddressblockslargerthan/16stoanyASinthesimu-lation.Weavoidlargeaddressblocksfortworeasons:rst,fewsuchmonitoringpointsexist,soitmaybeunreasonabletoassumethatAutographwillbedeployedatone,andsec-ond,awormprogrammermaytriviallycodeawormtoavoidscanningaddresseswithina/8knowntoharboranAutographmonitor.Ouravoidanceoflargeaddressblocksonlylength-ensthetimeitwilltakeAutographtogenerateawormsig-natureafteranovelworm'srelease.Weassume50%oftheaddressspacewithinthevulnerableASesispopulatedwithreachablehosts,that25%ofthesereachablehostsrunwebservers,andwexthe338,652vulnerablewebserversuni-formlyatrandomamongthetotalpopulationofwebserversinthesimulation.Finally,thesimulatedwormpropagatesus-ingrandomIPaddressscanningovertheentire228non-class-DIPaddressspace,andaproberateof10probespersec-ond.Wesimulatenetworkandprocessingdelays,randomlychosenin[0:5;1:5]seconds,betweenavictim'sreceiptofaninfectingconnectionanditsinitiationofoutgoinginfectionattempts.Webegintheepidemicbyinfecting25vulnerablehostsattimezero.Figure9showsthegrowthoftheepidemicwithinthevulnerablehostpopulationovertime. 1 10 100 1000 10000 0 50 100 150 200 Number of Payloads Time (min) s=1 (Max) s=4 (Max) s=1 (Median) s=4 (Median) Figure10:Payloadsobservedovertime:single,isolatedmonitors.Intheserstsimulations,weplaceAutographmonitorsatarandomlyselected1%oftheASesthatincludevulnerablehosts(63monitors).Figure10showsthemaximumandme-diannumbersofpayloadsdetectedovertimeacrossallmon-itors;notethattheyaxisislog-scaled.First,letusconsiderthecasewhereonlyasinglesiteontheInternetdeploysAu-tographonitsnetwork.Inthiscase,itisthemediantimerequiredbyall63monitorstodetectagivennumberofowsthatapproximatestheexpectedtimeforasingletonmonitortodothesame.Whenmonitorsidentifyportscannersaggres-sively,afterasinglefailedconnectionfromasourceaddress(s1),themedianmonitoraccumulates5wormpayloadsaf-terover9000seconds.Usingthemoreconservativeport-scanthresholds4,themedianmonitoraccumulatesnopayloadswithin10000seconds.Theseresultsarenotencouraging—fromFigure9,weknowthatafter9000seconds(150min-utes),over25%ofvulnerablehostshavebeeninfected.Nowletusconsiderthecasewhere63monitorsareallinactiveusesimultaneouslyanddistributedly.Ifwepresumethattherstmonitortogenerateasignatureforthewormmay(nearly)instantlydisseminatethatsignaturetoallwhowishtolterwormtrafc,byapplication-levelmulticast[1]orothermeans,theearliestAutographcanpossiblyndtheworm'ssignatureisgovernedbythe“luckiest”monitorinthesystem—therstonetoaccumulatetherequirednumberqofwormpayloads.The“luckiest”monitorinthissimu-lateddistributeddeploymentdetects5wormpayloadsshortlybefore4000secondshaveelapsed.Thisresultisfarmoreencouraging—after4000seconds(66minutes),fewerthan1%ofvulnerablehostshavebeeninfected.Thus,providedthatallAutographmonitorsdisseminatethewormsignaturestheydetectinatimelyfashion,thereisimmensebenetinthespeedofdetectionofasignatureforanovelwormwhenAutographisdeployeddistributedly,evenatasfewas1%ofASesthatcontainvulnerablehosts.Usingthemoreconservativeport-scanthresholds4,themonitorinthedistributedsystemtohaveaccumulatedthemostwormpayloadsafter10000secondshasstillonlycol-lected4.Here,again,weobservethattargetingincreasedspecicity(byidentifyingsuspiciousowsmoreconserva-tively)comesatacostofreducedsensitivity;inthiscase,sensitivitymaybeseenasthenumberofwormowsmatchedovertime.RunningmultipleindependentAutographmonitorsclearlypaysadividendinfasterwormsignaturedetection.Anaturalquestionthatfollowsiswhetherdetectionspeedmightbeim-provedfurtheriftheAutographmonitorssharedinformationwithoneanotherinsomeway.5.2tattler:DistributedGatheringofSuspectIPAddressesAtthestartofaworm'spropagation,theaggregaterateatwhichallinfectedhostsscantheIPaddressspaceisquitelow.BecauseAutographreliesonoverhearingunsuccessfulscanstoidentifysuspicioussourceIPaddresses,earlyinanepi-demicanAutographmonitorwillbeslowtoaccumulatesus-piciousaddresses,andinturnslowtoaccumulatewormpay-loads.WenowintroduceanextensiontoAutographnamedtattlerthat,asitsnamesuggests,sharessuspicioussourcead-dressesamongallmonitors,towardthegoalofacceleratingtheaccumulationofwormpayloads.WeassumeinthedesignoftattlerthatamulticastfacilityisavailabletoallAutographmonitors,andthattheyalljoinasinglemulticastgroup.WhileIPmulticastisnotabroadlydeployedserviceontoday'sInternet,therearemanyviableend-system-orientedmulticastsystemsthatcouldprovidethisfunctionality,suchasScribe[1].Inbrief,Autographmoni-torinstancescouldformaPastryoverlay,anduseScribetomulticasttothesetofallmonitors.WefurtherassumethatusersarewillingtopublishtheIPaddressesthathavebeenportscanningthem.8ThetattlerprotocolisessentiallyanapplicationoftheRTPControlProtocol(RTCP)[14],originallyusedtocontrolmul-ticastmultimediaconferencingsessions,slightlyextendedforuseintheAutographcontext.ThechiefgoalofRTCPistoallowasetofsenderswhoallsubscribetothesamemul-ticastgrouptoshareacappedquantityofbandwidthfairly.InAutograph,weseektoallowmonitorstoannouncetoothersthe(IP-addr,dst-port)pairstheyhaveob-servedportscanningthemselves,tolimitthetotalbandwidthofannouncementssenttothemulticastgroupwithinapre-determinedcap,andtoallocateannouncementbandwidthrel-ativelyfairlyamongmonitors.WerecountthesalientfeaturesofRTCPbriey:Apopulationofsendersalljoinsthesamemulticastgroup.Eachisconguredtorespectthesametotalband-widthlimit,B,fortheaggregatetrafcsenttothegroup.EachsendermaintainsanintervalvalueIitusesbetweenitsannouncements.Transmissionsarejittereduniformly atrandomwithin[0:5;1:5]timesthistimervalue.EachsenderstoresalistoftheuniquesourceIPad-dressesfromwhichithasreceivedannouncementpack-ets.Bycountingthese,eachsenderlearnsanestimateofthetotalnumberofsenders,N.Entriesinthelistex-pireiftheirsourcesarenotheardfromwithinatimeoutinterval.EachsendercomputesIN=B.Senderskeeparun-ningaverageofthesizesofallannouncementpacketsreceived,andscaleIaccordingtothesizeofthean-nouncementtheywishtosendnext.Whentoomanysendersjoininabriefperiod,theaggre-gatesendingratemayexceedC.RTCPusesareconsid-erationproceduretocombatthiseffect,wherebysenderslengthenIprobabilistically.SenderswhichdepartmayoptionallysendaBYEpacketincompliancewiththeIinter-announcementinterval,tospeedothersenders'learningofthedecreaseinthetotalgroupmembership.RTCPhasbeenshowntoscaletothousandsofsenders.Inthetattlerprotocol,eachannouncementamonitormakescontainsbetweenoneand100port-scannerreportsoftheform(src-IP,dst-port).Monitorsonlyannouncescannersthey'veheardthemselves.Hearingareportfromanothermonitorforascannersuppressesannouncementofthatscannerforarefreshinterval.Afteratimeoutinterval,amonitorexpiresascannerentryifthatscannerhasnotdirectlyscanneditandnoothermonitorhasannouncedthatscanner.AnnouncementpacketsaresentinaccordancewithRTCP.EverytimetheintervalIexpires,amonitorsendsanyan-nouncementsithasaccumulatedthathaven'tbeensuppressedbyothermonitors'announcements.Ifthemonitorhasnoportscanstoreport,itinsteadsendsaBYE,torelinquishitsshareofthetotalreportchannelbandwidthtoothermonitors.Figure11showsthebandwidthconsumedbythetat-tlerprotocolduringasimulatedCode-RedI-v2epidemic,forthreedeployedmonitorpopulations(6,63,and630moni-tors).WeuseanaggregatebandwidthcapCof512Kbpsinthissimulation.Notethatthepeakbandwidthconsumedacrossalldeploymentsisamere15Kbps.Thus,sharingportscannerinformationamongmonitorsisquitetractable.Whilewe'venotyetexplicitlyexploreddisseminationofsignaturesinourworkthusfar,weexpectasimilarprotocoltotattlerwillbeusefulandscalableforadvertisingsignatures,bothtoAutographmonitorsandtootherboxesthatmaywishtolterusingAutograph-generatedsignatures.Notewellthat“background”portscanningactivitiesun-relatedtothereleaseofanewwormareprevalentontheInternet,andtattlermusttoleratetheloadcausedby 0 5 10 15 20 0 50 100 150 200 Bandwidth (kbps) Time (min)10% deployment 1% deployment 0.1% deployment Figure11:BandwidthconsumedbytattlerduringaCode-RedIv2epidemic,forvaryingnumbersofdeployedmonitors.suchbackgroundportscanning.dshield.org[4]re-portsdailymeasurementsofportscanningactivities,asmeasuredbymonitorsthatcoverapproximately219IPad-dresses.Thedshield.orgstatisticsfromDecember2003andJanuary2004suggestthatapproximately600,000unique(source-IP,dst-port)pairsoccurina24-hourpe-riod.Ifweconservativelydoublethatgure,tattlerwouldhavetodeliver1.2Mreportsperday.Asimpleback-of-the-envelopecalculationrevealsthattattlerwouldconsume570bits/secondtodeliverthatreportvolume,assumingonean-nouncementpacketper(source-IP,dst-port)pair.Thus,backgroundportscanningasitexistsintoday'sInternetrepresentsinsignicantloadtotattler.WenowmeasuretheeffectofrunningtattleronthetimerequiredforAutographtoaccumulatewormowpayloadsinadistributeddeployment.Figure12showsthetimerequiredtoaccumulatepayloadsinadeploymentof63monitorsthatusetattler.Notethatforaportscannerdetectionthresholds1,theshortesttimerequiredtoaccumulate5payloadsacrossmonitorshasbeenreducedtoapproximately1500sec-onds,fromnearly4000secondswithouttattler(asshowninFigure10).Thus,sharingscanneraddressinformationamongmonitorswithtattlerspeedswormsignaturedetection.Insum,runningadistributedpopulationofAutographmonitorsholdspromiseforspeedingwormsignaturedetec-tionintwoways:itallowsthe“luckiest”monitorthatrstac-cumulatessufcientwormpayloadsdeterminethedelayuntilsignaturedetection,anditallowsmonitorstochatteraboutport-scanningsourceaddresses,andthusallmonitorsclas-sifywormowsassuspiciousearlier.5.3Online,Distributed,DMZ-Trace-DrivenEvaluationThesimulationresultspresentedthusfarhavequantiedthetimerequiredforAutographtoaccumulatewormpayloads 1 10 100 1000 10000 0 50 100 150 200 Number of Payloads Time (min) s=1 (Max) s=4 (Max) Figure12:Payloadsobservedovertime:tattleramongdis-tributedmonitors.afteraworm'srelease.WenowuseDMZ-trace-drivensim-ulationontheone-dayICSItracetomeasurehowlongittakesAutographtoidentifyanewlyreleasedwormamongthebackgroundnoiseofowsthatarenotworms,buthavebeencategorizedbytheowclassierassuspiciousafterportscanningthemonitor.Weareparticularlyinterestedinthetrade-offbetweenearlysignaturegeneration(sensitivityacrosstime,inasense)andspecicityofthegeneratedsig-natures.WemeasurethespeedofsignaturegenerationbythefractionofvulnerablehostsinfectedwhenAutographrstdetectstheworm'ssignature,andthespecicityofthegener-atedsignaturesbycountingthenumberofsignaturesgener-atedthatcausefalsepositives.Weintroducethislattermetricforspecicitybecauserawspecicityisdifculttointerpret:ifasignaturebasedonnon-worm-owcontent(fromamis-classiedinnocuousow)isgenerated,thenumberoffalsepositivesitcausesdependsstronglyonthetrafcmixatthatparticularsite.Furthermore,anunspecicsignaturemayberelativelystraightforwardtoidentifyassuchwith“signatureblacklists”(disallowedsignaturesthatshouldnotbeusedforlteringtrafc)providedbyasystemoperator.9WesimulateanonlinedeploymentofAutographasfol-lows.WerunasingleAutographmonitorontheICSItrace.ToinitializethelistofsuspiciousIPaddressesknowntothemonitor,werunBroontheentire24-hourtraceusingallknownwormsignatures,andexcludewormowsfromthetrace.Wethenscantheentireresultingworm-free24-hourtraceforportscanactivity,andrecordthelistofportscannersdetectedwiththresholdsofs2f1;2;4g.Toemu-latethesteady-stateoperationofAutograph,wepopulatethemonitor'ssuspiciousIPaddresslistwiththefullsetofportscannersfromoneoftheselists,sothatallowsfromthesesourceswillbeclassiedassuspicious.Wecanthengenerateabackgroundnoisetrace,whichconsistsofonlynon-wormowsfromportscanners,aswouldbedetectedbyarunningAutographmonitorforeachofs2f1;2;4g.Figure13showsthequantityofnon-wormnoiseowsinAutograph'ssuspi- 0 5 10 15 20 6:00 12:00 18:00 24:00 Background Noise s=1 0 5 10 15 20 6:00 12:00 18:00 24:00 Background Noise s=2 0 5 10 15 20 6:00 12:00 18:00 24:00 Background Noise Times=4 Figure13:Background“noise”owsclassiedassuspiciousvs.time,withvaryingport-scannerthresholds;ICSIDMZtrace.cioustrafcpooloverthetrace'sfull24hours.Wesimulatethereleaseofanovelwormatatimeofourchoosingwithinthe24-hourtraceasfollows.Wecong-ureAutographwithasignaturegenerationperiodicityrof10minutes,andaholdingperiodtforthesuspiciousowpoolof30minutes.UsingthesimulationresultsfromSec-tion5.2,wecountthenumberofwormowsexpectedtohavebeenaccumulatedbythe“luckiest”monitoramongthe63deployedduringeach30-minuteperiod,atintervalsof10minutes.WethenaddthatnumberofcompleteCode-RedI-v2ows(availablefromthepristine,unlteredtrace)tothesuspicioustrafcpoolfromthecorresponding30-minutepor-tionoftheICSItrace,toproducearealisticmixofDMZ-tracenoiseandtheexpectedvolumeofwormtrafc(aspredictedbythewormpropagationsimulation).Inthesesimulations,wevaryq,thetotalnumberofowsthatmustbefoundinthesuspicioustrafcpooltocausesignaturegenerationtobetriggered.Allsimulationsusew95%.Becausethequan-tityofnoisevariesovertime,weuniformlyrandomlychoosethetimeoftheworm'sintroduction,andtakemeansovertensimulations.Figure14showsthefractionofthevulnerablehostpopu-lationthatisinfectedwhenAutographdetectsthenewlyre-leasedwormasafunctionofq,forvaryingportscannerde-tectionsensitivities/specicities(s2f1;2;4g).Notethelog-scalingofthexaxis.Theseresultsdemonstratethatforaverysensitive/unspecicowclassier(s1),acrossawiderangeofqs(between1and40),Autographgeneratesasig-natureforthewormbeforethewormspreadstoeven1%ofvulnerablehosts.Astheowclassierimprovesinspecicitybutbecomeslesssensitive(sf2;4g),Autograph'sgenera-tionoftheworm'ssignatureisdelayed,asexpected.Figure15showsthenumberofunspecic(false-positive-inducing)signaturesgeneratedbyAutograph,asafunctionofq,fordifferentsensitivities/specicitiesofowclassier.The 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 100 1000 Fraction of vulnerable hosts uninfectedqs=1 s=2 s=4 Figure14:Fractionofvulnerablehostsuninfectedwhenwormsignaturedetectedvs.q,numberofsuspiciousowsrequiredtotriggersignaturedetection. 0 0.5 1 1.5 2 2.5 3 3.5 4 10 100 1000 Number of Unspecific Signaturesqs=1 s=2 s=4 Figure15:Numberofunspecicsignaturesgeneratedvs.q,numberofsuspiciousowsrequiredtotriggersignaturede-tection.goal,ofcourse,isforthesystemtogeneratezerounspecicsignatures,buttogenerateawormsignaturebeforethewormspreadstoofar.Ourresultsshowthatfors2andq15,Autographgeneratessignaturesthatcausenofalsepositives,yetgeneratesthesignatureforthenovelwormbefore2%ofvulnerablehostsbecomeinfected.Ourpointisnottoarguefortheseparticularparametervalues,butrathertoshowthatthereexistsaregionofoperationwherethesystemmeetsourstateddesigngoals.Moreimportantly,though,theseresultsshowthatanimprovedowclassierimprovesAutograph—asowclassiersbenetfromfurtherresearchandimprove,Autographcanadopttheseimprovementstoofferfasterwormsignaturegenerationwithlowerfalsepositiverates.6AttacksandLimitationsWebrieycatalogafewattacksthatonemightmountagainstAutograph,andlimitationsofthecurrentsystem.Overload.AutographreassemblessuspiciousTCPows.Flowreassemblyiscostlyinstateincomparisonwithpro-cessingpacketsindividually,butdefeatsthesubterfugeoffragmentingaworm'spayloadacrossmanysmallpack-ets[11].Wenotethatthenumberofinboundowsamoni-torobservesmaybelarge,inparticularafterawormspreadssuccessfully.IfAutographtriestoreassembleeveryincom-ingsuspiciousow,itmaybesusceptibletoDoSattack.WenotethatAutographtreatsalldestinationportsseparately,andthusparallelizeswellacrossports;asitecouldrunmultipleinstancesofAutographonseparatehardware,andthusin-creaseitsaggregateprocessingpower,forowreassemblyandallotherprocessing.Autographmayalsosamplesuspi-ciousowswhenthenumberofsuspiciousowstoprocessexceedssomethreshold;weintendtoinvestigatethisheuris-ticinfuture.Source-address-spoofedportscans.PortscansfromspoofedIPsourceaddressesareaperilformostIDSes.Thechiefreasonformonitoringportscansistolimitthedamagetheiroriginatorscaninict,mostoftenbylteringpacketsthatoriginatefromknownportscanners.Suchlteringin-vitesattackerstospoofportscansfromtheIPaddressesofthosewhosetrafctheywouldliketoblock[11,5].Source-spoofedportscanscanbeusedtomountdifferentattacks,morespecictoAutograph:thetattlermechanismmustcarryreporttrafcproportionaltothenumberofportscanners.Anattackercouldattempttosaturatetattler'sbandwidthlimitwithspoofedscannersourceaddresses,andthusrendertat-tleruselessindisseminatingaddressesoftrueportscanners.Asource-spoongattackercouldalsocausearemotesource'strafctobeincludedbyAutographinsignaturegeneration.Fortunately,asimplemechanismholdspromiseforren-deringboththeseattacksineffective.AutographclassiesaninboundSYNdestinedforanunpopulatedIPaddressorportwithnolisteningprocessasaportscan.ToidentifyTCPportscansfromspoofedIPsourceaddresses,anAutographmon-itorcouldrespondtosuchinboundSYNswithaSYN/ACK,providedtherouterand/orrewallonthemonitorednetworkcanbecongurednottorespondwithanICMPhostorportunreachable.IftheoriginatoroftheconnectionrespondswithanACKwiththeappropriatesequencenumber,thesourcead-dressontheSYNcouldnothavebeenspoofed.ThemonitormaythussafelyviewallsourceaddressesthatsendproperACKresponsestoSYN/ACKsasportscanners.Non-ACKresponsestotheseSYN/ACKs(RSTsorsilence)canthenbeignored;i.e.,thesourceaddressoftheSYNisnotrecordedasaportscanner.Notethatwhileanon-source-spoongportscannermaychoosenottorespondwithanACK,anysourcethathopestocompleteaconnectionandsuccessfullytrans-feraninfectingpayloadmustrespondwithanACK,andthusidentifyitselfasaportscanner.Jungetal.independentlyproposethissametechniquein[5]. Hit-listscanning.Ifawormpropagatesusingahitlist[18],ratherthanbyscanningIPaddressesthatmayormaynotcorrespondtolisteningservers,Autograph'sport-scan-basedsuspiciousowclassierwillfailutterlytoincludethatworm'spayloadsinsignaturegeneration.Identifyingwormowsthatpropagatebyhitlistsisbeyondthescopeofthispaper.Weareunawareatthiswritingofanypublishedsys-temthatdetectssuchows;state-of-the-artmaliciouspayloadgatheringmethods,suchashoneypots,aresimilarlystymiedbyhit-listpropagation.Nevertheless,anyfutureinnovationinthedetectionofowsgeneratedbyhit-list-usingwormsmaybeincorporatedintoAutograph,toaugmentorreplacethenaiveport-scan-basedheuristicusedinourprototype.7RelatedWorkSinghetal.[15]generatesignaturesfornovelwormsbymea-suringpacketcontentprevalenceandaddressdispersionatasinglemonitoringpoint.Theirsystem,EarlyBird,avoidsthecomputationalcostofowreassembly,butissuscepti-bletoattacksthatspreadworm-specicbytepatternsoverasequenceofshortpackets.Autographinsteadincurstheexpenseofowreassembly,butmitigatesthatexpensebyrstidentifyingsuspiciousows,andthereafterperformingowreassemblyandcontentanalysisonlyonthoseows.EarlyBirdreversesthesestages;itndssub-packetcontentstringsrst,andappliestechniquestolteroutinnocuouscontentstringssecond.AutographandEarlyBirdbothmakeuseofRabinngerprints,thoughindifferentways:Auto-graph'sCOPPtechniqueusesthemasdidLBFS,tobreakowpayloadsintonon-overlapping,variable-lengthchunksefciently,basedonpayloadcontent.EarlyBirdusesthemtogeneratehashesofoverlapping,xed-lengthchunksateverybyteoffsetinapacketefciently.Singhetal.independentlydescribeusingawhite-listtodisallowsignaturesthatcausefalsepositives(describedhereinasablacklistforsignatures,ratherthanawhite-listfortrafc),andreportexamplesoffalsepositivesthatarepreventedwithsuchawhite-list[16].KreibichandCrowcroft[6]describeHoneycomb,asystemthatgatherssuspicioustrafcusingahoneypot,andsearchesforleastcommonsubstringsinthattrafctogeneratewormsignatures.Honeycombreliesontheinherentsuspiciousnessoftrafcreceivedbyahoneypottolimitthetrafcconsid-eredforsignaturegenerationtotrulysuspiciousows.ThisapproachtogatheringsuspicioustrafciscomplementarytothatadoptedinAutograph;weintendtoinvestigateacquir-ingsuspiciousowsusinghoneypotsforsignaturegenera-tionbyAutographinfuture.TheevaluationofHoneycombassumesalltrafcreceivedbyahoneypotissuspicious;thatassumptionmaynotalwayshold,inparticularifattackersde-liberatelysubmitinnocuoustrafctothesystem.Autograph,Honeycomb,andEarlyBirdwillfacethatthreatasknowledgeoftheirdeploymentspreads;webelievevettingcandidatesig-naturesforfalsepositivesamongmanydistributedmonitorsmayhelptocombatit.Provos[12]observesthecomplementarynatureofhoney-potsandcontent-basedsignaturegeneration;hesuggestspro-vidingpayloadsgatheredbyhoneydtoHoneycomb.Weob-servethatAutographwouldsimilarlybenetfromhoneyd'scapturedpayloads.Furthermore,ifhoneydparticipatedintattler,Autograph'sdetectionofsuspiciousIPaddresseswouldbesped,withlesscommunicationthanthatrequiredtotransfercompletecapturedpayloadsfrominstancesofhoneydtoinstancesofAutograph.Yegneswaranetal.[23]corroboratethebenetofdis-tributedmonitoring,bothinspeedingtheaccurateaccumu-lationofportscanners'sourceIPaddresses,andinspeedingtheaccuratedeterminationofportscanningvolume.TheirDOMINOsystemdetectsportscannersusingactive-sinks(honeypots),bothtogeneratesourceIPaddressblacklistsforuseinaddress-basedtrafcltering,andtodetectanincreaseinportscanningactivityonaportwithhighcondence.TheevaluationofDOMINOfocusesonspeedandaccuracyinde-terminingportscanvolumeandportscanners'IPaddresses,whereasourevaluationofAutographfocusesonspeedandaccuracyingeneratingwormsignatures,asinuencedbythespeedandaccuracyofwormpayloadaccumulation.Ourworkistherstweknowtoevaluatethetradeoffbe-tweenearlinessofdetectionofanovelwormandgenerationofsignaturesthatcausefalsepositivesincontent-basedsig-naturedetection.8ConclusionandFutureWorkInthispaper,wepresentdesigncriteriaforanautomatedwormsignaturedetectionsystem,andthedesignandeval-uationofAutograph,aDMZmonitoringsystemthatisarststeptowardrealizingthem.Autographusesanaive,port-scan-basedowclassiertoreducethevolumeoftrafconwhichitperformscontent-prevalenceanalysistogeneratesig-natures.Thesystemrankscontentaccordingtoitspreva-lence,andonlygeneratessignaturesasneededtocoveritspoolofsuspiciousows;itthereforeisdesignedtominimizethenumberofsignaturesitgenerates.OurofineevaluationofAutographonrealDMZtracesrevealsthatthesystemcanbetunedtogeneratesensitiveandspecicsignaturesets,thatexhibithightruepositives,andlowfalsepositives.Oursim-ulationsofthepropagationofaCode-RedI-v2wormdemon-stratethatbytattlingtooneanotheraboutportscannerstheyoverhear,distributedAutographmonitorscandetectwormsearlierthanisolated,individualAutographmonitors,andthatthebandwidthrequiredtoachievethissharingofstateismin-imal.DMZ-trace-drivensimulationsoftheintroductionofanovelwormshowthatadistributeddeploymentof63Auto-graphmonitors,despiteusinganaiveowclassiertoiden-tifysuspicioustrafc,candetectanewlyreleasedCode-RedI-v2-likeworm'ssignaturebefore2%ofthevulnerablehostpopulationbecomesinfected.Ourcollectedresultsilluminate theinherenttensionbetweenearlygenerationofaworm'ssignatureandgenerationofspecicsignatures.Autographisayoungsystem.Severalavenuesbearfurtherinvestigation.WearecurrentlyevaluatingasingleAutographmonitor'sperformanceinanonlinesetting,wherethesys-temgeneratessignaturesperiodicallyusingthemostrecentsuspiciousowpool.Earlyresultsindicatethatinasinglesignaturegenerationinterval,thisonlinesystemcanproducesignaturesforcommonHTTPworms,includingCode-RedIIandNimda,andthatusingaminimalblacklist,thegeneratedsignaturescanincurzerofalsepositives.Wewillcontinuethisevaluationusingmorediversetracesandprotocol(port)workloads,tofurthervalidatetheseinitialresults.WelookforwardtodeployingAutographdistributedly,includingtat-tler,whichhassofaronlybeenevaluatedinsimulation.Fi-nally,wearekeentoexploresharinginformationbeyondportscanners'sourceIPaddressesamongmonitors,intheinterestofever-fasterandever-higher-qualitysignaturegeneration.AcknowledgmentsWearegratefultoVernPaxsonofICSIandtoCaseyHel-frichandJamesGurganusofIntelResearchforprovidingtheDMZtracesusedtoevaluateAutograph.AndtoAdrianPer-rig,PhilGibbons,RobertMorris,LuigiRizzo,andtheanony-mousreviewers,forinsightfuldiscussionsandcommentsthatimprovedourwork.Notes1Signaturesmayemploymorecomplicatedpayloadpatterns,suchasreg-ularexpressions.Werestrictourattentiontoxedbytesequences.2Weincludebothpoly-andmetamorphismhere;seeSection4.2.3Infuture,wormsmaybedesignedtominimizetheoverlapintheirsuc-cessiveinfectionpayloads;weconsidersuchwormsinSection4.2.4NotethatanIPaddressmayhavesenttrafcbeforebeingidentiedasascanner;suchtrafcwillstoredinthenon-suspiciousowpool.Weincludeonlysubsequentlyarrivingtrafcinthesuspiciousowpool,intheinterestofsimplicity,attheexpenseofpotentiallymissingwormtrafcsentbythescannerbeforeourhavingdetecteditassuch.5WormsthatpropagateveryslowlymayonlyaccumulateinsufcientvolumetobedetectedbyAutographforlongvaluesoft.6NotethateachAutographmonitormayindependentlychooseitsbreak-mark.Werethebreakmarkuniversalandwell-known,wormauthorsmighttrytotailorpayloadstoforceCOPPtochooseblockboundariesthatmixinvariantpayloadbyteswithchangingpayloadbyteswithinacontentblock.7Wehavesinceadopteda16-byteCOPPwindowinourimplementation,tomakeitharderforwormauthorstoconstructpayloadssoastoforcepar-ticularcontentblockboundaries;resultsarequitesimilarfork=16.8Incaseswhereasourceaddressownercomplainsthathisaddressisadvertised,theadministratorofanAutographmonitorcouldcongureAuto-graphnottoreportaddressesfromtheuncooperativeaddressblock.9Wehaveimplementedblacklistsatthiswriting,butomitafullevaluationofthemintheinterestofbrevity.Ourexperiencehasshownthatblacklistsofeven2to6disallowedsignaturescansignicantlyreducefalsepositivescausedbymisclassiedinnocuousows,forHTTPtrafc.References[1]CASTRO,M.,DRUSCHEL,P.,KERMARREC,A.-M.,ANDROW-STRON,A.Scribe:ALarge-scaleandDecentralizedApplication-levelMulticastInfrastructure.IEEEJournalonSelectedAreasinCommu-nication(JSAC)20,8(Oct.2002).[2]CHRISTODORESCU,M.,ANDJHA,S.StaticAnalysisofExecuta-blestoDetectMaliciousPatterns.InProceedingsofthe12thUSENIXSecuritySymposium(Aug.2003).[3]CISCOSYSTEMS.Network-BasedApplicationRecogni-tion.http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newf%t/122t/122t8/dtnbarad.htm.[4]DSHIELD.ORG.DShield-DistributedIntrusionDetectionSystem.http://dshield.org.[5]JUNG,J.,PAXSON,V.,BERGER,A.W.,ANDBALAKRISHNAN,H.FastPortscanDetectionUsingSequentialHypothesisTesting.InPro-ceedingsoftheIEEESymposiumonSecurityandPrivacy(May2004).[6]KREIBICH,C.,ANDCROWCROFT,J.Honeycomb—CreatingIntru-sionDetectionSignaturesUsingHoneypots.InProceedingsofthe2ndWorkshoponHotTopicsinNetworks(HotNets-II)(Nov.2003).[7]LEMOS,R.CountingtheCostofSlammer.CNETnews.com.http://news.com.com/2100-1001-982955.html,Jan.2003.[8]MOORE,D.,ANDSHANNON,C.Code-Red:ACaseStudyontheSpreadandVictimsofanInternetWorm.InProceedingsofthe2002ACMSIGCOMMInternetMeasurementWorkshop(IMW2002)(Nov.2002).[9]MOORE,D.,SHANNON,C.,VOELKER,G.M.,ANDSAVAGE,S.InternetQuarantine:RequirementsforContainingSelf-PropagatingCode.InProceedingsofIEEEINFOCOM2003(Mar.2003).[10]MUTHITACHAROEN,A.,CHEN,B.,ANDMAZIERES,D.ALow-bandwidthNetworkFileSystem.InProceedingsofthe18thACMSymposiumonOperatingSystemsPrinciples(SOSP2001)(Oct.2001).[11]PAXSON,V.Bro:ASystemforDetectingNetworkIntrudersinReal-Time.ComputerNetworks31,23-24(Dec.1999).[12]PROVOS,N.AVirtualHoneypotFramework.Tech.Rep.03-1,CITI(UniversityofMichigan),Oct.2003.[13]RABIN,M.O.FingerprintingbyRandomPolynomials.Tech.Rep.TR-15-81,CenterforResearchinComputingTechnology,HarvardUniversity,1981.[14]SCHULZRINNE,H.,CASNER,S.,FREDERICK,R.,ANDJACOBSON,V.RFC1889-RTP:ATransportProtocolforReal-TimeApplications,Jan.1996.[15]SINGH,S.,ESTAN,C.,VARGHESE,G.,ANDSAVAGE,S.TheEarly-BirdSystemforReal-timeDetectionofUnknownWorms.Tech.Rep.CS2003-0761,UCSD,Aug.2003.[16]SINGH,S.,ESTAN,C.,VARGHESE,G.,ANDSAVAGE,S.AutomatedWormFingerprinting.Unpublisheddraft,receivedMay2004.[17]STANIFORD,S.,HOAGLAND,J.A.,ANDMCALERNEY,J.M.Prac-ticalAutomatedDetectionofStealthyPortscans.JournalofComputerSecurity10,1-2(Jan.2002).[18]STANIFORD,S.,PAXSON,V.,ANDEAVER,N.Howto0wntheIn-ternetinYourSpareTime.InProceedingsofthe11thUSENIXSecuritySymposium(Aug.2002).[19]THESNORTPROJECT.Snort,TheOpen-SourceNetworkIntrusionDetectionSystem.http://www.snort.org/.[20]UNIVERSITYOFOREGON.UniversityofOregonRouteViewsProject.http://www.routeviews.org/.[21]WEAVER,N.C.WarholWorms:ThePotentialforVeryFastIn-ternetPlagues.http://www.cs.berkeley.edu/˜nweaver/warhol.html.[22]WU,J.,VANGALA,S.,GAO,L.,ANDKWIAT,K.AnEffectiveArchi-tectureandAlgorithmforDetectingWormswithVariousScanTech-niques.InProceedingsoftheNetworkandDistributedSystemSecuritySymposium2004(NDSS2004)(Feb.2004).[23]YEGNESWARAN,V.,BARFORD,P.,ANDJHA,S.GlobalIntrusionDetectionintheDOMINOOverlaySystem.InProceedingsofNetworkandDistributedSystemSecuritySymposium(NDSS2004)(Feb.2004).