Uncleanreports Tag Type Class ValidDates Size Reportingmethod bot Provided Bots 2006100120061014 621861 Botaddressesacquiredthroughprivatereportsfromathirdparty phish Provided Phishing 200605 ID: 202921
Download Pdf The PPT/PDF document "andthatthosecompromisedhostswouldberesto..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
andthatthosecompromisedhostswouldberestoredtoanuncompromisedstatequickly.Conversely,machinesinin-stitutionBwouldbereachedbyalargernumberofattacks,andcompromisedhostsmaynotbenoticedorrepaireduntillongafterthecompromisehastakenplace.InstitutionB'snetworkisunclean.Wecanobservetheuncleanlinessofanetworkbyexamin-ingitsresult.Ifahostiscompromised,weexpectthattheattackerwilluseitto,amongotheractivities,spam,scanandDDoSotherhosts.Ifuncleanlinessisanetwork-specicproperty,weexpectthatcompromisedhostswillclusterinspecicnetworks,whichwecanidentifyviathephenomenaofspatialandtemporaluncleanliness.Weemphasizethatuncleanlinessisanetworkproperty:hostsarecompromised,networksareunclean.Wedenespatialuncleanlinessasthetendencyforcom-promisedhoststoclusterinuncleannetworks.Spatialun-cleanlinessimpliesthatifweseeahostengagedinhostileactivity(suchasscanning),wehaveagoodchanceofnd-inganotherIPaddressinthesamenetworkengagedinhos-tileactivity.Wewilltestforspatialuncleanlinessbyex-aminingtheclusteringofaddresseswithinnetworks.Ifourhypothesisaboutspatialuncleanlinessiscorrect,thenwewouldexpectasetofcompromisedaddressestoberesideinfewerequallysizednetworksthanaddresseschosenatrandomfromapopulationre ectingthestructureoftheInternet.Wedenetemporaluncleanlinessasthetendencyforcom-promisedhoststorepeatedlyappearinthesamenetworksovertime.Temporaluncleanlinessimpliesthatifahostiscompromised,thenotherhostswithinthatnetworkwillbecompromisedinthefuture.Wewilltestfortemporalun-cleanlinessbyexaminingtheabilityofuncleannetworkstopredictfuturehostcompromises.Ifourhypothesisabouttemporaluncleanlinessiscorrect,thennetworkscontainingcompromisedhostswillpredictfuturecompromisedhostsmoreaccuratelythanequallysizednetworkschosenatran-dom.Figure1explainsourintuitionforspatialandtemporaluncleanliness.Thisgureshowstwoplots:theuppercountsthenumberofuniquehostsscanningalargenetworkfromJanuarytoApril,2006.ThelowerplotisaplotshowinghowmanyofthesescanningaddresseswerealsopresentinabotnetreportedduringtherstweekofMarch,2006.Thisplotcontainstwolines:onecountsthenumberofuniqueaddressesfromthebotreportthatwerealsoidentiedscan-ning;thesecondcountsthenumberofuniqueaddressesfromthebotreportthatwerepresentina24-bitCIDRblockwhereatleastoneaddresswasalsoscanning.Firstnotethatthesereportsresultedfromtwodierentdetectionmethods:thebotdatawascollectedbyobserv-ingIPaddressescommunicatingonIRCchannels,whilethescanningdatawascollectedusingabehavioralscandetec-tionmethoddeployedonanobservednetwork[6].Thereisastrongintersectionbetweenthetwosets:atitspeak,35%oftheaddressesreportedasbelongingtothebotnetarescanningtheobservednetwork.Second,weobservethatusingthe/24'scomprisingthebotnetidentiesmorescannersthanthebotnetaddressesalone.Wedemonstrateinx4thatthisresultissignicant.Finally,asthisgureshows,abnormalscanning(andthere-forebotnetcompromise)occursoverseveralweeks.Ifbotstakeseveralweekstobeidentiedandremoved,weexpectthatanuncleannetworkwillremainuncleanforsometime,andthereforewecanpredictfuturehostileactivityfromthesamenetworkoverthetermofthelifetimeofaparticularcompromise.Theprimarycontributionofthispaperisastudyofthepropertiesofuncleanlinessandwhethertheycanbeusedef-fectivelytopredictfutureactivity.Todoso,wetestfortheexistenceofspatialandtemporaluncleanlinessbycompar-ingthetracfromvariousreportsofhostileactivity.Wedemonstratethatcompromisedhostsarebothmoredenselyclusteredthannormaltracandpredictfutureuncleanac-tivity.Inaddition,weshowthatscanning,spammingandbotsshowevidenceofcrossrelationship,suchasthescan-ningobservedinFigure1.Wealsoshowthatthesephe-nomenadonotpredictfuturephishingsites,butthatpastphishingsitesdo.Wethereforedemonstratethattemporaluncleanlinessholdsforallfourindicators.Wethentestthestrengthofthispredictivemechanismbyevaluatingitssuit-abilitytoblocktraccrossingalargenetwork.Wedemon-stratethatlimitedpredictiveblockingisfeasible,duetotheimpactoflocality[17]evidentinnetworktrac.Theremainderofthispaperisstructuredasfollows:x2outlinesrelevantpreviousworkinreputationmanagementandidentifyinghostilegroupsbypasthistory.Inx3,de-scribeandclassifythedatasourcesthatweuseinthispaper.x4examinesthespatialuncleanlinesshypothesis,andx5ex-aminesthetemporaluncleanlinesshypothesis.x6examinestheimpactofblockinguncleannetworksandx7discussestheresults.2.PREVIOUSWORKResearchersinitiallystudiedbotnetsduetotheiruseinDDoSattacks.Mirkovicetal.[18]identiedDDoSattackswhichusedtwodistinctphases:acquiringhoststousefortheDDoSandusingthosehoststoconductanattack.Freilingetal.[5]identifyavarietyofotherattacksthatbotnetscanconducteciently.Collinsetal.[2]examinedattacksasconductedbyopportunisticattackers:thatis,theattackerhasnointerestorknowledgeofthetargetexceptthatthetargetisexploitable.Ourworkusestheseconceptstostudytheimpactoflargelyautomatedacquisitionanditsimpactonnetworkdefense.BotnetdemographicshavebeenstudiedusingHoneypotsandbyactivelyprobingbotnetworks[8,9,21].Rajandetal.'s[21]analysisisparticularlyrelevantduetotheextendedperiodduringwhichtheyobservednetworktrac,allowingthemtoidentifynotonlybotnetdemographicsbutactivity.Ourworkdiersfromtheseanalysesbycomparingmultipleobservedphenomenaandusingthisinformationtopredictfutureactivity.Inoperationalsecurity,blacklistsarecommonlyusedtoidentifyandblockhoststhatarealreadyassumedtobehos-tile.ExamplesofsuchblacklistsincludeSpamhaus'ZENlist[20]andtheBleedingSnortruleset[23].ResearcherssuchasLevy[16]notethatspammersincreasinglyrelyontheuseofoccupiedhoststogeneratespammessages-theseapproachesaremoreattractivetospammersbecausetheyooadprocessingrequirementsfromthespammer(asnotedbyLaurieetal.[15])andbecausetheyhidetheattacker'sidentity[4].Inaddition,researchershavestudiedtheimpactofblack-listsonspammingandotherhostileactivity.Jungetal.[12]comparespammingblacklistsagainstspamtractoMIT Uncleanreports Tag Type Class ValidDates Size Reportingmethod bot Provided Bots 2006/10/01-2006/10/14 621,861 Botaddressesacquiredthroughpri-vatereportsfromathirdparty phish Provided Phishing 2006/05/01-2006/11/01 53,789 AddressesfromaPhishingreportlist scan Observed Scanning 2006/10/01-2006/10/14 151,908 IPaddressesscanningtheobservednetwork spam Observed Spam 2006/10/01-2006/10/14 397,306 IPaddressesspammingtheobservednetwork Reportsforhypothesistesting bottest Provided Bots 2006/05/10 186 Botnetaddressesacquiredthroughprivatecommunication control Observed N/A 2006/09/25-2006/10/02 46,899,928 Controladdressesacquiredfromtheobservednetwork Table1:Tableoftagsusedtoanalyzespatialandtemporaluncleanliness.canusedierentmethodologiestoobservethesameeects.Forexample,aphishinglistcanacquireIPaddressesbyusingspamtraps[19]orbycollectinguserreports,(e.g.,thesubmissionformattheCastleCopsPIRTservice[1]).Fortheanalyseswithinthispaper,weuseonlyonesourceperreportandassumethatthesource'scollectionmethodologyisconsistentoverthereportperiod.Incontrasttoprovidedreports,observedreportsaregen-eratedfromnetworktraclogsreportingtraccoveringalargeedgenetwork.Becausewegenerateobservedreports,weareabletocollectobservedreportsatanytime,whichgivesusgreater exibilityinpickingdatathaninthecaseofprovidedreports.Eachreportisdierentiatedbyatagwhich,forthispaper,summarizestheperiodandsourceforthereport.WeexpressthisusingthenotationRT,whereTisthetag(e.g.,scan).AlistofreportsisprovidedinTable1;thislistisusedfortestinguncleanlinessproperties.Anotherlist,giveninTable2,willbeusedfortheanalysisinx6.Becauseweexpectuncleanlinesstobeanetworkproperty,weapproximatedistinctnetworksbyusingidenticallysizedCIDRblocks.WedeneaCIDRmaskingfunctionCn(i).TheCIDRmaskingfunctionevaluatestotheuniqueCIDRblockwithprexlengthnthatcontainstheIPaddressi(e.g.,C16(127:1:135:14)=127:1:0:0=16).Forconvenience,whentheCIDRmaskingfunctionisappliedonareportS,theresultisset-valuedandreturnsthesetofalln-bitCIDRblocksinthatset,thatis:Cn(S)[i2SCn(i)(1)WhendeterminingwhetherornotanIPaddressresideswithinasetofCIDRblocks,wewilluseaCIDRinclusionrelation,,toindicatethatanIPaddressisresidentinoneofasetofCIDRblocks:iS!9ns:t:Cn(i)2Cn(S)(2)Withallsetsandreports,weusebarstoindicatecardi-nality,i.e.,jSjisthenumberofelementsinthesetS.3.2ReportsTable1isaninventoryofthereportsusedinthispapertotestspatialandtemporaluncleanliness.Recallthatpro-videdreportshavebeengiventousbyotherpartiesandthatwegenerateobservedreportsusingtraclogsfromtheobservednetwork.Becauseofthis,thedatesthatwecantestfortemporaluncleanlinessareconstrainedbythetimesthattheprovidedreportscover.Theobservednetworkiscomposedofover20milliondis-tinctIPv4addressesandcontainsseveralserversthatareheavilyusedbyclientsacrosstheInternet.Giventhesizeandactivityoftheobservednetwork,weassumethatIPad-dressesfromtheInternetcrossingintoitarearepresentativesampleoftheInternetasawhole.Allreportshavebeenlteredtoonlyincludeaddressesthatareoutsideoftheobservednetworkandarenotother-wisereserved(e.g.,alladdressesspeciedinRFC1918havebeenremovedfromreports).Thislteringstepisintendedtoremoveselectionbiasfromourobservedreports;givenourfamiliaritywiththeobservednetworkanditssize,wemayidentifymoreofaparticularphenomenonthantheprovidedreportsmayidentify.Weclassifyfourofthereportsinthislistasuncleanre-ports.Thesearethereportsweuseasgroundtruthforidentifyingthefourclassesdescribedinx3.1:bots,phish-ing,scanningandspamming.DuringthetwoweekperiodofOctober1st{14th,2006,wehavebothprovidedandob-servedreportsonallclassesofuncleanactivity,consequentlyweusethisperiodtotesttemporaluncleanliness.Thenextsetofreportsareusedspecicallytotestthespa-tialandtemporaluncleanlinesshypotheses.Thebottestre-portdescribesasmallbotnetfromvemonthsbeforealltheotheractivityanalyzedinthispaper,bottestisusedasanextremecaseforprediction:ifave-montholdreportcanaccuratelypredictcurrentuncleanactivity,thenarecentreportshouldbemoreeective.Thecontrolreportconsistsof47millionuniqueIPad-dressesobservedduringtheweekofSeptember25th,2006.Wecomparethedatafromourotherreportsagainstran-domlygeneratedsubsetsofcontrolinordertodeterminewhetherornotthesereportsexhibitspatialortemporalun-cleanliness.Weusethecontrolreporttomoreaccuratelyre- ectthestructureofIPv4spacethanwewouldusingpurelyrandomlychosenIPaddresses.ThereportconsistsofIPad-dressesobservedtoengageinpayload-bearingTCPactivity, Figure2:Comparisonofdensityestimationtechniques(naiveandempirical)againstactualbotnetdensity.Notethatthenumberofblocksestimatedusingthenaivetechniqueisconsiderablyhigherthantheothertwo.smallersizeofthephishingreportsincomparisontotheotherreports.AsshowninTable1,thesixmonthphishingreportisapproximatelyanorderofmagnitudesmallerthantheotheruncleanreports.AswithFigure3(i),addressesinthephishingreportaremoretightlypackedthanaddressesselectedfromthecontrolreport.Figure3(iii)plotsthevolumeofRspamfromOctober1stto14th,2006.Figure3(iv)plotsthevolumeofRscanforthesameperiod.Eachofthesereportsismoretightlypackedthanthecomparativecontrolreports.AsFigures2and3show,uncleanreportshaveann-bitdensitygreaterthanorequaltoorgreaterthenthen-bitdensityofthecontrolreportsforallvaluesofn.Conse-quently,thisdatasupportsthespatialuncleanlinesshypoth-esis:compromisedhostsaredisproportionatelyconcentratedincertainnetworks.5.TEMPORALUNCLEANLINESSWenowaddresstemporaluncleanliness:thepropensityfornetworkstoremainuncleanforextendedperiodsoftime.Inordertotestfortemporaluncleanlinesswecomparetheabilityofareportofuncleanaddressestopredictfuturecom-promisedaddresses;inparticular,whetherornotareportofbotaddressescanpredictfuturebots,spamming,scanningandphishing.Thissectionisdividedasfollows:x5.1describesourmethodformeasuringthepresenceoftemporaluncleanliness,andx5.2showstheresults.5.1ModelandMethodologyToobservetemporaluncleanliness,weexaminethepre-dictivecapacityofreportsofuncleandata.Considerthreereports:Reventpast,whichreportsonsomeeventinthepast;Rnormalpast,whichreportsonpastactivitywithoutanypar-ticularcriterion,andReventpresent,whichdescribesthesameevent'spopulationinthepresent.IfReventpastandRnormalpastareofequalcardinality,thenReventpastisabetterpredictorofthereportReventpresentatprexlengthnif:jCn(Reventpast)\Cn(Reventpresent)jjCn(Rnormalpast)\Cn(Reventpresent)j(4)Iftemporaluncleanlinessexists,thenweexpectthatun-cleanreportswillconsistentlybebetterpredictorsoffutureuncleanreportsthanacontrolreport.However,wenotethatduetospatialuncleanliness,anuncleanreportwillpop-ulatefewerequallysizedblocksthananequivalentcontrolreport.Asaconsequence,asblocksizeincreases,thecon-trolreportwillhavealargernumberofimprecisesuccesses.Therefore,therewillbesomeprexlengthbelowwhichtheuncleanreportwillbeaworsepredictor.Fortesting,weusetheformofthetemporaluncleanli-nesshypothesisgivenintheequationbelow.GiventhatRuncleanpastandRnormalpasthaveequalcardinality,then (i)Rbot(ii)Rphish (iii)Rspam(iv)RscanFigure3:ComparativedensityofuncleanblocksagainstaddressesselectedfromRcontrol.Notethatineachcase,theexpectednumberofblocksinRcontrolishigherthantheobservedvalues,indicatingthatuncleanaddressesaremoredenselypackedinthoseblocksthanrandomlyselectedaddresses.9n2[16;32]s:t:jCn(Runcleanpast)\Cn(Runcleanpresent)jjCn(Rnormalpast)\Cn(Runcleanpresent)j(5)Thatis,thereexistsaprexlengthwhereapreviouslygeneratedreportofuncleanactivityismorepredictiveofpresentuncleanactivitythanacontrolreportofequalcar-dinality.Aswithspatialuncleanliness,welimitouranalysestoblockswithaCIDRprexlengthofatleast16bits.5.2AnalysisWenowtestthetemporaluncleanlinesshypothesisformu-latedinEquation5.Todoso,weuseRbottestasRuncleanpastandthencompareagainsteachofouruncleanreportscol-lectedduringtheperiodofOctober1st-14th,2006.Recallthatwedon'tcontrolthedatesforwhichwereceivepro-videdreports.Duringthisperiod,wehavedatafromeachoftheprovidedreportsandcouldgenerateobservedreportsforthesameperiod.Byusingavemonthgapintime,wealsotestanextremecase:ifpastactivitycaneectivelypre-dictfutureactivityvemonthsinadvance,thenweshouldbeabletopredictfutureactivityovershorterperiods.Figure4showstherelativepredictivecapacityofRbottestagainstfutureuncleanreports;forthesegures,RphishisasubreportofRphishfromTable1.Thisreportisconsiderablysmallerthantheothers,containing2302addresses.Thisresultsinasmallerdegreeofintersectionwiththerandomlygeneratedreportsfromthecontrolreport.Asinx4.2,wegeneratethereferencelinebyplottingaboxplotshowingthevarianceof1000randomlyselectedtestreportsdrawnfromRcontrol.IncontrastwithFigure3,thesmallcardinalityofRbottestensuresthatthevariationsob-servedbytheboxplotarevisible.WeconsidertRbottesttobeabetterpredictorthanRcontrolifthecardinalityofitsintersectionwiththecorrespondinguncleanreportishigherthantheintersectionwithrandomlyselectedaddressesin95%oftheobservedcases.AsFigure4shows,RbottestisabetterpredictorthanRcontrolforbotnets,spammingandscanningatvariousprexlengths.Alsoofnoteistheimpactofspatialuncleanliness:inthesethreegures,Rbottestisabetterpredictorforprexlengthsofapproximately19-20bitsandlonger.Atshorterprexlengths,randomlyselectedaddressesbecomebetterpredictors.Usingthe95%threshold,Rbottestisastrongerpredictoroffuturebotnetactivitybetween20and25bits,spammingbetween19and32bits,andscanningbetween20and24bits.Forprexlengthslongerthanthesevalues,the tworeportsareequallypredictiveduetothelowprobabilityofseeingCIDRblocksfromeitherreportintersect.Figure4(ii)plotsthepredictivecapacityofRbottestagainstRphish.IncontrasttotheotherplotsinFigure4,thisplotin-dicatesthatRbottestisnotagoodpredictoroffuturephish-ingactivityincomparisontorandomlyselectedcontrolsets.Wehavetwohypothesesastowhythisoccursforphishingdata:Ramachandranetal.[22]describehowbotnetown-ersplaceahigherpremiumonaddressesthathavenotyetbeenidentiedasbots.Becausephishingsitesneedtobepublicized,aphishingIPaddressbecomespublicknowledge,markedonblacklistsandconsequentlyhighlyunattractivefortheownerofabotnet.Analternativeexplanationisthat,incontrasttobotnets,phishingsitesaregenerallyhostedonwebservers,andaphishermayprefertohostphishingsitesinaactualdat-acentertoensurerobustnessduringa ashcrowd.Attheminimum,aphishingsitemustbepubliclyaccessible,whileausefulbotcanexistbehindaNATorarewall.There-fore,phishersmayprefersitesthatarealreadyhostingwebserversandhavetheresourcestohandleahightracload.Inordertodeterminewhetherthetemporaluncleanlinesshypothesisdoesholdforphishing,wenowconsideratestthatusesphishingdataexclusively.Figure5plotsthein-tersectionofRphishtestagainstthesamephishingsetasinFigure4(ii).Inthiscase,jRphishtestj=1386.Wenotethatthisgureshowsstrongevidencefortemporaluncleanlinessinphishing.SincetheseresultsshowthatvemontholdreportscanbeusedtomoreeectivelypredictthepopulationoffuturereportsthanrandomlyselectedIPaddressesfromaweekbe-fore,weconcludethatthetemporaluncleanlinesshypothesisissupportedbythisdata.Furthermore,inEquation5,wechosearangeofIPblocksarbitrarily,wecannowestablishalowerlimitfortheprexlengthof20bits,ananupperlimitinexcessof24bits.Wehavealsoshownthatphishingactivityandbotnetac-tivityarenotrelatedinthewaythatbots,scanningandspammingare.Asnotedelsewhere[21,15],scanningandspammingarecommonlyimplementedwithbotnets,sowewouldexpectthatRbot;RscanandRspamarerelated.How-ever,theinabilityofRbottesttopredictfuturephishingac-tivitysuggeststhatameasurementforuncleanlinesswillhavetobemultidimensional:phishingsitesarestilltakenover,butitmaybethatphishershavedierentcriteriaforthemachinestheyoccupythanbotnetowners.6.BLOCKINGTESTSThespatialandtemporaluncleanlinesshypothesesto-getherprovideamethodforidentifyingtheriskthattracfromaparticularnetworkoriginatesfromacompromisedhost.Wenowaddresstheissueofwhetheruncleannetworkscanbeeectivelyblocked;thatis,whetherornotblockingasetofuncleannetworkswilladverselyaectlegitimatetracenteringanactivenetwork.Todeterminewhetherwecaneectivelyblocktrac,weconductalimitedexperimenttoshowtheimpactofblock-ingasetofuncleannetworkswouldhaveonincomingtractoalivenetwork.Theremainderofthissectionisstruc-turedasfollows:x6.1describesouranalyticalmethod,andx6.2discussestheresults.6.1MethodTodeterminewhetherwecanproductivelyblocktraf-cfromuncleannetworks,weexaminetraclogsfromalivenetworkandcomparetheintersectionbetweenincom-ingtrac,theRbottestandotheruncleanlinessreportsfromthesameobservationperiodastheincomingtrac.WebeginbycollectingtraclogsofalltracthatcrossestheobservednetworkfromallIPaddressesiC24(Rbottest)fortheobservationperiodofOctober1st{14th2006.Thisreport,Rcandidate,consistsofallIPaddressesobservedintraf-ccrossingtheobservednetworkthatsharea/24incom-monwithanyoftheIPaddressesinRbottest.Thisallowsustotesttheeectivenessoflteringfromthe/24tothe/32range;wepickthisrangebecause,asseeninFigure3,24bitsistheminimumblocksizeatwhichRbottestisanunambiguouslybetterpredictoroffutureuncleanlinessthancontroldata.WefurtherconstrainRcandidatetothosead-dressesthatgenerateatleastoneTCPrecordduringthisperiod.ThetracdatausedinthisanalysisconsistsCISCONet-Flow5V5records.NetFlowrecordsarearepresentationofapproximatesessionsconsistingofalogofallidenticallyad-dressedpacketswithinalimitedtime.Flowrecordsareacompactrepresentationoftrac,butdonotcontainpay-load.Consequently,ouranalysisincludesadegreeofuncer-taintybecausewecannotvalidatewhatanysessionwasen-gagedin.Tocompensateforthis,wedierentiateaddressesbymembershipinoneoftheuncleanreportsandbybehav-iorobservedinthe owrecords.WepartitiontheaddressesinRcandidateintothreereports:Runknown,RhostileandRinnocent.AfullinventoryofthereportsusedinthisanalysisisgiveninTable2.RhostileconsistsofanyIPaddressinRcandidatethatisalsopresentintheuncleanreports(i.e.,scanning,spamming,phishingorbotnetmembership).Thehostilesetisidentiedpurelybyintersectingthesereports,andonceanIPaddressisidentiedashostileitcannotbepresentintheremainingtworeports.RunknownconsistsoftheaddressesinRcandidateaddressthatarenotpresentinoneoftheuncleanreports,buthavenopayloadbearing ows.Wedenea owaspayload-bearingifitisaTCP owwithatleast36bytesofpayloadandatleastoneACK ag.DuetoTCPoptions,a3-packetSYNscanwilloftenhave36bytesofpayload,eventhoughthisdataisstillpartoftheTCPhandshake.Hand-examinationofthe owlogsfoundmultipleexamplesof36-byteSYN-onlyscanstoapparentlyrandomlyselectedportsondiversetargets.TheIPaddressesinRunknownarenotproventobehostilebutarehighlysuspicious.Duetothelackofpayloadin owdata,wecannotdenitivelycategorizemembersofthisreportintoeitheroftheothertworeportsandconsequentlyweremovethemfromthefalsepositivecalculations.Forthisanalysis,weconsiderthefalsenegativeratetobeeectivelyzero,asweareonlyconsideringaddressesthatwehaveoptedtoblock.ThepopulationofRinnocentconsequentlyconsistsofanyIPaddressthatdoesconductpayload-bearingTCPactivityandisnotpresentinanyoftheuncleanreports.OurpredictionscenarioassumesthatthenetworkblocksCn(Rbottest)forsomevalueofn2[24;32].Thesuccess 5http://www.cisco.com/go/netflow Figure5:Comparativepredictivecapacityofphishingreports.Notethatthisdatadoeseectivelypredictfuturetrac,likethebotsinFigure4(i),(iii)and(iv).FP(n)=XiCn(Rbottest)m(i;Rcandidate\Rinnocent)(9)Table3summarizestheeectivenessofthispredictionmethod.Asthistableshows,allthreepopulationsincreaseasthebitlengthincreases.Atn=24,90%oftheincomingaddressesarecorrectlyidentiedashostile.Ifweassumethatunknownaddressesarehostile,thetruepositiverateis97%.Furthermore,thefalsepositiverateremainsrelativelylowuntiln=26.pa n TP(n) FP(n) pop(n) Runknown 24 287 35 322 708 25 172 22 194 344 26 81 1 82 200 27 38 1 39 105 28 18 0 18 60 29 7 0 7 29 30 1 0 1 14 31 1 0 1 7 32 1 0 1 0 Table3:ObservedtrueandfalsepositivecountsOfnotewiththisdatasetarethevolumeofuncertainad-dresses(i.e.,thepopulationofRunknown).Ata24bitprexlength,jC24(Rbottest)\C24(Runknown)jyieldsapproximately700addresses.WerstnotethatunknownaddresseshaveengagedinTCPcommunications,buthavenotexchangedpayload-consequently,blockingtheseaddressesdoesnotimpacttrac.OfmoreconcernisthatalloftheaddressesinRunknownengageinsomeformofsuspiciousbehavior(thatis,suspi-ciousapartfromtryingtoconnectwiththenetworkandnotexchangingpayload).Handexaminationfoundmanyaddressestryingtoopencommunicationsfromephemeralportstoephemeralportsorengagedinslowscanning.Thelatteraddressesdidnotappearinourscanningreportbe-causethescandetectionmechanismiscalibratedtoidentifyscansthattakeplaceoveranhour,whilescansobservedinthisdatasetwouldoftencontactlessthan30addressesperdayovertheobservationperiod.Thestrengthofthisblockingmethodispredicatedontherelativelysparseamountoftracissuingfromtheseblocks.AsTable3shows,1030IPaddresseswereblockedwhennwassetto24bits.jC24(Rbottest)j=173,whichyieldsapotentialsetof44,288addressthatcanbeblocked.Conse-quently,lessthan2%ofthetotalIPaddressesavailableinthose/24scommunicatedwiththeobservednetworkduringthistime.Someoftheeectivenessofthismethodmaybeattributedtothedemographicsofthebotnetandtheobservednet-workRbottestconsistsprimarilyofaddressesoutsidetheEnglish-speakingworld,with70%oftheaddressescomingfromTurkey.Despiteitssize,theobservednetworkanedgenetwork;alltracatitsborderiseitheroriginatingfromanaddresswithinthatborderorgoingtoanIPaddresswithinthatborder.Wethereforeconcludethatourtestresultsindicatethe Reportsusedforpredictiontesting Tag Type Class ValidDates Size Reportingmethod unclean Provided Special 2006/10/01-2006/10/14 1,158,103 Theunionofthefouruncleanre-ports,notethatthereisoverlap candidate Observed N/A 2006/10/01-2006/10/14 1030 IPAddressescrossingthenetworkborderandthatareinthesame/24'sasRunclean hostile Observed N/A 2006/10/01-2006/10/14 287 MembersofRcandidatealsopresentinRunclean unknown Observed N/A 2006/10/01-2006/10/14 708 MembersofRcandidatenotinRunclean,butengagedinsuspiciousactivity innocent Observed N/A 2006/10/01-2006/10/14 35 MembersofRcandidatenotpresentinRhostileorRunknown Table2:Tableofreportsusedforpredictiontest.feasibilityofblockinghostileaddresses,butthatthisap-proachisbestusedinconjunctionwithothertracanalysismechanismsinordertodeterminethebestpracticesforin-dividualnetworks.7.CONCLUSIONInthispaper,wehavedemonstratedthatitispossibletoeectivelypredictfuturehostileactivityfrompastnetworkactivity.Todoso,wehavedenedanetwork-basedqual-ityofuncleanliness,whichisanindicatorofhowlikelyanetworkistocontaincompromisedhosts.Asaninitialworkinthiseld,wehavefocusedontestingbasichypothesesaboutuncleanliness,whichwehavedenedwiththespatialandtemporaluncleanlinesshypotheses.Us-ingreportsofnetworkactivityandtraclogsofalargenetworkwehaveshownevidenceofspatialandtemporaluncleanliness.Wehavealsoshownthatanuncleanlinessmeasuremayinvolvemultipledimensions,suchasbotnetsandphishing.Finally,wehavedemonstratedthatspatialandtempo-raluncleanliness,coupledwiththelimitedaudienceofanedgenetwork,canbeeectivelyusedtoblockhostiletraf-cinthefuture.Giventhedemographicsissuesnotedinx6,uncleanlinessmaybestbeusedasariskindicator{byshowingthatanetworkisdemonstratinguncleanbehavior,securitypersonnelcanevaluatewhethertheriskofhostileactivityfromthenetworkisworththebenetofreceivingcommerceandcommunicationfromthatnetworkundernor-malcircumstances.Ourimmediategoalfollowingthisworkistodevelopamorerigorousandpreciseuncleanlinessmetric.Inparticu-lar,amultidimensionaluncleanlinessmetrictomeasuretheaggregateprobabilitythatanaddressisoccupied.Theele-mentsofthismetricinvolvethecomponentsdiscussedinthisworkaswellasotherpredictiveindicatorsofvulnerability(communicationwithbotnetC&Cnodes).Wealsobelievethatspatialuncleanlinesshasusefulimpli-cationsfornetworkloganalysis.Ifweknowthatahostfromonenetworkisattacking,scanningorotherwiseinterferingwiththetraconanobservednetwork,itisreasonabletoexamineothertracfromthatnetworktoseeifthereiscoordinatedhostileactivity.8.ACKNOWLEDGEMENTSWewouldliketothanktherefereesandourshepherdfortheirinsightfulcommentsinpreparingthispaper.9.ADDITIONALAUTHORSAdditionalauthor:JosephB.Kadane(CMUDepartmentofStatistics,email:kadane@stat.cmu.edu)10.REFERENCES[1]CastleCops.Castlecopsphishingincidentreporting&termination(PIRT)squad.Accessibleathttp://www.castlecops.com/pirt,fetchedonJanuary29th,2007.[2]M.Collins,C.Gates,andG.Kataria.Amodelforopportunisticnetworkexploits:ThecaseofP2Pworms.InProceedingsofthe2006WorkshoponEconomicsandInformationSecurity,2006.[3]M.CollinsandM.Reiter.Anempiricalanalysisoftarget-residentDoSlters.InProceedingsofthe2004IEEESymposiumonSecurityandPrivacy,2004.May9{12,2004.[4]D.Cook,J.Hartnett,K.Manderson,andJ.Scanlan.Catchingspambeforeitarrives:domainspecicdynamicblacklists.InACSWFrontiers'06:Proceedingsofthe2006AustralasianworkshopsonGridcomputingande-research,Darlinghurst,Australia,Australia,2006.[5]F.Freiling,T.Holz,andG.Wicherski.Botnettracking:Exploringaroot-causemethodologytopreventdistributeddenial-of-serviceattacks.InProceedingsofthe2005EuropeanSymposiumonResearchinComputerSecurity,2005.[6]C.Gates,J.McNutt,J.Kadane,andM.Kellner.DetectingscansattheISPlevel.TechnicalReportCMU/SEI-2006-TR-005,SoftwareEngineeringInstitute,2006.[7]C.Gates,J.McNutt,J.Kadane,andM.Kellner.Scandetectiononverylargenetworksusinglogisticregressionmodeling.InISCC'06:Proceedingsofthe11thIEEESymposiumonComputersandCommunications,Washington,DC,USA,2006.[8]T.Holz.Learningmoreaboutattackpatternswithhoneypots.InSicherheit2006:Sicherheit-SchutzundZuverlassigkeit,Beitrageder3.Jahrestagungdes