/
Baggy Bounds Checking An Efcient and BackwardsCompatib Baggy Bounds Checking An Efcient and BackwardsCompatib

Baggy Bounds Checking An Efcient and BackwardsCompatib - PDF document

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
412 views
Uploaded On 2015-05-03

Baggy Bounds Checking An Efcient and BackwardsCompatib - PPT Presentation

camacuk Microsoft Research Cambridge UK manuelcmcastro microsoftcom Abstract Attacks that exploit outofbounds errors in C and C programs are still prevalent despite many years of re search on bounds checking Previous backwards compat ible bounds chec ID: 59914

camacuk Microsoft Research Cambridge

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Baggy Bounds Checking An Efcient and Bac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

BaggyBoundsChecking:AnEfcientandBackwards-CompatibleDefenseagainstOut-of-BoundsErrorsPeriklisAkritidis,?ManuelCosta,yMiguelCastro,yStevenHand??ComputerLaboratoryUniversityofCambridge,UKfpa280,smh22g@cl.cam.ac.ukyMicrosoftResearchCambridge,UKfmanuelc,mcastrog@microsoft.comAbstractAttacksthatexploitout-of-boundserrorsinCandC++programsarestillprevalentdespitemanyyearsofre-searchonboundschecking.Previousbackwardscompat-ibleboundscheckingtechniques,whichcanbeappliedtounmodiedCandC++programs,maintainadatastruc-turewiththeboundsforeachallocatedobjectandper-formlookupsinthisdatastructuretocheckifpointersremainwithinbounds.Thisdatastructurecangrowlargeandthelookupsareexpensive.Inthispaperwepresentabackwardscompatibleboundscheckingtechniquethatsubstantiallyreducesperfor-manceoverhead.Thekeyinsightistoconstrainthesizesofallocatedmemoryregionsandtheiralignmenttoen-ableefcientboundslookupsandhenceefcientboundschecksatruntime.Ourtechniquehaslowoverheadinpractice—only8%throughputdecreaseforApache—andismorethantwotimesfasterthanthefastestpre-vioustechniqueandaboutvetimesfaster—usinglessmemory—thanrecordingobjectboundsusingasplaytree.1IntroductionBoundscheckingCandC++codeprotectsagainstawiderangeofcommonvulnerabilities.Thechallengehasbeenmakingboundscheckingfastenoughforproductionuseandatthesametimebackwardscompatiblewithbinarylibrariestoallowincrementaldeployment.Solutionsus-ingfatpointers[24,18]extendthepointerrepresentationwithboundsinformation.Thisenablesefcientboundschecksbutbreaksbackwardscompatibilitybecausein-creasingthepointersizechangesthememorylayoutofdatastructures.Backwardscompatibleboundscheckingtechniques[19,30,36,15]useaseparatedatastructuretolookupboundsinformation.Initialattemptsincurredasignicantoverhead[19,30,36](typically2x–10x)be- !"#$%&'()*+,- .//)%0&1)+'()*+,- 20,,1+3 !"#$%& Figure1:Allocatedmemoryisoftenpaddedtoapartic-ularalignmentboundary,andhencecanbelargerthantherequestedobjectsize.Bycheckingallocationboundsratherthanobjectbounds,weallowbenignaccessestothepadding,butcansignicantlyreducethecostofboundslookupsatruntime.causelookingupboundsisexpensiveandthedatastruc-turecangrowlarge.Morerecentwork[15]hasappliedsophisticatedstaticpointeranalysistoreducethenumberofboundslookups;thismanagedtoreducetheruntimeoverheadontheOldenbenchmarksto12%onaverage.Inthispaperwepresentbaggyboundschecking,aback-wardscompatibleboundscheckingtechniquethatre-ducesthecostofboundschecks.Weachievethisbyenforcingallocationboundsratherthanpreciseobjectbounds,asshowninFigure1.Sincememoryallocatorspadobjectallocationstoalignthepointerstheyreturn,thereisaclassofbenignout-of-boundserrorsthatviolatetheobjectboundsbutfallwithintheallocationbounds.Previouswork[4,19,2]hasexploitedthispropertyinavarietyofways.Hereweapplyittoefcientbackwardscompatibleboundschecking.Weuseabinarybuddyallocatortoen-ableacompactrepresentationoftheallocationbounds:sinceallallocationsizesarepowersoftwo,asinglebyteissufcienttostorethebinarylogarithmoftheallocation size.Furthermore,thereisnoneedtostoreadditionalin-formationbecausethebaseaddressofanallocationwithsizescanbecomputedbyclearingthelog2(s)leastsig-nicantbitsofanypointertotheallocatedregion.Thisallowsustouseaspaceandtimeefcientdatastruc-turefortheboundstable.Weuseacontiguousarrayinsteadofamoreexpensivedatastructure(suchasthesplaytreesusedinpreviouswork).Italsoprovidesuswithanelegantwaytodealwithcommoncasesoftem-porarilyout-of-boundspointers.WedescribeourdesigninmoredetailinSection2.Weimplementedbaggyboundscheckingasacompilerplug-infortheMicrosoftPhoenix[22]codegenera-tionframework,alongwithadditionalruntimecom-ponents(Section3).Theplug-ininsertscodetocheckboundsforallpointerarithmeticthatcannotbestaticallyprovensafe,andtoalignandpadstackvariableswherenecessary.Theruntimecomponentincludesabinarybuddyallocatorforheapallocations,anduser-spacevir-tualmemoryhandlersforgrowingtheboundstableondemand.InSection4weevaluatetheperformanceofoursys-temusingtheOldenbenchmark(toenableadirectcom-parisonwithDhurjatiandAdve[15])andSPECINT2000.Wecompareourspaceoverheadwithaversionofoursystemthatusesthesplaytreeimplementationfrom[19,30].Wealsoverifytheefcacyofoursys-teminpreventingattacksusingthetestsuitedescribedin[34],andrunanumberofsecuritycriticalCOTScom-ponentstoconrmitsapplicability.Section5describesourdesignandimplementationfor64-bitarchitectures.Thesearchitecturestypicallyhave“spare”bitswithinpointers,andwedescribeaschemethatusesthesetoencodeboundsinformationdirectlyinthepointerratherthanusingaseparatelookuptable.Ourcomparativeevaluationshowsthattheperformanceben-etofusingthesesparebitstoencodeboundsmaynotingeneraljustifytheadditionalcomplexity;howeverusingthemjusttoencodeinformationtorecovertheboundsforout-of-boundspointersmaybeworthwhile.Finallywesurveyrelatedwork(Section6),discusslimi-tationsandpossiblefuturework(Section7)andconclude(Section8).2Design2.1BaggyBoundsCheckingOursystemsharestheoverallarchitectureofbackwardscompatibleboundscheckingsystemsforC/C++(Fig- !"#$%&'()"**+,$-%.,/,0 1/,2'#'2(34'5"$/.6' 7#8',$(9:'5;8 �'#',/$'(7! �'#',/$'(9+2' -%#; ?/@@0?+"#289:'5;%#@ )+",5'(9+2' ?%#/,0(-%.,/,%'8 Figure2:Overallsystemarchitecture,withourcontribu-tionhighlightedwithinthedashedbox.ure2).Itconvertssourcecodetoanintermediaterepre-sentation(IR),ndspotentiallyunsafepointerarithmeticoperations,andinsertscheckstoensuretheirresultsarewithinbounds.Then,itlinksthegeneratedcodewithourruntimelibraryandbinarylibraries—compiledwithorwithoutchecks—tocreateahardenedexecutable.WeusethereferentobjectapproachforboundscheckingintroducedbyJonesandKelly[19].Givenanin-boundspointertoanobject,thisapproachensuresthatanyde-rivedpointerpointstothesameobject.Itrecordsboundsinformationforeachobjectinaboundstable.Thista-bleisupdatedonallocationanddeallocationofobjects:thisisdonebythemallocfamilyoffunctionsforheap-basedobjects;onfunctionentryandexitforstack-basedobjects;andonprogramstartupforglobalobjects.Thereferentobjectapproachperformsboundschecksonpointerarithmetic.Itusesthesourcepointertolookuptheboundsinthetable,performstheoperation,andchecksifthedestinationpointerremainsinbounds.Ifthedestinationpointerdoesnotpointtothesameobject,wemarkitout-of-boundstopreventanydereference(asin[30,15]).Howeverwepermititsuseinfurtherpointerarithmetic,sinceitmayultimatelyresultinanin-boundspointer.ThemarkingmechanismisdescribedindetailinSection2.4.Baggyboundscheckingusesaverycompactrepre- sentationforboundsinformation.Previoustechniquesrecordedapointertothestartoftheobjectanditssizeintheboundstable,whichrequiresatleasteightbytes.Wepadandalignobjectstopowersoftwoandenforceallo-cationboundsinsteadofobjectbounds.Thisenablesustouseasinglebytetoencodeboundsinformation.Westorethebinarylogarithmoftheallocationsizeintheboundstable:e=log2(size);Giventhisinformation,wecanrecovertheallocationsizeandapointertothestartoftheallocationwith:size=1e;base=p&˜(size-1);Toconvertfromanin-boundspointertotheboundsfortheobjectwerequireaboundstable.Previoussolutionsbasedonthereferentobjectapproach(suchas[19,30,15])haveimplementedtheboundstableusingasplaytree.Baggybounds,bycontrast,implementtheboundstableusingacontiguousarray.Thetableissmallbecauseeachentryusesasinglebyte.Additionally,wepartitionmem-oryintoalignedslotswithslot sizebytes.Theboundstablehasanentryforeachslotratherthananentryperbyte.Sothespaceoverheadofthetableis1=slot size,andwecantuneslot sizetobalancememorywastebe-tweenpaddingandtablesize.Wealignobjectstoslotboundariestoensurethatnotwoobjectsshareaslot.Accessestothetablearefast.Toobtainapointertotheentrycorrespondingtoanaddress,weright-shiftthead-dressbytheconstantlog2(slot size)andaddthecon-stanttablebase.Wecanusethispointertoretrievetheboundsinformationwithasinglememoryaccess,insteadofhavingtotraverseandsplayasplaytree(asinprevioussolutions).Notethatbaggyboundscheckingpermitsbenignout-of-boundsaccessestothememorypaddingafteranob-ject.Thisdoesnotcompromisesecuritybecausetheseaccessescannotwriteorreadotherobjects.Theycannotbeexploitedfortypicalattackssuchas(a)overwritingareturnaddress,functionpointerorothersecuritycriticaldata;or(b)readingsensitiveinformationfromanotherobject,suchasapassword.Wealsodefendagainstalessobviousattackwheretheprogramreadsvaluesfromthepaddingareathatwereoriginallywrittentoadeletedobjectthatoccupiedthesamememory.Wepreventthisattackbyclearingthepaddingonmemoryallocation.Pointerarithmeticoperation: p'=p+iExplicitboundscheck: size=1table[p&#x-604;&#x.500;&#x-604;&#x.500;slot_size]base=p&˜(size-1)p'&#x-604;&#x.500;=base&&p'-basesizeOptimizedboundscheck: (pˆp')��table[p��slot_size]==0Figure3:Baggyboundsenablesoptimizedboundschecks:wecanverifythatpointerp'derivedfrompointerpiswithinboundsbysimplycheckingthatpandp'havethesameprexwithonlytheeleastsignicantbitsmodied,whereeisthebinarylogarithmoftheallo-cationsize.2.2EfcientChecksIngeneral,boundscheckingtheresultp'ofpointerarithmeticonpinvolvestwocomparisons:oneagainstthelowerboundandoneagainsttheupperbound,asshowninFigure3.Wedevisedanoptimizedboundscheckthatdoesnotevenneedtocomputethelowerandupperbounds.Itusesthevalueofpandthevalueofthebinarylogarithmoftheallocationsize,e,retrievedfromtheboundstable.Theconstraintsonallocationsizeandalignmentensurethatp'iswithintheallocationboundsifitdiffersfromponlyintheeleastsignicantbits.Therefore,itissuf-cienttoshiftpˆp'byeandcheckiftheresultiszero,asshowninFigure3.Furthermore,forpointersp'wheresizeof(*p')�1,wealsoneedtocheckthat(char*)p'+sizeof(*p')-1iswithinboundstopreventasubsequentaccessto*p'fromcrossingtheallocationbounds.Baggyboundscheckingcanavoidthisextracheckifp'pointstoabuilt-intype.Alignedaccessestothesetypescannotoverlapanallocationboundarybecausetheirsizeisapoweroftwoandislessthanslot size.Whencheckingpointerstostructuresthatdonotsatisfytheseconstraints,weperformbothchecks.2.3InteroperabilityBaggyboundscheckingworksevenwheninstrumentedcodeislinkedagainstlibrariesthatarenotinstrumented. Thelibrarycodeworkswithoutchangebecauseitper-formsnochecksbutitisnecessarytoensurethatinstru-mentedcodeworkswhenaccessingmemoryallocatedinanuninstrumentedlibrary.Thisformofinteroperabil-ityisimportantbecausesomelibrariesaredistributedinbinaryform.Weachieveinteroperabilitybyusingthebinaryloga-rithmofthemaximumallocationsizeasthedefaultvalueforboundstableentries.Instrumentedcodeoverwritesthedefaultvalueonallocationswiththelogarithmoftheallocationsizeandrestoresthedefaultvalueondeallo-cations.Thisensuresthattableentriesforobjectsal-locatedinuninstrumentedlibrariesinheritthedefaultvalue.Therefore,instrumentedcodecanperformchecksasnormalwhenaccessingmemoryallocatedinalibrary,butcheckingiseffectivelydisabledfortheseaccesses.Wecouldinterceptheapallocationsinlibrarycodeatlinktimeandusethebuddyallocatortoenableboundschecksonaccessestolibrary-allocatedmemory,butthisisnotdoneinthecurrentprototype.2.4SupportforOut-Of-BoundsPointersApointermaylegallypointoutsidetheobjectboundsinC.Suchpointersshouldnotbedereferencedbutcanbecomparedandusedinpointerarithmeticthatcaneventu-allyresultinavalidpointerthatmaybedereferencedbytheprogram.Out-of-boundspointerspresentachallengefortheref-erentobjectapproachbecauseitreliesonanin-boundspointertoretrievetheobjectbounds.TheCstandardonlyallowsout-of-boundspointerstooneelementpasttheendofanarray.JonesandKelly[19]supporttheselegalout-of-boundspointersbypaddingobjectswithonebyte.Wedidnotusethistechniquebecauseitinteractspoorlywithourconstraintsonallocationsizes:addingonebytetoanallocationcandoubletheallocatedsizeinthecommoncasewheretherequestedallocationsizeisapoweroftwo.ManyprogramsviolatetheCstandardandgenerateille-galbutharmlessout-of-boundspointersthattheyneverdereference.Examplesincludefakingabaseonearraybydecrementingthepointerreturnedbymallocandotherequallytastelessuses.CRED[30]improvedontheJonesandKellyboundschecker[19]bytrackingsuchpointersusinganotherauxiliarydatastructure.Wedidnotusethisapproachbecauseitaddsoverheadondeal-locationsofheapandlocalobjects:whenanobjectisdeallocatedtheauxiliarydatastructuremustbesearchedtoremoveentriestrackingout-of-boundspointerstotheobject.Additionally,entriesinthisauxiliarydatastruc- !"#$%&$'%"()*+,%-(#./-(+'%##%0+123&+%&+*3%#!"#$%&$'%"()*+,%-(#./-(+#%,+123&+%&+*3%# *3%# *3%# %'4.5# Figure4:Wecantellwhetherapointerthatisout-of-boundsbylessthanslot size=2isbeloworaboveanallocation.Thisletsuscorrectlyadjustittogetapointertotheobjectbyrespectivelyaddingorsubtract-ingslot size.turemayaccumulateuntiltheirreferentobjectisdeallo-cated.Wehandleout-of-boundspointerswithinslot size=2bytesfromtheoriginalobjectasfollows.First,wemarkout-of-boundspointerstopreventthemfrombeingderef-erenced(asin[15]).Weusethememoryprotectionhard-waretopreventdereferencesbysettingthemostsigni-cantbitinthesepointersandbyrestrictingtheprogramtothelowerhalfoftheaddressspace(thisisoftenal-readythecaseforuser-spaceprograms).Wecanrecovertheoriginalpointerbyclearingthebit.Thenextchallengeistorecoverapointertothereferentobjectfromtheout-of-boundspointerwithoutresortingtoanadditionaldatastructure.Wecandothisforthecommoncasewhenout-of-boundspointersareatmostslot size=2bytesbeforeoraftertheallocation.Sincetheallocationboundsarealignedtoslotboundaries,wecanndifamarkedpointerisbeloworabovethealloca-tionbycheckingwhetheritliesinthetoporbottomhalfofamemoryslotrespectively,asillustratedinFigure4.Wecanrecoverapointertothereferentobjectbyaddingorsubtractingslot sizebytes.Thistechniquecannothandlepointersthatgomorethanslot size=2bytesout-sidetheoriginalobject.InSection5.2,weshowhowtotakeadvantageofthesparebitsinpointerson64bitarchitecturestoincreasethisrange,andinSection7wediscusshowwecouldaddsupportforarbitraryout-of-boundspointerswhileavoidingsomeoftheproblemsofprevioussolutions.Itisnotnecessarytoinstrumentpointerdereferences.Similarly,thereisnoneedtoinstrumentpointerequal-itycomparisonsbecausethecomparisonwillbecorrectwhetherthepointersareout-of-boundsornot.Butweneedtoinstrumentinequalitycomparisonstosupport comparinganout-of-boundspointerwithanin-boundsone:theinstrumentationmustclearthehigh-orderbitofthepointersbeforecomparingthem.Wealsoinstrumentpointerdifferencesinthesameway.Likepreviousboundscheckingsolutions[19,30,15],wedonotsupportpassinganout-of-boundspointertounin-strumentedcode.However,thiscaseisrare.Previouswork[30]didnotencounterthiscaseinseveralmillionlinesofcode.2.5StaticAnalysisBoundscheckinghasreliedheavilyonstaticanalysistooptimizeperformance[15].Checkscanbeeliminatedifitcanbestaticallydeterminedthatapointerissafe,i.e.alwayswithinbounds,orthatacheckisredundantduetoapreviouscheck.Furthermore,checksorjusttheboundslookupcanbehoistedoutofloops.Wehavenotimple-mentedasophisticatedanalysisand,instead,focusedonmakingchecksefcient.Nevertheless,ourprototypeimplementsasimpleintra-proceduralanalysistodetectsafepointeroperations.Wetrackallocationsizesandusethecompiler'svari-ablerangeanalysistoeliminatechecksthatarestaticallyshowntobewithinbounds.Wealsoinvestigateanap-proachtohoistchecksoutofloopsthatisdescribedinSection3.Wealsousestaticanalysistoreducethenumberoflocalvariablesthatarepaddedandaligned.Weonlypadandalignlocalvariablesthatareindexedunsafelywithinthefunction,orwhoseaddressistaken,andthereforepos-siblyleakedfromthefunction.Wecallthesevariablesunsafe.3ImplementationWeusedtheMicrosoftPhoenix[22]codegenerationframeworktoimplementaprototypesystemforx86ma-chinesrunningMicrosoftWindows.Thesystemconsistsofaplug-intothePhoenixcompilerandaruntimesup-portlibrary.Intherestofthissection,wedescribesomeimplementationdetails.3.1BoundsTableWechoseaslot sizeof16bytestoavoidpenalizingsmallallocations.Therefore,wereserve1=16thoftheaddressspacefortheboundstable.Sincepagesareal-locatedtothetableondemand,thisincreasesmemoryutilizationbyonly6.25%.Wereservetheaddressspacerequiredfortheboundstableonprogramstartupandin-stallauserspacepagefaulthandlertoallocatemissingtablepagesondemand.Allthebytesinthesepagesareinitializedbythehandlertothevalue31,whichencom-passesalltheaddressablememoryinthex86(analloca-tionsizeof231atbaseaddress0).Thispreventsout-of-boundserrorswheninstrumentedcodeaccessesmemoryallocatedbyuninstrumentedcode.3.2PaddingandAligningWeuseabinarybuddyallocatortosatisfythesizeandalignmentconstraintsonheapallocations.Binarybuddyallocatorsprovidelowexternalfragmentationbutsufferfrominternalfragmentationbecausetheyroundalloca-tionsizestopowersoftwo.Thisshortcomingisputtogooduseinoursystem.Ourbuddyallocatorimplemen-tationsupportsaminimumallocationsizeof16bytes,whichmatchesourslot sizeparameter,toensurethatnotwoobjectssharethesameslot.Weinstrumenttheprogramtouseourversionofmalloc-styleheapallocationfunctionsbasedonthebuddyallocator.Thesefunctionssetthecorrespondingboundstableentriesandzerothepaddingareaafteranobject.Forlocalvariables,wealignthestackframesoffunctionsthatcontainunsafelocalvariablesatruntimeandweinstrumentthefunctionentrytozerothepaddingandupdatetheappropriateboundstableentries.Wealsoinstrumentfunctionexittoresettableentriesto31forinteroperabilitywhenuninstrumentedcodereusesstackmemory.Wealignandpadstaticvariablesatcompiletimeandtheirboundstableentriesareinitializedwhentheprogramstartsup.Unsafefunctionargumentsareproblematicbecausepaddingandaligningthemwouldviolatethecallingcon-vention.Instead,wecopythemonfunctionentrytoap-propriatelyalignedandpaddedlocalvariablesandwechangeallreferencestousethecopies(exceptforusesofva_listthatneedtheaddressofthelastexplicitar-gumenttocorrectlyextractsubsequentarguments).Thispreservesthecallingconventionwhileenablingboundscheckingforfunctionarguments.TheWindowsruntimecannotalignstackobjectstomorethan8Knorstaticobjectstomorethan4K(congurableusingthe/ALIGNlinkerswitch).Wecouldreplacetheselargestackandstaticallocationswithheapalloca-tionstoremovethislimitationbutourcurrentprototypesetstheboundstableentriesfortheseobjectsto31.Zeroingthepaddingafteranobjectcanincreasespaceandtimeoverheadforlargepaddingareas.Weavoidthis overheadbyrelyingontheoperatingsystemtozeroal-locatedpagesondemand.Thenwetrackthesubsetofthesepagesthatismodiedandwezeropaddingareasinthesepagesonallocations.Similarissuesarediscussedin[9]andthestandardallocatorusesasimilartechniqueforcalloc.Ourbuddyallocatoralsousesthistech-niquetoavoidexplicitlyzeroinglargememoryareasal-locatedwithcalloc.3.3ChecksWeaddchecksforeachpointerarithmeticandarrayin-dexingoperationbut,following[15],wedonotinstru-mentaccessestoscalareldsinstructuresandwedonotcheckpointerdereferences.Thisfacilitatesadirectcom-parisonwith[15].Wecouldeasilymodifyourimple-mentationtoperformthesechecks,forexample,usingthetechniquedescribedin[14].Weoptimizeboundschecksforthecommoncaseofin-boundspointers.Toavoidcheckingifapointerismarkedout-of-boundsinthefastpath,wesetalltheentriesintheboundstablethatcorrespondtoout-of-boundspointerstozero.Sinceout-of-boundspointershavetheirmostsignicantbitset,weimplementthisbymappingallthevirtualmemorypagesinthetophalfoftheboundstabletoasharedzeropage.Thisensuresthatourslowpathhandlerisinvokedonanyarithmeticoperationinvolvingapointermarkedout-of-bounds.boundslookup8�&#x-2.4;䌡:moveax,bufshreax,4moval,byteptr[TABLE+eax]pointerarithmetic8:char*p=buf[i];boundscheck8&#x-2.4;䌡&#x-2.4;䌡&#x-2.4;䌡&#x-2.4;䌡:movebx,bufxorebx,pshrebx,aljzokp=slowPath(buf,p)ok:Figure5:Codesequenceinsertedtocheckunsafepointerarithmetic.Figure5showsthex86codesequencethatweinsertbe-foreanexamplepointerarithmeticoperation.First,thesourcepointer,buf,isrightshiftedtoobtaintheindexoftheboundstableentryforthecorrespondingslot.Thenthelogarithmoftheallocationsizeeisloadedfromtheboundstableintoregisteral.Theresultofthepointerarithmetic,p,isxoredwiththesourcepointer,buf,andrightshiftedbyaltodiscardthebottombits.Ifbufandparebothwithintheallocationboundstheycanonlydifferinthelog2eleastsignicantbits(asdiscussedbe-fore).Soifthezeroagisset,piswithintheallocationbounds.Otherwise,theslowPathfunctioniscalled.TheslowPathfunctionstartsbycheckingifbufhasbeenmarkedout-of-bounds.Inthiscase,itobtainsthereferentobjectasdescribedin2.4,resetsthemostsig-nicantbitinp,andreturnstheresultifitiswithinbounds.Otherwise,theresultisout-of-bounds.Iftheresultisout-of-boundsbymorethanhalfaslot,thefunc-tionsignalsanerror.Otherwise,itmarkstheresultout-of-boundsandreturnsit.Anyattempttoderefer-encethereturnedpointerwilltriggeranexception.Toavoiddisturbingregisterallocationinthefastpath,theslowPathfunctionusesaspecialcallingconventionthatsavesandrestoresallregisters.AsdiscussedinSection3.3,wemustaddsizeof(*p)totheresultandperformasecondcheckifthepointerisnotapointertoabuilt-intype.Inthiscase,bufisachar*.Similartopreviouswork,weprovideboundscheck-ingwrappersforStandardCLibraryfunctionssuchasstrcpyandmemcpythatoperateonpointers.Were-placeduringinstrumentationcallstothesefunctionswithcallstotheirwrappers.3.4OptimizationsTypicaloptimizationsusedwithboundscheckingin-cludeeliminatingredundantchecks,hoistingchecksoutofloops,orhoistingjustboundstablelookupsoutofloops.Optimizationofinnerloopscanhaveadramaticimpactonperformance.Weexperimentedwithhoistingboundstablelookupsoutofloopswhenallaccessesin-sidealoopbodyaretothesameobject.Unfortunately,performancedidnotimprovesignicantly,probablybe-causeourboundslookupsareinexpensiveandhoistingcanadverselyeffectregisterallocation.Hoistingthewholecheckoutofaloopispreferablewhenstaticanalysiscandeterminesymbolicboundsonthepointervaluesintheloopbody.However,hoistingoutthecheckisonlypossibleiftheanalysiscandeterminethattheseboundsareguaranteedtobereachedineveryexecution.Figure6showsanexamplewheretheloopboundsareeasytodeterminebuttheloopmayterminatebeforereachingtheupperbound.Hoistingoutthecheckwouldtriggerafalsealarminrunswheretheloopexitsbeforeviolatingthebounds.Weexperimentedwithanapproachthatgeneratestwoversionsoftheloopcode,onewithchecksandonewith-out.Weswitchbetweenthetwoversionsonloopentry. IntheexampleofFigure6,welookuptheboundsofpandifndoesnotexceedthesizeweruntheuncheckedversionoftheloop.Otherwise,werunthecheckedver-sion.for(i=0;in;i++){if(p[i]==0)break;ASSERT(IN_BOUNDS(p,&p[i]));p[i]=0;}#if(IN_BOUNDS(p,&p[n-1])){for(i=0;in;i++){if(p[i]==0)break;p[i]=0;}}else{for(i=0;in;i++){if(p[i]==0)break;ASSERT(IN_BOUNDS(p,&p[i]));p[i]=0;}}Figure6:Thecompiler'srangeanalysiscandeterminethattherangeofvariableiisatmost0:::n1.However,theloopmayexitbeforeireachesn1.Topreventerro-neouslyraisinganerror,wefallbacktoaninstrumentedversionoftheloopifthehoistedcheckfails.4ExperimentalEvaluationInthissectionweevaluatetheperformanceofoursys-temusingCPUintensivebenchmarks,itseffectivenessinpreventingattacksusingabufferoverowsuite,anditsusabilitybybuildingandmeasuringtheperformanceofrealworldsecuritycriticalcode.4.1PerformanceWeevaluatethetimeandpeakmemoryoverheadofoursystemusingtheOldenbenchmarksandSPECINT2000.Wechosethesebenchmarksinparttoallowacomparisonagainstresultsreportedforsomeotherso-lutions[15,36,23].Inaddition,toenableamorede-tailedcomparisonwithsplay-tree-basedapproaches—includingmeasuringtheirspaceoverhead—weimple-mentedavariantofourapproachwhichusesthesplaytreecodefromprevioussystems[19,30].Thisimple-mentationusesthestandardallocatorandislackingsup-portforillegalout-of-boundspointers,butisotherwiseidenticaltooursystem.WecompiledallbenchmarkswiththePhoenixcompilerusing/O2optimizationlevelandranthemona2.33GHzIntelCore2Duoprocessorwith2GBofRAM.FromSPECINT2000weexcludedeonsinceitusesC++whichwedonotyetsupport.Foroursplay-tree-basedimplementationonlywedidnotrunvprduetoitslackofsupportforillegalout-of-boundspointers.WealsocouldnotrungccbecauseofcodethatsubtractedapointerfromaNULLpointerandsubtractedtheresultfromNULLagaintorecoverthepointer.Runningthiswouldrequiremorecomprehensivesupportforout-of-boundspointers(suchasthatdescribedin[30],asweproposeinSection7).Wemadethefollowingmodicationstosomeofthebenchmarks:First,wemodiedparserfromSPECINT2000toxanoverowthattriggeredabounderrorwhenusingthesplaytree.Itdidnottriggeranerrorwithbaggyboundscheckingbecauseinourrunstheoverowwasentirelycontainedintheallocation,butshoulditoverlapanotherobjectduringarun,thebaggycheckingwoulddetectit.Theuncheckedprogramalsosurvivedourrunsbecausetheobjectwassmallenoughfortheoverowtobecontainedeveninthepaddingaddedbythestandardallocator.Then,wehadtomodifyperlbmkbychangingtwolinestopreventanout-of-boundsarithmeticwhoseresultisneverusedandgapbychanging5linestoavoidanout-of-boundspointer.Bothcasescanbehandledbytheex-tensiondescribedinSection5,butarenotcoveredbythesmallout-of-boundsrangesupportedbyour32-bitim-plementationandthesplay-tree-basedimplementation.Finally,wemodiedmstfromOldentodisableacus-tomallocatorthatallocates32Kbytechunksofmem-oryatatimethatarethenbrokendownto12byteob-jects.Thisincreasesprotectionatthecostofmemoryallocationoverheadandremovesanunfairadvantageforthesplaytreewhosetimeandspaceoverheadsaremini-mizedwhenthetreecontainsjustafewnodes,aswellasbaggyspaceoverheadthatbenetsfromthepoweroftwoallocation.Thisissue,sharedwithothersystemsofferingprotectionatthememoryblocklevel[19,30,36,15,2],illustratesafrequentsituationinCprogramsthatmayre-quiretweakingmemoryallocationroutinesinthesourcecodetotakefulladvantageofchecking.Inthiscasemerelychangingamacrodenitionwassufcient.Werstranthebenchmarksreplacingthestandardallo-catorwithourbuddysystemallocatortoisolateitsef-fectsonperformance,andthenweranthemusingourfullsystem.FortheOldenbenchmarks,Figure7showstheexecutiontimeandFigure8thepeakmemoryusage.InFigure7weobservethatsomebenchmarksintheOldensuite(mst,health)runsignicantlyfasterwith Figure7:ExecutiontimefortheOldenbenchmarksus-ingthebuddyallocatorandourfullsystem,normalizedbytheexecutiontimeusingthestandardsystemallocatorwithoutinstrumentation. Figure8:PeakmemoryusewiththebuddyallocatoraloneandwiththefullsystemfortheOldenbenchmarks,normalizedbypeakmemoryusingthestandardallocatorwithoutinstrumentation.thebuddyallocatorthanwiththestandardone.Thesebenchmarksarememoryintensiveandanymemorysav-ingsreectontherunningtime.InFigure8wecanseethatthebuddysystemuseslessmemoryforthesethanthestandardallocator.Thisisbecausethesebench-markscontainnumeroussmallallocationsforwhichthepaddingtosatisfyalignmentrequirementsandtheper-allocationmetadatausedbythestandardallocatorex-ceedtheinternalfragmentationofthebuddysystem.Thismeansthattheaveragetimeoverheadofthefullsys-temacrosstheentireOldensuiteisactuallyzero,becausethepositiveeffectsofusingthebuddyallocatormaskthecostsofchecks.Thetimeoverheadofthechecksaloneasmeasuredagainstthebuddyallocatorasabaselineis6%.Theoverheadofthefastestpreviousboundscheck-ingsystem[15]onthesamebenchmarksandsamepro-tection(moduloallocationvs.objectbounds)is12%,buttheirsystemalsobenetsfromthetechniqueofpoolallocationwhichcanalsobeusedindependently.Basedonthebreakdownofresultsreportedin[15],theirover-headmeasuredagainstthepoolallocationis15%,anditseemsmorereasonabletocomparethesetwonumbers, Figure9:ExecutiontimeforSPECINT2000benchmarksusingthebuddyallocatorandourfullsystem,normalizedbytheexecutiontimeusingthestandardsystemallocatorwithoutinstrumentation. Figure10:PeakmemoryusewiththebuddyallocatoraloneandwiththefullsystemforSPECINT2000bench-marks,normalizedbypeakmemoryusingthestandardallocatorwithoutinstrumentation.asboththebuddyallocatorandpoolallocationcanbeinprincipleappliedindependentlyoneithersystem.NextwemeasuredthesystemusingtheSPECINT2000benchmarks.Figures9and10showthetimeandspaceoverheadsforSPECINT2000benchmarks.Weobservethattheuseofthebuddysystemhaslittleeffectonperformanceinaverage.TheaverageruntimeoverheadofthefullsystemwiththebenchmarksfromSPECINT2000is60%.vprhasthehighestoverheadof127%becauseitsfrequentuseofillegalpointerstofakebase-onearraysinvokesourslowpath.Weobservedthatadjustingtheallocatortopadeachallocationwith8bytesfrombelow,decreasesthetimeoverheadto53%withonly5%addedtothememoryusage,althoughingeneralwearenotinterestedintuningthebenchmarkslikethis.Interestingly,theoverheadformcfisamere16%comparedtothe185%in[36]buttheoverheadofgzipis55%comparedto15%in[36].Suchdifferencesinperformanceareduetodifferentlevelsofprotectionsuchascheckingstructureeldindexingandcheckingdereferences,theeffectivenessofdifferentstaticanaly-sisimplementationsinoptimizingawaychecks,andthe Figure11:Executiontimeofbaggyboundscheckingver-sususingasplaytreefortheOldenbenchmarksuite,nor-malizedbytheexecutiontimeusingthestandardsystemallocatorwithoutinstrumentation.Benchmarksmstandhealthusedtoomuchmemoryandthrashedsotheirexecutiontimesareexcluded. Figure12:Executiontimeofbaggyboundscheckingver-sususingasplaytreeforSPECINT2000benchmarks,normalizedbytheexecutiontimeusingthestandardsys-temallocatorwithoutinstrumentation.differentcompilersused.Toisolatetheseeffects,wealsomeasuredoursystemus-ingthestandardmemoryallocatorandthesplaytreeim-plementationfromprevioussystems[19,30].Figure11showsthetimeoverheadforbaggyboundsversususingasplaytreefortheOldenbenchmarks.ThesplaytreerunsoutofphysicalmemoryforthelasttwoOldenbench-marks(mst,health)andslowsdowntoacrawl,soweexcludethemfromtheaverageof30%forthesplaytree.Figure12comparesthetimeoverheadagainstus-ingasplaytreefortheSPECINT2000benchmarks.Theoverheadofthesplaytreeexceeds100%forallbench-marks,withanaverageof900%comparedtotheaverageof60%forbaggyboundschecking.Perhapsthemostinterestingresultofourevaluationwasspaceoverhead.Previoussolutions[19,30,15]donotreportonthememoryoverheadsofusingsplaytrees,sowemeasuredthememoryoverheadofoursystemusingsplaytreesandcompareditwiththememoryoverheadofbaggybounds.Figure13showsthatoursystemhad Figure13:PeakmemoryuseofbaggyboundscheckingversususingasplaytreefortheOldenbenchmarksuite,normalizedbypeakmemoryusingthestandardallocatorwithoutinstrumentation. Figure14:PeakmemoryuseofbaggyboundscheckingversususingasplaytreeforSPECINT2000benchmarks,normalizedbypeakmemoryusingthestandardallocatorwithoutinstrumentation.negligiblememoryoverheadforOlden,asopposedtothesplaytreeversion's170%overhead.ClearlyOlden'snu-meroussmallallocationsstressthesplaytreebyforcingittoallocateanentryforeach.Indeed,weseeinFigure14thatitsspaceoverheadformostSPECINT2000benchmarksisverylow.Neverthe-less,theoverheadof15%forbaggyboundsislessthanthe20%averageofthesplaytree.Furthermore,thepo-tentialworstcaseofdoublethememorywasnotencoun-teredforbaggyboundsinanyofourexperiments,whilethesplaytreedidexhibitgreaterthan100%overheadforonebenchmark(twolf).Thememoryoverheadisalsolow,asexpected,comparedtoapproachesthattrackmetadataforeachpointer.Xuetal.[36]report331%forOlden,andNagarakatteetal.[23]reportanaverageof87%usingahash-table(and64%usingacontiguousarray)overOldenandasubsetofSPECINTandSPECFP,butmorethanabout260%(orabout170%usingthearray)forthepointerintensiveOldenbenchmarksalone.Thesesystemssuffermemoryoverheadsperpointerinordertoprovideoptionaltem-poralprotection[36]andsub-objectprotection[23]and Figure15:ThroughputofApachewebserverforvaryingnumbersofconcurrentrequests. Figure16:ThroughputofNullHTTPDwebserverforvaryingnumbersofconcurrentrequests.itisinterestingtocontrastwiththemalthoughtheyarenotdirectlycomparable.4.2EffectivenessWeevaluatedtheeffectivenessofoursysteminpre-ventingbufferoverowsusingthebenchmarksuitefrom[34].Theattacksrequiredtuningtohaveanychanceofsuccess,becauseoursystemchangesthestackframelayoutandcopiesunsafefunctionargumentstolo-calvariables,butthebenchmarksusetheaddressoftherstfunctionargumenttondthelocationofthereturnaddresstheyaimtooverwrite.Baggyboundscheckingprevented17outof18bufferoverowsinthesuite.Itfailed,however,topreventtheoverowofanarrayinsideastructurefromoverwritingapointerinsidethesamestructure.Thislimitationisalsosharedwithothersystemsthatdetectmemoryerrorsatthelevelofmemoryblocks[19,30,36,15].4.3SecurityCriticalCOTSApplicationsFinally,toverifytheusabilityofourapproach,webuiltandmeasuredafewadditionallargerandsecuritycriticalProgram KSLOC openssl-0.9.8k 397Apache-2.2.11 474nullhttpd-0.5.1 2libpng-1.2.5 36SPECINT2000 309Olden 6 Total 1224Table1:Sourcelinesofcodeinprogramssuccessfullybuiltandrunwithbaggybounds.COTSapplications.Table1liststhetotalnumberoflinescompiledinourexperiments.WebuilttheOpenSSLtoolkitversion0.9.8k[28]com-prisedofabout400KSLOC,andexecuteditstestsuitemeasuring10%timeand11%memoryoverhead.Thenwebuiltandmeasuredtwowebservers,Apache[31]andNullHTTPD[27].RunningNull-HTTPDrevealedthreeboundsviolationssimilarto,andincluding,theonereportedin[8].WeusedtheApachebenchmarkutilitywiththekeep-aliveoptiontocom-parethethroughputoveraLANconnectionofthein-strumentedanduninstrumentedversionsofbothwebservers.WemanagedtosaturatetheCPUbyusingthekeep-aliveoptionofthebenchmarkingutilitytoreuseconnectionsforsubsequentrequests.Weissuedrepeatedrequestsfortheservers'defaultpagesandvariedthenumberofconcurrentclientsuntilthethroughputoftheuninstrumentedversionleveledoff(Figures15and16).Weveriedthattheserver'sCPUwassaturatedatthispoint,andmeasuredathroughputdecreaseof8%forApacheand3%forNullHTTPD.Finally,webuiltlibpng,anotoriouslyvulnerabilitypronelibrarythatiswidelyused.Wesuccessfullyranitstestprogramfor1000PNGlesbetween1–2Kfoundonadesktopmachine,andmeasuredanaverageruntimeoverheadof4%andapeakmemoryoverheadof3.5%.564-bitArchitecturesInthissectionweverifyandinvestigatewaystooptimizeourapproachon64bitarchitectures.Thekeyobserva-tionisthatpointersin64bitarchitectureshavesparebitstouse.InFigure17(a)and(b)weseethatcurrentmodelsofAMD64processorsuse48outof64bitsinpointers,andWindowsfurtherlimitthisto43bitsforuserspaceprograms.Thus21bitsinthepointerrepresentationarenotused.Nextwedescribetwousesforthesesparebits,andpresentaperformanceevaluationonAMD64. !"#$%&'(&$)&) !*++,-(&)%.))-&!!%!+./& 0&-, !"0& !,1(2.-&%.))-&!!%!+./& *!&-%.))-&!!%!+./&345%6784"(%9"$),2!%*!&-8!+./&3/5%:.##&)%+,"$(&- !*++,-(&)%.))-&!!%!+./& 0&-, ,11!&( !"0&3)5%;*(8,184,*$)!%(.##&)%+,"$(&- 0&-, 7 =6 �7 � ?= @ � �= @ ?=3.5%ABC67%D.-)2.-& Figure17:UseofpointerbitsbyAMD64hardware,Win-dowsapplications,andbaggyboundstaggedpointers.5.1SizeTaggingSincebaggyboundsoccupylessthanabyte,theycantina64bitpointer'ssparebits,removingtheneedforaseparatedatastructure.Thesetaggedpointersaresimilartofatpointersinchangingthepointerrepresentationbuthaveseveraladvantages.First,taggedpointersretainthesizeofregularpointers,avoidingfatpointers'registerandmemorywaste.More-over,theirmemorystoresandloadsareatomic,unlikefatpointersthatbreakcoderelyingonthis.Finally,theypre-servethememorylayoutofstructures,overcomingthemaindrawbackoffatpointersthatbreakstheirinterop-erabilitywithuninstrumentedcode.Forinteroperability,wemustalsoenableinstrumentedcodetousepointersfromuninstrumentedcodeandviceversa.Weachievetheformerbyinterpretingthede-faultzerovaluefoundinunusedpointerbitsasmaxi-malbounds,sochecksonpointersmissingboundssuc-ceed.Theotherdirectionisharderbecausewemustavoidraisingahardwareexceptionwhenuninstrumentedcodedereferencesataggedpointer.Wesolvedthisusingthepaginghardwaretomapallad-dressesthatdifferonlyintheirtagbitstothesamemem-ory.Thisway,unmodiedbinarylibrariescanusetaggedpointers,andinstrumentedcodeavoidsthecostofclear-ingthetagtoo.AsshowninFigure17(c),weuse5bitstoencodethesize,allowingobjectsupto232bytes.Inordertousethepaginghardware,these5bitshavetocomefromthe43bitssupportedbytheoperatingsystem,thusleaving38bitsofaddressspaceforprograms.With5addressbitsusedforthebounds,weneedtomap32differentaddressregionstothesamemem-ory.WeimplementedthisentirelyinuserspaceusingtheCreateFileMappingandMapViewOfFileExWindowsAPIfunctionstoreplacetheprocessimage,stack,andheapwithalebackedbythesystempagingleandmappedat32differentlocationsintheprocessaddressspace.Weusethe5bitseffectivelyignoredbythehardwaretostorethesizeofmemoryallocations.Forheapalloca-tions,ourmalloc-stylefunctionssetthetagsforpoint-erstheyreturn.Forlocalsandglobals,weinstrumenttheaddresstakingoperator“&”toproperlytagtheresultingpointer.Westorethebitcomplementofthesizelog-arithmenablinginteroperabilitywithuntaggedpointersbyinterpretingtheirzerobitpatternasallbitsset(repre-sentingamaximalallocationof232).extractbounds(movrax,bufshrrax,26hxorrax,1fhpointerarithmetic8:char*p=buf[i];boundscheck8&#x-2.4;䌡&#x-2.4;䌡&#x-2.4;䌡&#x-2.4;䌡:movrbx,bufxorrbx,pshrrbx,aljzokp=slowPath(buf,p)ok:Figure18:AMD64codesequenceinsertedtocheckun-safearithmeticwithtaggedpointers.Withtheboundsencodedinpointers,thereisnoneedforamemorylookuptocheckpointerarithmetic.Figure18showstheAMD64codesequenceforcheckingpointerarithmeticusingataggedpointer.First,weextracttheencodedboundsfromthesourcepointerbyrightshiftingacopytobringthetagtothebottom8bitsoftheregisterandxoringthemwiththevalue0x1ftorecoverthesizelogarithmbyinvertingthebottom5bits.Thenwecheckthattheresultofthearithmeticiswithinboundsbyxor-ingthesourceandresultpointers,shiftingtheresultbythetagstoredinal,andcheckingforzero.Similartothetable-basedimplementationofSection3,out-of-boundspointerstriggeraboundserrortosimplifythecommoncase.Tocausethis,wezerothebitsthatwereusedtoholdthesizeandsavethemusing5morebitsinthepointer,asshowninFigure17(d). Figure19:NormalizedexecutiontimeonAMD64withOldenbenchmarks. Figure20:NormalizedexecutiontimeonAMD64withSPECINT2000benchmarks.5.2Out-Of-BoundsOffsetThesparebitscanalsostoreanoffsetthatallowsustoadjustanout-of-boundspointertorecovertheaddressofitsreferentobject.Wecanuse13bitsforthisoffset,asshowninFigure17(d).Thesebitscancountslotorevenallocationsizemultiples,increasingthesupportedout-of-boundsrangetoatleast216bytesaboveorbelowanallocation.Thistechniquedoesnotdependonsizetaggingandcanbeusedwithatableinstead.Whenlookingupapointerinthetable,however,thetopbitshavetobemaskedoff.5.3EvaluationWeevaluatedbaggyboundscheckingonAMD64usingthesubsetofbenchmarksfromSection4.1thatrunun-modiedon64bits.Wemeasuredthesystemusingacontiguousarrayagainstthesystemusingtaggedpoint-ers(BaggyandTaginthegurelegendsrespectively).Wealsomeasuredtheoverheadusingthebuddyalloca-toronly.ThemultiplememorymappingscomplicatedmeasuringmemoryusebecauseWindowscountssharedmemory Figure21:NormalizedpeakmemoryuseonAMD64withOldenbenchmarks. Figure22:NormalizedpeakmemoryuseonAMD64withSPECINT2000benchmarks.multipletimesinpeakmemoryreports.Toovercomethis,wemeasuredmemoryusewithoutactuallytaggingthepointers,toavoidtouchingmorethanoneaddressforthesamememory,butwiththememorymappingsinplacetoaccountforatleastthetoplevelmemoryman-agementoverheads.Figures19and20showthetimeoverhead.Theaverageusingatableon64-bitsis4%forOldenand72%forSPECINT2000—closetothe32-bitresultsofSection3.Figures21and22showthespaceoverhead.Theaverageusingatableis21%forOldenand11%forSPECINT2000.Olden'sspaceoverheadishigherthanthe32-bitversion;unlikethe32-bitcase,thebuddyallocatorcon-tributestothisoverheadby14%onaverage.Taggedpointersare1–2%fasteronaveragethanthetable,anduseabout5%lessmemoryformostbench-marks,exceptafewonessuchaspowerandcrafty.Theseexceptionsarebecauseourprototypedoesnotmappagestodifferentaddressesondemand,butinsteadmaps3230-bitregionsofvirtualaddressspaceonprogramstartup.Hencethexedoverheadisnotableforthesebenchmarksbecausetheirabsolutememoryusageislow.Whilewesuccessfullyimplementedmappingmultipleviewsentirelyinuser-space,arobustimplementationwouldprobablyrequirekernelmodesupport.Wefeel thatthegainsaretoosmalltojustifythecomplex-ity.However,usingthesparebitstostoreanout-of-boundsoffsetisagoodsolutionfortrackingout-of-boundspointerswhenusingthereferentobjectapproachofJonesandKelly[19].6RelatedWorkManytechniqueshavebeenproposedtodetectmem-oryerrorsinCprograms.Staticanalysistechniques,e.g.,[33,21,7],candetectdefectsbeforesoftwareshipsandtheydonotintroduceruntimeoverhead,buttheycanmissdefectsandraisefalsealarms.Sincestatictechniquesdonotremovealldefects,theyhavebeencomplementedwithdynamictechniques.De-buggingtoolssuchasPurify[17]andAnnelid[25]canndmemoryerrorsduringtesting.Whilethesetoolscanbeusedwithoutsourcecode,theytypicallyslow-downapplicationsbyafactorof10ormore.Somedynamictechniquesdetectspecicerrorssuchasstackoverows[13,16,32]orformatstringexploits[12];theyhavelowoverheadbuttheycannotdetectallspa-tialmemoryerrors.Techniquessuchascontrol-owin-tegrity[20,1]ortainttracking(e.g.[10,26,11,35])de-tectbroadclassesoferrors,buttheydonotprovidegen-eralprotectionfromspatialmemoryerrors.Somesystemsprovideprobabilisticprotectionfrommemoryerrors[5].Inparticular,DieHard[4]increasesheapallocationsizesbyarandomamounttomakemoreout-of-boundserrorsbenignatalowperformancecost.Oursystemalsoincreasestheallocationsizebutenforcestheallocationboundstopreventerrorsandalsopro-tectsstack-allocatedobjectsinadditiontoheap-allocatedones.SeveralsystemspreventallspatialmemoryerrorsinCprograms.SystemssuchasSafeC[3],CCured[24],Cyclone[18],andthetechniqueinXuetal.[36]asso-ciateboundsinformationwitheachpointer.CCured[24]andCyclone[18]arememorysafedialectsofC.Theyextendthepointerrepresentationwithboundsinforma-tion,i.e.,theyuseafatpointerrepresentation,butthischangesmemorylayoutandbreaksbinarycompatibil-ity.Moreover,theyrequireasignicantefforttoportapplicationstothesafedialects.Forexample,CCuredrequiredchanging1287outof6000linesofcodefortheOldenbenchmarks[15],andanaverageof10%ofthelinesofcodehavetobechangedwhenportingprogramsfromCtoCyclone[34].CCuredhas28%averagerun-timeoverheadfortheOldenbenchmarks,whichissig-nicantlyhigherthanthebaggyboundsoverhead.Xuetal.[36]trackpointerstodetectspatialerrorsaswellastemporalerrorswithadditionaloverhead,thustheirspaceoverheadisproportionaltothenumberofpoint-ers.Theaveragetimeoverheadforspatialprotectiononthebenchmarksweoverlapis73%versus16%forbaggyboundswithaspaceoverheadof273%versus4%.Othersystemsmapanymemoryaddresswithinanal-locatedobjecttotheboundsinformationfortheobject.JonesandKelly[19]developedabackwardscompatibleboundscheckingsolutionthatusesasplaytreetomapaddressestobounds.Thesplaytreeisupdatedonallo-cationanddeallocation,andoperationsonpointersareinstrumentedtolookuptheboundsusinganin-boundspointer.Theadvantageoverpreviousapproachesusingfatpointersisinteroperabilitywithcodethatwascom-piledwithoutinstrumentation.Theyincreasetheallo-cationsizetosupportlegalout-of-boundspointersonebytebeyondtheobjectsize.Baggyboundscheckingofferssimilarinteroperabilitywithlesstimeandspaceoverhead,whichweevaluatedbyusingtheirimplemen-tationofsplaytreeswithoursystem.CRED[30]im-provesonthesolutionofJonesandKellybyaddingsup-portfortrackingout-of-boundspointersandmakingsurethattheyareneverdereferencedunlesstheyarebroughtwithinboundsagain.RealprogramsoftenviolatetheCstandardandcontainsuchout-of-boundspointersthatmaybesavedtodatastructures.Theperformanceover-headforprogramsthatdonothaveout-of-boundspoint-ersissimilartoJonesandKellyifthesamelevelofrun-timecheckingisused,buttheauthorsrecommendonlycheckingstringstolowertheoverheadtoacceptablelev-els.Forprogramsthatdocontainsuchout-of-boundspointersthecostoftrackingthemincludesscanningahash-tableoneverydereferencetoremoveentriesforout-of-boundspointers.Oursolutionismoreefcient,andweproposewaystotrackcommoncasesofout-of-boundspointersthatavoidusinganadditionaldatastruc-ture.ThefastestprevioustechniqueforboundscheckingbyDhurjatietal.[15]ismorethantwotimesslowerthanourprototype.Itusesinter-proceduraldatastructureanalysistopartitionallocationsintopoolsstaticallyandusesaseparatesplaytreeforeachpool.Theycanavoidinsertingsomeobjectsinthesplaytreewhentheanalysisndsthatapoolissize-homogeneous.Thisshouldsignicantlyreducethememoryusageofthesplaytreecomparedtoprevioussolutions,butunfortu-natelytheydonotreportmemoryoverheads.Thisworkalsooptimizesthehandlingofout-of-boundspointersinCRED[30]byrelyingonhardwarememoryprotectiontodetectthedereferenceofout-of-boundspointers.Thelatestproposal,SoftBound[23],tracksboundsforeachpointertoachievesub-objectprotection.Sub-object protection,however,mayintroducecompatibilityprob-lemswithcodeusingpointerarithmetictotraversestruc-tures.SoftBoundmaintainsinteroperabilitybystoringboundsinahashtableoralargecontiguousarray.Stor-ingboundsforeachpointercanleadtoaworstcasememoryfootprintashighas300%forthehash-tableversionor200%forthecontiguousarray.TheaveragespaceoverheadacrossOldenandasubsetofSPECINTandSPECFPis87%usingahash-tableand64%forthecontiguousarray,andtheaverageruntimeoverheadforcheckingbothreadsandwritesis93%forthehashta-bleand67%forthecontiguousarray.OuraveragespaceoverheadoverOldenandSPECINTis7.5%withanav-eragetimeoverheadof32%.Otherapproachesassociatedifferentkindsofmetadatawithmemoryregionstoenforcesafetyproperties.Thetechniquein[37]detectssomeinvalidpointersderefer-encesbymarkingallwritablememoryregionsandpre-ventingwritestonon-writablememory;itreportsanaverageruntimeoverheadof97%.DFI[8]computesreachingdenitionsstaticallyandenforcesthematrun-time.DFIhasanaverageoverheadof104%ontheSPECbenchmarks.WIT[2]computestheapproximatesetofobjectswrittenbyeachinstructionanddynamicallypre-ventswritestoobjectsnotintheset.WITdoesnotprotectfrominvalidreads,andissubjecttothepreci-sionofapoints-toanalysiswhendetectingsomeout-of-boundserrors.Ontheotherhand,WITcandetectac-cessestodeallocated/unallocatedobjectsandsomeac-cessesthroughdanglingpointerstore-allocatedobjectsindifferentanalysissets.WITissixtimesfasterthanbaggyboundscheckingforSPECINT2000,soitisalsoanattractivepointintheerrorcoverage/performancede-signspace.7LimitationsandFutureWorkOursystemsharessomelimitationswithothersolutionsbasedonthereferentobjectapproach.Arithmeticonin-tegersholdingaddressesisunchecked,castinganinte-gerthatholdsanout-of-boundsaddressbacktoapointerorpassinganout-of-boundspointertouncheckedcodewillbreaktheprogram,andcustommemoryallocatorsreduceprotection.Oursystemdoesnotaddresstemporalmemorysafetyviolations(accessesthrough“danglingpointers”tore-allocatedmemory).ConservativegarbagecollectionforC[6]isonewaytoaddressthesebutintroducesitsowncompatibilityissuesandunpredictableoverheads.Ourapproachcannotprotectfrommemoryerrorsinsub-objectssuchasstructureelds.Tooffersuchprotection,asystemmusttracktheboundsofeachpointer[23]andriskfalsealarmsforsomelegalprogramsthatusepoint-erstonavigateacrossstructureelds.InSection4wefoundtwoprogramsusingout-of-boundspointersbeyondtheslot size=2bytessupportedon32-bitsandonebeyondthe216bytessupportedon64-bits.UnfortunatelytherealapplicationsbuiltinSection4.3werelimitedtosoftwarewecouldreadilyporttotheWindowstoolchain;wideusewilllikelyencounterocca-sionalproblemswithout-of-boundspointers,especiallyon32-bitsystems.Weplantoextendedoursystemtosupportallout-of-boundspointersusingthedatastruc-turefrom[30],buttakeadvantageofthemoreefcientmechanismswedescribedforthecommoncases.TosolvethedelayeddeallocationproblemdiscussedinSec-tion6anddeallocateentriesassoonastheout-of-boundspointerisdeallocated,wecantrackout-of-boundspoint-ersusingthepointer'saddressinsteadofthepointer'sreferentobject'saddress.(Similartotheapproach[23]takesforallpointers.)Tooptimizescanningthisdatastructureoneverydeallocationwecanuseanarraywithanentryforeveryfewmemorypages.Asinglemem-oryreadfromthisarrayondeallocation(e.g.onfunc-tionexit)issufcienttoconrmthedatastructurehasnoentriesforamemoryrange.Thisisthecommoncasesincemostout-of-boundspointersarehandledbytheothermechanismswedescribedinthispaper.Ourprototypeusesasimpleintra-proceduralanalysistondsafeoperationsanddoesnoteliminateredundantchecks.Weexpectthatintegratingstateoftheartanaly-sestoreducethenumberofcheckswillfurtherimproveperformance.Finally,ourapproachtoleratesharmlessboundviola-tionsmakingitlesssuitablefordebuggingthanslowertechniquesthatcanuncovertheseerrors.Ontheotherhand,beingfastermakesitmoresuitableforproductionruns,andtoleratingfaultsinproductionrunsmaybede-sired[29].8ConclusionsAttacksthatexploitout-of-boundserrorsinCandC++continuetobeaserioussecurityproblem.Wepresentedbaggyboundschecking,abackwards-compatibleboundscheckingtechniquethatimplementsefcientboundschecks.Itimprovestheperformanceofboundschecksbycheckingallocationboundsinsteadofobjectboundsandbyusingabinarybuddyallocatortoconstrainthesizeandalignmentofallocationstopowersof2.Theseconstraintsenableaconciserepresentationforallocationboundsandletbaggyboundscheckingstorethisinfor- mationinanarraythatcanbelookedupandmaintainedefciently.Ourexperimentsshowthatreplacingasplaytree,whichwasusedtostoreboundsinformationinpre-vioussystems,byourarrayreducestimeoverheadbyanorderofmagnitudewithoutincreasingspaceoverhead.Webelievebaggyboundscheckingcanbeusedinprac-ticetohardensecurity-criticalapplicationsbecauseithaslowoverhead,itworksonunmodiedCandC++pro-grams,anditpreservesbinarycompatibilitywithunin-strumentedlibraries.Forexample,wewereabletocom-piletheApacheWebserverwithbaggyboundscheckingandthethroughputofthehardenedversionoftheserverdecreasesbyonly8%relativetoanunistrumentedver-sion.AcknowledgmentsWethankourshepherdR.Sekar,theanonymousreview-ers,andthemembersoftheNetworksandOperatingSystemsgroupatCambridgeUniversityforcommentsthathelpedimprovethispaper.WealsothankDinakarDhurjatiandVikramAdvefortheircommunication.References[1]Mart´nAbadi,MihaiBudiu,´UlfarErlingsson,andJayLigatti.Control-owintegrity.InProceed-ingsofthe12thACMConferenceonComputerandCommunicationsSecurity(CCS),2005.[2]PeriklisAkritidis,CristianCadar,CostinRaiciu,ManuelCosta,andMiguelCastro.PreventingmemoryerrorexploitswithWIT.InProceedingsofthe2008IEEESymposiumonSecurityandPri-vacy,2008.[3]ToddM.Austin,ScottE.Breach,andGurindarS.Sohi.Efcientdetectionofallpointerandarrayaccesserrors.InProceedingsoftheACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation(PLDI),1994.[4]EmeryD.BergerandBenjaminG.Zorn.DieHard:probabilisticmemorysafetyforunsafelanguages.InProceedingsoftheACMSIGPLANconferenceonProgrammingLanguageDesignandImplemen-tation(PLDI),2006.[5]SandeepBhatkar,R.Sekar,andDanielC.DuVar-ney.Efcienttechniquesforcomprehensiveprotec-tionfrommemoryerrorexploits.InProceedingsofthe14thUSENIXSecuritySymposium,2005.[6]Hans-JuergenBoehmandMarkWeiser.Garbagecollectioninanuncooperativeenvironment.InSoftwarePractice&Experience,1988.[7]WilliamR.Bush,JonathanD.Pincus,andDavidJ.Sielaff.Astaticanalyzerforndingdynamicpro-grammingerrors.InSoftwarePractice&Experi-ence,2000.[8]MiguelCastro,ManuelCosta,andTimHarris.Se-curingsoftwarebyenforcingdata-owintegrity.InProceedingsofthe7thSymposiumonOperatingSystemsDesignandImplementation(OSDI),2006.[9]JimChow,BenPfaff,TalGarnkel,andMendelRosenblum.Shreddingyourgarbage:reducingdatalifetimethroughsecuredeallocation.InPro-ceedingsofthe14thUSENIXSecuritySymposium,2005.[10]ManuelCosta,JonCrowcroft,MiguelCastro,AntonyRowstron,LidongZhou,LintaoZhang,andPaulBarham.CanwecontainInternetworms?InProceedingsoftheThirdWorkshoponHotTopicsinNetworks(HotNets-III),2004.[11]ManuelCosta,JonCrowcroft,MiguelCastro,AntonyRowstron,LidongZhou,LintaoZhang,andPaulBarham.Vigilante:end-to-endcontainmentofinternetworms.InProceedingsofthe20thACMSIGOPSSymposiumonOperatingSystemsPrinci-ples(SOSP),2005.[12]CrispinCowan,MattBarringer,SteveBeattie,GregKroah-Hartman,MikeFrantzen,andJamieLokier.FormatGuard:automaticprotectionfromprintffor-matstringvulnerabilities.InProceedingsofthe10thUSENIXSecuritySymposium,2001.[13]CrispinCowan,CaltonPu,DaveMaier,HeatherHintony,JonathanWalpole,PeatBakke,SteveBeattie,AaronGrier,PerryWagle,andQianZhang.StackGuard:automaticadaptivedetectionandpre-ventionofbuffer-overowattacks.InProceedingsofthe7thUSENIXSecuritySymposium,1998.[14]JohnCriswell,AndrewLenharth,DinakarDhur-jati,andVikramAdve.Securevirtualarchitec-ture:asafeexecutionenvironmentforcommod-ityoperatingsystems.InProceedingsof21stACMSIGOPSSymposiumonOperatingSystemsPrinci-ples(SOSP),2007.[15]DinakarDhurjatiandVikramAdve.Backwards-compatiblearrayboundscheckingforCwithverylowoverhead.InProceedingsofthe28thInterna-tionalConferenceonSoftwareEngineering(ICSE),2006. [16]HiroakiEtohandKunikazuYoda.Pro-tectingfromstack-smashingattacks.http://www.trl.ibm.com/projects/security/ssp/main.html.[17]ReedHastingandBobJoyce.Purify:Fastdetectionofmemoryleaksandaccesserrors.InProceedingsoftheWinterUSENIXConference,1992.[18]TrevorJim,J.GregMorrisett,DanGrossman,MichaelW.Hicks,JamesCheney,andYanlingWang.Cyclone:AsafedialectofC.InProceed-ingsoftheGeneralTrackoftheUSENIXAnnualConference,2002.[19]RichardW.M.JonesandPaulH.J.Kelly.Backwards-compatibleboundscheckingforarraysandpointersinCprograms.InProceedingsofthe3rdInternationalWorkshoponAutomaticDebug-ging(AADEBUG),1997.[20]VladimirKiriansky,DerekBruening,andSamanP.Amarasinghe.Secureexecutionviaprogramshep-herding.InProceedingsofthe11thUSENIXSecu-ritySymposium,2002.[21]DavidLarochelleandDavidEvans.Staticallyde-tectinglikelybufferoverowvulnerabilities.InProceedingsofthe10thUSENIXSecuritySympo-sium,2001.[22]Microsoft.Phoenixcompilerframework.http://connect.microsoft.com/Phoenix.[23]SantoshNagarakatte,JianzhouZhao,MiloMartin,andSteveZdancewic.SoftBound:Highlycompat-ibleandcompletespatialmemorysafetyforC.InProceedingsoftheACMSIGPLANConferenceonProgrammingLanguageDesignandImplementa-tion(PLDI),2009.[24]GeorgeC.Necula,ScottMcPeak,andWestleyWeimer.CCured:type-saferetrottingoflegacycode.InProceedingsofthe29thACMSIGPLAN-SIGACTSymposiumonPrinciplesofProgrammingLanguages(POPL),2002.[25]NicholasNethercoteandJeremyFitzhardinge.Bounds-checkingentireprogramswithoutrecom-piling.InInformalProceedingsoftheSecondWorkshoponSemantics,ProgramAnalysis,andComputingEnvironmentsforMemoryManagement(SPACE),2004.[26]JamesNewsomeandDawnSong.Dynamictaintanalysisforautomaticdetection,analysisandsig-naturegenerationofexploitsoncommoditysoft-ware.InProceedingsofthe12thNetworkandDis-tributedSystemSecuritySymposium(NDSS),2005.[27]NullLogic.NullHTTPd.http://nullwebmail.sourceforge.net/httpd.[28]OpenSSLToolkit.http://www.openssl.org.[29]MartinRinard,CristianCadar,DanielDumitran,DanielM.Roy,TudorLeu,andWilliamS.Bee-bee,Jr.Enhancingserveravailabilityandsecuritythroughfailure-obliviouscomputing.InProceed-ingsofthe6thSymposiumonOperatingSystemsDesignandImplementation(OSDI),2004.[30]OlatunjiRuwaseandMonicaS.Lam.Apracticaldynamicbufferoverowdetector.InProceedingsofthe11thNetworkandDistributedSystemSecu-ritySymposium(NDSS),2004.[31]TheApacheSoftwareFoundation.TheApacheHTTPServerProject.http://httpd.apache.org.[32]Vendicator.StackShield.http://www.angelfire.com/sk/stackshield.[33]DavidWagner,JeffreyS.Foster,EricA.Brewer,andAlexanderAiken.Arststeptowardsauto-mateddetectionofbufferoverrunvulnerabilities.InProceedingsofthe7thNetworkandDistributedSystemSecuritySymposium(NDSS),2000.[34]JohnWilanderandMariamKamkar.Acomparisonofpubliclyavailabletoolsfordynamicbufferover-owprevention.InProceedingsofthe10thNet-workandDistributedSystemSecuritySymposium(NDSS),2003.[35]WeiXu,SandeepBhatkar,andR.Sekar.Taint-enhancedpolicyenforcement:apracticalapproachtodefeatawiderangeofattacks.InProceedingsofthe15thUSENIXSecuritySymposium,2006.[36]WeiXu,DanielC.DuVarney,andR.Sekar.Anefcientandbackwards-compatibletransformationtoensurememorysafetyofCprograms.InPro-ceedingsofthe12thACMSIGSOFTInternationalSymposiumonFoundationsofSoftwareEngineer-ing(SIGSOFT/FSE),2004.[37]SuanHsiYongandSusanHorwitz.ProtectingCprogramsfromattacksviainvalidpointerderefer-ences.InProceedingsofthe11thACMSIGSOFTSymposiumontheFoundationsofSoftwareEngi-neering(ESEC/FSE),2003.