/
Unleashing M AYHEM on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert and Unleashing M AYHEM on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert and

Unleashing M AYHEM on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert and - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
503 views
Uploaded On 2014-12-26

Unleashing M AYHEM on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert and - PPT Presentation

rebert dbrumley cmuedu Abstract In this paper we present M AYHEM a new sys tem for automatically 64257nding exploitable bugs in binary ie executable programs Every bug reported by M AYHEM is accompanied by a working shellspawning exploit The working ID: 29778

rebert dbrumley cmuedu Abstract

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Unleashing M AYHEM on Binary Code Sang K..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

383 384 386 394 392 381 390 2012 IEEE Symposium on Security and Privacy1081-6011/12 $26.00 © 2012 IEEEDOI 10.1109/SP.2012.31380 389 385 393 382 388 391 387 worksonrawbinarycode withoutdebugginginformation.Tomakeexploitgeneration possibleatthebinary-level,M AYHEM addressestwomajor technicalchallenges:activelymanagingexecutionpathswithout exhaustingmemory,andreasoningabout symbolicmemory indices ,wherealoadorastoreaddressdependsonuser input.Tothisend,weproposetwonoveltechniques:1)hybrid symbolicexecutionforcombiningonlineandof”ine(concolic) executiontomaximizethebene“tsofbothtechniques,and 2)index-basedmemorymodeling,atechniquethatallows M AYHEM toef“cientlyreasonaboutsymbolicmemoryat thebinarylevel.WeusedM executable)programs.M AYHEM producesaworkingcontrol- hijackexploitforeachbugitreports,thusguaranteeingeach bugreportisactionableandsecurity-critical.Byworking withbinarycodeM AYHEM enableseventhosewithoutsource codeaccesstocheckthe(in)securityoftheirsoftware. M AYHEM detectsandgeneratesexploitsbasedonthe basicprinciplesintroducedinourpreviousworkonAEG[ 2 ]. Atahigh-level,M AYHEM “ndsexploitablepathsbyaug- mentingsymbolicexecution[ 16 ]withadditionalconstraints atpotentiallyvulnerableprogrampoints.Theconstraints addressesisessentialtoexploitreal-worldbugs.Principle#1 isnecessaryforrunningcomplexapplications,sincemost non-trivialprogramswillcontainapotentiallyin“nitenumber ofpathstoexplore. Currentapproachestosymbolicexecution,e.g.,CUTE[ 26 ], BitBlaze[ 5 ],KLEE[ 9 ],SAGE[ 13 ],McVeto[ 27 ],AEG[ 2 ], S2E[ 28 ],andothers[ 3 ],[ 21 ],donotsatisfyallthe abovedesignpoints.Conceptually,currentexecutorscanbe runofthesystemneedstorestartexecutionoftheprogram fromtheverybeginning.Conceptually,thesameinstructions needtobeexecutedrepeatedlyforeveryexecutiontrace.Our experimentalresultsshowthatthisre-executioncanbevery expensive(see § VIII). Onlinesymbolicexecution[ 9 ],[ 28 ]forksateachbranch point.Previousinstructionsareneverre-executed,butthe continuedforkingputsastrainonmemory,slowingdown theexecutionengineasthenumberofbranchesincrease. Theresultisnoforwardprogressandthusprinciples#1 and#3arenotmet.SomeonlineexecutorssuchasKLEE stopforkingtoavoidbeingsloweddownbytheirmemory use.Suchexecutorssatisfyprinciple#1butnotprinciple#3 (interestingpathsarepotentiallyeliminated). M AYHEM combinesthebestofbothworldsbyintroduc- ing hybridsymbolicexecution ,whereexecutionalternates betweenonlineandof”inesymbolicexecutionruns.Hybrid executionactslikeamemorymanagerinanOS,except thatitisdesignedto ef“ciently swapoutsymbolicexecution engines.Whenmemoryisunderpressure,thehybridengine picksarunningexecutor,andsavesthecurrentexecution state,andpathformula.Thethreadisrestoredbyrestoringthe formula,concretelyrunningtheprogramuptotheprevious executionstate,andthencontinuing.Cachingthepath formulaspreventsthesymbolicre-executionofinstructions, whichisthebottleneckinof”ine,whilemanagingmemory moreef“cientlythanonlineexecution. M AYHEM alsoproposestechniquesforef“cientlyreason- ingaboutsymbolicmemory.Asymbolicmemoryaccess occurswhenaloadorstoreaddressdependsoninput.Sym- bolicpointersareverycommonatthebinarylevel,andbeing abletoreasonaboutthemisnecessarytogeneratecontrol- hijackexploits.Infact,ourexperimentsshowthat40%of thegeneratedexploitswouldhavebeenimpossibledueto concretizationconstraints( § VIII).Toovercomethisproblem, M AYHEM employsanindex-basedmemorymodel( § V)to avoidconstrainingtheindexwheneverpossible. Resultsareencouraging.Whilethereisampleroomfor newresearch,M AYHEM currentlygeneratesexploitsfor severalsecurityvulnerabilities:bufferover”ows,function pointeroverwrites,andformatstringvulnerabilitiesfor 29differentprograms.M AYHEM alsodemonstrates2-10 × speedupoverof”inesymbolicexecutionwithouthavingthe memoryconstraintsofonlinesymbolicexecution. Overall,M AYHEM makesthefollowingcontributions: 1)Hybridexecution. Weintroduceanewschemeforsym- bolicexecution„whichwecall hybrid symbolicexecution„ thatallowsusto“ndabetterbalancebetweenspeedand memoryrequirements.HybridexecutionenablesM AYHEM toexploremultiplepathsfasterthanexistingapproaches (see § IV). 2)Index-basedmemorymodeling. Weproposeindex-based memorymodelasapracticalapproachtodealingwith symbolicindicesatthebinary-level.(see § V). 3)Binary-onlyexploitgeneration. Wepresentthe“rst end-to-endbinary-onlyexploitablebug“ndingsystemthat demonstratesexploitabilitybyoutputtingworkingcontrol hijackexploits. II.O VERVIEWOF M AYHEM Inthissectionwedescribetheoverallarchitecture,usage scenario,andchallengesfor“ndingexploitablebugs.Weuse anHTTPserver, orzHttpd [ 1 ]„showninFigure1a„as anexampletohighlightthemainchallengesandpresenthow M AYHEM works.Notethatweshowsourceforclarityand simplicity;M AYHEM runsonbinarycode. 1 #define BUFSIZE 4096 2 3 typedefstruct { 4 char buf[BUFSIZE]; 5 int used; 6 } STATIC BUFFER t; 7 8 typedefstruct conn { 9STATIC BUFFER tread buf; 10... //omitted 11 } CONN t; 12 13 staticvoid serverlog(LOG TYPE ttype, 14 constchar  format,...) 15 { 16... //omitted 17 if (format!=NULL) { 18va start(ap,format); 19vsprintf(buf,format,ap); 20va end(ap); 21 } 22fprintf(log,buf); //vulnerablepoint 23fflush(log); 24 } 25 26HTTP STATE thttp read request(CONN t  conn) 27 { 28... //omitted 29 while (conn Š � read buf.used BUFSIZE) { 30sz=static buffer read(conn,&conn Š � read buf); 31 if (sz 0) { 32... 33conn Š � read buf.used+=sz; 34 if (memcmp(&conn Š � read buf.buf[conn Š � read buf.used] Š 4,Ž \ r \ n \ r \ nŽ,4)== 0) 35 { 36 break ; 37 } 38 } 39 if (conn Š � read buf.used � =BUFSIZE) { 40conn Š � status.st= HTTP STATUS 400; 41 return HTTP STATE ERROR; 42 } 43... 44serverlog(ERROR LOG, 45Ž%s \ nŽ, 46conn Š � read buf.buf); 47... 48 } (a)Codesnippet. ... buf ptr log (“le pointer) fprintf frame pointer return addr to serverlog ... buf (in serverlog) serverlog frame pointer old ebp ... an exploit generated by Mayhem: \x5c\xca\xff\xbf\x5e\xca\xff \xbf%51832c%17$hn %62847c%18$hn \x90\x90 ... shellcode address High Low (b)Stackdiagramofthevulnerableprogram. Figure1: orzHttpd vulnerability In orzHttpd ,eachHTTPconnectionispassed to http_read_request .Thisroutineinturncalls static_buffer_read aspartofthelooponline29to gettheuserrequeststring.Theuserinputisplacedintothe 4096-bytebuffer �conn-read_buf.buf online30.Each readincrementsthevariable �conn-read_buf.used by thenumberofbytesreadsofarinordertopreventabuffer over”ow.Thereadloopcontinuesuntil \ r \ n \ r \ n isfound, checkedonline34.Iftheuserpassesinmorethan4096bytes withoutanHTTPend-of-linecharacter,thereadloopaborts andtheserverreturnsa400errorstatusmessageonline 41.Eachnon-errorrequestgetsloggedviathe serverlog function. Thevulnerabilityitselfisin serverlog ,whichcalls fprintf withauserspeci“edformatstring(anHTTP request).Variadicfunctionssuchas fprintf useaformat stringspeci“ertodeterminehowtowalkthestacklooking forarguments.Anexploitforthisvulnerabilityworksby supplyingformatstringsthatcause fprintf towalkthe stacktouser-controlleddata.Theexploitthenusesadditional formatspeci“erstowritetothedesiredlocation[ 22 ]. Figure1bshowsthestacklayoutof orzHttpd whenthe formatstringvulnerabilityisdetected.Thereisacallto fprintf andtheformattingargumentisastringofuser- controlledbytes. Wehighlightseveralkeypointsfor“ndingexploitable bugs: Low-leveldetailsmatter: Determiningexploitabilityre- quiresthatwereasonaboutlow-leveldetailslikereturn addressesandstackpointers.Thisisourmotivationfor focusingonbinary-leveltechniques. Thereareanenormousnumberofpaths: Intheexample, thereisanewpathoneveryencounterofan if statement, whichcanleadtoanexponentialpathexplosion.Additionally, thenumberofpathsinmanyportionsofthecodeisrelatedto thesizeoftheinput.Forexample, memcmp unfoldsaloop, creatinganewpathforsymbolicexecutiononeachiteration. Longerinputsmeanmoreconditions,moreforks,andharder scalabilitychallenges.Unfortunatelymostexploitsarenot shortstrings,e.g.,inabufferover”owtypicalexploitsare hundredsorthousandsofbyteslong. Themorecheckedpaths,thebetter: Toreachtheex- ploitable fprintf bugintheexample,M AYHEM needsto reasonthroughtheloop,readinput,forkanewinterpreter foreverypossiblepathandcheckforerrors.Withoutcareful resourcemanagement,anenginecangetboggeddownwith toomanysymbolicexecutionthreadsbecauseofthehuge numberofpossibleexecutionpaths. Executeasmuchnativelyaspossible: Symbolicexecution isslowcomparedtoconcreteexecutionsincethesemantics ofaninstructionaresimulatedinsoftware.In orzHttpd , millionsofinstructionssetupthebasicserverbeforean attackercanevenconnecttoasocket.Wewanttoexecute theseinstructionsconcretelyandthenswitchtosymbolic Test Cases Binary Mayhem Buggy Inputs Taint Tracker (CEC) Concrete Execution Client Symbolic Evaluator Path Selector Checkpoint Manager (SES) Symbolic Execution Server Check Points Dynamic Binary Instrumentator (DBI) Exploits Exploit Generator Virtualization Layer Operating System Hardware Input Spec. Target Machine Figure2:M AYHEM architecture execution. TheM AYHEM architecturefor“ndingexploitablebugsis showninFigure2.TheuserstartsM AYHEM byrunning: mayhem-sym-net80400./orzhttpd Thecommand-linetellsM AYHEM tosymbolicallyexecute orzHttpd ,andopensocketsonport80toreceivesymbolic 400-bytelongpackets.Allremainingstepstocreateanexploit areperformedautomatically. M AYHEM consistsoftwoconcurrentlyrunningprocesses: a ConcreteExecutorClient (CEC),whichexecutescode nativelyonaCPU,anda SymbolicExecutorServer (SES). BothareshowninFigure2.Atahighlevel,theCECrunson atargetsystem,andtheSESrunsonanyplatform,waiting forconnectionsfromtheCEC.TheCECtakesinabinary programalongwiththepotentialsymbolicsources(input speci“cation)asaninput,andbeginscommunicationwith theSES.TheSESthensymbolicallyexecutesblocksthatthe CECsends,andoutputsseveraltypesoftestcasesincluding normaltestcases,crashes,andexploits.Thestepsfollowed byM AYHEM to“ndthevulnerablecodeandgeneratean exploitare: 1) The --sym-net80400 argumenttellsM AYHEM to performsymbolicexecutionondatareadinfromasocket onport80.Effectivelythisisspecifyingwhichinput sourcesarepotentiallyunderattackercontrol.M AYHEM canhandleattackerinputfromenvironmentvariables,“les, andthenetwork. 2) TheCECloadsthevulnerableprogramandconnectsto theSEStoinitializeallsymbolicinputsources.Afterthe initialization,M AYHEM executesthebinaryconcretelyon theCPUintheCEC.Duringexecution,theCECinstru- mentsthecodeandperformsdynamictaintanalysis[ 23 ]. Ourtainttrackingenginechecksifablockcontainstainted instructions,whereablockisasequenceofinstructions thatendswithaconditionaljumporacallinstruction. 3) WhentheCECencountersataintedbranchconditionor jumptarget,itsuspendsconcreteexecution.Atainted jumpmeansthatthetargetmaybedependentonattacker input.TheCECsendstheinstructionstotheSESandthe SESdetermineswhichbranchesarefeasible.TheCEC willlaterreceivethenextbranchtargettoexplorefrom theSES. 4) TheSES,runninginparallelwiththeCEC,receivesa streamoftaintedinstructionsfromtheCEC.TheSES jitstheinstructionstoanintermediatelanguage( § III), andsymbolicallyexecutesthecorrespondingIL.The CECprovidesanyconcretevalueswheneverneeded,e.g., whenaninstructionoperatesonasymbolicoperandand aconcreteoperand.TheSESmaintainstwotypesof formulas: PathFormula: Thepathformulare”ectstheconstraintsto reachaparticularlineofcode.Eachconditionaljumpadds anewconstraintontheinput.Forexample,lines32-33 createtwonewpaths:onewhichisconstrainedsothatthe readinputendsinan \ r \ n \ r \ n andline35isexecuted, andonewheretheinputdoesnotendin \ r \ n \ r \ n and line28willbeexecuted. ExploitabilityFormula: Theexploitabilityformuladeter- mineswhetheri)theattackercangaincontrolofthe instructionpointer,andii)executeapayload. 5) WhenM AYHEM hitsataintedbranchpoint,theSES decideswhetherweneedtoforkexecutionbyquerying theSMTsolver.Ifweneedtoforkexecution,allthe newforksaresenttothepathselectortobeprioritized. Uponpickingapath,theSESnoti“estheCECaboutthe changeandthecorrespondingexecutionstateisrestored. Ifthesystemresourcecapisreached,thenthecheckpoint managerstartsgeneratingcheckpointsinsteadofforking newexecutors( § IV).Attheendoftheprocess,testcases aregeneratedfortheterminatedexecutorsandtheSES informstheCECaboutwhichcheckpointshouldcontinue executionnext. 6) Duringtheexecution,theSESswitchescontextbetween executorsandtheCECcheckpoints/restorestheprovided executionstateandcontinuesexecution.Todoso,theCEC maintainsavirtualizationlayertohandletheprograminter- actionwiththeunderlyingsystemandcheckpoint/restore betweenmultipleprogramexecutionstates( § IV-C). 7) WhenM AYHEM detectsataintedjumpinstruction,it buildsanexploitabilityformula,andqueriesanSMT solvertoseeifitissatis“able.Asatisfyinginputwill be,byconstruction,anexploit.Ifnoexploitisfoundon thetaintedbranchinstruction,theSESkeepsexploring executionpaths. 8) Theabovestepsareperformedateachbranchuntilan exploitablebugisfound,M AYHEM hitsauser-speci“ed maximumruntime,orallpathsareexhausted. III.B ACKGROUND BinaryRepresentationinourlanguage. Basicsymbolic executionisperformedonassemblyinstructionsasthey execute.IntheoverallsystemthestreamcomesfromtheCEC asexplainedearlier;hereweassumetheyaresimplygiven tous.WeleverageBAP[ 15 ],anopen-sourcebinaryanalysis frameworktoconvertx86assemblytoanintermediate languagesuitableforsymbolicexecution.Foreachinstruction executed,thesymbolicexecutorjitstheinstructiontothe BAPIL.TheSESperformssymbolicexecutiondirectlyon theIL,introducesadditionalconstraintsrelatedtospeci“c attackpayloads,andsendstheformulatoanSMTsolverto checksatis“ability.Forexample,theILfora ret instruction consistsoftwostatements:onethatloadsanaddressfrom memory,andonethatjumpstothataddress. SymbolicExecutionontheIL. Inconcreteexecution,the programisgivenaconcretevalueasinput,itexecutes statementstoproducenewvalues,andterminateswith“nal values.Insymbolicexecutionwedonotrestrictexecutiontoa singlevalue,butinsteadprovideasymbolicinputvariablethat representsthesetofallpossibleinputvalues.Thesymbolic executionengineevaluatesexpressionsforeachstatement intermsoftheoriginalsymbolicinputs.Whensymbolic executionhitsabranch,itconsiderstwopossibleworlds: onewherethetruebranchtargetisfollowedandonewhere thefalsebranchtargetisfollowed.Itdoessobyforkingoff aninterpreterforeachbranchandassertinginthegenerated formulathatthebranchguardmustbesatis“ed.The“nal formulaencapsulatesallbranchconditionsthatmustbemet toexecutethegivenpath,thusiscalledthe pathformula or pathpredicate . InM AYHEM ,eachILstatementtypehasacorresponding symbolicexecutionrule.AssertionsintheILareimmediately appendedtotheformula.Conditionaljumpstatementscreate twoformulas:onewherethebranchguardisassertedtrue andthetruebranchisfollowed,andonewhichassertsthe negationoftheguardandthefalsebranchisfollowed.For example,ifwealreadyhaveformula f andexecute cjmp e 1 , e 2 , e 3 where e 1 isthebranchguardand e 2 and e 3 arejumptargets,thenwecreatethetwoformulas: f  e 1  FSE ( path e 2 ) f ¬ e 1  FSE ( path e 3 ) where FSE standsforforwardsymbolicexecutionofthe jumptarget.Duetospace,wegivetheexactsemanticsina companionpaper[15],[24]. IV.H YBRID S YMBOLIC E XECUTION M AYHEM isahybridsymbolicexecutionsystem.Instead ofrunninginpureonlineorof”ineexecutionmode,M AY - HEM canalternatebetweenmodes.Inthissectionwepresent themotivationandmechanicsofhybridexecution. A.PreviousSymbolicExecutionSystems Of”inesymbolicexecution„asfoundinsystemssuchas SAGE[ 13 ]„requirestwoinputs:thetargetprogramandan initialseedinput.Inthe“rststep,of”inesystemsconcretely executetheprogramontheseedinputandrecordatrace.In 1 2 millions of instructions 1 2 3 4 Of”ineOnline 3 millions of instructions 4 1 2 Hybrid 3 millions of instructions 4 Figure3:Hybridexecutiontriestocombinethespeedof onlineexecutionandthememoryuseofof”ineexecutionto ef“cientlyexploretheinputspace. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 5.0 x 10 5 1.0 x 10 6 1.5 x 10 6 2.0 x 10 6 2.5 x 10 6 3.0 x 10 6 Testcase gen. throughput (num/sec.) Memory Use (KBytes) Figure4:Onlineexecutionthroughputversusmemoryuse. thesecondstep,theysymbolicallyexecutetheinstructionsin therecordedtrace.Thisapproachiscalled concolic execution, ajuxtapositionofconcreteandsymbolicexecution.Of”ine executionisattractivebecauseofitssimplicityandlow resourcerequirements;weonlyneedtohandleasingle executionpathatatime. Thetop-leftdiagramofFigure3highlightsanimmediate drawbackofthisapproach.Foreveryexploredexecutionpath, weneedto“rstre-executea(potentially)verylargenumber ofinstructionsuntilwereachthesymbolicconditionwhere executionforked,andthenbegintoexplorenewinstructions. Onlinesymbolicexecutionavoidsthisre-executioncost byforkingtwointerpretersatbranchpoints,eachonehaving acopyofthecurrentexecutionstate.Thus,toexplorea differentpath,onlineexecutionsimplyneedstoperforma contextswitch totheexecutionstateofasuspendedinterpreter. S2E[ 28 ],KLEE[ 9 ]andAEG[ 2 ]followthisapproachby performingonlinesymbolicexecutiononLLVMbytecode. However,forkingoffanewexecutorateachbranchcan quicklystrainthememory,causingtheentiresystemtogrind toahalt.State-of-the-artonlineexecutorstrytoaddressthis problemwithaggressivecopy-on-writeoptimizations.For example,KLEEhasanimmutablestaterepresentationand S2Esharescommonstatebetweensnapshotsofphysical memoryanddisks.Nonetheless,sinceallexecutionstates arekeptinmemorysimultaneously,eventuallyallonline executorswillreachthememorycap.Theproblemcanbe mitigatedbyusingDFS(Depth-First-Search)„however,this isnotaveryusefulstrategyinpractice.Todemonstratethe problem,wedownloadedS2E[ 28 ]andranitonacoreutils application( echo )with2symbolicarguments,eachone 10byteslong.Figure4showshowthesymbolicexecution throughput(numberoftestcasesgeneratedpersecond)is sloweddownasthememoryuseincreases. B.HybridSymbolicExecution M AYHEM introduces hybridsymbolicexecution toactively managememorywithoutconstantlyre-executingthesame instructions.Hybridsymbolicexecutionalternatesbetween onlineandof”inemodestomaximizetheeffectivenessof eachmode.M AYHEM startsanalysisinonlinemode.When thesystemreachesamemorycap,itswitchestoof”inemode anddoesnotforkanymoreexecutors.Instead,itproduces checkpointstostartnewonlineexecutionslateron.Thecrux ofthesystemistodistributetheonlineexecutiontasksinto subtaskswithoutlosingpotentiallyinterestingpaths.The hybridexecutionalgorithmemployedbyM AYHEM issplit intofourmainphases: 1.Initialization: The“rsttimeM AYHEM isinvokedfora program,itinitializesthecheckpointmanager,thecheckpoint database,andtestcasedirectories.Itthenstartsonline executionoftheprogramandmovestothenextphase. 2.OnlineExploration: Duringtheonlinephase,M AYHEM symbolicallyexecutestheprograminanonlinefashion, context-switchingbetweencurrentactiveexecutionstates, andgeneratingtestcases. 3.Checkpointing: Thecheckpointmanagermonitorsonline execution.Wheneverthememoryutilizationreachesacap, orthenumberofrunningexecutorsexceedsathreshold,it willselectandgenerateacheckpointforanactiveexecutor. Acheckpointcontainsthesymbolicexecutionstateofthe suspendedexecutor(pathpredicate,statistics,etc.)andreplay information 1 .Theconcreteexecutionstateisdiscarded.When theonlineexecutioneventually“nishesallactiveexecution paths,M AYHEM movestothenextphase. 4.CheckpointRestoration: Thecheckpointmanagerselects acheckpointbasedonarankingheuristic IV-D andrestores itinmemory.Sincethesymbolicexecutionstatewassaved inthecheckpoint,M AYHEM onlyneedstore-constructthe concreteexecutionstate.Todoso,M AYHEM concretely executestheprogramusingonesatis“ableassignmentof thepathpredicateasinput,untiltheprogramreachesthe instructionwhentheexecutionstatewassuspended.Atthat point,theconcretestateisrestoredandtheonlineexploration (phase2)restarts.Notethatphase4avoidssymbolicallyre- executinginstructionsduringthecheckpointrestorationphase 1 NotethatthetermcheckpointŽdiffersfromanof”ineexecutionseedŽ, whichisjustaconcreteinput. simpli“essymbolicexpressionsandformulasbyapplying algebraicsimpli“cations,e.g. x  x=0 , x&0=0 , andsoon. Recallfrom § IV-C ,M AYHEM usestaintanalysis[ 11 ], [ 23 ]toselectivelyexecuteinstructionblocksthatdealwith symbolicdata.Thisoptimizationgivesa 8 × speedupon averageoverexecutingallinstructionblocks(see § VIII-G). V.I NDEX - BASED M EMORY M ODELING M AYHEM introducesan index-basedmemorymodel asa practicalapproachtohandlingsymbolicmemoryloads.The index-basedmodelallowsM AYHEM toadaptitstreatment ofsymbolicmemorybasedonthevalueoftheindex.Inthis sectionwepresenttheentirememorymodelofM AYHEM . M AYHEM modelsmemoryasamap µ : I  E from32- bitindices( i )toexpressions( e ).Ina load( µ , i ) expression, wesaythatindex i indexes memory µ ,andtheloadedvalue e representsthe contents ofthe i th memorycell.Aloadwith aconcreteindex i isdirectlytranslatedbyM AYHEM into anappropriatelookupin µ (i.e., µ [ i ] ).A store( µ , i , e ) instructionresultsinanewmemory µ [ i  e ] where i is mappedto e . A.PreviousWork&SymbolicIndexModeling Asymbolicindexoccurswhentheindexusedinamemory lookupisnotanumber,butanexpression„apatternthat appearsveryfrequentlyinbinarycode.Forexample,aC switch(c) statementiscompileddowntoajump-table lookupwheretheinputcharacter c isusedastheindex. Standardstringconversionfunctions(suchasASCIIto Unicodeandviceversa, to_lower , to_upper ,etc.)are allinthiscategory. Handlingarbitrarysymbolicindicesisnotoriouslyhard, sinceasymbolicindexmay(intheworstcase)reference any cellinmemory.Previousresearchandstate-of-the-arttools indicatethattherearetwomainapproachesforhandlinga symbolicindex:a)concretizingtheindexandb)allowing memorytobefullysymbolic. First,concretizingmeansinsteadofreasoningabout allpossiblevaluesthatcouldbeindexedinmemory,we concretize theindextoasinglespeci“caddress.This concretizationcanreducethecomplexityoftheproduced formulasandimprovesolving/explorationtimes.However, constrainingtheindextoasinglevaluemaycauseusto misspaths„forinstance,iftheydependonthevalueof theindex.Concretizationisthenaturalchoiceforof”ine executors,suchasSAGE[ 13 ]orBitBlaze[ 5 ],sinceonlya singlememoryaddressisaccessedduringconcreteexecution. Reasoningaboutallpossibleindicesisalsopossibleby treatingmemoryasfullysymbolic.Forexample,toolssuch asMcVeto[ 27 ],BAP[ 15 ]andBitBlaze[ 5 ]offercapabilities tohandlesymbolicmemory.Themaintradeoff„when comparedwiththeconcretizationapproach„isperformance. Formulasinvolvingsymbolicmemoryaremoreexpressive, thussolving/explorationtimesareusuallyhigher. B.MemoryModelingin M AYHEM The“rstimplementationofM AYHEM followedthesimple concretizationapproachandconcretizedallmemoryindices. Thisdecisionprovedtobeseverelylimitinginthatselecting asingleaddressfortheindexusuallydidnotallowusto satisfytheexploitpayloadconstraints.Ourexperimentsshow that40%oftheexamplesrequireustohandlesymbolic memory„simpleconcretizationwasinsuf“cient(see § VIII). Thealternativeapproachwassymbolicmemory.Toavoid thescalabilityproblemsassociatedwithfullysymbolic memory,M AYHEM modelsmemory partially ,wherewrites arealwaysconcretized,butsymbolicreadsareallowedtobe modeledsymbolically.Intherestofthissectionwedescribe theindex-basedmemorymodelofM AYHEM indetail,as wellassomeofthekeyoptimizations. MemoryObjects. Tomodelsymbolicreads,M AYHEM introduces memoryobjects .Similartotheglobalmemory µ , amemoryobject M isalsoamapfrom32-bitindicesto expressions.Unliketheglobalmemoryhowever,amemory objectisimmutable.Wheneverasymbolicindexisusedto readmemory,M AYHEM generatesafreshmemoryobject M thatcontainsallvaluesthatcouldbeaccessedbythe index„ M isapartialsnapshotoftheglobalmemory. Usingthememoryobject,M AYHEM canreducethe evaluationofa load( µ , i ) expressionto M [ i ] .Note,that thisissemanticallyequivalenttoreturning µ [ i ] .Thekey differenceisinthesizeofthesymbolicarrayweintroduce intheformula.Inmostcases,thememoryobject M will beordersofmagnitudesmallerthantheentirememory µ . MemoryObjectBoundsResolution. Instantiatingthemem- oryobjectrequiresM AYHEM to“ndallpossiblevaluesof asymbolicindex i .Intheworstcase,thismayrequireup to 2 32 queriestothesolver(for32-bitmemoryaddresses). TotacklethisproblemM AYHEM exchangessomeaccuracy forscalabilitybyresolvingthebounds [ L , U ] ofthememory region„where L isthelowerand U istheupperboundofthe index.Theboundsneedtobeconservative,i.e.,allpossible valuesoftheindexshouldbewithinthe [ L , U ] interval.Note thatthememoryregiondoesnotneedtobecontinuous,for example i mighthaveonlytworealizablevalues( L and U ). ToobtaintheseboundsM AYHEM usesthesolverto performbinarysearchonthevalueoftheindexinthecontext ofthecurrentpathpredicate.Forexample,initiallyforthe lowestboundofa32-bit i : L [0 , 2 32 Š 1] .If i 2 32 Š 1 2 issatis“ablethen L [0 , 2 32 Š 1 2 Š 1] whileunsatis“ability indicatesthat L [ 2 32 Š 1 2 , 2 32 Š 1] .Werepeattheprocess untilwerecoverbothbounds.Usingtheboundswecannow instantiatethememoryobject(usingafreshsymbolicarray M )asfollows:  i  [ L , U ]: M [ i ]= µ [ i ] . Theboundsresolutionalgorithmdescribedaboveis suf“cienttogenerateaconservativerepresentationofmemory objectsandallowM AYHEM toreasonaboutsymbolic memoryreads.Intherestofthesectionwedetailthemain memory index value value value ite( n ()) 64 91 memory index 64 91 memory index 64 91 (a) to_lower conversion table(b) Index search tree (c) Linearization ite( n ) L = ite( n )R = ite( n ) Figure5:Figure(a)showsthe to_lower conversiontable,(b)showsthegeneratedIST,and(c)theISTafterlinearization. optimizationtechniquesM AYHEM includestotacklesome ofthecaveatsoftheoriginalalgorithm: € Queryingthesolveroneverysymbolicmemoryderefer- enceisexpensive.Evenwithbinarysearch,identifying bothboundsofa32-bitindexrequired  54 querieson average( § VIII)( § V-B1, § V-B2, § V-B3). € Thememoryregionmaynotbecontinuous.Eventhough manyvaluesbetweentheboundsmaybeinfeasible,they arestillincludedinthememoryobject,andconsequently, intheformula( § V-B2). € Thevalueswithinthememoryobjectmighthavestructure. Bymodelingtheobjectasasinglebytearrayweare missingopportunitiestooptimizeourformulasbasedon thestructure.( § V-B4, § V-B5). € Intheworstcase,asymbolicindexmayaccessany possiblelocationinmemory( § V-C). 1)ValueSetAnalysis(VSA): M AYHEM employsanonline versionofVSA[ 4 ]toreducethesolverloadwhenresolving theboundsofasymbolicindex( i ).VSAreturnsastrided intervalforthegivensymbolicindex.Astridedinterval representsasetofvaluesintheform S [ L , U ] ,where S is thestrideand L , U arethebounds.Forexample,theinterval 2[1 , 5] representstheset { 1 , 3 , 5 } .Thestridedintervaloutput byVSAwillbeanover-approximationofallpossiblevalues theindexmighthave.Forinstance, i =(1+ byte ) 1 „ where byte isasymbolicbytewithaninterval 1[0 , 255] „ resultsinaninterval: VSA ( i )=2[2 , 512] . ThestridedintervalproducedbyVSAisthenre“nedbythe solver(usingthesamebinary-searchstrategy)togetthetight lowerandupperboundsofthememoryobject.Forinstance, ifthepathpredicateassertsthat byte 32 ,thentheinterval fortheindex (1+ byte ) 1 canbere“nedto 2[2 , 64] . UsingVSAasapreprocessingstephasacascadingeffecton ourmemorymodeling:a)weperform70%lessqueriesto resolvetheexactboundsofthememoryobject( § VIII),b)the stridedintervalcanbeusedtoeliminateimpossiblevalues inthe [ L , U ] region,thusmakingformulassimpler,andc) theeliminationcantriggerotheroptimizations(see § V-B5). 2)Re“nementCache: EveryVSAintervalisre“nedusing solverqueries.There“nementprocesscanstillbeexpensive (forinstance,theover-approximationreturnedbyVSAmight betoocoarse).Toavoidrepeatingtheprocessforthesame intervals,M AYHEM keepsacachemappingintervalsto potentialre“nements.Wheneverwegetacachehit,wequery thesolvertocheckwhetherthecachedre“nementisaccurate forthecurrentsymbolicindex,beforeresortingtobinary- searchforre“nement.There“nementcachecanreducethe numberofbounds-resolutionqueriesby82%( § VIII). 3)LemmaCache: Checkinganentryofthere“nement cachestillrequiressolverqueries.M AYHEM usesanother levelofcachingtoavoidrepeatedlyquerying  -equivalent formulas,i.e.,formulasthatarestructurallyequivalentup tovariablerenaming.Todoso,M AYHEM convertsqueried formulastoacanonicalrepresentation(F)andcachesthe queryresults(Q)intheformofa lemma : F  Q .The answerforanyformulamappingtothesamecanonical representationisretrievedimmediatelyfromthecache.The lemmacachecanreducethenumberofbounds-resolution queriesbyupto96%( § VIII).Theeffectivenessofthiscache dependsontheindependentformulasoptimization § IV-E .The pathpredicatehastoberepresentedasasetofindependent formulas,otherwiseanynewformulaadditiontothecurrent pathpredicatewouldinvalidateallpreviousentriesofthe lemmacache. 4)IndexSearchTrees(ISTs): Anyvalueloadedfrom amemoryobject M issymbolic.Toresolveconstraints involvingaloadedvalue( M [ i ] ),thesolverneedstoboth “ndanentryintheobjectthatsatis“estheconstraints and ensurethattheindextotheobjectentryisrealizable.To lightentheburdenonthesolver,M AYHEM replacesmemory objectlookupexpressionswith indexsearchtrees(ISTs) .An ISTisabinarysearchtreewherethesymbolicindexisthe keyandtheleafnodescontaintheentriesoftheobject.The entiretreeisencodedintheformularepresentationofthe loadexpression. Moreconcretely,givena(sortedbyaddress)listof entries E withinamemoryobject M ,abalancedIST forasymbolicindex i isde“nedas: IST ( E )= ite ( i addr ( E right ) ,E left ,E right )) ,where ite representsanif- then-elseexpression, E left ( E right )representstheleft(right) halfoftheinitialentries E ,and addr ( · ) returnsthelowest addressofthegivenentries.ForasingleentrytheISTreturns theentrywithoutconstructingany ite expressions. Notethattheabovede“nitionconstructsabalanced IST.WecouldinsteadconstructtheISTwithnested ite expressions„makingtheformuladepth O ( n ) inthenum- berofobjectentriesinsteadof O (log n ) .However,our experimentalresultsshowthatabalancedISTis 4 × faster thananestedIST( § VIII).Figure5showshowM AYHEM constructstheISTwhengiventheentriesofamemoryobject (the to_lower conversiontable)withasinglesymbolic characterastheindex. 5)BucketizationwithLinearFunctions: TheISTgener- ationalgorithmcreatesaleafnodeforeachentryinthe memoryobject.Toreducethenumberofentries,M AYHEM performsanextrapreprocessingstepbeforepassingtheobject totheIST.Theideaisthatwecanusethememoryobject structuretocombinemultipleentriesintoasingle bucket .A bucketisanindex-parameterizedexpressionthatreturnsthe valueofthememoryobjectforeveryindexwithinarange. M AYHEM useslinearfunctionstogeneratebuckets.Specif- ically,M AYHEM sweepsallentrieswithinamemoryobject andjoinsconsecutivepoints( index,value tuples)into lines,aprocesswecall linearization .Anytwopointscanform aline y = x +  .Follow-uppoints i i ,v i willbeincluded inthesamelineif u i = i i +  .Attheendoflinearization, thememoryobjectissplitintoalistofbuckets,whereeach bucketiseitheralineoranisolatedpoint.Thelistofbuckets cannowbepassedtotheISTalgorithm.Figure5showsthe to_lower ISTafterapplyinglinearization.Linearization effectivelyreducesthenumberofleafnodesfrom256to3. Theideaofusinglinearfunctionstosimplifymemory lookupscomesfromasimpleobservation:linear-likepatterns appearfrequentlyforseveraloperationsatthebinarylevel. Forexample,jumptablesgeneratedbyswitchstatements, conversionandtranslationtables(e.g.,ASCIItoUnicode andviceversa)allcontainvaluesthatarescalinglinearly withtheindex. C.PrioritizedConcretization. Modelingasymbolicloadusingamemoryobjectis bene“cialwhenthesizeofthememoryobjectissigni“cantly smallerthantheentirememory( |M| | µ | ).Thus,the aboveoptimizationsareonlyactivatedwhenthesizeof thememoryobject,approximatedbytherange,isbelowa threshold( |M| 1024 inourexperiments). Wheneverthememoryobjectsizeexceedsthethreshold, M AYHEM willconcretizetheindexusedtoaccessit. However,insteadofpickingasatisfyingvalueatrandom, M AYHEM attemptsto prioritize thepossibleconcretization 1 typedefstruct { 2 int value; 3 char  bar; 4 } foo; 5 int vulnerable( char  input) 6 { 7 foo  ptr=init; 8 buffer[100]; 9 strcpy(buffer,input); 10 buffer[0]=ptr Š � bar[0]; 11 return 0; 12 } bar * ptr * value symbolic region 1 buffer symbolic region 2 symbolic region 3 Figure6:M AYHEM reconstructingsymbolicdatastructures. values.Speci“cally,foreverysymbolicpointer,M AYHEM performsthreechecks: 1) Checkifitispossibletoredirectthepointertounmapped memoryunderthecontextofthecurrentpathpredicate. Iftrue,M AYHEM willgenerateacrashtestcaseforthe satisfyingvalue. 2) Checkifitispossibletoredirectthesymbolicpointer tosymbolicdata.Ifitis,M AYHEM willredirect(and concretize)thepointertotheleastconstrainedregionof thesymbolicdata.Byredirectingthepointertowardsthe leastconstrainedregion,M AYHEM triestoavoidloading overconstrainedvalues,thuseliminatingpotentiallyinter- estingpathsthatdependonthesevalues.Toidentifythe leastconstrainedregion,M AYHEM splitsmemoryinto symbolicregions,andsortsthembasedonthecomplexity ofconstraintsassociatedwitheachregion. 3) Ifalloftheabovechecksfail,M AYHEM concretizesthe indextoavalidmemoryaddressandcontinuesexecution. Theabovestepsinferwhetherasymbolicexpressionisa pointer,andifso,whetheritisvalidornot(e.g.,NULL). Forexample,Figure6containsabufferover”owatline 9.However,anattackerisnotguaranteedtohijackcontrol evenif strcpy overwritesthereturnaddress.Theprogram needstoreachthereturninstructiontoactuallytransfer control.However,atline10theprogramperformstwo dereferencesbothofwhichneedtosucceed(i.e.,avoid crashingtheprogram)toreachline11(notethatpointer ptr isalreadyoverwrittenwithuserdata).M AYHEM augmented withprioritizedconcretizationwillgenerate3distincttest cases:1)acrashtestcaseforaninvaliddereferenceofpointer ptr ,2)acrashtestcasewheredereferencingpointer bar failsaftersuccessfullyredirecting ptr tosymbolicdata,and 3)anexploittestcase,wherebothdereferencessucceedand userinputhijackscontroloftheprogram.Figure6shows thememorylayoutforthethirdtestcase. VI.E XPLOIT G ENERATION M AYHEM checksfortwoexploitableproperties:asym- bolic(tainted)instructionpointer,andasymbolicformat string.Eachpropertycorrespondstoabufferover”owand formatstringattackrespectively.Wheneveranyofthetwo Program ExploitType Input Source Symbolic InputSize Symb. Mem. Precondition AdvisoryID. ExploitGen. Time(s) Linux A2ps StackOver”ow Env.Vars 550 crashing EDB-ID-816 189 Aeon StackOver”ow Env.Vars 1000 length CVE-2005-1019 10 Aspell StackOver”ow Stdin 750 crashing CVE-2004-0548 82 Atphttpd StackOver”ow Network 800  crashing CVE-2000-1816 209 FreeRadius StackOver”ow Env. 9000 length Zero-Day 133 GhostScript StackOver”ow Arg. 2000 pre“x CVE-2010-2055 18 Glftpd StackOver”ow Arg. 300 length OSVDB-ID-16373 4 Gnugol StackOver”ow Env. 3200 length Zero-Day 22 Htget StackOver”ow Env.vars 350  length N/A 7 Htpasswd StackOver”ow Arg. 400  pre“x OSVDB-ID-10068 4 Iwcon“g StackOver”ow Arg. 400 length CVE-2003-0947 2 Mbse-bbs StackOver”ow Env.vars 4200  length CVE-2007-0368 362 nCompress StackOver”ow Arg. 1400 length CVE-2001-1413 11 OrzHttpd FormatString Network 400 length OSVDB-ID-60944 6 PSUtils StackOver”ow Arg. 300 length EDB-ID-890 46 Rsync StackOver”ow Env.Vars 100  length CVE-2004-2093 8 SharUtils FormatString Arg. 300 pre“x OSVDB-ID-10255 17 Socat FormatString Arg. 600 pre“x CVE-2004-1484 47 SquirrelMail StackOver”ow Arg. 150 length CVE-2004-0524 2 Tipxd FormatString Arg. 250 length OSVDB-ID-12346 10 xGalaga StackOver”ow Env.Vars 300 length CVE-2003-0454 3 Xtokkaetama StackOver”ow Arg. 100 crashing OSVDB-ID-2343 10 Windows Coolplayer StackOver”ow Files 210  crashing CVE-2008-3408 164 Destiny StackOver”ow Files 2100  crashing OSVDB-ID-53249 963 Dizzy StackOver”ow(SEH) Arg. 519  crashing EDB-ID-15566 13,260 GAlan StackOver”ow Files 1500  pre“x OSVDB-ID-60897 831 GSPlayer StackOver”ow Files 400  crashing OSVDB-ID-69006 120 Muse StackOver”ow Files 250  crashing OSVDB-ID-67277 481 Soritong StackOver”ow(SEH) Files 1000  crashing CVE-2009-1643 845 TableI:ListofprogramsthatM AYHEM demonstratedasexploitable. executionreachesthemaximumnumberofliveinterpreters andstartsterminatingexecutionpaths.Atthispoint,the memorykeepsincreasinglinearlyasthepathsweexplore becomedeeper.Notethatatthebeginning,hybridexecution consumesasmuchmemoryasonlineexecutionwithout exceedingthememorythreshold,andutilizesmemory resourcesmoreaggressivelythanof”ineexecutionthroughout theexecution.Of”ineexecutionrequiresmuchlessmemory (lessthan500KBonaverage),butataperformancecost,as demonstratedbelow. FasterthanOf”ineExecution. Figure8showstheexplo- rationtimefor /bin/echo usingdifferentlimitsonthe maximumnumberofrunningexecutors.Forthisexperiment, weuse6bytesofsymbolicargumentstoexploretheentire inputspaceinareasonableamountoftime.Whenthe maximumnumberofrunningexecutorsis1,itmeans 0 200 400 600 800 1000 1200 1400 1 2 4 8 16 32 64 128 Time to cover all paths (sec.) Maximum number of running executors Re-execution Time Exploration Time Figure8:Explorationtimesfordifferentlimitsonthe maximumnumberofrunningexecutors. M AYHEM willproduceadiskcheckpoint„theaverage checkpointsizewas30KB„foreverysymbolicbranch, AEG M AYHEM Program Time LLVM Time ASM TaintedASM TainedIL iwcon“g 0.506s 10,876 1.90s 394,876 2,200 12,893 aspell 8.698s 87,056 24.62s 696,275 26,647 133,620 aeon 2.188s 18,539 9.67s 623,684 7,087 43,804 htget 0.864s 12,776 6.76s 576,005 2,670 16,391 tipxd 2.343s 82,030 9.91s 647,498 2,043 19,198 ncompress 5.511s 60,860 11.30s 583,330 8,778 71,195 TableIV:AEGcomparison:binary-onlyexecutionrequires moreinstructions. 0 500 1000 1500 2000 2500 3000 3500 50 60 70 80 90 100 Exploit generation time (sec.) Normalized precondition size (%) timeout xtokkaetama sharutils ghostscript socat htpasswd a2ps Figure10:Exploitgenerationtimeversuspreconditionsize. utility.TheresultsareshowninFigure9. Weusedthe21toolswiththesmallestcodesize,and4 biggertoolsthatweselected.M AYHEM achieveda97.56% averagecoverageperapplicationandgot100%coverageon 13tools.Forcomparison,KLEEachieved100%coverage on12coreutilswithoutsimulatedsystemcallfailures(to havethesamecon“gurationasM AYHEM ).Thus,M AYHEM seemstobecompetitivewithKLEEforthisdataset.Note thatM AYHEM isnotdesignedspeci“callyformaximizing codecoverage.However,ourexperimentsprovidearough comparisonpointagainstothersymbolicexecutors. F.ComparisonagainstAEG Wepicked8differentprogramsfromtheAEGworking examples[ 2 ]andranbothtoolstocompareexploitgeneration timesoneachofthoseprogramsusingthesamecon“guration (TableIV).M AYHEM wasonaverage3.4 × slowerthanAEG. AEGusessourcecode,thushastheadvantageofoperatingat ahigher-levelofabstraction.Atthebinarylevel,thereareno typesandhigh-levelstructuressuchasfunctions,variables, buffersandobjects.Thenumberofinstructionsexecuted (TableIV)isanotherfactorthathighlightsthedifference betweensourceandbinary-onlyanalysis.Consideringthis, webelievethisisapositiveandcompetitiveresultfor M AYHEM . PreconditionSize. Asanadditionalexperiment,wemea- suredhowthepresenceofapreconditionaffectsexploit generationtimes.Speci“cally,wepicked6programsthat requireacrashinginputto“ndanexploitablebugand startedtoiterativelydecreasethesizeofthepreconditionand 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Number of tainted instructions (%) 24 different Linux applications Figure12:Taintedinstructions(%)for24Linuxapplications. measuredexploitgenerationtimes.Figure10summarizes ourresultsintermsofnormalizedpreconditionsizes„for example,anormalizedpreconditionof70%fora100-byte crashinginputmeansthatweprovide70bytesofthecrashing inputasapreconditiontoM AYHEM .Whilethebehavior appearedtobeprogram-dependent,inmostoftheprograms weobservedasuddenphase-transition,wheretheremoval ofasinglecharactercouldcauseM AYHEM tonotdetectthe exploitablebugwithinthetimelimit.Webelievethistobe aninterestingtopicforfutureworkinthearea. G.PerformanceTuning FormulaOptimizations. Recallfrom § IV-E M AYHEM uses variousoptimizationtechniquestomakesolverqueriesfaster. TocompareagainstouroptimizedversionofM AYHEM ,we turnedoffsomeoralloftheseoptimizations. Wechose15Linuxprogramstoevaluatethespeedup obtainedwithdifferentlevelsofoptimizationsturnedon. Figure11showsthehead-to-headcomparison(inexploit “ndingandgenerationtimes)between4differentformula optimizationoptions.Algebraicsimpli“cationsusuallyspeed upouranalysisandofferanaveragespeedupof10%for the15testprograms.Signi“cantspeedupsoccurwhenthe independentformulaoptimizationisturnedonalongwith simpli“cations,offeringspeedupsof10-100 × . Z3supportsincrementalsolving,soasanadditional experiment,wemeasuredtheexploitgenerationtimewith Z3inincrementalmode.Inmostcasessolvingtimesfor incrementalformulasarecomparabletothetimesweobtain withtheindependentformulasoptimization.Infact,inhalfof ourexamples(7outof15)incrementalformulasoutperform independentformulas.Incontrasttopreviousresults,this impliesthatusingthesolverinincrementalmodecanalleviate theneedformanyformulasimpli“cationsandoptimizations. Adownsideofusingthesolverinincrementalmodewas thatitmadeoursymbolicexecutionstatemutable„andthus waslessmemoryef“cientduringourlong-runningtests. TaintedInstructions. Onlytaintedinstructionblocksare evaluatedsymbolicallybyM AYHEM „allotherblocksare executednatively.Figure12showsthepercentageoftainted instructionsfor24programs(takenfromTableI).Morethan 95%ofinstructionswerenottaintedinoursampleprograms, andthisoptimizationgaveabout 8 × speeduponaverage. XI.C ONCLUSION WepresentedM AYHEM ,atoolforautomatically“nding exploitablebugsinbinary(i.e.,executable)programsinan ef“cientandscalableway.Tothisend,M AYHEM introduces anovelhybridsymbolicexecutionschemethatcombines thebene“tsofexistingsymbolicexecutiontechniques(both onlineandof”ine)intoasinglesystem.Wealsopresentindex- basedmemorymodeling,atechniquethatallowsM AYHEM todiscovermoreexploitablebugsatthebinary-level.We usedM AYHEM toanalyze29applicationsandautomatically identi“edanddemonstrated29exploitablevulnerabilities. XII.A CKNOWLEDGEMENTS Wethankourshepherd,CristianCadarandtheanonymous reviewersfortheirhelpfulcommentsandfeedback.This researchwassupportedbyaDARPAgranttoCyLabat CarnegieMellonUniversity(N11AP20005/D11AP00262),a NSFCareergrant(CNS0953751),andpartialCyLabARO supportfromgrantDAAD19-02-1-0389andW911NF-09-1- 0273.Thecontentoftheinformationdoesnotnecessarily re”ectthepositionorthepolicyoftheGovernment,andno of“cialendorsementshouldbeinferred. R EFERENCES [1] Orzhttpd,asmallandhighperformancehttpserver,Ž http://code.google.com/p/orzhttpd/. [2] T.Avgerinos,S.K.Cha,B.L.T.Hao,andD.Brumley,AEG: Automaticexploitgeneration,Žin Proc.oftheNetworkand DistributedSystemSecuritySymposium ,Feb.2011. [3] D.Babi ´ c,L.Martignoni,S.McCamant,andD.Song, Statically-DirectedDynamicAutomatedTestGeneration,Žin InternationalSymposiumonSoftwareTestingandAnalysis . NewYork,NY,USA:ACMPress,2011,pp.12…22. [4] G.BalakrishnanandT.Reps,Analyzingmemoryaccesses inx86executables.Žin Proc.oftheInternationalConference onCompilerConstruction ,2004. [5] BitBlazebinaryanalysisproject,Ž http://bitblaze.cs.berkeley.edu,2007. [6]BitTurner,BitTurner,Žhttp://www.bitturner.com. [7] D.Brumley,P.Poosankam,D.Song,andJ.Zheng,Automatic patch-basedexploitgenerationispossible:Techniquesand implications,Žin Proc.oftheIEEESymposiumonSecurity andPrivacy ,May2008. [8] J.Caballero,P.Poosankam,S.McCamant,D.Babic,and D.Song,Inputgenerationviadecompositionandre-stitching: Findingbugsinmalware,Žin Proc.oftheACMConferenceon ComputerandCommunicationsSecurity ,Chicago,IL,October 2010. [9] C.Cadar,D.Dunbar,andD.Engler,KLEE:Unassisted andautomaticgenerationofhigh-coveragetestsforcomplex systemsprograms,Žin Proc.oftheUSENIXSymposiumon OperatingSystemDesignandImplementation ,Dec.2008. [10] M.Costa,M.Castro,L.Zhou,L.Zhang,andM.Peinado, Bouncer:Securingsoftwarebyblockingbadinput,Žin SymposiumonOperatingSystemsPrinciples ,Oct.2007. [11] J.R.CrandallandF.Chong,Minos:Architecturalsupport forsoftwaresecuritythroughcontroldataintegrity,Žin Proc. oftheInternationalSymposiumonMicroarchitecture ,Dec. 2004. [12] L.M.deMouraandN.Bjørner,Z3:Anef“cientsmtsolver,Ž in TACAS ,2008,pp.337…340. [13] P.Godefroid,M.Levin,andD.Molnar,Automatedwhitebox fuzztesting,Žin Proc.oftheNetworkandDistributedSystem SecuritySymposium ,Feb.2008. [14] S.Heelan,AutomaticGenerationofControlFlowHijacking ExploitsforSoftwareVulnerabilities,ŽOxfordUniversity,Tech. Rep.MScThesis,2002. [15] I.Jager,T.Avgerinos,E.J.Schwartz,andD.Brumley,BAP: Abinaryanalysisplatform,Žin Proc.oftheConferenceon ComputerAidedVeri“cation ,2011. [16]J.King,Symbolicexecutionandprogramtesting,Ž Commu- nicationsoftheACM ,vol.19,pp.386…394,1976. [17] Launchpad,https://bugs.launchpad.net/ubuntu,openbugsin Ubuntu.Checked03/04/12. [18] C.-K.Luk,R.Cohn,R.Muth,H.Patil,A.Klauser,G.Lowney, S.Wallace,V.J.Reddi,andK.Hazelwood,Pin:Building customizedprogramanalysistoolswithdynamicinstrumen- tation,Žin Proc.oftheACMConferenceonProgramming LanguageDesignandImplementation ,Jun.2005. [19] R.MajumdarandK.Sen,Hybridconcolictesting,Žin Proc. oftheACMConferenceonSoftwareEngineering ,2007,pp. 416…426. [20] L.Martignoni,S.McCamant,P.Poosankam,D.Song,and P.Maniatis,Path-explorationlifting:Hi-“testsforlo-“emula- tors,Žin Proc.oftheInternationalConferenceonArchitectural SupportforProgrammingLanguagesandOperatingSystems , London,UK,Mar.2012. [21] A.Moser,C.Kruegel,andE.Kirda,Exploringmultiple executionpathsformalwareanalysis,Žin Proc.oftheIEEE SymposiumonSecurityandPrivacy ,2007. [22] T.Newsham,Formatstringattacks,ŽGuardent,Inc.,Tech. Rep.,2000. [23] J.NewsomeandD.Song,Dynamictaintanalysisfor automaticdetection,analysis,andsignaturegenerationof exploitsoncommoditysoftware,Žin Proc.oftheNetworkand DistributedSystemSecuritySymposium ,Feb.2005. [24] E.J.Schwartz,T.Avgerinos,andD.Brumley,Allyouever wantedtoknowaboutdynamictaintanalysisandforward symbolicexecution(butmighthavebeenafraidtoask),Žin Proc.oftheIEEESymposiumonSecurityandPrivacy ,May 2010,pp.317…331. [25] E.J.Schwartz,T.Avgerinos,andD.Brumley,Q:Exploit hardeningmadeeasy,Žin Proc.oftheUSENIXSecurity Symposium ,2011. [26] K.Sen,D.Marinov,andG.Agha,CUTE:Aconcolicunit testingengineforC,Žin Proc.oftheACMSymposiumonthe FoundationsofSoftwareEngineering ,2005. [27] A.V.Thakur,J.Lim,A.Lal,A.Burton,E.Driscoll,M.Elder, T.Andersen,andT.W.Reps,Directedproofgenerationfor machinecode,Žin CAV ,2010,pp.288…305. [28] G.C.VitalyChipounov,VolodymyrKuznetsov,S2E:A platformforin-vivomulti-pathanalysisofsoftwaresystems,Ž in Proc.oftheInternationalConferenceonArchitectural SupportforProgrammingLanguagesandOperatingSystems , 2011,pp.265…278. XI.CWepresentedMAYHEM,atoolforautomatically“ndingexploitablebugsinbinary(i.e.,executable)programsinanef“cientandscalableway.Tothisend,MAYHEManovelhybridsymbolicexecutionschemethatcombinesthebene“tsofexistingsymbolicexecutiontechniques(bothonlineandof”ine)intoasinglesystem.Wealsopresentindex-basedmemorymodeling,atechniquethatallowsMAYHEMtodiscovermoreexploitablebugsatthebinary-level.WeusedMAYHEMtoanalyze29applicationsandautomaticallyidenti“edanddemonstrated29exploitablevulnerabilities.XII.ACKNOWLEDGEMENTSWethankourshepherd,CristianCadarandtheanonymousreviewersfortheirhelpfulcommentsandfeedback.ThisresearchwassupportedbyaDARPAgranttoCyLabatCarnegieMellonUniversity(N11AP20005/D11AP00262),aNSFCareergrant(CNS0953751),andpartialCyLabAROsupportfromgrantDAAD19-02-1-0389andW911NF-09-1-0273.Thecontentoftheinformationdoesnotnecessarilyre”ectthepositionorthepolicyoftheGovernment,andnoof“cialendorsementshouldbeinferred.inferred.Orzhttpd,asmallandhighperformancehttpserver,ŽŽT.Avgerinos,S.K.Cha,B.L.T.Hao,andD.Brumley,AEG:Automaticexploitgeneration,ŽinProc.oftheNetworkandDistributedSystemSecuritySymposium,Feb.2011.2011.D.Babic,L.Martignoni,S.McCamant,andD.Song,Statically-DirectedDynamicAutomatedTestGeneration,ŽinInternationalSymposiumonSoftwareTestingandAnalysisNewYork,NY,USA:ACMPress,2011,pp.12…22.12…22.G.BalakrishnanandT.Reps,Analyzingmemoryaccessesinx86executables.ŽinProc.oftheInternationalConferenceonCompilerConstruction,2004.2004.BitBlazebinaryanalysisproject,Žhttp://bitblaze.cs.berkeley.edu,2007.[6]BitTurner,BitTurner,Žhttp://www.bitturner.com..com.D.Brumley,P.Poosankam,D.Song,andJ.Zheng,Automaticpatch-basedexploitgenerationispossible:Techniquesandimplications,ŽinProc.oftheIEEESymposiumonSecurityandPrivacy,May2008.2008.J.Caballero,P.Poosankam,S.McCamant,D.Babic,andD.Song,Inputgenerationviadecompositionandre-stitching:Findingbugsinmalware,ŽinProc.oftheACMConferenceonComputerandCommunicationsSecurity,Chicago,IL,OctoberOctoberC.Cadar,D.Dunbar,andD.Engler,KLEE:Unassistedandautomaticgenerationofhigh-coveragetestsforcomplexsystemsprograms,ŽinProc.oftheUSENIXSymposiumonOperatingSystemDesignandImplementation,Dec.2008.2008.M.Costa,M.Castro,L.Zhou,L.Zhang,andM.Peinado,Bouncer:Securingsoftwarebyblockingbadinput,ŽinSymposiumonOperatingSystemsPrinciples,Oct.2007.2007.J.R.CrandallandF.Chong,Minos:Architecturalsupportforsoftwaresecuritythroughcontroldataintegrity,ŽinProc.oftheInternationalSymposiumonMicroarchitecture,Dec.Dec.L.M.deMouraandN.Bjørner,Z3:Anef“cientsmtsolver,ŽTACAS,2008,pp.337…340.337…340.P.Godefroid,M.Levin,andD.Molnar,Automatedwhiteboxfuzztesting,ŽinProc.oftheNetworkandDistributedSystemSecuritySymposium,Feb.2008.2008.S.Heelan,AutomaticGenerationofControlFlowHijackingExploitsforSoftwareVulnerabilities,ŽOxfordUniversity,Tech.Rep.MScThesis,2002.2002.I.Jager,T.Avgerinos,E.J.Schwartz,andD.Brumley,BAP:Abinaryanalysisplatform,ŽinProc.oftheConferenceonComputerAidedVeri“cation,2011.[16]J.King,Symbolicexecutionandprogramtesting,ŽnicationsoftheACM,vol.19,pp.386…394,1976.1976.Launchpad,https://bugs.launchpad.net/ubuntu,openbugsinUbuntu.Checked03/04/12.03/04/12.C.-K.Luk,R.Cohn,R.Muth,H.Patil,A.Klauser,G.Lowney,S.Wallace,V.J.Reddi,andK.Hazelwood,Pin:Buildingcustomizedprogramanalysistoolswithdynamicinstrumen-tation,ŽinProc.oftheACMConferenceonProgrammingLanguageDesignandImplementation,Jun.2005.2005.R.MajumdarandK.Sen,Hybridconcolictesting,ŽinProc.oftheACMConferenceonSoftwareEngineering,2007,pp.pp.L.Martignoni,S.McCamant,P.Poosankam,D.Song,andP.Maniatis,Path-explorationlifting:Hi-“testsforlo-“emula-tors,ŽinProc.oftheInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystemsLondon,UK,Mar.2012.2012.A.Moser,C.Kruegel,andE.Kirda,Exploringmultipleexecutionpathsformalwareanalysis,ŽinProc.oftheIEEESymposiumonSecurityandPrivacy,2007.2007.T.Newsham,Formatstringattacks,ŽGuardent,Inc.,Tech.Rep.,2000.2000.J.NewsomeandD.Song,Dynamictaintanalysisforautomaticdetection,analysis,andsignaturegenerationofexploitsoncommoditysoftware,ŽinProc.oftheNetworkandDistributedSystemSecuritySymposium,Feb.2005.2005.E.J.Schwartz,T.Avgerinos,andD.Brumley,Allyoueverwantedtoknowaboutdynamictaintanalysisandforwardsymbolicexecution(butmighthavebeenafraidtoask),ŽinProc.oftheIEEESymposiumonSecurityandPrivacy,May2010,pp.317…331.317…331.E.J.Schwartz,T.Avgerinos,andD.Brumley,Q:Exploithardeningmadeeasy,ŽinProc.oftheUSENIXSecurity,2011.2011.K.Sen,D.Marinov,andG.Agha,CUTE:AconcolicunittestingengineforC,ŽinProc.oftheACMSymposiumontheFoundationsofSoftwareEngineering,2005.2005.A.V.Thakur,J.Lim,A.Lal,A.Burton,E.Driscoll,M.Elder,T.Andersen,andT.W.Reps,Directedproofgenerationformachinecode,ŽinCAV,2010,pp.288…305.288…305.G.C.VitalyChipounov,VolodymyrKuznetsov,S2E:Aplatformforin-vivomulti-pathanalysisofsoftwaresystems,ŽProc.oftheInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems2011,pp.265…278. 394 1 10 100 1000 10000 iwconfig squirrel mail xgalaga glftpd orzhttpd aeon ncompress tipxd ghostscript xtokkaetama sharutils aspell socat psutils atphttpd Exploit Gen. Time (sec. in logscale)Indep. Formula + Simplification Inc. Formula + Simplification Indep. Formula Simplification Timeout Figure11:ExploitgenerationtimeofMAYHEMfordifferentoptimizations.IX.DMostoftheworkpresentedinthispaperfocusesonexploitablebug“nding.However,webelievethatthemaintechniquescanbeadaptedtootherapplicationdomainsunderthecontextofsymbolicexecution.Wealsobelievethatourhybridsymbolicexecutionandindex-basedmemorymodelingrepresentnewpointsinthedesignspaceofsymbolicexecution.WestressthattheintentionofMAYHEMisinformingauserthatanexploitablebugexists.Theexploitproducedisintendedtodemonstratetheseverityoftheproblem,andtohelpdebugandaddresstheunderlyingissue.MAYHEMmakesnoefforttobypassOSdefensessuchasASLRandDEP,whichwilllikelyprotectsystemsagainstexploitswegenerate.However,ourpreviousworkonQ[]showsthatabrokenexploit(thatnolongerworksbecauseofASLRandDEP),canbeautomaticallytransformed„withhighprobability„intoanexploitthatbypassesbothdefensesonmodernOSes.WhilewecouldfeedtheexploitsgeneratedbyAYHEMdirectlyintoQ,wedonotexplorethispossibilityinthispaper.AYHEMdoesnothavemodelsforallsystem/librarycalls.Thecurrentimplementationmodelsabout30systemcallsinLinux,and12librarycallsinWindows.Toanalyzelargerandmorecomplicatedprograms,moresystemcallsneedtobemodeled.Thisisanartifactofperformingper-processsymbolicexecution.Whole-systemsymbolicexecutorssuchasS2E[]orBitBlaze[]canexecutebothuserandkernelcode,andthusdonothavethislimitation.Thedown-sideisthatwhole-systemanalysiscanbemuchmoreexpensive,becauseofthehigherstaterestorationcostandthetimespentanalyzingkernelcode.AnotherlimitationisthatMAYHEMcancurrentlyanalyzeonlyasingleexecutionthreadoneveryrun.MAYHEMhandlemulti-threadedprogramswhenthreadsinteractwitheachother(throughmessage-passingorsharedmemory).Last,MAYHEMexecutesonlytaintedinstructions,thusitissubjecttoallthepitfallsoftaintanalysis,includingundertainting,overtaintingandimplicit”ows[24].FutureWork:OurexperimentsshowthatMAYHEMgenerateexploitsforstandardvulnerabilitiessuchasstack-basedbufferover”owsandformatstrings.AninterestingfuturedirectionistoextendMAYHEMtohandlemoreadvancedexploitationtechniquessuchasexploitingheap-basedbufferover”ows,use-after-freevulnerabilities,andinformationdisclosureattacks.Atahighlevel,itshouldbepossibletodetectsuchattacksusingsafetypropertiessimilartotheonesMAYHEMcurrentlyemploys.However,itisstillanopenquestionhowthesametechniquescanscaleanddetectsuchexploitsinbiggerprograms.X.RELATEDBrumleyetal.[]introducedtheautomaticpatch-basedexploitgeneration(APEG)challenge.APEGusedthepatchtopointoutthelocationofthebugandthenusedslicingtoconstructaformulaforcodepathsfrominputsourcetovulnerableline.MAYHEM“ndsvulnerabilitiesandvulnerablecodepathsitself.Inaddition,APEGsnotionofanexploitismoreabstract:anyinputthatviolateschecksintroducedbythepathareconsideredexploits.Hereweconsiderspeci“callycontrol”owhijackexploits,whichwerenotautomaticallygeneratedbyAPEG.Heelan[]wasthe“rsttodescribeatechniquethattakesinacrashinginputforaprogram,alongwithajumpregister,andautomaticallygeneratesanexploit.Ourresearchexploresthestatespaceto“ndsuchcrashinginputs.AEG[]wasthe“rstsystemtotackletheproblemofbothidentifyingexploitablebugsandautomaticallygeneratingexploits.AEGworkedsolelyonsourcecodeandintroducedpreconditionedsymbolicexecutionasawaytofocussym-bolicexecutiontowardsaparticularpartofthesearchspace.AYHEMisalogicalextensionofAEGtobinarycode.Inpractice,workingonbinarycodeopensupautomaticexploitgenerationtoawiderclassofprogramsandscenarios.Thereareseveralbinary-onlysymbolicexecutionframe-workssuchasBouncer[],BitFuzz[],BitTurner[FuzzBall[],McVeto[],SAGE[],andS2E[whichhavebeenusedinavarietyofapplicationdomains.ThemainquestionwetackleinMAYHEMisscalingto“ndanddemonstrateexploitablebugs.Thehybridsymbolicexecutiontechniquewepresentinthispaperiscompletelydifferentfromhybridconcolictesting[],whichinterleavesrandomtestingwithconcolicexecutiontoachievebettercodecoverage. 393 AEG AYHEM Program Time LLVM Time ASM TaintedASM TainedIL iwcon“g 0.506s 10,876 1.90s 394,876 2,200 12,893 aspell 8.698s 87,056 24.62s 696,275 26,647 133,620 aeon 2.188s 18,539 9.67s 623,684 7,087 43,804 htget 0.864s 12,776 6.76s 576,005 2,670 16,391 tipxd 2.343s 82,030 9.91s 647,498 2,043 19,198 ncompress 5.511s 60,860 11.30s 583,330 8,778 71,195 TableIV:AEGcomparison:binary-onlyexecutionrequiresmoreinstructions. 0 500 1000 1500 2000 2500 3000 3500 50 60 70 80 90 100 Exploit generation time (sec.)Normalized precondition size (%)timeoutxtokkaetama sharutils ghostscript socat htpasswd a2ps Figure10:Exploitgenerationtimeversuspreconditionsize.utility.TheresultsareshowninFigure9.Weusedthe21toolswiththesmallestcodesize,and4biggertoolsthatweselected.MAYHEMachieveda97.56%averagecoverageperapplicationandgot100%coverageon13tools.Forcomparison,KLEEachieved100%coverageon12coreutilswithoutsimulatedsystemcallfailures(tohavethesamecon“gurationasMAYHEM).Thus,MAYHEMseemstobecompetitivewithKLEEforthisdataset.NotethatMAYHEMisnotdesignedspeci“callyformaximizingcodecoverage.However,ourexperimentsprovidearoughcomparisonpointagainstothersymbolicexecutors.F.ComparisonagainstAEGWepicked8differentprogramsfromtheAEGworkingexamples[]andranbothtoolstocompareexploitgenerationtimesoneachofthoseprogramsusingthesamecon“guration(TableIV).MAYHEMwasonaverage3.4slowerthanAEG.AEGusessourcecode,thushastheadvantageofoperatingatahigher-levelofabstraction.Atthebinarylevel,therearenotypesandhigh-levelstructuressuchasfunctions,variables,buffersandobjects.Thenumberofinstructionsexecuted(TableIV)isanotherfactorthathighlightsthedifferencebetweensourceandbinary-onlyanalysis.Consideringthis,webelievethisisapositiveandcompetitiveresultforAYHEMPreconditionSize.Asanadditionalexperiment,wemea-suredhowthepresenceofapreconditionaffectsexploitgenerationtimes.Speci“cally,wepicked6programsthatrequireacrashinginputto“ndanexploitablebugandstartedtoiterativelydecreasethesizeofthepreconditionand 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Number of tainted instructions (%)24 different Linux applications Figure12:Taintedinstructions(%)for24Linuxapplications.measuredexploitgenerationtimes.Figure10summarizesourresultsintermsofnormalizedpreconditionsizes„forexample,anormalizedpreconditionof70%fora100-bytecrashinginputmeansthatweprovide70bytesofthecrashinginputasapreconditiontoMAYHEM.Whilethebehaviorappearedtobeprogram-dependent,inmostoftheprogramsweobservedasuddenphase-transition,wheretheremovalofasinglecharactercouldcauseMAYHEMtonotdetecttheexploitablebugwithinthetimelimit.Webelievethistobeaninterestingtopicforfutureworkinthearea.G.PerformanceTuningFormulaOptimizations.RecallfromIV-EAYHEMvariousoptimizationtechniquestomakesolverqueriesfaster.TocompareagainstouroptimizedversionofMAYHEM,weturnedoffsomeoralloftheseoptimizations.Wechose15Linuxprogramstoevaluatethespeedupobtainedwithdifferentlevelsofoptimizationsturnedon.Figure11showsthehead-to-headcomparison(inexploit“ndingandgenerationtimes)between4differentformulaoptimizationoptions.Algebraicsimpli“cationsusuallyspeedupouranalysisandofferanaveragespeedupof10%forthe15testprograms.Signi“cantspeedupsoccurwhentheindependentformulaoptimizationisturnedonalongwithsimpli“cations,offeringspeedupsof10-100Z3supportsincrementalsolving,soasanadditionalexperiment,wemeasuredtheexploitgenerationtimewithZ3inincrementalmode.Inmostcasessolvingtimesforincrementalformulasarecomparabletothetimesweobtainwiththeindependentformulasoptimization.Infact,inhalfofourexamples(7outof15)incrementalformulasoutperformindependentformulas.Incontrasttopreviousresults,thisimpliesthatusingthesolverinincrementalmodecanalleviatetheneedformanyformulasimpli“cationsandoptimizations.Adownsideofusingthesolverinincrementalmodewasthatitmadeoursymbolicexecutionstatemutable„andthuswaslessmemoryef“cientduringourlong-runningtests.TaintedInstructions.OnlytaintedinstructionblocksareevaluatedsymbolicallybyMAYHEM„allotherblocksareexecutednatively.Figure12showsthepercentageoftaintedinstructionsfor24programs(takenfromTableI).Morethan95%ofinstructionswerenottaintedinoursampleprograms,andthisoptimizationgaveaboutspeeduponaverage. 392 LHits RHits Misses #Queries Time(sec) Noopt. N/A N/A N/A 217,179 1,841 +VSA N/A N/A N/A 49,424 437 +Rcache N/A 3996 7 10,331 187 +Lcache 3940 56 7 242 77 TableII:Effectivenessofboundsresolutionoptimizations.TheLandRcachesarerespectivelytheLemmaandRe“nementcachesasde“nedinthusisequivalenttoof”ineexecution.Whenthemaximumnumberofrunningexecutorswas128orabove,MAYHEMdidnothavetocheckpointtodisk,thusisequivalenttoanonlineexecutor.Asaresult,onlineexecutiontookaround25secondstoexploretheinputspacewhileof”ineexecutionneeded1,400seconds.Onlinewas56fasterthanof”ineinthisexperiment.Weidenti“edtwomajorreasonsforthisperformanceboost.First,there-executioncostishigherthancontext-switchingbetweentwoexecutionstates(IV-B).MAYHEMspentmorethan25%ofthetimere-executingpreviouspathsintheof”inescheme.Fortheonlinecase,2%ofthetimewasspentcontext-switching.Second,onlineismorecache-ef“cientthanof”ineexecutioninourimplementation.Speci“cally,onlineexecutionmakesmoreef“cientuseofthePincodecache[]byswitchingbetweenpathsin-memoryduringasingleexecution.Asaresult,thecodecachemadeonlineexecution40fasterthanof”ineexecution.Additionally,weranaWindowsGUIprogram)tocomparethethroughputbetweenof”ineandhybridexecution.Wechosethisprogrambecauseitdoesnotrequireuserinteraction(e.g.,mouseclick),tostartsymbolicexecution.Werantheprogramfor1hourforeachexecutionmode.Hybridexecutionwas10fasterthanof”ineexecution.D.HandlingSymbolicMemoryinReal-WorldApplicationsRecallfromV,index-basedmemorymodelingenablesAYHEMtoreasonaboutsymbolicindices.OurexperimentsfromTableIshowthatmorethan40%oftheprogramsrequiredsymbolicmemorymodeling(column6)toexploit.Inotherwords,MAYHEM„afterseveralhoursofanalysis„wasunabletogenerateexploitsfortheseprogramswithoutindex-basedmemorymodeling.Tounderstandwhy,weevaluatedourindex-basedmemorymodelingoptimizationsontheserver.BoundsResolutionTableIIshowsthetimetakenbyAYHEMto“ndavulnerabilityinatphttpdusingdifferentlevelsofoptimizationsfortheboundsresolutionalgorithm.Thetimesincludeexploitdetectionbutnotexploitgenerationtime(sinceitisnotaffectedbytheboundsresolutionalgorithm).Row3showsthatVSAreducestheaveragenumberofqueriestotheSMTsolverfrom54to FormulaRepresentation Time(sec.) Unbalancedbinarytree 1,754 Balancedbinarytree 425 Balancedbinarytree+Linearization 192 TableIII:PerformancecomparisonfordifferentISTrepre- 0 20 40 60 80 100 0 500 1000 1500 2000 2500 3000 3500 Code Coverage (%)Time (sec.) Figure9:CodecoverageachievedbyMAYHEMastimeprogressesfor25coreutilsapplications.queriespersymbolicmemoryaccess,andreducesthetotaltimeby75%.Row4showsshowsthenumberofquerieswhenthere“nementcache(Rcache)isenabledontopofVSA.TheRcachereducesthenumberofnecessarybinarysearchestofrom4003to7,resultingina57%speedup.Thelastrowshowstheeffectofthelemmacache(Lcache)ontopoftheotheroptimizations.TheLcachetakesmostoftheburdenofftheRcache,thusresultinginanadditional59%speedup.WedonotexpecttheLcachetoalwaysbethatef“cient,sinceitreliesheavilyontheindependenceofformulasinthepathpredicate.Thecumulativespeedupwas96%.IndexSearchTreeRepresentation.RecallfromV-BAYHEMmodelssymbolicmemoryloadsasISTs.Toshowtheeffectivenessofthisoptimizationweranthreedifferentformularepresentations(showninTableIII).ThebalancedISTwasmorethanfasterthantheunbalancedbinarytreerepresentation,andwithlinearizationoftheformulaweobtainedacumulativespeedup.Note,thatwithsymbolicarrays(noISTs)wewereunabletodetectanexploitwithinthetimelimit.AYHEMCoverageComparisonToevaluateMAYHEMsabilitytocovernewpaths,wedownloadedanopen-sourcesymbolicexecutor(KLEE)tocomparetheperformanceagainstMAYHEM.NoteKLEErunsonsource,whileMAYHEMonbinary.Wemeasuredthecodecoverageof25coreutilsapplicationsasafunctionoftime.MAYHEMranforonehour,atmost,oneachofthoseapplications.WeusedthegeneratedtestcasestomeasurethecodecoverageusingtheGNUgcov 391 Program ExploitType InputSource InputSize Symb. Precondition AdvisoryID. ExploitGen.Time(s) Linux A2ps StackOver”ow Env.Vars 550 crashing EDB-ID-816 189 Aeon StackOver”ow Env.Vars 1000 length CVE-2005-1019 10 Aspell StackOver”ow Stdin 750 crashing CVE-2004-0548 82 Atphttpd StackOver”ow Network 800  crashing CVE-2000-1816 209 FreeRadius StackOver”ow Env. 9000 length Zero-Day 133 GhostScript StackOver”ow Arg. 2000 pre“x CVE-2010-2055 18 Glftpd StackOver”ow Arg. 300 length OSVDB-ID-16373 4 Gnugol StackOver”ow Env. 3200 length Zero-Day 22 Htget StackOver”ow Env.vars 350  length N/A 7 Htpasswd StackOver”ow Arg. 400  pre“x OSVDB-ID-10068 4 Iwcon“g StackOver”ow Arg. 400 length CVE-2003-0947 2 Mbse-bbs StackOver”ow Env.vars 4200  length CVE-2007-0368 362 nCompress StackOver”ow Arg. 1400 length CVE-2001-1413 11 OrzHttpd FormatString Network 400 length OSVDB-ID-60944 6 PSUtils StackOver”ow Arg. 300 length EDB-ID-890 46 Rsync StackOver”ow Env.Vars 100  length CVE-2004-2093 8 SharUtils FormatString Arg. 300 pre“x OSVDB-ID-10255 17 Socat FormatString Arg. 600 pre“x CVE-2004-1484 47 SquirrelMail StackOver”ow Arg. 150 length CVE-2004-0524 2 Tipxd FormatString Arg. 250 length OSVDB-ID-12346 10 xGalaga StackOver”ow Env.Vars 300 length CVE-2003-0454 3 Xtokkaetama StackOver”ow Arg. 100 crashing OSVDB-ID-2343 10 Windows Coolplayer StackOver”ow Files 210  crashing CVE-2008-3408 164 Destiny StackOver”ow Files 2100  crashing OSVDB-ID-53249 963 Dizzy StackOver”ow(SEH) Arg. 519  crashing EDB-ID-15566 13,260 GAlan StackOver”ow Files 1500  pre“x OSVDB-ID-60897 831 GSPlayer StackOver”ow Files 400  crashing OSVDB-ID-69006 120 Muse StackOver”ow Files 250  crashing OSVDB-ID-67277 481 Soritong StackOver”ow(SEH) Files 1000  crashing CVE-2009-1643 845 TableI:ListofprogramsthatMAYHEMdemonstratedasexploitable.executionreachesthemaximumnumberofliveinterpretersandstartsterminatingexecutionpaths.Atthispoint,thememorykeepsincreasinglinearlyasthepathsweexplorebecomedeeper.Notethatatthebeginning,hybridexecutionconsumesasmuchmemoryasonlineexecutionwithoutexceedingthememorythreshold,andutilizesmemoryresourcesmoreaggressivelythanof”ineexecutionthroughouttheexecution.Of”ineexecutionrequiresmuchlessmemory(lessthan500KBonaverage),butataperformancecost,asdemonstratedbelow.FasterthanOf”ineExecution.Figure8showstheexplo-rationtimeforusingdifferentlimitsonthemaximumnumberofrunningexecutors.Forthisexperiment,weuse6bytesofsymbolicargumentstoexploretheentireinputspaceinareasonableamountoftime.Whenthemaximumnumberofrunningexecutorsis1,itmeans 0 200 400 600 800 1000 1200 1400 1 2 4 8 16 32 64 128 Time to cover all paths (sec.)Maximum number of running executorsRe-execution Time Exploration Time Figure8:Explorationtimesfordifferentlimitsonthemaximumnumberofrunningexecutors.AYHEMwillproduceadiskcheckpoint„theaveragecheckpointsizewas30KB„foreverysymbolicbranch, 390 exploitablepoliciesareviolated,MAYHEMgeneratesanexploitabilityformulaandtriesto“ndasatisfyinganswer,i.e.,anexploit.AYHEMcangeneratebothlocalandremoteattacks.Ourgenericdesignallowsustohandlebothtypesofattackssimilarly.ForWindows,MAYHEMdetectsoverwrittenStructuredExceptionHandler(SEH)onthestackwhenanexceptionoccurs,andtriestocreateanSEH-basedexploit.BufferOver”ows:AYHEMgeneratesexploitsforanypossibleinstruction-pointeroverwrite,commonlytriggeredbyabufferover”ow.WhenMAYHEM“ndsasymbolicinstructionpointer,it“rsttriestogeneratejump-to-registerexploits,similartopreviouswork[].Forthistypeofexploit,theinstructionpointershouldpointtoatrampoline,jmp%eax,andtheregister,e.g.,shouldpointtoaplaceinmemorywherewecanplaceourshellcode.Byencodingthoseconstraintsintotheformula,MAYHEMisabletoquerythesolverforasatisfyinganswer.Ifananswerexists,weprovedthatthebugisexploitable.Ifwecantgenerateajump-to-registerexploit,wetrytogenerateasimplerexploitbymakingtheinstructionpointerpointdirectlytoaplaceinmemorywherewecanplaceshellcode.FormatStringAttacks:Toidentifyandgenerateformatstringattacks,MAYHEMcheckswhethertheformatargumentofformatstringfunctions,e.g.,,containsanysymbolicbytes.Ifanysymbolicbytesaredetected,ittriestoplaceaformatstringpayloadwithintheargumentthatwilloverwritethereturnaddressoftheformattingfunction.VII.IMPLEMENTATIONAYHEMconsistsofabout27,000linesofC/C++andOCamlcode.OurbinaryinstrumentationframeworkwasbuiltonPin[]andallthehooksformodeledsystemandAPIcallswerewritteninC/C++.ThesymbolicexecutionengineiswrittensolelyinOCamlandconsistsofabout10,000linesofcode.WerelyonBAP[]toconvertassemblyinstructionstotheIL.WeuseZ3[]asourdecisionprocedure,forwhichwebuiltdirectOCamlbindings.Toallowforremotecom-municationbetweenthetwocomponentsweimplementedourowncross-platform,light-weightRPCprotocol(bothinC++andOCaml).Additionally,tocomparebetweendifferentsymbolicexecutionmodes,weimplementedallthree:online,of”ineandhybrid.VIII.EVALUATIONA.ExperimentalSetupWeevaluatedoursystemon2virtualmachinesrunningonadesktopwitha3.40GHzIntel(R)Corei7-2600CPUand16GBofRAM.EachVMhad4GBRAMandwasrunningDebianLinux(Squeeze)VMandWindowsXPSP3respectively. 0.0 x 100 2.0 x 105 4.0 x 105 6.0 x 105 8.0 x 105 1.0 x 106 1.2 x 106 1.4 x 106 1.6 x 106 1.8 x 106 2.0 x 106 0 500 1000 1500 2000 2500 3000 Memory Use (Bytes)Time (sec.)online hybrid offline Figure7:Memoryuseinonline,of”ine,andhybridmode.B.ExploitableBugDetectionWedownloaded29differentvulnerableprogramstochecktheeffectivenessofMAYHEM.TableIsummarizesourresults.Experimentswereperformedonstrippedunmodi“edbinariesonbothLinuxandWindows.OneoftheWindowsapplicationsMAYHEMexploited()wasapackedbinary.Column3showsthetypeofexploitsthatMAYHEMdetectedaswedescribedinVI.Column4showsthesymbolicsourcesthatweconsideredforeachprogram.ThereareexamplesfromallthesymbolicinputsourcesthatMAYHEMsupports,includingcommand-linearguments(Arg.),environmentvariables(Env.Vars),networkpackets(Network)andsymbolic“les(Files).Column5isthesizeofeachsymbolicinput.Column6describesthepreconditiontypesthatweprovidedtoMAYHEM,foreachofthe29programs.Theyaresplitintothreecategories:length,pre“xandcrashinginputasdescribedinIV-D.Column7showstheadvisoryreportsforallthedemonstratedexploits.Infact,MAYHEMfound2zero-dayexploitsfortwoLinuxapplications,bothofwhichwereportedtothedevelopers.ThelastcolumncontainstheexploitgenerationtimefortheprogramsthatMAYHEManalyzed.Wemeasuredtheexploitgenerationtimeasthetimetakenfromthestartofanalysisuntilthecreationofthe“rstworkingexploit.Thetimerequiredvariesgreatlywiththecomplexityoftheapplicationandthesizeofsymbolicinputs.ThefastestprogramtoexploitwastheLinuxwirelesscon“gurationin1.90secondsandthelongestwastheWindowsprogram,whichtookabout4hours.C.ScalabilityofHybridSymbolicExecutionWemeasuredtheeffectivenessofhybridsymbolicexecu-tionacrosstwoscalingdimensions:memoryuseandspeed.LessMemory-HungrythanOnlineExecution.Figure7showstheaveragememoryuseofMAYHEMovertimewhileanalyzingautilityincoreutils()withonline,of”ineandhybridexecution.Afterafewminutes,online 389 withinamemoryobject,abalancedISTforasymbolicindexisde“nedas:ISTrightleftright,whererepresentsanif-then-elseexpression,leftright)representstheleft(right)halfoftheinitialentries,andreturnsthelowestaddressofthegivenentries.ForasingleentrytheISTreturnstheentrywithoutconstructinganyexpressions.Notethattheabovede“nitionconstructsabalancedIST.WecouldinsteadconstructtheISTwithnestedexpressions„makingtheformuladepthinthenum-berofobjectentriesinsteadof.However,ourexperimentalresultsshowthatabalancedISTisfasterthananestedIST(VIII).Figure5showshowMAYHEMconstructstheISTwhengiventheentriesofamemoryobjectconversiontable)withasinglesymboliccharacterastheindex.5)BucketizationwithLinearFunctions:TheISTgener-ationalgorithmcreatesaleafnodeforeachentryinthememoryobject.Toreducethenumberofentries,MAYHEMperformsanextrapreprocessingstepbeforepassingtheobjecttotheIST.Theideaisthatwecanusethememoryobjectstructuretocombinemultipleentriesintoasinglebucketbucketisanindex-parameterizedexpressionthatreturnsthevalueofthememoryobjectforeveryindexwithinarange.AYHEMuseslinearfunctionstogeneratebuckets.Specif-ically,MAYHEMsweepsallentrieswithinamemoryobjectandjoinsconsecutivepoints(index,valuetuples)intolines,aprocesswecall.Anytwopointscanformaline.Follow-uppointswillbeincludedinthesamelineif.Attheendoflinearization,thememoryobjectissplitintoalistofbuckets,whereeachbucketiseitheralineoranisolatedpoint.ThelistofbucketscannowbepassedtotheISTalgorithm.Figure5showstheISTafterapplyinglinearization.Linearizationeffectivelyreducesthenumberofleafnodesfrom256to3.Theideaofusinglinearfunctionstosimplifymemorylookupscomesfromasimpleobservation:linear-likepatternsappearfrequentlyforseveraloperationsatthebinarylevel.Forexample,jumptablesgeneratedbyswitchstatements,conversionandtranslationtables(e.g.,ASCIItoUnicodeandviceversa)allcontainvaluesthatarescalinglinearlywiththeindex.C.PrioritizedConcretization.Modelingasymbolicloadusingamemoryobjectisbene“cialwhenthesizeofthememoryobjectissigni“cantlysmallerthantheentirememory(|M| |).Thus,theaboveoptimizationsareonlyactivatedwhenthesizeofthememoryobject,approximatedbytherange,isbelowathreshold(inourexperiments).Wheneverthememoryobjectsizeexceedsthethreshold,AYHEMwillconcretizetheindexusedtoaccessit.However,insteadofpickingasatisfyingvalueatrandom,AYHEMattemptstothepossibleconcretizationtypedefstructvalue;bar;foo;vulnerable(input)ptr=init;buffer[100];strcpy(buffer,input);buffer[0]=ptrbar[0]; bar * ptr * value symbolicregion 1 buffer Figure6:MAYHEMreconstructingsymbolicdatastructures.values.Speci“cally,foreverysymbolicpointer,MAYHEMperformsthreechecks:Checkifitispossibletoredirectthepointertounmappedmemoryunderthecontextofthecurrentpathpredicate.Iftrue,MAYHEMwillgenerateacrashtestcaseforthesatisfyingvalue.Checkifitispossibletoredirectthesymbolicpointertosymbolicdata.Ifitis,MAYHEMwillredirect(andconcretize)thepointertotheleastconstrainedregionofthesymbolicdata.Byredirectingthepointertowardstheleastconstrainedregion,MAYHEMtriestoavoidloadingoverconstrainedvalues,thuseliminatingpotentiallyinter-estingpathsthatdependonthesevalues.Toidentifytheleastconstrainedregion,MAYHEMsplitsmemoryintosymbolicregions,andsortsthembasedonthecomplexityofconstraintsassociatedwitheachregion.Ifalloftheabovechecksfail,MAYHEMconcretizestheindextoavalidmemoryaddressandcontinuesexecution.Theabovestepsinferwhetherasymbolicexpressionisapointer,andifso,whetheritisvalidornot(e.g.,NULL).Forexample,Figure6containsabufferover”owatline9.However,anattackerisnotguaranteedtohijackcontrolevenifoverwritesthereturnaddress.Theprogramneedstoreachthereturninstructiontoactuallytransfercontrol.However,atline10theprogramperformstwodereferencesbothofwhichneedtosucceed(i.e.,avoidcrashingtheprogram)toreachline11(notethatpointerisalreadyoverwrittenwithuserdata).MAYHEMwithprioritizedconcretizationwillgenerate3distincttestcases:1)acrashtestcaseforaninvaliddereferenceofpointer,2)acrashtestcasewheredereferencingpointerfailsaftersuccessfullyredirectingtosymbolicdata,and3)anexploittestcase,wherebothdereferencessucceedanduserinputhijackscontroloftheprogram.Figure6showsthememorylayoutforthethirdtestcase.VI.EENERATIONAYHEMchecksfortwoexploitableproperties:asym-bolic(tainted)instructionpointer,andasymbolicformatstring.Eachpropertycorrespondstoabufferover”owandformatstringattackrespectively.Wheneveranyofthetwo 388 memory indexvalue value value ite( n ()) 6491memory index 6491memory index 6491(a) to_lower conversion table(b) Index search tree(c) Linearization L = ite( n )R = ite( n ) Figure5:Figure(a)showstheconversiontable,(b)showsthegeneratedIST,and(c)theISTafterlinearization.optimizationtechniquesMAYHEMincludestotacklesomeofthecaveatsoftheoriginalalgorithm:Queryingthesolveroneverysymbolicmemoryderefer-enceisexpensive.Evenwithbinarysearch,identifyingbothboundsofa32-bitindexrequiredqueriesonaverage(VIII)(V-B1,V-B2,V-B3).Thememoryregionmaynotbecontinuous.Eventhoughmanyvaluesbetweentheboundsmaybeinfeasible,theyarestillincludedinthememoryobject,andconsequently,intheformula(V-B2).Thevalueswithinthememoryobjectmighthavestructure.Bymodelingtheobjectasasinglebytearraywearemissingopportunitiestooptimizeourformulasbasedonthestructure.(V-B4,V-B5).Intheworstcase,asymbolicindexmayaccessanypossiblelocationinmemory(V-C).1)ValueSetAnalysis(VSA):AYHEMemploysanonlineversionofVSA[]toreducethesolverloadwhenresolvingtheboundsofasymbolicindex().VSAreturnsastridedintervalforthegivensymbolicindex.AstridedintervalrepresentsasetofvaluesintheformformL,U],wherethestrideandarethebounds.Forexample,theintervalal,5]representstheset.ThestridedintervaloutputbyVSAwillbeanover-approximationofallpossiblevaluestheindexmighthave.Forinstance,=(1+bytebyteisasymbolicbytewithanintervalal,255]„resultsinaninterval:VSA)=2[2ThestridedintervalproducedbyVSAisthenre“nedbythesolver(usingthesamebinary-searchstrategy)togetthetightlowerandupperboundsofthememoryobject.Forinstance,ifthepathpredicateassertsthatbyte,thentheintervalfortheindex(1+bytecanbere“nedtoto,64].UsingVSAasapreprocessingstephasacascadingeffectonourmemorymodeling:a)weperform70%lessqueriestoresolvetheexactboundsofthememoryobject(VIII),b)thestridedintervalcanbeusedtoeliminateimpossiblevaluesinthetheL,U]region,thusmakingformulassimpler,andc)theeliminationcantriggerotheroptimizations(seeV-B5).2)Re“nementCache:EveryVSAintervalisre“nedusingsolverqueries.There“nementprocesscanstillbeexpensive(forinstance,theover-approximationreturnedbyVSAmightbetoocoarse).Toavoidrepeatingtheprocessforthesameintervals,MAYHEMkeepsacachemappingintervalstopotentialre“nements.Wheneverwegetacachehit,wequerythesolvertocheckwhetherthecachedre“nementisaccurateforthecurrentsymbolicindex,beforeresortingtobinary-searchforre“nement.There“nementcachecanreducethenumberofbounds-resolutionqueriesby82%(3)LemmaCache:Checkinganentryofthere“nementcachestillrequiressolverqueries.MAYHEMusesanotherlevelofcachingtoavoidrepeatedlyquerying-equivalentformulas,i.e.,formulasthatarestructurallyequivalentuptovariablerenaming.Todoso,MAYHEMconvertsqueriedformulastoacanonicalrepresentation(F)andcachesthequeryresults(Q)intheformofa.Theanswerforanyformulamappingtothesamecanonicalrepresentationisretrievedimmediatelyfromthecache.Thelemmacachecanreducethenumberofbounds-resolutionqueriesbyupto96%(VIII).TheeffectivenessofthiscachedependsontheindependentformulasoptimizationIV-E.Thepathpredicatehastoberepresentedasasetofindependentformulas,otherwiseanynewformulaadditiontothecurrentpathpredicatewouldinvalidateallpreviousentriesofthelemmacache.4)IndexSearchTrees(ISTs):Anyvalueloadedfromamemoryobjectissymbolic.Toresolveconstraintsinvolvingaloadedvalue((i]),thesolverneedstoboth“ndanentryintheobjectthatsatis“estheconstraintsensurethattheindextotheobjectentryisrealizable.Tolightentheburdenonthesolver,MAYHEMreplacesmemoryobjectlookupexpressionswithindexsearchtrees(ISTs).AnISTisabinarysearchtreewherethesymbolicindexisthekeyandtheleafnodescontaintheentriesoftheobject.Theentiretreeisencodedintheformularepresentationoftheloadexpression.Moreconcretely,givena(sortedbyaddress)listof 387 simpli“essymbolicexpressionsandformulasbyapplyingalgebraicsimpli“cations,e.g.andsoon.RecallfromIV-CAYHEMusestaintanalysis[[23]toselectivelyexecuteinstructionblocksthatdealwithsymbolicdata.Thisoptimizationgivesaspeeduponaverageoverexecutingallinstructionblocks(seeV.IBASEDEMORYAYHEMintroducesanindex-basedmemorymodelasapracticalapproachtohandlingsymbolicmemoryloads.Theindex-basedmodelallowsMAYHEMtoadaptitstreatmentofsymbolicmemorybasedonthevalueoftheindex.InthissectionwepresenttheentirememorymodelofMAYHEMAYHEMmodelsmemoryasamapfrom32-bitindices()toexpressions().Inaexpression,wesaythatindexindexes,andtheloadedvaluerepresentstheofthememorycell.AloadwithaconcreteindexisdirectlytranslatedbyMAYHEManappropriatelookupinini]).Ainstructionresultsinanewmemorymemoryie]whereiismappedtoA.PreviousWork&SymbolicIndexModelingAsymbolicindexoccurswhentheindexusedinamemorylookupisnotanumber,butanexpression„apatternthatappearsveryfrequentlyinbinarycode.Forexample,aCstatementiscompileddowntoajump-tablelookupwheretheinputcharacterisusedastheindex.Standardstringconversionfunctions(suchasASCIItoUnicodeandviceversa,,etc.)areallinthiscategory.Handlingarbitrarysymbolicindicesisnotoriouslyhard,sinceasymbolicindexmay(intheworstcase)referencecellinmemory.Previousresearchandstate-of-the-arttoolsindicatethattherearetwomainapproachesforhandlingasymbolicindex:a)concretizingtheindexandb)allowingmemorytobefullysymbolic.First,concretizingmeansinsteadofreasoningaboutallpossiblevaluesthatcouldbeindexedinmemory,weconcretizetheindextoasinglespeci“caddress.Thisconcretizationcanreducethecomplexityoftheproducedformulasandimprovesolving/explorationtimes.However,constrainingtheindextoasinglevaluemaycauseustomisspaths„forinstance,iftheydependonthevalueoftheindex.Concretizationisthenaturalchoiceforof”ineexecutors,suchasSAGE[]orBitBlaze[],sinceonlyasinglememoryaddressisaccessedduringconcreteexecution.Reasoningaboutallpossibleindicesisalsopossiblebytreatingmemoryasfullysymbolic.Forexample,toolssuchasMcVeto[],BAP[]andBitBlaze[]offercapabilitiestohandlesymbolicmemory.Themaintradeoff„whencomparedwiththeconcretizationapproach„isperformance.Formulasinvolvingsymbolicmemoryaremoreexpressive,thussolving/explorationtimesareusuallyhigher.B.MemoryModelinginAYHEMThe“rstimplementationofMAYHEMfollowedthesimpleconcretizationapproachandconcretizedallmemoryindices.Thisdecisionprovedtobeseverelylimitinginthatselectingasingleaddressfortheindexusuallydidnotallowustosatisfytheexploitpayloadconstraints.Ourexperimentsshowthat40%oftheexamplesrequireustohandlesymbolicmemory„simpleconcretizationwasinsuf“cient(seeThealternativeapproachwassymbolicmemory.Toavoidthescalabilityproblemsassociatedwithfullysymbolicmemory,MAYHEMmodelsmemory,wherewritesarealwaysconcretized,butsymbolicreadsareallowedtobemodeledsymbolically.Intherestofthissectionwedescribetheindex-basedmemorymodelofMAYHEMindetail,aswellassomeofthekeyoptimizations.MemoryObjects.Tomodelsymbolicreads,MAYHEMmemoryobjects.Similartotheglobalmemoryamemoryobjectisalsoamapfrom32-bitindicestoexpressions.Unliketheglobalmemoryhowever,amemoryobjectisimmutable.Wheneverasymbolicindexisusedtoreadmemory,MAYHEMgeneratesafreshmemoryobjectthatcontainsallvaluesthatcouldbeaccessedbytheindex„isapartialsnapshotoftheglobalmemory.Usingthememoryobject,MAYHEMcanreducetheevaluationofaexpressiontotoi].Note,thatthisissemanticallyequivalenttoreturningreturningi].Thekeydifferenceisinthesizeofthesymbolicarrayweintroduceintheformula.Inmostcases,thememoryobjectbeordersofmagnitudesmallerthantheentirememoryMemoryObjectBoundsResolution.Instantiatingthemem-oryobjectrequiresMAYHEMto“ndallpossiblevaluesofasymbolicindex.Intheworstcase,thismayrequireupqueriestothesolver(for32-bitmemoryaddresses).TotacklethisproblemMAYHEMexchangessomeaccuracyforscalabilitybyresolvingtheboundsboundsL,U]ofthememoryregion„whereisthelowerandistheupperboundoftheindex.Theboundsneedtobeconservative,i.e.,allpossiblevaluesoftheindexshouldbewithinthetheL,U]interval.Notethatthememoryregiondoesnotneedtobecontinuous,forexamplemighthaveonlytworealizablevalues(ToobtaintheseboundsMAYHEMusesthesolvertoperformbinarysearchonthevalueoftheindexinthecontextofthecurrentpathpredicate.Forexample,initiallyforthelowestboundofa32-bit32-bit,232Š1].If issatis“ablethen whileunsatis“abilityindicatesthat .Werepeattheprocessuntilwerecoverbothbounds.Usingtheboundswecannowinstantiatethememoryobject(usingafreshsymbolicarray)asfollows:ws:L,U]:M[i]=µ[i].Theboundsresolutionalgorithmdescribedaboveissuf“cienttogenerateaconservativerepresentationofmemoryobjectsandallowMAYHEMtoreasonaboutsymbolicmemoryreads.Intherestofthesectionwedetailthemain 386 (unlikestandardconcolicexecution),andthere-executionhappensconcretely.Figure3showstheintuitionbehindhybridexecution.Weprovideadetailedcomparisonbetweenonline,of”ine,andhybridexecutioninC.DesignandImplementationoftheCECTheCECtakesinthebinaryprogram,alistofinputsourcestobeconsideredsymbolic,andanoptionalcheck-pointinputthatcontainsexecutionstateinformationfromapreviousrun.TheCECconcretelyexecutestheprogram,hooksinputsourcesandperformstaintanalysisoninputvariables.EverybasicblockthatcontainstaintedinstructionsissenttotheSESforsymbolicexecution.Asaresponse,theCECreceivestheaddressofthenextbasicblocktobeexecutedandwhethertosavethecurrentstateasarestorationpoint.Wheneveranexecutionpathiscomplete,theCECcontext-switchestoanunexploredpathselectedbytheSESandcontinuesexecution.TheCECterminatesonlyifallpossibleexecutionpathshavebeenexploredorathresholdisreached.Ifweprovideacheckpoint,theCEC“rstexecutestheprogramconcretelyuntilthecheckpointandthencontinuesexecutionasbefore.VirtualizationLayer.Duringanonlineexecutionrun,theCEChandlesmultipleconcreteexecutionstatesoftheanalyzedprogramsimultaneously.Eachconcreteexecutionstateincludesthecurrentregistercontext,memoryandOSstate(theOSstatecontainsasnapshotofthevirtual“lesystem,networkandkernelstate).UndertheguidanceoftheSESandthepathselector,theCECcontextswitchesbetweendifferentconcreteexecutionstatesdependingonthesymbolicexecutorthatiscurrentlyactive.ThevirtualizationlayermediatesallsystemcallstothehostOSandemulatesthem.KeepingseparatecopiesoftheOSstateensurestherearenoside-effectsacrossdifferentexecutions.Forinstance,ifoneexecutorwritesavaluetoa“le,thismodi“cationwillonlybevisibletothecurrentexecutionstate„allotherexecutorswillhaveaseparateinstanceofthesame“le.Ef“cientStateSnapshot.Takingafullsnapshotoftheconcreteexecutionstateateveryforkisveryexpensive.Tomitigatetheproblem,CECsharesstateacrossexecutionstates…similartoothersystems[],[].Wheneverexecutionforks,thenewexecutionstatereusesthestateoftheparentexecution.Subsequentmodi“cationstothestatearerecordedinthecurrentexecution.D.DesignandImplementationoftheSESTheSESmanagesthesymbolicexecutionenvironmentanddecideswhichpathsareexecutedbytheCEC.Theenvironmentconsistsofasymbolicexecutorforeachpath,apathselectorwhichdetermineswhichfeasiblepathtorunnext,andacheckpointmanager.TheSEScapsthenumberofsymbolicexecutorstokeepinmemory.Whenthecapisreached,MAYHEMstopsgeneratingnewinterpretersandproducescheckpoints;executionstatesthatwillexploreprogrampathsthatMAYHEMwasunabletoexploreinthe“rstrunduetothememorycap.EachcheckpointisprioritizedandusedbyMAYHEMtocontinueexplorationofthesepathsatasubsequentrun.Thus,whenallpendingexecutionpathsterminate,MAYHEMselectsanewcheckpointandcontinuesexecution„untilallcheckpointsareconsumedandMAYHEMexits.Eachsymbolicexecutormaintainstwocontexts(asstate):avariablecontextcontainingallsymbolicregistervaluesandtemporaries,andamemorycontextkeepingtrackofallsymbolicdatainmemory.Wheneverexecutionforks,theSESclonesthecurrentsymbolicstate(tokeepmemorylow,wekeeptheexecutionstateimmutabletotakeadvantageofcopy-on-writeoptimizations„similartopreviouswork[[28])andaddsanewsymbolicexecutortoapriorityqueue.Thispriorityqueueisregularlyupdatedbyourpathselectortoincludethelatestchanges(e.g.,whichpathswereexplored,instructionscovered,andsoon).PreconditionedSymbolicExecution:AYHEMmentspreconditionedsymbolicexecutionasinAEG[Inpreconditionedsymbolicexecution,ausercanoptionallygiveapartialspeci“cationoftheinput,suchasapre“xorlengthoftheinput,toreducetherangeofsearchspace.Ifauserdoesnotprovideaprecondition,thenSEStriestoexploreallfeasiblepaths.Thiscorrespondstotheuserprovidingtheminimumamountofinformationtothesystem.PathSelection:AYHEMappliespathprioritizationheuristics„asfoundinsystemssuchasSAGE[]andKLEE[]„todecidewhichpathshouldbeexplorednext.Currently,MAYHEMusesthreeheuristicrankingrules:a)executorsexploringnewcode(e.g.,insteadofexecutingknowncodemoretimes)havehighpriority,b)executorsthatidentifysymbolicmemoryaccesseshavehigherpriority,andc)executionpathswheresymbolicinstructionpointersaredetectedhavethehighestpriority.Theheuristicsaredesignedtoprioritizepathsthataremostlikelytocontainabug.Forinstance,the“rstheuristicreliesontheassumptionthatpreviouslyexploredcodeislesslikelytocontainabugthannewcode.E.PerformanceTuningAYHEMemploysseveraloptimizationstospeed-upsymbolicexecution.Wepresentthreeoptimizationsthatweremosteffective:1)independentformula,2)algebraicsimpli“cations,and3)taintanalysis.SimilartoKLEE[],MAYHEMsplitsthepathpredicatetoindependentformulastooptimizesolverqueries.AsmallimplementationdifferencecomparedtoKLEEisthatAYHEMkeepsamapfrominputvariablestoformulasatalltimes.Itisnotconstructedonlyforqueryingthesolver(thisrepresentationallowsmoreoptimizationsV).MAYHEMappliesotherstandardoptimizationsasproposedbyprevioussystemssuchastheconstraintsubsumptionoptimization[acounter-examplecache[]andothers.MAYHEM 385 12 millions of instructions12 3 4Of”ineOnline 3 millions of instructions4 12Hybrid 3 millions of instructions4 Figure3:Hybridexecutiontriestocombinethespeedofonlineexecutionandthememoryuseofof”ineexecutiontoef“cientlyexploretheinputspace. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 5.0 x 105 1.0 x 106 1.5 x 106 2.0 x 106 2.5 x 106 3.0 x 106 Testcase gen. throughput (num/sec.)Memory Use (KBytes) Figure4:Onlineexecutionthroughputversusmemoryuse.thesecondstep,theysymbolicallyexecutetheinstructionsintherecordedtrace.Thisapproachiscalledexecution,ajuxtapositionofconcreteandsymbolicexecution.Of”ineexecutionisattractivebecauseofitssimplicityandlowresourcerequirements;weonlyneedtohandleasingleexecutionpathatatime.Thetop-leftdiagramofFigure3highlightsanimmediatedrawbackofthisapproach.Foreveryexploredexecutionpath,weneedto“rstre-executea(potentially)verylargenumberofinstructionsuntilwereachthesymbolicconditionwhereexecutionforked,andthenbegintoexplorenewinstructions.Onlinesymbolicexecutionavoidsthisre-executioncostbyforkingtwointerpretersatbranchpoints,eachonehavingacopyofthecurrentexecutionstate.Thus,toexploreadifferentpath,onlineexecutionsimplyneedstoperformacontextswitchtotheexecutionstateofasuspendedinterpreter.S2E[],KLEE[]andAEG[]followthisapproachbyperformingonlinesymbolicexecutiononLLVMbytecode.However,forkingoffanewexecutorateachbranchcanquicklystrainthememory,causingtheentiresystemtogrindtoahalt.State-of-the-artonlineexecutorstrytoaddressthisproblemwithaggressivecopy-on-writeoptimizations.Forexample,KLEEhasanimmutablestaterepresentationandS2Esharescommonstatebetweensnapshotsofphysicalmemoryanddisks.Nonetheless,sinceallexecutionstatesarekeptinmemorysimultaneously,eventuallyallonlineexecutorswillreachthememorycap.TheproblemcanbemitigatedbyusingDFS(Depth-First-Search)„however,thisisnotaveryusefulstrategyinpractice.Todemonstratetheproblem,wedownloadedS2E[]andranitonacoreutilsapplication()with2symbolicarguments,eachone10byteslong.Figure4showshowthesymbolicexecutionthroughput(numberoftestcasesgeneratedpersecond)issloweddownasthememoryuseincreases.B.HybridSymbolicExecutionAYHEMhybridsymbolicexecutiontoactivelymanagememorywithoutconstantlyre-executingthesameinstructions.Hybridsymbolicexecutionalternatesbetweenonlineandof”inemodestomaximizetheeffectivenessofeachmode.MAYHEMstartsanalysisinonlinemode.Whenthesystemreachesamemorycap,itswitchestoof”inemodeanddoesnotforkanymoreexecutors.Instead,itproducescheckpointstostartnewonlineexecutionslateron.Thecruxofthesystemistodistributetheonlineexecutiontasksintosubtaskswithoutlosingpotentiallyinterestingpaths.ThehybridexecutionalgorithmemployedbyMAYHEMissplitintofourmainphases:1.Initialization:The“rsttimeMAYHEMisinvokedforaprogram,itinitializesthecheckpointmanager,thecheckpointdatabase,andtestcasedirectories.Itthenstartsonlineexecutionoftheprogramandmovestothenextphase.2.OnlineExploration:Duringtheonlinephase,MAYHEMsymbolicallyexecutestheprograminanonlinefashion,context-switchingbetweencurrentactiveexecutionstates,andgeneratingtestcases.3.Checkpointing:Thecheckpointmanagermonitorsonlineexecution.Wheneverthememoryutilizationreachesacap,orthenumberofrunningexecutorsexceedsathreshold,itwillselectandgenerateacheckpointforanactiveexecutor.Acheckpointcontainsthesymbolicexecutionstateofthesuspendedexecutor(pathpredicate,statistics,etc.)andreplay.Theconcreteexecutionstateisdiscarded.Whentheonlineexecutioneventually“nishesallactiveexecutionpaths,MAYHEMmovestothenextphase.4.CheckpointRestoration:ThecheckpointmanagerselectsacheckpointbasedonarankingheuristicIV-Dandrestoresitinmemory.Sincethesymbolicexecutionstatewassavedinthecheckpoint,MAYHEMonlyneedstore-constructtheconcreteexecutionstate.Todoso,MAYHEMexecutestheprogramusingonesatis“ableassignmentofthepathpredicateasinput,untiltheprogramreachestheinstructionwhentheexecutionstatewassuspended.Atthatpoint,theconcretestateisrestoredandtheonlineexploration(phase2)restarts.Notethatphase4avoidssymbolicallyre-executinginstructionsduringthecheckpointrestorationphaseNotethatthetermcheckpointŽdiffersfromanof”ineexecutionseedŽ,whichisjustaconcreteinput. 384 input.TheCECsendstheinstructionstotheSESandtheSESdetermineswhichbranchesarefeasible.TheCECwilllaterreceivethenextbranchtargettoexplorefromtheSES.TheSES,runninginparallelwiththeCEC,receivesastreamoftaintedinstructionsfromtheCEC.TheSESjitstheinstructionstoanintermediatelanguage(andsymbolicallyexecutesthecorrespondingIL.TheCECprovidesanyconcretevalueswheneverneeded,e.g.,whenaninstructionoperatesonasymbolicoperandandaconcreteoperand.TheSESmaintainstwotypesofPathFormula:Thepathformulare”ectstheconstraintstoreachaparticularlineofcode.Eachconditionaljumpaddsanewconstraintontheinput.Forexample,lines32-33createtwonewpaths:onewhichisconstrainedsothatthereadinputendsinanandline35isexecuted,andonewheretheinputdoesnotendinline28willbeexecuted.ExploitabilityFormula:Theexploitabilityformuladeter-mineswhetheri)theattackercangaincontroloftheinstructionpointer,andii)executeapayload.WhenMAYHEMhitsataintedbranchpoint,theSESdecideswhetherweneedtoforkexecutionbyqueryingtheSMTsolver.Ifweneedtoforkexecution,allthenewforksaresenttothepathselectortobeprioritized.Uponpickingapath,theSESnoti“estheCECaboutthechangeandthecorrespondingexecutionstateisrestored.Ifthesystemresourcecapisreached,thenthecheckpointmanagerstartsgeneratingcheckpointsinsteadofforkingnewexecutors(IV).Attheendoftheprocess,testcasesaregeneratedfortheterminatedexecutorsandtheSESinformstheCECaboutwhichcheckpointshouldcontinueexecutionnext.Duringtheexecution,theSESswitchescontextbetweenexecutorsandtheCECcheckpoints/restorestheprovidedexecutionstateandcontinuesexecution.Todoso,theCECmaintainsavirtualizationlayertohandletheprograminter-actionwiththeunderlyingsystemandcheckpoint/restorebetweenmultipleprogramexecutionstates(IV-C).WhenMAYHEMdetectsataintedjumpinstruction,itbuildsanexploitabilityformula,andqueriesanSMTsolvertoseeifitissatis“able.Asatisfyinginputwillbe,byconstruction,anexploit.Ifnoexploitisfoundonthetaintedbranchinstruction,theSESkeepsexploringexecutionpaths.Theabovestepsareperformedateachbranchuntilanexploitablebugisfound,MAYHEMhitsauser-speci“edmaximumruntime,orallpathsareexhausted.III.BACKGROUNDBinaryRepresentationinourlanguage.Basicsymbolicexecutionisperformedonassemblyinstructionsastheyexecute.IntheoverallsystemthestreamcomesfromtheCECasexplainedearlier;hereweassumetheyaresimplygiventous.WeleverageBAP[],anopen-sourcebinaryanalysisframeworktoconvertx86assemblytoanintermediatelanguagesuitableforsymbolicexecution.Foreachinstructionexecuted,thesymbolicexecutorjitstheinstructiontotheBAPIL.TheSESperformssymbolicexecutiondirectlyontheIL,introducesadditionalconstraintsrelatedtospeci“cattackpayloads,andsendstheformulatoanSMTsolvertochecksatis“ability.Forexample,theILforaconsistsoftwostatements:onethatloadsanaddressfrommemory,andonethatjumpstothataddress.SymbolicExecutionontheIL.Inconcreteexecution,theprogramisgivenaconcretevalueasinput,itexecutesstatementstoproducenewvalues,andterminateswith“nalvalues.Insymbolicexecutionwedonotrestrictexecutiontoasinglevalue,butinsteadprovideasymbolicinputvariablethatrepresentsthesetofallpossibleinputvalues.Thesymbolicexecutionengineevaluatesexpressionsforeachstatementintermsoftheoriginalsymbolicinputs.Whensymbolicexecutionhitsabranch,itconsiderstwopossibleworlds:onewherethetruebranchtargetisfollowedandonewherethefalsebranchtargetisfollowed.Itdoessobyforkingoffaninterpreterforeachbranchandassertinginthegeneratedformulathatthebranchguardmustbesatis“ed.The“nalformulaencapsulatesallbranchconditionsthatmustbemettoexecutethegivenpath,thusiscalledthepathformulapathpredicateInMAYHEM,eachILstatementtypehasacorrespondingsymbolicexecutionrule.AssertionsintheILareimmediatelyappendedtotheformula.Conditionaljumpstatementscreatetwoformulas:onewherethebranchguardisassertedtrueandthetruebranchisfollowed,andonewhichassertsthenegationoftheguardandthefalsebranchisfollowed.Forexample,ifwealreadyhaveformulaandexecuteisthebranchguardandarejumptargets,thenwecreatethetwoformulas:FSEFSEFSEstandsforforwardsymbolicexecutionofthejumptarget.Duetospace,wegivetheexactsemanticsinacompanionpaper[15],[24].IV.HAYHEMisahybridsymbolicexecutionsystem.Insteadofrunninginpureonlineorof”ineexecutionmode,Mcanalternatebetweenmodes.Inthissectionwepresentthemotivationandmechanicsofhybridexecution.A.PreviousSymbolicExecutionSystemsOf”inesymbolicexecution„asfoundinsystemssuchasSAGE[]„requirestwoinputs:thetargetprogramandaninitialseedinput.Inthe“rststep,of”inesystemsconcretelyexecutetheprogramontheseedinputandrecordatrace.In 383 ,eachHTTPconnectionispassed.Thisroutineinturncallsaspartofthelooponline29togettheuserrequeststring.Theuserinputisplacedintothe4096-bytebufferonline30.Eachreadincrementsthevariablethenumberofbytesreadsofarinordertopreventabufferover”ow.Thereadloopcontinuesuntilisfound,checkedonline34.Iftheuserpassesinmorethan4096byteswithoutanHTTPend-of-linecharacter,thereadloopabortsandtheserverreturnsa400errorstatusmessageonline41.Eachnon-errorrequestgetsloggedviatheThevulnerabilityitselfisin,whichcallswithauserspeci“edformatstring(anHTTPrequest).Variadicfunctionssuchasuseaformatstringspeci“ertodeterminehowtowalkthestacklookingforarguments.Anexploitforthisvulnerabilityworksbysupplyingformatstringsthatcausetowalkthestacktouser-controlleddata.Theexploitthenusesadditionalformatspeci“erstowritetothedesiredlocation[Figure1bshowsthestacklayoutofwhentheformatstringvulnerabilityisdetected.Thereisacalltoandtheformattingargumentisastringofuser-controlledbytes.Wehighlightseveralkeypointsfor“ndingexploitablebugs:Low-leveldetailsmatter:Determiningexploitabilityre-quiresthatwereasonaboutlow-leveldetailslikereturnaddressesandstackpointers.Thisisourmotivationforfocusingonbinary-leveltechniques.Thereareanenormousnumberofpaths:Intheexample,thereisanewpathoneveryencounterofanwhichcanleadtoanexponentialpathexplosion.Additionally,thenumberofpathsinmanyportionsofthecodeisrelatedtothesizeoftheinput.Forexample,unfoldsaloop,creatinganewpathforsymbolicexecutiononeachiteration.Longerinputsmeanmoreconditions,moreforks,andharderscalabilitychallenges.Unfortunatelymostexploitsarenotshortstrings,e.g.,inabufferover”owtypicalexploitsarehundredsorthousandsofbyteslong.Themorecheckedpaths,thebetter:Toreachtheex-bugintheexample,MAYHEMneedstoreasonthroughtheloop,readinput,forkanewinterpreterforeverypossiblepathandcheckforerrors.Withoutcarefulresourcemanagement,anenginecangetboggeddownwithtoomanysymbolicexecutionthreadsbecauseofthehugenumberofpossibleexecutionpaths.Executeasmuchnativelyaspossible:Symbolicexecutionisslowcomparedtoconcreteexecutionsincethesemanticsofaninstructionaresimulatedinsoftware.Inmillionsofinstructionssetupthebasicserverbeforeanattackercanevenconnecttoasocket.Wewanttoexecutetheseinstructionsconcretelyandthenswitchtosymbolic Test Binary Mayhem BuggyInputs Taint Tracker SymbolicEvaluator Path Selector CheckpointManager(SES)Symbolic Execution Server CheckPoints Dynamic Binary Instrumentator(DBI) Exploits Exploit Generator Virtualization Operating System Hardware InputSpec. Target Figure2:MAYHEMexecution.TheMAYHEMarchitecturefor“ndingexploitablebugsisshowninFigure2.TheuserstartsMAYHEMbyrunning:mayhem-sym-net80400./orzhttpdThecommand-linetellsMAYHEMtosymbolicallyexecute,andopensocketsonport80toreceivesymbolic400-bytelongpackets.Allremainingstepstocreateanexploitareperformedautomatically.AYHEMconsistsoftwoconcurrentlyrunningprocesses:ConcreteExecutorClient(CEC),whichexecutescodenativelyonaCPU,andaSymbolicExecutorServerBothareshowninFigure2.Atahighlevel,theCECrunsonatargetsystem,andtheSESrunsonanyplatform,waitingforconnectionsfromtheCEC.TheCECtakesinabinaryprogramalongwiththepotentialsymbolicsources(inputspeci“cation)asaninput,andbeginscommunicationwiththeSES.TheSESthensymbolicallyexecutesblocksthattheCECsends,andoutputsseveraltypesoftestcasesincludingnormaltestcases,crashes,andexploits.ThestepsfollowedbyMAYHEMto“ndthevulnerablecodeandgenerateanexploitare:--sym-net80400argumenttellsMAYHEMperformsymbolicexecutionondatareadinfromasocketonport80.Effectivelythisisspecifyingwhichinputsourcesarepotentiallyunderattackercontrol.MAYHEMcanhandleattackerinputfromenvironmentvariables,“les,andthenetwork.TheCECloadsthevulnerableprogramandconnectstotheSEStoinitializeallsymbolicinputsources.Aftertheinitialization,MAYHEMexecutesthebinaryconcretelyontheCPUintheCEC.Duringexecution,theCECinstru-mentsthecodeandperformsdynamictaintanalysis[Ourtainttrackingenginechecksifablockcontainstaintedinstructions,whereablockisasequenceofinstructionsthatendswithaconditionaljumporacallinstruction.WhentheCECencountersataintedbranchconditionorjumptarget,itsuspendsconcreteexecution.Ataintedjumpmeansthatthetargetmaybedependentonattacker 382 use.Suchexecutorssatisfyprinciple#1butnotprinciple#3(interestingpathsarepotentiallyeliminated).AYHEMcombinesthebestofbothworldsbyintroduc-hybridsymbolicexecution,whereexecutionalternatesbetweenonlineandof”inesymbolicexecutionruns.HybridexecutionactslikeamemorymanagerinanOS,exceptthatitisdesignedtoef“cientlyswapoutsymbolicexecutionengines.Whenmemoryisunderpressure,thehybridenginepicksarunningexecutor,andsavesthecurrentexecutionstate,andpathformula.Thethreadisrestoredbyrestoringtheformula,concretelyrunningtheprogramuptothepreviousexecutionstate,andthencontinuing.Cachingthepathformulaspreventsthesymbolicre-executionofinstructions,whichisthebottleneckinof”ine,whilemanagingmemorymoreef“cientlythanonlineexecution.AYHEMalsoproposestechniquesforef“cientlyreason-ingaboutsymbolicmemory.Asymbolicmemoryaccessoccurswhenaloadorstoreaddressdependsoninput.Sym-bolicpointersareverycommonatthebinarylevel,andbeingabletoreasonaboutthemisnecessarytogeneratecontrol-hijackexploits.Infact,ourexperimentsshowthat40%ofthegeneratedexploitswouldhavebeenimpossibleduetoconcretizationconstraints(VIII).Toovercomethisproblem,AYHEMemploysanindex-basedmemorymodel(V)toavoidconstrainingtheindexwheneverpossible.Resultsareencouraging.Whilethereisampleroomfornewresearch,MAYHEMcurrentlygeneratesexploitsforseveralsecurityvulnerabilities:bufferover”ows,functionpointeroverwrites,andformatstringvulnerabilitiesfor29differentprograms.MAYHEMalsodemonstrates2-10speedupoverof”inesymbolicexecutionwithouthavingthememoryconstraintsofonlinesymbolicexecution.Overall,MAYHEMmakesthefollowingcontributions:1)Hybridexecution.Weintroduceanewschemeforsym-bolicexecution„whichwecallsymbolicexecution„thatallowsusto“ndabetterbalancebetweenspeedandmemoryrequirements.HybridexecutionenablesMAYHEMtoexploremultiplepathsfasterthanexistingapproaches2)Index-basedmemorymodeling.Weproposeindex-basedmemorymodelasapracticalapproachtodealingwithsymbolicindicesatthebinary-level.(see3)Binary-onlyexploitgeneration.Wepresentthe“rstend-to-endbinary-onlyexploitablebug“ndingsystemthatdemonstratesexploitabilitybyoutputtingworkingcontrolhijackexploits.II.OVERVIEWOFAYHEMInthissectionwedescribetheoverallarchitecture,usagescenario,andchallengesfor“ndingexploitablebugs.WeuseanHTTPserver,,1]„showninFigure1a„asanexampletohighlightthemainchallengesandpresenthowAYHEMworks.Notethatweshowsourceforclarityandsimplicity;MAYHEMrunsonbinarycode.#definetypedefstructbuf[BUFSIZE];used; BUFFER typedefstruct9STATIC BUFFER tread buf;10...//omitted staticvoidserverlog(LOG TYPE ttype,constcharformat,...)16...//omitted(format!=NULL)18va start(ap,format);19vsprintf(buf,format,ap);20va end(ap);22fprintf(log,buf);//vulnerablepoint23fflush(log);26HTTP STATE thttp read request(CONN conn)28...//omitted(conn buf.usedBUFSIZE)30sz=static buffer read(conn,&conn buf);(sz32...33conn buf.used+=sz;(memcmp(&conn buf.buf[conn buf.used]4,ŽnŽ,4)==(conn buf.used=BUFSIZE)40connstatus.st= STATUS 400; STATE ERROR;43...44serverlog(ERROR LOG,Ž%snŽ,46 buf.buf);47...(a)Codesnippet. ... buf ptr log (“le pointer) fprintf frame pointer return addr to serverlog ... buf (in serverlog) serverlog frame pointer old ebp ... \x5c\xca\xff\xbf\x5e\xca\xff (b)Stackdiagramofthevulnerableprogram.Figure1: 381 UnleashingMAYHEMonBinaryCodeSangKilCha,ThanassisAvgerinos,AlexandreRebertandDavidBrumleyCarnegieMellonUniversityPittsburgh,PAsangkilc,thanassis,alexandre.rebert,dbrumley„InthispaperwepresentMAYHEM,anewsys-temforautomatically“ndingexploitablebugsinbinary(i.e.,executable)programs.EverybugreportedbyMAYHEMaccompaniedbyaworkingshell-spawningexploit.Theworkingexploitsensuresoundnessandthateachbugreportissecurity-criticalandactionable.MAYHEMworksonrawbinarycodewithoutdebugginginformation.Tomakeexploitgenerationpossibleatthebinary-level,MAYHEMaddressestwomajortechnicalchallenges:activelymanagingexecutionpathswithoutexhaustingmemory,andreasoningaboutsymbolicmemory,wherealoadorastoreaddressdependsonuserinput.Tothisend,weproposetwonoveltechniques:1)hybridsymbolicexecutionforcombiningonlineandof”ine(concolic)executiontomaximizethebene“tsofbothtechniques,and2)index-basedmemorymodeling,atechniquethatallowsAYHEMtoef“cientlyreasonaboutsymbolicmemoryatthebinarylevel.WeusedMAYHEMto“ndanddemonstrate29exploitablevulnerabilitiesinbothLinuxandWindowsprograms,2ofwhichwerepreviouslyundocumented.Keywords-hybridexecution,symbolicmemory,index-basedmemorymodeling,exploitgenerationI.INTRODUCTIONBugsareplentiful.Forexample,theUbuntuLinuxbugmanagementdatabasecurrentlylistsover90,000openbugs[].However,bugsthatcanbeexploitedbyattackersaretypicallythemostserious,andshouldbepatched“rst.Thus,acentralquestionisnotwhetheraprogramhasbugs,butwhichbugsareexploitable.InthispaperwepresentMAYHEM,asoundsystemforautomatically“ndingexploitablebugsinbinary(i.e.,executable)programs.MAYHEMproducesaworkingcontrol-hijackexploitforeachbugitreports,thusguaranteeingeachbugreportisactionableandsecurity-critical.ByworkingwithbinarycodeMAYHEMenableseventhosewithoutsourcecodeaccesstocheckthe(in)securityoftheirsoftware.AYHEMdetectsandgeneratesexploitsbasedonthebasicprinciplesintroducedinourpreviousworkonAEG[Atahigh-level,MAYHEM“ndsexploitablepathsbyaug-mentingsymbolicexecution[]withadditionalconstraintsatpotentiallyvulnerableprogrampoints.Theconstraintsincludedetailssuchaswhetheraninstructionpointercanberedirected,whetherwecanpositionattackcodeinmemory,andultimately,whetherwecanexecuteattackerscode.Iftheresultingformulaissatis“able,thenanexploitispossible.Amainchallengeinexploitgenerationisexploringenoughofthestatespaceofanapplicationto“ndexploitablepaths.Inordertotacklethisproblem,MAYHEMsdesignisbasedonfourmainprinciples:1)thesystemshouldbeabletomakeforwardprogressforarbitrarilylongtimes„ideallyrunforeverŽ„withoutexceedingthegivenresources(especiallymemory),2)inordertomaximizeperformance,thesystemshouldnotrepeatwork,3)thesystemshouldnotthrowawayanywork„previousanalysisresultsofthesystemshouldbereusableonsubsequentruns,and4)thesystemshouldbeabletoreasonaboutsymbolicmemorywherealoadorstoreaddressdependsonuserinput.Handlingmemoryaddressesisessentialtoexploitreal-worldbugs.Principle#1isnecessaryforrunningcomplexapplications,sincemostnon-trivialprogramswillcontainapotentiallyin“nitenumberofpathstoexplore.Currentapproachestosymbolicexecution,e.g.,CUTE[BitBlaze[],KLEE[],SAGE[],McVeto[],AEG[S2E[],andothers[],[],donotsatisfyalltheabovedesignpoints.Conceptually,currentexecutorscanbedividedintotwomaincategories:of”ineexecutors„whichconcretelyrunasingleexecutionpathandthensymbolicallyexecuteit(alsoknownastrace-basedorexecutors,e.g.,SAGE),andonlineexecutors„whichtrytoexecuteallpossiblepathsinasinglerunofthesystem(e.g.,S2E).Neitheronlinenorof”ineexecutorssatisfyprinciples#1-#3.Inaddition,mostsymbolicexecutionenginesdonotreasonaboutsymbolicmemory,thusdonotmeetprinciple#4.Of”inesymbolicexecutors[],[]reasonaboutasingleexecutionpathatatime.Principle#1issatis“edbyiterativelypickingnewpathstoexplore.Further,everyrunofthesystemisindependentfromtheothersandthusresultsofpreviousrunscanbeimmediatelyreused,satisfyingprinciple#3.However,of”inedoesnotsatisfyprinciple#2.Everyrunofthesystemneedstorestartexecutionoftheprogramfromtheverybeginning.Conceptually,thesameinstructionsneedtobeexecutedrepeatedlyforeveryexecutiontrace.Ourexperimentalresultsshowthatthisre-executioncanbeveryexpensive(seeOnlinesymbolicexecution[],[]forksateachbranchpoint.Previousinstructionsareneverre-executed,butthecontinuedforkingputsastrainonmemory,slowingdowntheexecutionengineasthenumberofbranchesincrease.Theresultisnoforwardprogressandthusprinciples#1and#3arenotmet.SomeonlineexecutorssuchasKLEEstopforkingtoavoidbeingsloweddownbytheirmemory 2012 IEEE Symposium on Security and PrivacyDOI 10.1109/SP.2012.31380