/
Disassembly of Executable Code Revisited Benjamin Schw Disassembly of Executable Code Revisited Benjamin Schw

Disassembly of Executable Code Revisited Benjamin Schw - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
390 views
Uploaded On 2015-05-30

Disassembly of Executable Code Revisited Benjamin Schw - PPT Presentation

arizonaedu Abstract Machine code disassembly routines form a fundamental component of software systems that statically analyze or modify executable programs The task of disassembly is complicated by indirect jumps and the presence of non executable d ID: 77693

arizonaedu Abstract Machine code disassembly

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Disassembly of Executable Code Revisited..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DisassemblyofExecutableCodeRevisited BenjaminSchwarzSaumyaDebrayGregoryAndrewsDepartmentofComputerScienceUniversityofArizonaTucson,AZ85721 bschwarz,debray,greg @cs.arizona.eduMachinecodedisassemblyroutinesformafundamentalcomponentofsoftwaresystemsthatstaticallyanalyzeormodifyexecutableprograms.Thetaskofdisassemblyiscomplicatedbyindirectjumpsandthepresenceofnon-executabledata—jumptables,alignmentbytes,etc.—intheinstructionstream.Existingdisassemblyalgorithmsarenotalwaysabletocopesuccessfullywithexecutablelescontainingsuchfeaturesandfailsilently—i.e.,produceincorrectdisassemblieswithoutanyindicationthattheresultstheyareproducingareincorrect.Thiscanbeaseriousproblem,sinceitcancompromisethecorrectnessofabinaryrewritingtool.Inthispaperweexaminetwocommonly-useddisassemblyalgorithmsandillustratetheirshortcomings.Weproposeahybridapproachthatperformsbetterthanthesealgorithmsinthesensethatitisabletodetectsituationswherethedisassemblymaybeincorrectandlimittheextentofsuchdisassemblyerrors.Experimentalresultsindicatethatthealgorithmisquiteeffective:theamountofcodeaggedasincurringdisassemblyerrorsisusuallyquitesmall.1IntroductionTherehasbeenasignicantamountofattentionfocusedonbinaryrewritingandlink-timecodeoptimizationinrecentyears[5,6,15,17,19].Afundamentalrequirementofanysoftwaresystemthataimstostaticallyanalyzeormodifyanexecutableprogramisaccuratedisassemblyofitsmachinecodeinstructions.Thetaskofrecoveringtheseinstructionsisoftencomplicatedbythepresenceofnon-executabledata—jumptables,alignmentbytes,etc.—intheinstructionstream.Thisposesachicken-and-eggproblem:wecannotidentifytheinstructionswithoutknowingwhatisdata,andviceversa.Thefactthatlink-timebinarymodicationtoolshavetobepreparedtodealwithhand-codedassemblyroutines,e.g.,duetostaticallylinkedlibraries,complicatestheproblemfurtherbecauseitmeansthatwecannotalwaysassumethatthecodefollowsfamiliarsource-levelconventions(e.g.,thatafunctionhasasingleentrypoint)orusesrecognizablecompileridioms.Thepresenceofvariable-lengthinstructions—commonlyfoundinCISCarchitecturessuchasthewidelyusedIntelx86—resultsinanadditionaldegreeofcomplexity,andrenderssimpleheuristicsforextractinginstructionse-quencesineffective.Inthispaperweexaminetechniquescurrentlyusedfordisassembly,discusstheirdrawbacks,andintroduceanimprovedmethodfortheextractionofinstructionsfromastatically-linkedbinarythatcontainsrelocationinformation.Ouralgorithmiscapableofidentifyingjumptablesembeddedwithinthetextsegment,offsettablesforpositionindependentcode(PIC)sequences,anddatainsertedforalignmentpurposes,e.g.,toalignloopheaders.Mostimportantly,itisabletoavoidsomedisassemblyerrorsthatcanoccurwhenusingexistingdisassemblytechniques.WehaveimplementedourapproachinPLTO,apost-link-timeoptimizerfortheIntelx86architecture.Experimen-talresultsindicatethatouralgorithmisabletocopewithstaticallylinkedexecutablescontaininghighlyoptimizedhand-codedassemblycodewithahighdegreeofprecision,identifyingpotentialdisassemblyproblemsratherthanfailingsilentlyandlimitingtheextentofsuchproblemstoasmallportionoftheinputexecutables.2Preliminaries2.1RelocationInformationLinkersarecapableofproducingrelocationtablesateachstageduringthelinkingprocess.Bydefault,thenalexecutablesdonotcontainrelocationinformationbecauseitisnotneededbytheloadertore-maptheprogram.However,manybinaryrewritingframeworksthatcarryouttranslationoroptimizationutilizesuchinformation.Thetablesareusedtoidentifythebit-sequencesintheexecutablethatcorrespondtoaddressesoftheprogram.Asingle  ThisworkwassupportedinpartbytheNationalScienceFoundationundergrantsCCR-0073394,EIA-0080123,andCCR-0113633. entryinthetableusuallycontains: i) asectionoffset,abitthatspecieswhethertherelocationisPC-relativeorabsolute,andthewidth(typicallythesizeofanaddressonthearchitecture)oftherelocation.Systemsthatanalyzeandtransformmachinecodeprogramsusethisinformationinmuchthesamewaythatlinkersdo.Afterthecodehasbeenmovedaround,referencestoaddresseshavechanged,andtheyneedtobeupdatedtoreecttheirnewpositionintheexecutable.Withoutknowledgeaboutthelocationsofaddress,abinarymodicationsystemhastobefairlyconservativeinthekindsofcodetransformationsitisabletoeffect.Theremainderofthispaperassumesthatrelocationtablesareavailableintheexecutable.Wedonotfeelthisisunnecessarilyonerous:auserwhoissufcientlyconcernedaboutperformancetousealink-timeoptimizerseemslikelytobewillingtoinvokethecompilerwiththeadditionalagsneededtoretainrelocationinformation.Otherbinaryrewritingsystems,notablyOM[19]andAtom[18],havethesamerequirement,andmostlinkersarecapableofproducingthesetables.2.2Position-IndependentCodeManycompilerscanbeinstructedtoemitcodethatdoesnotrelyonbeingboundtoanyparticularpositionintheprogram'saddressspace.Thesecodesequencesareoftenreferredtoasposition-independentcode(PIC).Inparticular,PICsequencesdonotcontainanyrelocatableaddressesembeddedintheinstructions.Thispropertyenablesthecodetoworkregardlessofitsmemorylocationatruntime.Furthermore,PICdoesnotneedtobepatchedbytheloader,enablingittobemappedasread-onlydata—whichisusefulforsharedcodesuchasdynamicallylinkedlibraries[14].Whenacompilerisemittingposition-independentcodeittypicallycreatesjumptablesthatarealsoposition-independent.Thesetablesareusuallyembeddedinthetextsegmentoftheexecutableandconsistofasequenceofoffsetsratherthanvirtualaddresses.Ajumpthatusestheoffsettablerstloadsanearbyaddress,thenusesthistoindexintothetableandretrieveanoffset.Theoffsetisaddedtotheaddressthatwaspreviouslyloadedandthenusedinanindirectjumptoreachthedesireddestination.Theproblemsposedbyposition-independentjumptablesarethree-fold:theoffsettables,whicharereallynodifferentthandata,appearintheinstructionstream;thecodesequencesthatperformtheindirectjumpsareoftencomplicatedandmaynotadheretoasinglepatternthatiseasilyrecognizable;anditisentirelypossiblethatanoffsettabledoesnotcontainrelocationentries.Takentogether,thesepropertiesmakethetaskofdisassemblingPICsequencesinvolvingjumptablesmoredifcultthanstandard3TwoMethodsforInstructionDisassembly3.1LinearSweepAstraightforwardapproachtodisassemblyistodecodeeverythingappearinginsectionsoftheexecutablethataretypicallyreservedformachinecode.ThismethodisusedbyprogramssuchastheGNUutilityobjdump[9]aswellasbylink-timeoptimizerssuchas[15],OM[19],andSpike[6].Itsmainadvantageissimplicity.However,ithasthedisadvantagethatanydatathatisembeddedintheinstructionstreamismisinterpretedascodeanddisassembled.Onlyunderspecialcircumstances(suchaswhenaninvalidopcodeisdecoded)canthesesituationsbediscovered.TheproblemisillustratedbythecodefragmentshowninFigure1,takenfromthemachinecodeforthefunc-strrchrinthestandardClibrary(libc)underRedHatLinuxonaPentiumIIIprocessor.Startingataddress,threebytesofdata(,shownhighlighted)wereinsertedtopushtheloopheaderataddressforward,presumablyforalignmentpurposes.TheNULLbytesandsubsequentinstructionsaremisin-terpretedbytheutilityobjdump,asitusestheschemedescribedabovetodecodeinstructions.Byinspection,wecangureoutthatthejumpataddress0x809efaatargetsthemiddleofwhatobjdumpbelivestobeaninstruction.Inaddition,theinstructionsitdecodedarerathersuspiciousintheircurrentcontext(theataddresserencesanabsolutememorylocationthatdoesnotevenappearinthescopeofexecutable!).Theinstructionsequenceisclearlyinvalid,butthelinearsweepalgorithmisunabletodiscerndatafromcode.TheprobleminthiscasearisesbecauseontheIntelx86architecture,aNULLbytecanbeavalidopcode;itwouldnothaveariseniftheprogrammerhadusedinstructionstoforcealignment.However,thelargerpointillustratedbythisexampleremainsvalid:Dataembeddedinthetextsegmentcanbemisidentiedascodebythelinearsweepalgorithm,andthiscancausedisassemblyerrorsinsomeoralloftheremainderoftheinstructionstream. OntheIntelx86thisisdoneusinga`call0'instructionfollowedbya`pop%eax'instruction,whichhastheeffectofstoringthelatterinstruction'saddressintoregister Location MemoryContents DisassemblyResults 0x809ef45:eb3c 0000 0x809ef4a:83ee0483ee0x809ef4f:04830x809efaa:739ejmp0x809ef83add%al,(%eax)add%al,0xee8304ee(%ebx)add$0x83,%aljae0x809ef4a    Figure1:AnExampleofDisassemblyProblemsusingLinearSweep3.2RecursiveTraversalTheproblemwiththelinearsweepalgorithm,illustratedbytheexampleinFigure1,isthatitdoesnottakeintoaccountthecontrolowbehavioroftheprogram:inparticular,theinstructionimmediatelybeforethethreeNULLbytesinsertedforalignment.Asaresult,itisunabletodiscernthatthesealignmentbytesarenotreachableduringexecution,andmistakenlyinterpretsthemasexecutablecode.Anobviousxwouldbetotakeintoaccountthecontrolowbehavioroftheprogrambeingdisassembledinordertodeterminewhattodisassemble.Intuitively,wheneverweencounterabranchinstructionduringdisassembly,wedeterminethepossiblecontrolowsuccessorsofthatinstruction,i.e.,addresseswhereexecutioncouldcontinue,andproceedwithdisassemblyatthoseaddresses(e.g.,foraconditionalbranchinstructionwewouldconsiderthebranchtargetandthefall-throughaddress).Variationsonthisbasicapproachtodisassembly,whichwetermrecursivetraversal,areusedbyanumberofbinarytranslationandoptimizationsystems[3,20].Avirtueofthealgorithmisitssimplicityandeffectivenessinavoidingdisassemblyofdata.Thebasicalgorithmforrecursivetraversalis:procDisassemble( hasalreadybeenvisited)=DecodeInstr(.visited=instrListisabranchorfunctioncall)T=setofpossiblecontrolflowsuccessorsofforeach Disassemble( .length;/*addrofnextinstruction*/isavalidinstructionaddress;Eachexecutablecontainsanentrypoint,whichisusuallyspeciedintheprogramheader.TheroutineDisassem-isinitiallyinvokedwiththisentrypoint.Undertheassumptionthatweareabletoidentifyallpossiblecontrolowsuccessorsofeachbranchandfunctioncalloperationintheprogram,thisensuresthatanyinstructionthatisreachablefromtheprogramentryiscorrectlydisassembled.ThismethodisabletohandlethecodefragmentshowninFigure1.Upondecodingthejumpinstructionataddress,disassemblycontinuesataddress,the(only)controlowsuccessorforthisinstruction.Eventuallytheinstructionataddressisreachedbyapathfromthispoint,andthisinturncausesdisas-semblytoproceedfromtheinstructionat0x809ef4a.Thethreebytesareneverdisassembled,sincetheyarenotreachablebyanyexecutionpaththroughtheprogram. Location MemoryContents DisassemblyResults 0x80b1d8b:8d84c0951d0b080x80b1d92:ffe00x80b1d94:8d0x80b1d95:7426000x80b1d98:8b060x80b1d9a:13020x80b1d9c:8907 0x80b1d95(%eax,%eax,8),%eaxjmp*%eax0x0(%esi,1),%esimov(%esi),%eaxadc(%edx),%eaxmov%eax,(%edi)   Figure2:AnExampleofDisassemblyProblemsusingRecursiveTraversalThekeyassumptioninthisalgorithmisthatwecanidentifyallpossiblecontrolowsuccessorsofeachcontroltransferoperationintheprogram.Thismaynotalwaysbestraightforwardinthecaseofindirectjumps.Forjumptablesappearinginthetextsegment,thisposesacorrectnessissue:anyimprecisionindeterminingthesizeofsuchajumptablewillresulteitherinafailuretodisassemblesomereachablecode(ifthetablesizeisoverestimated)orerroneousdisassemblyofdata(ifitssizeisunderestimated).Theproblemiscomplicatedbythefactthatthestructureofthecodegeneratedforswitchstatementscandifferwidelyfromoneinstanceofaswitchtoanother,evenforaspeciccompilerandtargetarchitecture.Existingproposalsforidentifyingthetargetsofindirectjumpsusuallyresorttonontrivialprogramanalysessuchasprogramslicing[4]orconstantpropagation[8].Weneedacontrolowgraphforthefunctioninordertocarryoutsuchanalyses.Unfortunately,theconstructionofacontrolowgraphforafunctionbeforeallofitsinstructionshavebeendisassembleddoesnotseemstraightforward.Instead,weresorttoasimplertechniquebasedonrelocationinformation.Whendisassemblingthecodeforafunction,letbethesetofrelocatabletextsegmentaddressessuchthatliesbetweenthestartaddressforandthestartaddressofthefunctionfollowing,andletbethesetofaddressessuchthat andlocationitselfcontainsarelocatabletextsegmentaddress.Intuitively,weexpectanindirectjumptoanaddressbeimplementedbyloading(whichmustbeatextsegmentaddress,undertheassumptionthatallcodeisinthetextsegment)intoaregisterandthenjumpingindirectlythrough,andinthiscasetheaddresshastoberelocatable;thesetconsistsofallsuchaddressesthatliewithinthefunction,andhencemightbepossibletargetsforanindirectjumpin.Thesetspeciesthoseelementsofthatarejumptableentries,i.e.,whichdonotcontaincodeandhencecannotbethetargetofajump.Thesetofpossibletargetsofanindirectjumpwithinisthentakentobethesetofaddresses Thisapproachseemsplausible,inthatitusesaconservativeover-estimateofthesetofpossibletargetsofeachindirectjump,whichmeansthateveryaddressthatcouldinfactbeatargetofthejumpisconsideredandallreachablecodeisdisassembled.Theproblemisthatwemayalsoconsideraddressesthatarenotinfacttargets.Thiscanproduceincorrectdisassemblyresults,asillustratedbyanexamplefromaClibraryroutineunderRedHatLinux mpn add ,showninFigure2.IntheIntelx86instructionset,an(“loadeffectiveaddress”)instructionoftheform`baseAddr'hastheeffect baseAddr contentsOf )+ contentsOf(r1): instructionataddressinFigure2thereforecomputesanaddressintoregistervaluedependsonthecontentsofbeforethisinstruction.Aninspectionofthehand-codedassemblyroutineforthisfunctionrevealsthataloopbeginsataddressat0x80b1d98,andtheaddresscomputedbythisinstruction Accurateidenticationofthepossibletargetsofanindirectjumpthroughajumptablecanbedifcultevenifweassumethatacontrolowgraphisavailable,sincewecannotingeneralcountonthejumpinaprogrambeingaccompaniedbyaboundscheckthatwouldenableustoidentifytheextentofthejumptable.Suchchecksmaybeexcisedfromhand-craftedassemblycodebyacarefulprogrammerwhoisawareofspecicinvariantsthatholdintheprogram;anaggressiveoptimizingcompilermaybeabletoelidethecheckbasedonprogramanalysestoidentifytherangeofvaluesforavariable[10]orusingoptimizationsanalogoustotheeliminationofarrayboundschecks[11,16].Wemayalsoencounterindirectjumpsthatdon'tinvolveajumptableandhencedon'thaveaboundscheck. issomewhereinthemiddleofthisloop;exactlywhereisdeterminedbythecontentsofItturnsoutthatthisregisteralwaystakesonavaluethatresultsinavalidinstructionaddressbeingcomputed.However,duringastaticexaminationoftheinstructionstreamduringdisasembly,wecannotguaranteethat = sincesuchguaranteesingeneralrequirenontrivialanalysessuchasconstantpropagationorprogramslicing,whichinturnrequirethecontrolowgraphforthefunction,whichisnotavailableduringdisassembly.Sincetheaddressappearsasarelocatabletextsegmentaddresswithinthefunction,andthislocationdoesnotitselfcontainarelocatabletextsegmentaddress,itisconsideredasapossibletargetoftheindirectjumpatlocation0x80b1d92duringrecursivetraversaldisassembly(thiscorrespondstothepossibilitythatregistercouldhavethevalue0whenthisinstructionisexecuted).Asaresult,wecontinuedisassemblingtheinputstartingatlocation0x80b1d95Theproblemisthatthisaddressisinthemiddleofaninstruction,i.e.,recursivetraversalproducesanincorrectdisassemblyinthiscase.4AnImprovedAlgorithmThelinearsweepandrecursivetraversaldisassemblyalgorithmsdiscussedintheprevioussectionhavecomplementarystrengthsandweaknesses.Theformerdoesnotrelyonthepreciseidenticationoftargetsofindirectjumpsforcorrectdisassembly,butithastroublecopingwithdataembeddedintheinstructionstream;thelatterisabletodecodearounddataembeddedinthetextsegment,butitmayhaveproblemswithindirectjumpsiftheirtargetscannotbepreciselyidentied.Thissectiondiscusseshowthesetwoalgorithmscanbecombinedtoexploitthestrengthsofeach.4.1ExtendingtheLinearSweepAlgorithmThesimplelinearsweepalgorithmdiscussedinSection3.1hasthedisadvantagethatanydataappearinginthetextsegmentcausesdisassemblyerrors.Inparticular,thismeansthatthisalgorithmcannotdealwithjumptablesembeddedinthetextsegment.Inthissectionwediscusshowthelinearsweepalgorithmcanbeextendedtohandlejumptablesembeddedintheinstructionstream.AsmentionedinSection2.1,weassumethatrelocationinformationisavailableinthelebeingdisassembled.Wecantakeadvantageofsuchinformationtoidentifyjumptablesembeddedinthetextsegment(notethatjumptablesinthedatasegmentdonotposeaproblem:ourprimarygoalhereistoidentifytheextentofjumptablesinthetextsegmentsothatwecanavoidmisinterpretingthemascode).Eachaddressappearinginajumptableembeddedinthetextsegmenthasthefollowingproperties:thememorylocationscontainingaremarkedrelocatable;andtheaddressitselfpointsintothetextsegment.Theseproperties,whilenecessaryforjumptableentries,maynotbesufcient:dependingonthearchitecture,relo-catableaddresses,possiblypointingintothetextsegment,mayalsoappearasimmediateoperandsinaninstruction.However,theinstructionsetsoftypicalmodernarchitecturesimposean(architecture-specic)upperboundthenumberofsuchimmediateoperandsthatcanappearadjacenttoeachotherinaninstruction(e.g.,fortheIntelx86architecture,2).Thus,ifthetextsegmentcontainsadjacentrelocatableaddresseseachofwhichpointintothetextsegment( ),atmosttherstofthesemaybepartofaninstruction;theremainingaddressesmustbedata.Wecanusethisinformationtomodifythelinearsweepalgorithmsothat,duringdisassembly,itgoesaroundanysuchdatablocksidentiedinthetextsegment.Ofcourse,thisdoesnotresolvethestatusoftheentriesinthesequence,i.e.,determinewhethertheyarepartofthejumptableorimmediateoperandsofaninstruction.Wewillreturntothispointshortly.Acrucialpropertyofthisapproachisthatitallowsustoidentifytheendofajumptablethatappearsinthetextsegment.Thetextsegmentthereforebecomesdividedinto“chunks”ofcodeseparatedbyjumptables.Eachchunkstartseitherattheentrypointofafunctionorattheendofthepreviousjumptable.WeusethesimplelinearsweepalgorithmofSection3.1todisassembleeachsuchchunk,thenexaminethelastinstructioninthedisassembledchunk.Supposethatthelastinstructioncontainsaddresses(0 )asimmediateoperandsappearingattheendoftheinstruction.Thenweknowthatofthecontiguousrelocatableaddressesappearingattheendofthatchunk,addressesarepartofinstructionsandtheremainingaddressesconstitutejumptableentries.Theresultingalgorithmisasfollows: Theinstruction`lea0x0(%esi,1),%esi'ataddressservesasa4-byteno-opwhosepurposeistoaligntherstinstructioninthelooponan8-byteboundary. 1.Foreachsequenceofcontiguousrelocatabletextsegmentaddressesappearingintheprogram(markthelastaddressesinthesequenceas2.Foreachsequenceofunmarkedaddressesinthetextsegmentdo:(a)DisassembleusingthesimplelinearsweepalgorithmofSection3.1.Stopwhendisassemblyreachesamarkedlocation.(b)Ifthelastinstructionbeingdisassembledwasincompletelydisassembledwhenthemarkedlocationwasreached,discardthisinstruction.(c)Examinethelastcorrectlydisassembledinstruction,letbethenumberofrelocatabletextsegmentad-dressesappearingatitsend(0Theremustbeunmarkedrelocatabletextsegmentaddressesbetweentheendofthisinstructionandthenextmarkedlocation.MarkeachoftheseaddressesasTheresultingalgorithmisabletohandlejumptablesappearinginthetextsegment.However,becauseitreliesonrelocationinformation,itisstillunabletodealwithdataembeddedinthetextsegmentthatdoesnothaveanyrelocationinformationassociatedwithit,suchastheNULLbytesintheexampleofFigure1.Wenextdiscusshowwecancombineourenhancedlinearsweepalgorithmandrecursivetraversaltoaddressthisproblem.4.2AHybridDisassemblyAlgorithmThebiggestproblemwithboththerecursivetraversalalgorithmdiscusedinSection3.2,andtheextendedlinearsweepalgorithmdescribedintheprevioussecion,isthattheycanresultinundetecteddisassemblyerrorsthatcancompromisethecorrectnessoftheoverallbinaryrewritingsystem.Thebasicideabehindourapproachistocombinethesetwoalgorithmsinawaythatallowsustodetect,andidentifytheextentof,suchdisassemblyerrors.Ourapproachisstraightforward.WedisassembletheprogramusingtheextendedlinearsweepalgorithmdescribedinSection4.1,thenverifytheresultsofthisdisassemblyafunctionatatimeusingtherecursivetraversalalgorithm.Thevericationprocesschecksthattheinstructionsequenceobtainedforeachfunctionisself-consistent,i.e.,doesnotcontainerrorssuchasabranchintothemiddleofaninstruction.Anyfunctionforwhichvericationfails,i.e.,forwhichthelinearsweepandrecursivetraversalsdisagree,isprecludedfromsubsequentoptimization.Afunctionisveriedasfollows:–Userecursivetraversaltodisassembleeachinstructioninthefunction.–Foreachinstructionsoobtainedataddress,checkthattheoriginaldisassemblyusinglinearsweephasalsoobtainedtheinstructionataddress.Ifnot,reportfailure–Ifnofailureisencounteredwhileprocessingtheinstructionsinthefunction,reportAsapracticalmeasure,thevericationstepdoesnotactuallyconstructasecondcopyofthedisassembledinstructionsequenceforthefunction,sincethiswouldbewastefulofmemory.Insteaditsimplychecksthattheinstructionsthatitencountersasitgoesalongmatchthedisassemblyresultsobtainedusingthelinearsweep.Ifvericationfailsforafunction,thecodeforthatfunctionismarked“problematic”andisprecludedfromsub-sequentoptimization.Weretaintheoriginalmachinecodesequenceforsuchfunctions,andinsertitbackintotheprogramafteroptimizationoftheremainderoftheprogram.Thismayrequireupdatestoaddresseswithinthema-chinecodeforsuchproblematicfunctions,sincetheymaynotbereinsertedattheiroriginaladdresses.Suchaddressesareidentiedfromtheoriginalrelocationinformationassociatedwiththem.Onecouldimagineextendingthisapproachsothat,ifvericationfailsforafunctionbecauseofadisagreementbetweenthelinearsweepandtherecursivetraversalalgorithms,wemighttrytodeterminewhetheroneofthemiscorrect.Inthiscase,wecouldusetheresultsofthedisassemblyalgorithmsdeemedtohaveproducedacorrectresult,insteadofsimplygivinguponthefunctionandmarkingitasproblematic.Forexample,ifafunctiondoesnotcontainanyindirectjumps,wecanbeguaranteedthattherecursivetraversalalgorithmiscorrect.Ourcurrentsystemdoesnotimplementsuchextensions. DisassemblyTime Program TLinear TRecursive THybrid THybrid= TLinear THybrid=TRecursive compress 1.16 1.02 2.06 1.78 2.02 gcc 10.63 7.47 16.4 1.54 2.20 go 2.64 2.16 4.40 1.67 2.04 ijpeg 1.87 1.54 3.10 1.66 2.01 li 1.61 1.34 2.67 1.66 1.99 m88ksim 1.96 1.63 3.29 1.68 2.02 perl 2.84 2.32 4.73 1.66 2.04 vortex 4.40 3.24 7.07 1.61 2.18 EOMETRICEAN 1.66 2.06 (a)SPECint-95 DisassemblyTime Program TLinear TRecursive THybrid THybrid=TLinear THybrid=TRecursive bzip2 1.44 1.18 2.45 1.70 2.08 crafty 2.32 1.88 3.82 1.65 2.03 eon 5.71 4.19 9.28 1.62 2.22 gcc 14.59 10.82 23.94 1.64 2.21 gzip 1.45 1.19 2.41 1.66 2.02 mcf 1.18 1.00 1.98 1.68 1.98 parser 1.71 1.38 2.83 1.66 2.05 twolf 2.10 1.73 3.52 1.68 2.04 vortex 3.91 2.87 6.28 1.61 2.19 vpr 1.72 1.46 2.91 1.69 1.99 EOMETRICEAN 1.66 2.08 (b)SPECint-2000Key::Disassemblytimeusingtheextendedlinearsweepalgorithm:Disassemblytimeusingrecursivetraversal:DisassemblytimeusingthehybridalgorithmTable1:Performance:DisassemblySpeed5ExperimentalResultsWetestedandevaluatedthevariousdisassemblyalgorithmsdescribedherewithinthecontextofPLTO,alink-timeoptimizerwehavedevelopedfortheIntelx86architecture[17],usingtheSPECint-95andSPECint-2000benchmarksuites.Ourexperimentswererunonanotherwiseunloaded550MHzPentiumIIIsystemwith1GBofmainmemoryrunningRedHatLinux7.1.Theprogramswerecompiledwithversionegcs-2.96atoptimizationlevel,withadditionalagsinstructingthelinkertoretainrelocationinformationandtoproducestaticallylinkedexecutables.Theuseofstaticallylinkedexecutablesresultsfromourrequirementthattheinputbinariescontainrelocationinformation;thelinkerrefusestoretainrelocationinformationforexecutablesthatarenotstaticallylinked.Itturnsouttobeusefulbecauseitforcesustodealwithhighlyoptimizedlibrarycode,includinghand-craftedassemblycode,thatpresentsinterestingdisassemblychallenges.Oftheseprograms,theprogramfromtheSPECint-2000suitecontainsjumptablesinthetextsegmentresultingfromfragmentsofposition-independentcode.Wemeasuredthedisassemblytime(whichincludesthetimetakentoreadthetextsegmentintomemory)forthethreedifferentalgorithms—extendedlinearsweep,recursivetraversal,andhybrid—aswellasthe“precision”ofourhybriddisassemblyalgorithmasgivenbytheamountofcodethatismarkedas“problematic.”Theexecutiontimesofthelinearsweepandrecursivetraversalalgorithmsaregivenforreferencepurposesonly,sinceneitheralgorithmproducescorrectdisassemblyresults(eachofthemfailssilentlyonsomeportionsoftheprogram,asdescribedearlier).TheresultsareshowninTable1.Asonewouldexpect,thetimetakenbythehybridalgorithmisroughlyequaltothesumofthetimesforthelinearsweepandrecursivetraversalalgorithms.Onaverage,thehybridisabout66%slowerthanthelineartraversalschemeandabouttwiceasslowastherecursivetraversalscheme.Forourpurposes,thedisassemblytimeaccountsforonlyarelativelysmallfractionofthetotalprocessingtime,sotheadditionaltime No.ofFunctions No.ofTextBytes Program Nf Pf Pf=Nf(%) Nb Pb Pb=Nb(%) compress 570 4 0.70 291552 792 0.27 gcc 2418 3 0.12 1146304 736 0.06 go 919 4 0.44 485472 792 0.16 ijpeg 968 4 0.41 403664 800 0.20 li 928 4 0.43 334992 800 0.24 m88ksim 832 4 0.48 394656 800 0.20 perl 887 4 0.45 502768 800 0.16 vortex 1506 4 0.27 671936 792 0.12 EOMETRICEAN 0.38 0.16 (a)SPECint-95 No.ofFunctions No.ofTextBytes Program Nf Pf Pf=Nf(%) Nb Pb Pb=Nb(%) bzip2 634 3 0.47 339216 736 0.22 crafty 673 4 0.59 449632 792 0.18 eon 2288 4 0.17 810256 800 0.10 gcc 2607 3 0.12 1384176 736 0.05 gzip 663 3 0.45 344464 736 0.21 mcf 572 4 0.70 294880 792 0.27 parser 884 4 0.45 385280 792 0.21 twolf 751 4 0.53 457184 792 0.17 vortex 1506 4 0.27 671936 792 0.12 vpr 832 4 0.48 391440 800 0.20 EOMETRICEAN 0.38 0.16 (b)SPECint-2000Key::Totalno.offunctions:No.offunctionsinferredtobe“problematic”:Totalno.ofbytesinthetextsegment:No.ofbytesin“problematic”functionsTable2:Performance:PrecisionofDisassemblytakenbythehybriddisassemblyalgorithmdoesnotposeaperformanceissueoverall.Table2showsthe“precision”ofdisassembly,inthesenseoftheproportionofcodeinaprogramthatisprop-erlydisassembledandpassesverication.Alloftheproblematicfunctionsidentiedresultfromhighlyoptimizedlibraryroutines.Threeprogramshave3problematicfunctionseach,whichwere(alsocalled), mpn add ,and mpn sub .Theotherprogramshave4problematicfunctionseach:thethreementionedabove mpn .Inthelattercase,theproblemisthatduringverication,therecursivetraversalincorrectlydisas-sembleswhatitthinksisaconditionaljumpin mpn add thatgoestothemiddleofanothervalidinstructionin mpn .Thefunctionaccountsforthemajority(448bytes)oftheproblematiccode.Itcanbeseenthattheamountofcodefoundtobeproblematicisverysmall:onaverage,fewerthan0.4%ofthefunctions,comprisinglessthan0.2%oftheprogram'stextsegment.Inotherwords,over99.8%ofthetextsegmentisveriedtohavebeencorrectlydisassembledandeligibleforsubsequentprocessing.Thisresultsineffectiveoptimizationofthesebinaries,withsignicantperformanceimprovements[17].6RelatedWorkThesimplelinearsweepdisassemblyalgorithmdescribedinSection3.1isusedbyanumberofsystemsthatanalyzeormodifyexecutableles.TheseincludetheGNUobjdumputility[9];theprolingtool[12]anditssuccessor,EEL[13];thelink-timeoptimizer[15];aswellastheOM[19]andSpike[6]link-timeoptimizersandtheAtombinaryinstrumentationtool[18]fromCompaq.Allofthesesystemscanproduceincorrectdisassembliesforinputbinarieswhosetextsegmentscontaindata.Asithappens,mostofthesesystems,e.g.,,OM,Spike,andAtom, targetRISCarchitectures,wherethexed-sizedinstructionsmakeiteasiertodetectdisassemblyerrors.ExamplesofbinaryrewritingsystemsthatuserecursivetraversalfordisassemblyincludeUQBT[5]andtheworkofTheiling[20].Neitherofthesereliesonrelocationinformationtoidentifyaddresses.UQBThandlesindirectjumpsandindirectfunctioncallsusing“speculativedisassembly,”i.e.disassemblyofareasthatappeartobecode,intheexpectationthattheymightbethetargetsofsuchcontroltransfers[2].Thesystemkeepstrackofhowmuchofthetextithasdisassembled,andexploresgapsincoverageaspossiblecode.Whendisassemblingsuchgaps,a“speculative”bitisset,whichmeansthatifaninvalidinstuctionisdisassembled,thatdisassemblyisabandoned.Disassemblycanthenberestartedatthenextword(forRISCmachines)orbyte(formachinessuchasthePentium).Theilingdescribesasystemthatreliesonknowledgeofthespeciccompilerusedtogenerateanexecutabletoguideitsdisassembly[20].Aproblemwiththisapproachisthatwecannotalwaysguaranteethatallofthecodeinanexecutablewillhavebeenproducedusingthesamecompiler,e.g.,inthecaseofstaticallylinkedbinarieswheredifferentlibrariesmayhavebeencompiledwithdifferentcompilers(ordifferentversionsofacompiler).Theiling'salgorithmassumestheexistenceofamodulethatidentiesthetargetsofindirectjumps;however,thepaperdoesnotspecifyhowthisiscarriedout.Therehasalsobeenalotofworkondynamicbinaryrewritinganddynamicoptimization(see,e.g.,[1,7]).Thedisassemblyissuesforsuchsystemsareverydifferentfromthosediscussedinthispaper,sinceatruntimewecanexamineanindirectjumpoperationjustbeforeitisexecutedinordertoidentifytheactualaddressofthejumptarget.7ConclusionsCorrectdisassemblyofanexecutableisafundamentalrequirementofanytoolthatintendstomodifyexecutablepro-grams.Existingalgorithmsforstaticdisassemblysufferfromthedisadvantagethattheycan“failsilently”andproduceincorrectlydisassembledcode.This,inturn,cancompromisethecorrectnessoftheentirebinaryrewritingtool.Inthispaperwediscussedsomeofthereasonswhythesealgorithmscanfail,andproposeahybriddisassemblyalgorithmthatisabletocheckthedisassembledinstructionsequenceitproduces.Thisallowsittodiscoverdisassemblyerrorsandlimitthescopeofsucherrors.Codefragmentsthatarefoundtopossiblycontaindisassemblyerrorsinthiswayareprecludedfromsubsequentoptimizations.ExperimentsusingtheSPECint-95andSPECint-2000benchmarksuitesindicatesthatitisabletosuccessfullydecodeover99.8%ofthetextsegmentoftheinputbinaries.References[1]V.Bala,E.Duesterwald,andS.Banerjia,“Dynamo:Atransparentdynamicoptimizationsystem”,Proc.SIGPLAN'00ConferenceonProgrammingLanguageDesignandImplementation,June2000,pp.1–12.[2]C.Cifuentes,personalcommunication,May2001.[3]C.CifuentesandK.J.Gough,“DecompilationofBinaryPrograms“,Software—PracticeandExperience,25(9),Jul.1995.[4]C.CifuentesandM.VanEmmerik,“RecoveryofJumpTableCaseStatementsfromBinaryCode”ProceedingsoftheInter-nationalWorkshoponProgramComprehension,May1999.[5]C.Cifuentes,M.VanEmmerik,D.Ung,D.Simon,andT.Washington,“PreliminaryExperienceswiththeUQBTBinaryTranslationFramework”,Proc.WorkshoponBinaryTranslation,Oct.1999.[6]R.Cohn,D.Goodwin,P.G.Lowney,andN.Rubin,“OptimizingAlphaExecutablesonWindowsNTwithSpike”,TechnicalJournal,Vol.9,No.4,1997,pp.3–20.[7]C.Consel,L.Hornof,J.Lawall,R.Marlet,G.Muller,J.Noy´e,S.Thibault,andE.-N.Volanschi,“Tempo:Specializingsystemsapplicationsandbeyond”,InACMComputingSurveys,SymposiumonPartialEvaluation(SOPE'98),30(3),Sep[8]B.DeSutter,B.DeBus,K.DeBosschere,P.Keyngnaert,andB.Demoen,“OntheStaticAnalysisofIndirectControlTransfersinBinaries”,Proc.InternationalConferenceonParallelandDistributedProcessingTechniquesandApplications(PDPTA),2000.[9]GNUProject–FreeSoftwareFoundation,GNUManualsOnlinehttp://www.gnu.org/manual/binutils-2.10.1/html chapter/binutils 4.html.9 [10]W.Harrison,“CompilerAnalysisoftheValueRangesforVariables”,IEEETransactionsonSoftwareEngineering,3(3),pp.243–250,May1977.[11]P.KolteandM.Wolfe,“EliminationofRedundantArraySubscriptRangeChecks”,Proc.SIGPLAN'95ConferenceonProgrammingLanguageDesignandImplementation,June1995,pp.270–278.[12]J.R.LarusandT.Ball,“RewritingExecutableFilestoMeasureProgramBehavior”Software—PracticeandExperience24(2),197–218,Feb.1994.[13]J.R.LarusandE.Schnarr,“EEL:Machine-IndependentExecutableEditing”,Proc.SIGPLAN'95ConferenceonProgram-mingLanguageDesignandImplementation,June1995,pp.291–300.[14]J.R.Levine,LinkersandLoaders,MorganKaufman,2000.[15]R.Muth,S.K.Debray,S.Watterson,andK.DeBosschere,“alto:ALink-TimeOptimizerfortheCompaqAlpha”,SoftwarePracticeandExperience:67–101,Jan.2001.[16]R.RuginaandM.C.Rinard,“SymbolicBoundsAnalysisofPointers,ArrayIndices,andAccessedMemoryRegions”,Proc.SIGPLAN'00ConferenceonProgrammingLanguageDesignandImplementation,June2000,pp.182–195.[17]B.Schwarz,S.Debray,G.Andrews,andM.Legendre,“PLTO:ALink-TimeOptimizerfortheIntelIA-32Architecture”3rdWorkshoponBinaryTranslation,Sept.2001.[18]A.SrivastavaandA.Eustace,“ATOM:ASystemforBuildingCustomizedProgramAnalysisTools”,Proc.SIGPLAN'94ConferenceonProgrammingLanguageDesignandImplementation,June1994,pp.196–205.[19]A.SrivastavaandD.W.Wall,“APracticalSystemforIntermoduleCodeOptimizationatLink-Time”,JournalofProgram-mingLanguages,March1993,pp.1–18.[20]H.Theiling,“ExtractingSafeandPreciseControlFlowfromBinaries”Proceedingsofthe7thConferenceonReal-TimeComputingSystemsandApplications,Dec.2000.