/
Path Exploration Lifting HiFi Tests for LoFi Emulators Path Exploration Lifting HiFi Tests for LoFi Emulators

Path Exploration Lifting HiFi Tests for LoFi Emulators - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
389 views
Uploaded On 2017-08-18

Path Exploration Lifting HiFi Tests for LoFi Emulators - PPT Presentation

com Stephen McCamant UC Berkeley smcccsberkeleyedu Pongsin Poosankam CMU UC Berkeley ppoosankcmuedu Dawn Song UC Berkeley Berkeley CA USA dawnsongcsberkeleyedu Petros Maniatis Intel Labs Berkeley CA USA petrosmaniatisintelcom Abstract Processor emul ID: 82923

com Stephen McCamant Berkeley

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Path Exploration Lifting HiFi Tests for ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Interestingly,althoughinthispaperweusepath-explorationlift-ingfromaHi-FiemulatortoaLo-Fiemulator,hopingto“ruboff”someofthehigherdelityoftheHi-FiemulatortotheLo-Fione,thetechniqueismoregeneral.Itcanbeusedintheoppositedi-rection,fromLo-FitoHi-Fi,toseehowtheHi-FiemulatorwouldbehaveforthedistinctcasesimplementedbytheLo-Fiemulatordevelopers.Beyondemulation,foranytwoimplementationsofthesameprecisespecication(e.g.,SQLqueryengines,SSLimple-mentations,etc.),itmakessensetoanalyzeoneimplementationtogeneratetestcasesforcomparisontotheotherimplementation.Certainlyneithercross-validationnorpathexplorationviasym-bolicexecutionarenew.However,pathexplorationonanarti-facttotestthatsameartifactcanatbesttriggeritsowncornercases,butnotcapturethoseunimplementedinit,whichourap-proachcanachieve.Moreimportantly,cross-validationalonecan,atbest,dorandomdirectedtesting(“fuzzing”)withoutcapitalizingonthefundamentaldifferencesbetweenthedifferenttestedarti-facts.Incontrast,applyingsystematicprogramanalysisforpathexploration,suchassymbolicexecution,toamplifythebenetsofcross-validationisnoveltoourknowledge.Ideally,onewouldwanttoapplypath-explorationliftingtoahardwarespecication(e.g.,theregister-transferlanguagespecicationofacircuit);un-fortunately,suchspecicationsforcommodityhardwarearepro-prietaryandextremelywellguarded.ByapplyingthemethodologytoaHi-Fiemulator,wecapturecornercasesfromthenextbestthing.Conveniently,theHi-Fiemulatorneednotbeperfect,onlycomplete:weusetheactualhardwaretotestouremulatorsagainst,socorrectnessbugsintheHi-Fiemulatordonotimpactourresults,andcanbediscoveredthroughourapproachassideeffects.Althoughpath-explorationliftingisageneralconcept,itsim-plementationischallenging,oftenlosinggenerality.InthispaperweapplyandcustomizethetechniqueforBochsandQEMU,deal-ingwithseveralfundamentalchallenges.Whileinthepastothershaveusedsymbolicexecutiontogeneratehigh-coveragetestcasesforprograms,thoseprogramswereapplicationswithscalarorothersimpleinputtypes.Incontrast,PokeEMUmustgeneratetestcasesforemulators,whoseinputisastartingmachinestateandatestin-struction,astaggeringlylargestatespacetoexplore.Furthermore,evenafterPokeEMUmakessenseoutofthestatespaceandgener-atessometestcaseswithstartingstatesforagiventestinstruction,wemustgureouthowtoleadtheemulatedsystemtothedesiredstartstate:howtogetittosetitsregisters,conguration,programcounter,andexecutionmodetothevaluesrequired,whichisnotstraightforwardsincecertainpartsofthemachinestatecannotbesetdirectlyandmostinstructionshavemultiplesideeffectsthatmayundopriorstatesetup.Finally,itmaynotalwaysbepossibletoanalyzethesourcecodeofanemulatorduetointellectualprop-ertyrestrictionsand,evenwithoutsuchrestrictions,theemulatormaymanipulatesystemstatethroughmultipleintermediaterepre-sentations,viajust-in-timecompilers,etc.;operatingonthebinaryexecutablemaybetheonlyoptionfortestingsystememulators.Contributions.Thispaperproposespath-explorationlifting,anewmethodologyforexhaustivelyexploitingthecorrectnessofonear-tifacttoimprovethecorrectnessofanother.Thepaperpresentsthedesign,implementation,andevaluationofthemethodologyinthePokeEMUsystemforprocessoremulators.PokeEMUconsistsofseveralkeycomponents.First,thepaperpresentsasymbolicexecu-tionengineforx86binaries,FuzzBALL,usedtoexplorepathsfrombinaries,asopposedtosourcecode.Second,itdescribesanovelex-plorationstrategyforprocessoremulators,whichstartswiththein-structiondecoders,generatinginstructionstoiterateover,andthenexplorestheinstructionemulatortoidentifypathsperinstruction,withoptimizationstoreducethestatespace.Third,itdetailsanessentialtoolforprocessortesting:aninput-stategeneratorthat, Figure1.Overviewgivenaninputstateforatestcase,automaticallycreatesaprogramtobringanemulatororphysicalsystemtothatstate,sothatthetestcantakeplace.Finally,itconductstherstsystematicstudyoftheapproachusingsymbolicexecution,assessingQEMU'semula-tiondelityusingtestcasesliftedfromBochsandcomparingittobothBochsandrealhardware,identifyingseveraldeviationsfromexpectedbehavior.Ourevaluationestablishesfourkeypoints.First,formorethan95%oftheinstructions,PokeEMUachievedcompletepathcover-age.Second,PokeEMUfoundalargenumberofdeviationsamongtheemulatorstestedandrealhardware:outofabout610,000pro-grams,morethan72,000triggereddifferences,andwehaveiden-tiedanumberofrootcauses,someofwhichaffectmanyinstruc-tions.Third,manyofthedeviationsfoundcouldnothavebeenfoundbyprioremulator-testingapproaches,suchasrandomtest-ing.Finally,atleasttwoofthedeviationsfoundcouldleadtosig-nicantsecurityproblems,whenthoseemulatorsareusedasthebasisforasecuritytool.2.OverviewACPUorinstruction-setemulatorisaprogramthatrunsononearchitecture(host),butwhosefunctionalityistosimulatethepro-cessorofapotentiallydifferentarchitecture(guest).OurgoalinthisworkistodiscoverdifferencesbetweenthebehaviorofaCPUemulator,andthebehaviorofanotheremulatororrealhardware:suchadifferenceenablesustondbugsinemulators.Tobeprecise,wedenethe(machine)stateofaCPUemulatororhardwaresystemtobeallthevalues,suchasingeneralpur-poseregisters,controlregisters,ags,ormemory,thatcanaffecttheexecutionofafutureinstruction.Wesaythatsystemsshowabehavioraldifferenceifwestarttheminthesamemachinestate(calledtheteststate),andtheyexecutethesameinstruction(calledthetestinstruction),butaftertheinstructionexecutestheyareindifferentmachinestates(calledthenalstates).Examplediffer-encesincludehavingadifferentvalueinaregister,ornotraisinganexceptionwhenthehardwarewould.Approach.Atahighlevel,ourapproachistodiscoverdifferencesbyconstructinghigh-coverageteststhattriggerthem,usingthe methodologyofpath-explorationlifting.Furthermore,unliketra-ditionalprogramtestingwherethetestsaresimplyscalartestinputvalues,atestfortheemulatorwouldspecifyateststateandatestinstruction.Andinpractice,wegeneratetestprograms:stand-aloneprogramsthatcanrunonanemulatortosetuptheteststatesandthenexecutethetestinstructions.Ourdesigngoalsare(1)tomaximizethecoverageofourtest-ing,subjecttotheconstraintsof(2)producingapracticalnumberoftests,while(3)requiringrelativelylittlehumanguidanceinmod-ifyingtheemulatorsorconguringthetests.Nextwediscusshowweapplytheseprinciplestothekeytechnicalchallenges.ChallengesandTechniques.First,thespaceofpossibleinstructionsandmachinestatesisastronomicallylarge,soitwouldnotbeprac-ticaltoindividuallytesteverypossibleinstructionandinitialstate.Howeverthisspacealsohascomplexstructure,sochoosinginstruc-tionsandstatesatrandom,orbasedontypicalusage,wouldmissdifferencesthatoccurincornercases.Weaddressthischallengebyusingsymbolicexecutiontoexplorethespacebasedonhowcomponentsofthestateareusedbyatestedemulator.First,weapplysymbolicexecutionontheinstructiondecoderofanemu-latortoselecttestinstructions.Thenforeachtestinstruction,weapplysymbolicexecutiontotheemulator'simplementationofthatinstruction.Specically,wechooseasubsetofthemachinestateasrelevant(thischoiceisdiscussedindetailinSection3.3.1andFigure3).Symbolicexecutiondeterminesateststateforeachex-ecutionpaththroughtheimplementationthatcanbetriggeredbysomeassignmentofvaluestotheselectedstatecomponents.AsecondchallengeisthatCPUarchitecturesdonotprovideauniforminterfacetoinitializeallthecomponentsoftheirstate.Forinstancecontrolregistersmustbeinitializedusingspecializedinstructions,andsomekindsofinitializationareeitherprerequisitesfororconictwithotherkinds.Toaddressthesechallenges,wewriteaxedpieceofcodetoinitializeamachinetoabaselinestate.Thenourtoolautomaticallyinstantiatesasequenceofteststateinitializerstosettheremainingpartsofthestate.Thusthetestprogramconsistsofthebaselinestateinitializer,thentheteststateinitializers,thenthetestinstruction.Athirdchallengeisthatmanyemulatorsuseinlineassemblycodeorperformjust-in-time(JIT)compilation,sotheycannotbeproperlyanalyzedatthesourcecodelevel.Weaddressthischallengebyusingbinary-levelsymbolicexecution,whichappliesuniformlytoaninterpretercompiledfromsourcecodeandthemachinecodecreatedbyajust-in-timecompiler.Althoughwedidhavesourcecodefortheemulatorswestudied,wetookonthischallengesoastoprepareforalsostudyingclosed-source,commercialemulatorsandVMMs.SystemArchitecture.Ourapproachhasfoursteps:exploration,testprogramgeneration,testprogramexecution,anddifferenceanaly-sis(Figure1).Intherststep,exploration,weusesymbolicexecutiontoex-ploreanemulator.Toefcientlypartitionthespaceofpossiblein-structionexecutions,wedotheexplorationintwosteps:werstexploretogeneratelegalinstructions(Figure1(1)),andthenex-ploretheexecutionofeachinstructionseparately(Figure1(2)).Weprogressivelyexplorealltheexecutionpathsofaninstructionim-plementation,givenaselectedsetofsymbolicstatecomponents,andgenerateonetestforeachpath.Thustheoutputofthisstepisasetofpairsoftestinstructionsandteststates.Thesymbolicex-ecutionisimplementedusingourtoolFuzzBALL,whichwehaveextendedwithoptimizationsforthisproblemdomain.Thesecondstep,testprogramgeneration,constructscompletetestprogramsfromtheresultsofexploration.Foreachinputpairofatestinstruc-tionandanteststatefoundintheexplorationphase,weconstructasoutputatestprogramconsistingofthebaselineinitializer,theteststateinitializerfortheteststate,andthetestinstruction(Fig-ure1(3)).Inthethirdstep,testprogramexecution,wetakethetestprogramsasinputandrunthemonemulatorsandrealhardware(Figure1(4)).Weinstrumenttheemulatorsandahardware-basedvirtualmachinetosaveasoutputthemachinestateafterexecut-ingthetestprogram(thenalstate).Inthefourthstep,differenceanalysis,wecomparethenalstatesfromthedifferentexecutionsofatest(Figure1(5)).Iftheresultsdifferbetweenemulatorsorbetweenanemulatorandtherealhardware,wehavetriggeredabehaviordifference.Forourevaluation,wehaveselectedtwoemulatorsthatsupportx86guestcode:Bochsisaninterpreter,andQEMUisaJITcom-piler.WecompiletheemulatorsfortheLinux/x86hostplatform.Inourexperiments,weapplysymbolicexecutiontoBochs,andthenusethegeneratedtestsforathree-waybehaviorcomparisonofBochs,QEMU,andanIntelR CoreTMi5workstationvirtual-izedbyKVM.Wetesttheprocessor's32-bitprotectedmode.3.Path-ExplorationLiftingInthissectionwedescribethemaintechnicalaspectsofhowoursystemexploresthespaceofpossibleinstructionexecutionsinanemulator.Westartbydescribingourcoretechnologyoflightweightbinary-levelsymbolicexecution(Section3.1).Thenwedescribethetwowaysweapplyit:rst,todiscoverpossibleinstructions(Section3.2),andthentondmachinestatesthattriggervariousbehaviorsinanemulatedinstruction(Section3.3).Finally,wede-scribeadifference-minimizationtechniquethatweusetosimplifythemachinestatesdiscoveredbysymbolicexecution(Section3.4).3.1LightweightSymbolicExecutionThecoreofoursystem'sstate-spaceexplorationisalightweightengineforbinary-levelsymbolicexecution,namedFuzzBALL.Westartourdescriptionwithareviewofthekeyconceptsofsymbolicexecutioningeneral,thendescribetheonlineapproachourtooltakes,andsomeoftheparticularchallengesthatarisewhenoperatingonbinaries.Atahighlevel,FuzzBALLimplementssimilarfunctionalitytoprevioussymbolicexecutionsystemssuchasKLEE[6].Butincontrast,ittakesasimplerapproachinsomeareasthatcanbeperformanceorcodecomplexitychallengesinothersystems,anditisdesignedforabinary-level,ratherthanasource-level,programrepresentation.3.1.1Background:SymbolicExecutionThebasicprincipleofsymbolicexecutionistoreplacecertainconcretevaluesinaprogram'sstatewithsymbolicvariables.Asthesesymbolicvaluesareusedinlatercomputations,theyproducemorecomplexsymbolicexpressions.Thesesymbolicexpressionsarevaluablebecausetheycansummarizetheeffectofmanycon-creteexecutions.Whenasymbolicexpressionisusedinacontrol-owinstruc-tion,wecalltheformulathatcontrolsthetargetabranchcondition.Onacompleteprogramrun,theconjunctionoftheconditionsforallthesymbolicbranchesisthepathcondition.Thusthevaluesforthesymbolicvariablesthatsatisfyapathconditionareonesthatwouldcausetheprogramtoexecutethesamecontrol-owpathastheoneexecutedsymbolically.Similarly,bytakingaprexofthepathconditionwiththenalbranchconditionnegated,weobtainaconditioncorrespondingtoadifferentcontrol-owpath.Solvingsuchapathconditionletsusobtainanewsetofconcretevaluesthatwouldcausethecorrespondingpathtobeexecuted. IntelR CoreTMi5isatrademarkofIntelCorporationintheU.S.and/orothercountries.Othernamesandbrandsmaybeclaimedasthepropertyofothers. Oneachexecution,FuzzBALLexaminesthedecisiontreetochoosearandompathwithinthepartofthetreethathasnotbeencompletelyexplored,thenaddsontothetreeforthepartofthepathbeingexploredthatisnew.Whencreatinganewnode,FuzzBALLcheckswhetherboththefalseandtruebranchdirectionsarefea-sible,andifso,itcanchoosearbitrarily(eitherrandomlyorac-cordingtoasuppliedheuristic).Afterreachingtheendofthepath,FuzzBALLpropagatesthebitindicatingthatasubtreehasbeenfullyexploredbackupthetreeuntilitreachesanodewithanunex-ploredbranch.Thedecisiontreegrowsasmorepathsareexplored,soFuzzBALLusesacompactin-memoryrepresentationandcanoptionallystoreitondiskinstead,butthisisnotneededforrunsofthelengthusedinthisproject.Branchesthatcomefromifstatementsandbranchesforloopexitconditionsaretreateduniformly,sinceattheinstructionleveltheylookthesame.ThusFuzzBALLconsidersadifferentnum-berofexecutionsofaloopasdistinguishingadifferentexecu-tionpath.Inotherapplications,thiscanleadtoasignicantstate-spaceexplosiontomanage,butitisnotamajorobstacletoPo-keEMUbecauseinstructionsusuallydonotcontainunboundedinput-dependentloops.ThusthedecisiontreeensuresthateachpathFuzzBALLex-ploresisdifferent,andthatexplorationstopsifnofurtherpathsarepossible.Similarlytosystemsthatduplicateexecutionstateatasymbolicbranch[6,8],thedecisiontreesaves(expensive)in-vocationsofthedecisionprocedurewhenthetoolalreadyknowswhichbranchdirectionisfeasible.Asatradeoff,ourapproachre-peats(relativelyinexpensive)concreteandsymbolicexecutionontherepeatedpath,toavoidkeepingmultiplestatesatonce,whichwouldincreasememoryusageandimplementationcomplexity.ExtensiontoWord-sizedValues.Whenexecutionrequiresacon-cretevalueforaword-sizedexpression,likeaswitchstatementargumentoranarrayindex,FuzzBALLappliesthesamemecha-nismsdescribedabovefortwo-waybranches,onceforeachbitintheword,most-signicantrst.Thisreductioncarriesoverthekeypropertiesfromtwo-waybranches:FuzzBALLchoosesonlyfeasi-blevalues,andeventuallytriesallfeasiblevalues.3.1.3OperatingattheBinaryLevelSinceFuzzBALLtargetsbinariesratherthansource,itmustad-dresschallengesincludinginstruction-setcomplexityandvariable-sizedmemoryaccesses.Tofactoroutinstruction-setcomplexity,FuzzBALLusestheBitBlaze4Vinelibrary[27],whichinturnbuildsontheVEXlibrarywhichisalsousedbytheValgrinddebuggingtool[25].FirstVEXtranslatesanx86instructionintotheVEXintermediaterepresenta-tion,andthenVinetranslatesfromthisintoitsownlanguagewhichisevensimpler;thesetranslationsarecachedforefciency.Tohan-dlememoryaccessesofdifferentoperandsizes(bytes,words,etc.),FuzzBALLtrieswhenpossibletorepresentvaluesintheirnaturalsize,sothatsplittingandreassemblyarerequiredonlywhentheprogramitselfaccessesmemoryinaninconsistentway.Toachievethis,FuzzBALL'srepresentationofmemorycancontainsymbolicvaluesofdifferingsizes.Wedescribesomeadditionalimplementa-tionchallengesthat,inparticular,areinspiredbyusewithemula-torsinSection3.3.2.3.1.4ImpactofFuzzBALL'sCorrectnessAtthispoint,onemightworryaboutseeminglycircularreasoninginourapproach.Ourgoalistocheckthecorrectnessofonex86interpreter(thatinanemulator),butourtechniquereliesonanother 4http://bitblaze.cs.berkeley.edu/x86interpreter(thatinsideFuzzBALL).WhatifFuzzBALLhasbugssimilartothosewend(Section6)inotheremulators?Infactthereareseveralreasonswhyourapproachisstillef-fective.First,anysuchbugsinFuzzBALLwouldbeunlikelytosignicantlyaffectourresults,becauseemulatorsuseintheirownimplementationamuchsmallerandbetter-exercisedsubsetofpro-cessorfunctionalitythantheyemulate.Second,thedifferenceswediscoverarereal,independentofthetestgenerationprocess.Weusesymbolicexecutiontoimprovecoverage,butthebehaviordif-ferencesarevalidatedbytestcasesthatrunontheirown.Third,FuzzBALLcanbeusedtovalidatemanyemulators,soeffortstowardsstrengtheningorverifyingthecorrectnessofFuzzBALLwouldbeampliedthroughitsuseinatoolsuchaPokeEMU.3.2InstructionSetExplorationThex86instructionsetiscomplexenoughthatevenjustenumer-atingallthepossibleinstructionsisnon-trivial.Butwewouldlikeexactlysuchanenumeration,inordertopartitionthelaterexplo-rationsothatweconsidereachinstructionseparatelyandexactlyonce.Thereforeourrst,andrelativelysimpler,applicationofsym-bolicexecutionistodiscoverasetofbytesequencesrepresentinginstructionstotest.Weobservethatemulatorscontaininstruction-decodingfunc-tionalitytoparseabytesequence,checkwhetherthesequenceisalegalinstruction,andifso,decidewhichcodeintheemulatorwillprocessit.Thislatercodemightbetheimplementationitselfinaninterpreter,oracode-generationroutineinanIR-basedorJIT-compileremulator;wewillrefertoitasper-instructioncode.Byexploringtheinstructiondecoderwithsymbolicexecution,wecandiscoverwhichbytesequencestheemulatorconsiderstobeinstruc-tions,andgroupbytesequencesthatarethe“same”instructioninthesensethattheyhavecommonper-instructioncodeintheemu-lator.Inparticular,westartsymbolicexecutionattheentrypointoftheemulator'sinstructionparser,markthebytesthataretheinputtothisparserassymbolic,andexploreexecutionpathsuptotheselectionoftheper-instructioncode.Anx86instructionisbetween1and15bytes,consistingofop-tionalprexbytes,anopcodethatisusually1or2bytes,andtrail-ingelds.Thosetrailingeldscanspecifyasub-opcode,registeroperands,addressingmodes,andimmediatevalues.Thetotalnum-berofpossibleinstructionbytesequencesisastronomical(thoughlessthan28151:31036,becausenotallinstructionsallowallpossibleprexesandoperands).Toselectamoremanageablenumberofbytesequences,weconceptuallypartitionthebytese-quencesaccordingtowhichper-instructioncodetheytrigger,andselectaboundednumberofbytesequences(currently1)fromeachcellofthepartition.Intuitively,weselectonebytesequenceperin-struction,forthedenitionof“instruction”givenbytheemulator'sper-instructioncode.Selectingmorebytesequencesperinstructionwouldslightlyimproveourcoverageoffunctionalityselectedbyagswithintheinstruction,suchasdifferentaddressingmodes,butweestimatethattheincrementalbenetwouldberelativelylow.Theinstructionsdenedbyemulatorimplementationsarenotinone-to-onecorrespondencewiththe1-2byteinstructionopcodeeld:someopcodevaluescorrespondtomultipleimplementationsdependingonprexesoranextrasub-opcodeeld,andsomedis-tinctopcodesshareasingleimplementation.Butweobservethatgenerallyatmosteitherasingleprexbyteorthesub-opcodewithinthenextbyteaftertheopcodeisalsorelevant,andanyotherprexbytesareoptional,soeveryimplementationhasauniquerep-resentativebasedontherstthreebytesofaninstructionbytese-quence.AsshownintheresultsofSection6,thisapproachallowsustocutdownanoriginalspaceof224(16.8million)three-bytesequencesintolessthan1000uniqueinstructions. PG CD NW AM WP NE ET TS EM MP PE 0 8 16 24 31 PCW PWT Page-directorybase ... SMXE VMXE 0 0 ... ... PCE PGE MCE PAE PSE DE TSD PVI VME Reserved ID VIP VIF AC VM RF 0 NT OF DF IF TF SF ZF 0 AF 0 PF 1 CF IOPL Reserved CR0 CR3 CR4 EFLAGS EAX...ESP G D/B L AVL P S 0 8 16 24 31 Base Base Limit Type DPL Limit Base GDTentries PS A PCD PWT U/S R/W P Page-tableaddress PDentries G PAT D A PCD PWT U/S R/W P PTentries Physicaladdress TI 0 8 15 RPL Index CS...GS Figure3.Symbolicmachinestate(grayedbitsaresymbolic,theremainingonesareconcrete).butnotconstrainedbychecksontheexploredpath,thedecisionprocedurewillchoosevaluesforthemarbitrarily.However,thisexibilityisinconvenientfortworeasons.First,itmakesthegen-eratedtestshardertounderstand,becausetheycontainextrastatedifferencesthatareirrelevanttotheexecutionoftheemulator,andcausedonlybythedecisionprocedure'sarbitrarychoices.Second,theseirrelevantdifferencescancausetestexecutiontofailwhentheyaffectstatethatischeckedinthetestexecutionbutnotduringthesymbolicexecution.Asanexample,westartsymbolicexecu-tioninBochsafterithasfetchedanddecodedaninstruction,sothepermissionsonthecodesegmentCSarenotrelevantformostinstructions.Butinarealexecution,thetestinstructionmustbefetchedusingCS,soachangethatmakesthatsegmentinaccessiblewillcausethetesttofailbeforeexecutingtheinstruction.Toavoidtheseproblems,wewishtobaseteststatesnotonanas-signmentwhereunconstrainedbitsarearbitrary,butononewhereunconstrainedbitsareleftthesameasinabaselinemachinestatethat“justworks.”Inotherwordswewanttondanassignmentthatisminimallydifferentfromthebaselinestate.Weimplementthisminimizationusingasimpleandefcientgreedyapproach.Start-ingwithaworkingassignmentequaltotheoneproducedbythedecisionprocedure,weiterateovereachofthebitsthatarediffer-entfromthebaselinestate.Foreachbitthatisdifferent,wecheckwhethersettingittoitsvalueinthebaselinestatestillsatisesthepathcondition;ifso,wemakethechangeintheworkingassign-ment.Potentiallymakingmultiplepassescouldfurtherreducethesizeofthedifference,butasinglepassissufcientfortheproblemofunconstrainedvariables,whichisourmainmotivation.Wealsoexploredimplementingthisminimizationbyexclud-ingvariablesfromtheassignmentthatdonotappearinthepathcondition.However,particularlyinthepresenceofbitwiseoper-ations,FuzzBALL'ssymbolicexpressionssometimesretainirrele-vantvariables.Itwouldhaverequiredacomplexadditionalanalysistoreliablyremovesuchvariables.Bycomparisonourcurrentap-proachbasedonevaluationwassimpletoimplementandrequiresnoapproximation.4.GeneratingTestProgramsFigure4showstheexecutionofatestprogram,whichisastan-dalonediskimagethatbootsanemulator,initializesateststate,executesatestinstruction,andeitherhaltsnormallyorraisesanexception.Tosimplifytheprocessofconstructingcodetosetuptheteststate,wedivideitintotwosteps.Firstwewriteabaselinestateinitializer,codethatsetsupasinglebaselinestatethatisastartingpointforanystateinagivenprocessormode.Thenweuseanautomatedcodegenerationprocesstoconstruct,foreachspe- Boot-strap Baselinestateinit. Testinit. Testinsn. Halt Exception Testprogram Figure4.Executionofatest(theblackcirclesdenotewhenwetakeasnapshotoftheCPUstateandofthephysicalmemory;therectangledelimitsthetestprogram).cicteststate,theadditionalinitializationsneededtoreachtheteststatefromthebaselinestate:wecallthesetheteststateinitializers.Theadvantageofthistwo-stepapproachisthatbecausetheteststatesaresimilartothebaselinestate,werequirerelativelylittlenewcodespecictoeachtest.Wechooseabootablediskimageastheeasiestwaytoloadandruncodeinanemulator.Soinsummary,atestconsistsofabootablediskimagecontaininganoff-the-shelfbootloader,thexedbase-linestateinitializer,theteststateinitializersforaparticularteststate,andthetestinstruction.Nextwedescribeindetailbaseline-stateinitializationandtest-stateinitializergeneration.4.1BaselineStateInitializationThebaselinestateisaminimalistexecutionenvironmentnecessaryforsuccessfullyrunningallpossibletestsinaspecicoperatingmode.Thisbaselinestatecorrespondstotheconcretestateusedduringtheexplorationstage(describedinSection3.3).Wenowdescribespecicallythebaselineinitializerweusefor32-bitpro-tectedmodewithpagingenabled,themostcommonmodeforx86processorsandtheoneusedinourevaluation.Wecouldconstructsimilarbaselineinitializersforothermodes.Theoff-the-shelfbootloaderweusehappenstoalreadycon-gurethemachinein32-bitprotectedmode.Theremainderoftheinitializationconsistsofpopulatingtheglobaldescriptortable,thepagetable,theinterruptdescriptortable,andenablingpagingandinterrupts.Morepreciselyweinitializetheglobaldescriptorta-bletouseaatsegmentationmodel.Thatis,thecode,data,andstacksegmentshaveazerobaseanda4-GBytelimit.Wecongurethepagetabletomapthe4-GBytevirtualaddressspacelinearlytoa4-MBytephysicalmemory,repeatingevery4-MBytessothateachphysicalpagebacks1024virtualpages.Allpagesareinitiallymarkedasreadableandwritableandaccessibletobothuserandkernelmode.Thiscongurationensuresthat,unlesstheglobalde- %esp:0x002007dc00208055:0x13(gdt10)00208056:0xcf(gdt10)(a)1movl$0x002007dc,%esp2movb$0x13,0x00208055//modifysegmenttypeand3movb$0xcf,0x00208056//defaultoperationsize(gdt10)4movw$0x0050,%ax//forcereloadofstacksegment5movw%ax,%ss6movl$0x00000000,%eax//restorekilled%eax7.byte0xff,0xf0//push%eax8hlt//theend(b) Figure5.Sampletest-casegeneratedbyFuzzBALL(a)andcorrespondingx86codeofthetestprogram(b),fortheinstructionpush%eax.andistriggeredbytraps.Hardwareinterrupts,exceptions,andhaltrequeststhatoccurwhileexecutingguestcodedirectlyonthehardwarecanbeinterceptedbyconguringtheCPUtotrapintothevirtualmachinemonitorwhenevertheyoccur.Whenatrapoccurs,thevirtualmachinemonitor,havingcompletevisibilitytothestateoftheguestvirtualmachine,cancreateasnapshotofthestateoftheCPUandofthephysicalmemory.Finally,thehardwareguaranteesaseparationoftheguestfromthevirtualmachinemonitor.Thus,thevirtualmachinemonitorisalwaysabletoregaincontroloftheexecution,itcanresetthestateoftheguest,andmultipletestscanberunwithouthavingtoresetthemachinephysically.Alltheguestinstructionsinthetestprogramthatcanbedirectlyexecutedonthehardwareareguaranteedtobecorrect.Inotherwords,thestateattheendoftheirexecutioncorrespondstothestatewewouldobtainifweexecutedthesameinstructionswithoutthevirtualizationlayer.Ontheotherhand,fortheinstructionsthatrequirethemediationofthevirtualmachinemonitorwedonothavethesameguarantee.Howeverthenumberofsuchinstructionsisverysmall(justthosethatloadandstoreafewprivilegedcontrolregisters),andtheirsemanticssimple,sowehavecheckedbyhandthatthecodeinthevirtualmachinemonitorresponsibleforthemediationcomplieswiththerealsemantics.OurimplementationisbasedonKVM[19](Kernel-basedVir-tualMachine),avirtualmachinemonitorforGNU/Linux.OnlyafewmodicationswerenecessarytotheoriginalKVMcodebaseinordertointerceptalltrapsthatoccurafterthebaselinestatehasbeeninitialized.Wehandledifferenttypesoftrapsdifferently.Ifthetraporiginatesfromanexceptionorahaltrequest,wetakeasnap-shotoftheguestCPUstateandphysicalmemoryandterminatetheguest.Ifthetraporiginatesfromahardwareinterrupt,weignorethetrapandresumetheexecutionoftheguest.Anotherclassoftrapsareusedtosimulateexceptions:theseoccurwhenaninstructionthatwouldnormallycauseanexception(intheabsenceofthevir-tualizationlayer),insteadgeneratesavirtualizationtrap.Thusforallothertypesoftrap,weletthevirtualmachinemonitorhandlethetrap,but,beforeresumingtheexecutionoftheguest,wecheckwhetheranexceptionwillbeinjectedintotheguestatthenextre-sume.Ifso,thisindicatesthatthetrapwassimulatinganexception,sowetakeasnapshotandterminateasforadirectexception.6.EvaluationWeevaluatedPokeEMUbycomparingthebehaviorsofthelatestversionsofQEMU(0.14.0)andBochs(2.4.6),withthebehaviorofanIntelR CoreTMi5processor.OnthelatterweusedacustomizedversionofKVM(2.6.37)toautomatetheexecutionoftheexper-iments.Sincethei5processorhashardwaresupportformemoryvirtualization(extended,ornested,pagetables),thevastmajorityoftheinstructionscouldbeexecutednativelybythehardwarewith-outtheneedforsoftwareemulation.AstheHi-FiemulatorweusedaslightlyearlierversionofBochs(2.4.5),thelatestavailableatthetimewestartedworkingonthisproject.Weslightlycustomizedthisemulatortoeasesymbolicexecution(e.g.,wedisabledthedevicesandtheuserinterface).WegeneratedtestcasesusingvirtualmachinesrunningonAmazonEC2.WethenusedthesamevirtualmachinestorunthetestcasesinQEMUandBochsandtocomparetheirbehaviors.Thegenerationofthetestcasesrequired545.4CPUhourson38-coreinstancesonEC2(totalcostwasabout135USdollarsinAmazonEC2chargesduringthesummerof2011).Generationishighlyparallelizable,sincethebulkofitsexecutioncostliesintheinvocationsofthesolver,andmultiplepathscanbeexploredatthesametime.Weestimatethat,withproperscheduling,test-casegenerationwouldtakeabout33.0hourson3instances.Test-caseexecutiontooktotalsof198.7,391.9,and48.5CPUhoursonQEMU,Bochs,andtherealhardware,respectively,andresultscomparisontook175.9CPUhours.Testexecutionisalsohighlyparallel,butourreal-hardwaretestingapproachisincom-patiblewithEC2'spara-virtualization;forthepresentresultsweusedalocalworkstation.Bycombining13EC2instancesand3bare-metalinstancesfromanotherprovider,andaccountingforthenetworktransferbetweenthem,weestimatethatacompletesetoftestexecutionsandthecomparisonoftheirresultswouldtake7.8hoursand$100.19.Thisisalreadyfastenoughtousefornightlyre-gressiontesting,sowebelievethatexecutiontimeisnotalimitingfactorforourapproachorthePokeEMUprototype.Oursystemwasabletoidentifyseveraldifferencesinthebehav-iorsoftheemulators,someofwhichwerenotknownbefore.Wearguethatoursystemcansuccessfullybeusedinthefuturetoval-idatetheimplementationofthecurrentlymissingsecurityfeaturesinQEMU(i.e.,theenforcementofsegments'limitsandrights)andtheotherissues(suchasthosecausedbythelackofatomicitydur-ingemulation)wefound.6.1CompletenessoftheTestingTogeneratetestinstructionsweexploredtheinstructionsetusinga15byteinputbuffer.Therstthreebytesofthisbufferweremadesymbolic(forthereasonsexplainedinSection3.2)andtheremainingonesweresettozero.Weidentied68,977candidatebytesequencesencodingvalidinstructionsandthenselected880uniqueinstructions.Thissetofinstructionscoveredalltheuniqueinstructionssupportedbytheemulator,withtheexceptionofafewSIMDinstructionswhoseopcodesarelongerthanthreebytes;wealsoexcludedoatingpointinstructionssinceoursymbolicexecutionenginedoesnotsupportthem.5Weusedeachoftheseinstructionstoexplorethemachinestate-spaceandtogeneratetestprograms.Fortheexplorationwetreatedtheentiremachinestateassymbolic,withtheexceptionofthebytesinmemoryrepresentingpointers(asshowninFigure3),theFPUstate,theMMXregisters,andthecontentsoftheinterruptdescriptortable.Asconcreteinputsweusedasnapshotofthe 5Someofthetechniquesusedforoating-pointequivalencecheckingbyCollingbourneetal.[11]mighthelpusremovetheoating-pointrestrictionfromPokeEMUinourfuturework. Inpractice,however,emulatorsmaythemselvescomposein-dividualinstructionsincorrectly,especiallyinthecaseofQEMU,whichperformsdynamicbinarytranslationformulti-instructionsequences.Inourfuturework,weplanonstudyinghowmulti-instructionsequencesaretreatedbyemulators.SymbolicExecutionofJITCompilersandHardwareSpecications.Wehavebasedoursystemonbinary-levelsymbolicexecutionsothatinthefuturewecanapplyittoemulatorsbasedonjust-in-timecompilation,suchasQEMU.Forexample,itwouldbeinterestingtoperformtheconverseofthecomparisoninSection6bygeneratingtestsfromQEMUandusingthemtoevaluateBochs.SinceBochsisgenerallymorecomplete,ourexpectationisthatthiswouldproduceonlyafewmoredifferencesthanourcurrentexperiments,butitisimportantiftherearecaseswhereQEMUimplementsacheckandBochsfailsto.Inthelimit,itmaybepossibletoapplyourpath-explorationlift-ingmethodologytothehighest-Fiemulatorthereis:thehardwarespecicationitself.Althoughwehavenohopeofobtaining(andpublishingabout)specicationsofcommercialhardware,itmightbepossibletoapplythismethodologytoopen-sourcehardwarear-chitectures,liketheSPARCLeonprocessor.Beforewereachthatdesirableremotelimit,wehopetostudyhigher-levelinterpreters,e.g.,forhigh-levellanguagessuchasJava.OtherVirtualMachines.Wecurrentlymakesomeuseofsourcecodetosimplifytheworkowofourstudy,butourbinaryapproachallowsustotackleemulatorsforwhichwehavenosourcecodeatall,e.g.,commercialvirtualmachinemonitorsthatincorporateemulationinoneormoreexecutionmodes.Tofacilitatethis,wewouldliketofurtherautomatetheprocessofdeterminingwhichhostlocationsholdguestmachinestate.Forinstancethelocationof%eaxistheonewheretheemulatorwrites42whenexecutingtheinstructionmov$42,%eax.EquivalenceChecking.Despiteitspromise,ourapproachonlypro-videstests,notproofsofcorrectness.Afurtherdirectiontoimprovethecompletenessofouremulatorcheckingwouldbetoperformacompleteequivalencecheckbetweenoursetofsymbolicexecu-tionresults.StartingwithasingleHi-Fiemulatorpath,wecouldidentifyallpathsintheLo-Fiemulatorexercisedbythesamein-putstates.ThenwecouldsymbolicallycombinetheresultsforallLo-Fipathsintoasinglelargeformula(asinthesummary-buildingtechniquedescribedinSection3.3.2).ThenwewouldcheckwithadecisionprocedurewhethertheformulaforthesingleHi-FipathisequivalenttotheformulaforthefewLo-Fipathsonallpossibleinputs.Itmaybedifculttomakesuchanapproachscaletoallinstructions,butwhenitworksitprovidesaverystrongstatementabouttheabsenceofdifferences.Thishasbeentriedsuccessfullyforsmaller,restrictedprograms,likeprocessormicrocode[2].8.RelatedWorkNextwediscusstwoclassesofpreviousresearchthatarerelatedtoourworkhere:rst,otherprojectsthathavesearchedforbugsinemulators,andthenothersystemsforsymbolicexecution.TestingofEmulators.Emulatorauthorspresumablyperformtest-inginternally,buttherehasbeenrelativelylittleresearchontech-niquestomakethattestingmoreautomatedandeffective.AseriesoftworecentpapersbyMartignonietal.showthepracticalvalueofthird-partycomparativetestingofemulators.TheyrsttestedCPUemulatorsspecically,withrandomlygeneratedinstructions[20].Latertheytestedwhole-systemvirtualmachines(basedonemula-tionandothertechnologies)usinghand-writtentemplatesthatwerethenautomaticallyexpandedtocreatealargernumberofinstruc-tionsequences[21].Togenerateasetoflegalinstructionbytese-quences(thesamechallengewefaceinSection3.2),theyperformaconcreteexplorationusingtheCPUasablack-boxcorrectnessoracle.TheyalsoexecutetestsusingtechniquessimilartotheoneswedescribeinSection4:eitherwithauser-spaceprogram[20]oracustom-writtenkernel[21].However,randomtestingonitsowndoesnotprovidethesamekindofcoverageguaranteesthatsym-bolicexecutiondoes.First,PokeEMUcompletedtestgenerationwithmeasurablepathcoverage:completepathcoveragefor95%ofthetestedinstructions,aprecisequantitativemeasureofcoverage,whichrandom-testingmethodscannotprovide.Second,asshownbythecomparisonofSection6,ourapproachrevealedsomebugsthatthesepreviouslystate-of-the-artstudiesbasedonrandomtest-ingdidnotnd.Therefore,weconsiderPokeEMUademonstratedimprovementoverthestateoftheart.SymbolicExecution.Thoughourprimarymotivationinthisworkisthepracticalproblemoftrustworthyemulation,ourresultstherearemadepossibleinpartbyimprovementsintheunderlyingtech-nologyofsymbolicexecution.Symbolicexecutionwasrstproposedinthe1970s[18].Ithasbeenthesubjectofrenewedinterestinthelastdecadethankstoanewgenerationofapproaches[7,15]andadvancesinconstraintsolvingandincreasedcomputingpowerthathaveallowedittobemorewidelyapplied.Wecanclassifysymbolicexecutionsystemsaccordingtotherelationshipbetweenconcreteandsymbolicex-ecution.Insystemsthatarecalledtrace-based,dynamic,orcon-colic[26],theprogramchoosesbranchdirectionsbasedonacon-creteinput,butrecordsapathsothatitcangenerateandifferentinputlater.Bycontrastonlinesystems,ofwhichFuzzBALLisanexample,maintainsymbolicvalueswithoutacorrespondingcon-cretevalue,andsocanbefreetochooseeitherdirectionatabranch.AnotheronlinesymbolicexecutiontoolisKLEE[6],whichgeneratestestcasesforCprogramsusingasymbolicinterpreterforLLVMbytecode.KLEEissimilartoFuzzBALLinmanyways,buthastwokeydesigndifferences.First,KLEE“forks”andmain-tainsmultipleexecutionstatesatoncewhenbothsidesofabrancharefeasible,whereasFuzzBALLexecutesjustonepathtocom-pletionandreturnstootherpathslater.Second,KLEE'ssymbolicconstraintscancontainarrayexpressions,whileFuzzBALLavoidsthembychoosingconcretevaluesforindexes.KLEE'sapproachproducesfewerexecutionpaths,butitrequiresadditionalknowl-edgeandassumptionsaboutthewayaprogrammanagesmemory.Also,decisionprocedurequeriesthatcontainlargearrayscanbesignicantlydifculttosolve.ThoughamoresymbolicapproachcouldbeaddedtoFuzzBALL,ourcurrentapproachworkssuf-cientlywellformanyapplications,includingthepresentone.Particularlyforsecurityapplications,itisimportanttobeabletoperformsymbolicexecutionatthebinarylevel,aswedo.SAGE[16]isatrace-basedsymbolicexecutionsystemforx86thatisusedforextensivetestingwithinMicrosoft,butisnotpubliclyavailable;SmartFuzz[22]isopen-sourceandbasedonValgrind.Howevertrace-basedsystemstendtobegearedtoexploringjustafewpathsinaprogram,ratherthantheexhaustiveexplorationweperform.Anothercapabilitythatisimportantinsomesecurityapplicationsistobeabletosymbolicallyexecuteaprograminthecontextofacompleteoperatingsystem.Inatrace-basedtoolonecancollecttraceswithawholesystememulator,butmaintainsymbolicinformationforasingleprocess,asintheBitFuzz[5]system,basedonQEMU.Mostrecently,S2E[10]isanonlinesystemthatintegratesKLEEwithQEMU,allowingmoreexiblecombinationofsymbolicandconcreteexecutionacrossmultiplecomponents.However,ouremulatorsdonotmakesignicantuseoftheoperatingsystemwhenexecutinginstructions,soalighter-weightsingle-processapproachwasappropriateforus.