2YichenXieandAlexAikenprogrampropertiesComparedtopreviouserrordetectiontoolsbasedondatarowanalysisorabstractinterpretationourapproachhasthefollowingadvantages1PrecisionSaturnsmodelingofloopf ID: 610304
Download Pdf The PPT/PDF document "Saturn:AScalableFrameworkforErrorDetecti..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisabilityYichenXieandAlexAikenStanfordUniversityThisarticlepresentsSaturn,ageneralframeworkforbuildingpreciseandscalablestaticerrordetectionsystems.Saturnexploitsrecentadvancesinbooleansatisability(SAT)solversandispathsensitive,precisedowntothebitlevel,andmodelspointersandheapdata.Ourapproachisalsohighlyscalable,whichweachieveusingtwotechniques.First,foreachprogramfunction,severaloptimizationscompressthesizeofthebooleanformulasthatmodelthecontrol-anddata-\rowandtheheaplocationsaccessedbyafunction.Second,summariesinthespiritoftypesignaturesarecomputedforeachfunction,allowinginter-proceduralanalysiswithoutadramaticincreaseinthesizeofthebooleanconstraintstobesolved.WehaveexperimentallyvalidatedourapproachbyconductingtwocasestudiesinvolvingaLinuxlockcheckerandamemoryleakchecker.Resultsfromtheexperimentsshowthatoursystemscaleswell,parallelizeswell,andndsmoreerrorswithfewerfalsepositivesthanpreviousstaticerrordetectionsystems.CategoriesandSubjectDescriptors:D.2.4[SoftwareEngineering]:Software/ProgramVeri-cation;D.2.3[SoftwareEngineering]:CodingToolsandTechniques;D.2.5[SoftwareEngi-neering]:TestingandDebuggingGeneralTerms:Algorithms,Experimentation,Languages,Verication.AdditionalKeyWordsandPhrases:Programanalysis,errordetection,booleansatisability.1.INTRODUCTIONThisarticlepresentsSaturn1,asoftwareerror-detectionframeworkbasedonex-ploitingrecentadvancesinsolvingbooleansatisability(SAT)constraints.Atahighlevel,Saturnworksbytransformingcommonlyusedprogramcon-structsintobooleanconstraintsandthenusingaSATsolvertoinferandcheck1SATisability-basedfailUReaNalysis.ThisresearchissupportedbyNationalScienceFoundationgrantCCF-1234567.Thisarticlecombinestechniquesandalgorithmspresentedintwopreviousconferencepapersbytheauthors,publishedrespectivelyinProceedingsofthe32ndACMSIGPLAN-SIGACTSym-posiumonPrinciplesofProgrammingLanguages(POPL2005)andProceedingsofthe5thJointMeetingoftheEuropeanSoftwareEngineeringConferenceandACMSIGSOFTSymposiumontheFoundationsofSoftwareEngineering(FSE).Authors'Address:YichenXieandAlexAiken,ComputerScienceDepartment,StanfordUniver-sity,Stanford,CA94305;E-mail:fyxie,aikeng@cs.stanford.edu.Permissiontomakedigital/hardcopyofallorpartofthismaterialwithoutfeeforpersonalorclassroomuseprovidedthatthecopiesarenotmadeordistributedforprotorcommercialadvantage,theACMcopyright/servernotice,thetitleofthepublication,anditsdateappear,andnoticeisgiventhatcopyingisbypermissionoftheACM,Inc.Tocopyotherwise,torepublish,topostonservers,ortoredistributetolistsrequirespriorspecicpermissionand/orafee.c\r2005ACM0164-0925/05/XXXX-XXXX$5.00ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear,Pages1{??. 2YichenXieandAlexAikenprogramproperties.Comparedtopreviouserrordetectiontoolsbasedondata\rowanalysisorabstractinterpretation,ourapproachhasthefollowingadvantages:(1)Precision:Saturn'smodelingofloop-freecodeisfaithfuldowntothebitlevel,andisthereforeconsiderablymoreprecisethanmostabstraction-basedapproacheswhereimmediateinformationlossoccursatabstractiontime.Inthecontextoferrordetection,theextraprecisiontranslatesintoaddedanalysispowerwithlessconfusion,whichwedemonstratebyndingmanymoreerrorswithsignicantlyfewerfalsepositivesthanpreviousapproaches.(2)Flexibility:Traditionaltechniquesrelyonacombinationofcarefullychosenabstractionstofocusonaclassofpropertieseectively.Saturn,byexploitingtheexpressivepowerofbooleanconstraints,uniformlymodelsmanylanguagefeaturesandcanthereforeserveasageneralframeworkforawiderrangeofanalyses.Wedemonstratethe\rexibilityofourapproachbyencodingtwoprop-ertycheckersinSaturnthattraditionallyrequiredistinctsetsoftechniques.However,SAT-solvingisNP-complete,andthereforeincursaworst-caseexpo-nentialtimecost.SinceSaturnaimsatcheckinglargeprogramswithmillionsoflinesofcode,weemploytwotechniquestomakeourapproachscale.Intraproce-durally,ourencodingofprogramconstructsasbooleanformulasissubstantiallymorecompactthanpreviousapproaches(Section2).Whilewemodeleachbitpathsensitivelyasin[XieandChou2002;Kroeningetal.2003;Clarkeetal.2004],severaltechniquesachieveasubstantialreductioninthesizeoftheSATformulasSaturnmustsolve(Section3).Interprocedurally,Saturncomputesaconcisesummary,similartoatypesigna-ture,foreachanalyzedfunction.Thesummary-basedapproachenablesSaturntoanalyzemuchlargerprogramsthanpreviouserrorcheckingsystemsbasedonSAT,andinfact,thescalingbehaviorofSaturnisatleastcompetitivewith,ifnotbet-terthan,othernon-SATapproachestobugndingandverication.Inaddition,Saturnisabletoinferandapplysummariesthatencodeaformofinterprocedu-ralpathsensitivity,lendingitselfwelltocheckingcomplexprogrambehaviors(seeSection5.2foranexample).Summary-basedinterproceduralanalysisalsoenablesparallelization.Saturnprocesseseachfunctionseparatelyandtheanalysiscanbecarriedoutinparallel,subjectonlytotheorderingdependenciesofthefunctioncallgraph.InSection6.8,wedescribeasimpledistributedarchitecturethatharnessestheprocessingpowerofaheterogeneousclusterofroughly80unloadedCPUs.OurimplementationdramaticallyreducestherunningtimeoftheleakcheckerontheLinuxkernel(5MLOC)fromover23hoursto50minutes.Wepresentexperimentalresultstovalidateourapproach(Sections5and6).Sec-tion5describestheencodingoftemporalsafetypropertiesinSaturnandpresentsaninterproceduralanalysisthatautomaticallyinfersandcheckssuchproperties.Weshowonesuchspecicationindetail:checkingthatasinglethreadcorrectlymanageslocks|i.e.,doesnotperformtwolockorunlockoperationsinarowonanylock(Section5.5).Section6givesacontext-andpath-sensitiveescapeanaly-sisofdynamicallyallocatedobjects.Bothcheckersndmoreerrorsthanpreviousapproacheswithsignicantlyfewerfalsepositives.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability3OnethingthatSaturnisnot,atleastinitscurrentform,isavericationframework.ToolssuchasCQual[Fosteretal.2002]arecapableofverication(provingtheabsenceofbugs,oratleastascloseasonecanreasonablycometothatgoalforCprograms).Inthispaper,SaturnisusedasabugndingframeworkinthespiritofMC[Hallemetal.2002],whichmeansitisdesignedtondasmanybugsaspossiblewithalowfalsepositiverate,potentiallyatthecostofmissingsomebugs.Therestofthearticleisorganizedasfollows:Section2presentstheSaturnlanguageanditsencodingintobooleanconstraints.Section3discussesanumberofkeyimprovementstotheencodingthatenableecientcheckingofopenprograms.Section4givesabriefoutlineofhowweusetheSaturnframeworktobuildmodularcheckersforsoftware.Sections5and6aretwocasestudieswherewepresentthedetailsofthedesignandimplementationoftwopropertycheckers.WedescribesourcesofunsoundnessforbothcheckersinSection7.RelatedworksisdiscussedinSection8andweconcludewithSection9.2.THESATURNFRAMEWORKInthissection,wepresentalow-levelprogramminglanguageanditstranslationintoourerrordetectionframework.BecauseourimplementationtargetsCprograms,ourlanguagemodelsintegers,structures,pointers,andhandlesthearbitrarycontrol\row2foundinC.Webeginwithalanguageandencodingthathandlesonlyintegerprogramvalues(Section2.1)andgraduallyaddfeaturesuntilwehavepresentedtheentireframework:intraproceduralcontrol\rowincludingloops(Section2.2),struc-tures(Section2.3),pointers(Section2.4),andnallyattributes(Section2.5).InSection3weconsidersometechniquesthatsubstantiallyimprovetheperformanceofourencoding.2.1ModelingIntegersFigure1presentsagrammarforasimpleimperativelanguagewithintegers.Theparenthesizedsymbolonthelefthandsideofeachproductionisavariablerangingoverelementsofitssyntacticcategory.Thelanguageisstaticallyandexplicitlytyped;thetyperulesarecompletelystandardandforthemostpartweelidetypesforbrevity.Therearetwobasetypes:booleans(bool)andn-bitsignedorunsignedintegers(int).Notethebasetypesaresyntacticallyseparatedinthelanguageasexpressions,whichareinteger-valued,andconditions,whichareboolean-valued.Weusetorangesolelyoverdierenttypesofintegervalues.Theintegerexpressionsincludeconstants(const),integervariables(v),unaryandbinaryoperations,integercasts,andliftingfromconditionals.Wegivethelistofoperatorsthatwemodelpreciselyusingbooleanformulas(e.g.+,-,bitwise-and,etc.);forotheroperators(e.g.,division,remainder,etc.),wemakeapproximations.Weuseaspecialexpressionunknowntomodelunknownvalues(e.g.,intheenvi-ronment)andtheresultofoperationsthatwedonotmodelprecisely.2ThecurrentimplementationofSaturnhandlesreducible\row-graphs,whicharebyfarthemostcommonformeveninCcode.Irreducible\row-graphscanbeconvertedtoreducibleonesbynode-splitting[Ahoetal.1986].ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 4YichenXieandAlexAikenLanguageType()::=(n;signedjunsigned)Obj(o)::=vExpr(e)::=unknown()jconst(n;)jojunopeje1binope2j()ejlifte(c;)Cond(c)::=falsejtruej:cje1compe2jc1^c2jc1_c2jliftc(e)Stmt(s)::=o ejassert(c)jassume(c)jskipcomp2f=;;;;6=gunop2f ;!gbinop2f+; ;;=;mod;band;bor;xor;;l;agRepresentationRep()::=[bn 1:::b0]swheres2fsigned;unsignedgBit(b)::=0j1jxjb1^b2jb1_b2j:bFig.1.ModelingintegersinSaturn.Expressions= (v) `vE)scalar(n;s)=x0;:::;xn 1arefreshbooleanvariables `unknown()E)[xn 1:::x0]sunknown `eE)[bn 1:::b0]x=(m;s)b0=8biif0in0ifs=unsignedandnimbn 1ifs=signedandnim `()eE)[b0 1:::b00]scast(n;s)= `cC)b `lifte(c;)E)[000|{z}n 1b]slifte `eE)[bn 1:::b0]s `e0E)[b0 1:::b00]s `ebande0E)[bn 1^b0 1:::b0^b00]sandConditionals `eE)[bn 1:::b0]s `liftc(e)C)WibiliftcStatements `eE)G; `(v e)S)hG; [v7!]iassign `cC)bG; `assume(c)S)hG^b; iassume `cC)b(G^:b)notsatisableG; `assert(c)S)hG; iassert-okFig.2.Thetranslationofintegers.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability5Objectsinthescalarlanguagearen-bitsignedorunsignedintegers,wherenandthesignednessaredeterminedbythetype.AsshownatthebottomofFigure1,aseparatebooleanexpressionmodelseachbitofanintegerandthustrackingthewidthisimportantforourencoding.Thesigned/unsigneddistinctionisneededtopreciselymodellow-leveltypecasts,bitshiftoperations,andarithmeticoperations.Theclassofobjects(Obj)ultimatelyincludesvariables,pointers,andstructures,whichencompassalltheentitiesthatcanbethetargetofanassignment.Forthemomentwedescribeonlyintegervariables.TheencodingforarepresentativeselectionofconstructsisshowninFigure2;omittedcasesintroducenonewideas.Therulesforexpressionshavetheform `eE)whichmeansthatundertheenvironment mappingvariablestovectorsofbooleanexpressions(oneforeachbitinthevariable'stype),theexpressioneisencodedasavectorofbooleanexpressions.Theencodingschemeforconditionals `cC)bissimilar,exceptthetargetisasinglebooleanexpressionbmodelingthecondition.Themostinterestingrulesareforstatements:G; `sS)hG0; 0imeansthatunderguardGandvariableenvironment thestatementsresultsinanewguard/environmentpairhG0; 0i.Inoursystem,guardsexpresspathsensitiv-ity;everystatementisguardedbyabooleanexpressionexpressingtheconditionsunderwhichthatstatementmayexecute.Moststatementsdonotaectguards(theexceptionisassume);theimportantoperationsonguardsarediscussedinSec-tion2.2.Withoutgoingintodetails,weexplaintheconceptualmeaningofaguardusingthefollowingexample:if(c)fs1;s2gelses3;s4;Statementss1ands2areexecutedifcistrue,sotheguardforbothstatementsisthebooleanencodingofc.Similarly,s3'sguardistheencodingof:c.Statements4isreachedfrombothbranchesoftheifstatementandthereforeitsguardisthedisjunctionoftheguardsfromthetwobranches:(c_:c)=true.Akeystatementinourlanguageisassert,whichweusetoexpresspointsatwhichsatisabilityqueriesmustbechecked.Astatementassert(c)checksthat:ccannotbetrueatthatprogrampointbycomputingthesatisabilityofG^:b,whereGistheguardoftheassertandbistheencodingoftheconditionc.Theoveralleectoftheencodingistoperformsymbolicexecution,castintermsofbooleanexpressions.Eachstatementtransformsanenvironmentintoanewen-vironment(andguard)thatcapturestheeectofthestatement.Ifallbitsintheinitialenvironment 0areconcrete0'sand1'sandtherearenounknownexpres-sionsintheprogrambeinganalyzed,theninfactthisencodingisstraightforwardinterpretationandallmodeledbitscanthemselvesbereducedto0'sand1's.How-ever,bitsmayalsobebooleanvariables(unknowns).Thuseachbitbrepresentedinourencodingmaybeanarbitrarybooleanexpressionoversuchvariables.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 6YichenXieandAlexAikenMergeScalarv;(Gi; i)=[b0:::b0]swhere[bim:::bi0]s= i(v)b0=Wi(Gi^bij)MergeEnv(Gi; i)=\nWiGi; where (v)=MergeScalarv;(Gi; i)Fig.3.Mergingcontrol-\rowpaths.2.2ControlFlowWerepresentfunctionbodiesascontrol-\rowgraphs,whichwedeneinformally.Forthepurposeofthissection,weassumeloop-freeprograms.Loopsarehandledinavarietyofwayswhicharedescribedattheendofthissection.Eachstatementsisanodeinthecontrol-\rowgraph,andeachedge(s;s0)representsanuncondi-tionaltransferofcontrolfromstos0.Ifastatementhasmultiplesuccessors,thenexecutionmaybetransferredtoanysuccessornon-deterministically.Tomodelthedeterministicsemanticsofconventionalprograms,werequirethatifanodehasmultiplesuccessors,theneachsuccessorisanassumestatement,andfurthermore,thattheconditionsinthoseassumesaremutuallyexclusiveandthattheirdisjunctionisequivalenttotrue.Thusaconditionalbranchwithpredicatepismodeledbyastatementwithtwosuccessors:onesuccessorassumesp(thetruebranch)andtheotherassumes:p(thefalsebranch).Theotherimportantissueisassigningaguardandenvironmenttoeachstatements.Assumeshasanorderedlistofpredecessorssi.3Theencodingofsiproducesanenvironment iandguardGi.Theinitialguardandenvironmentforsisthenacombinationofthenalguardsandenvironmentsofitspredecessors.Thedesiredguardissimplythedisjunctionofthepredecessorguards;aswemayarriveatsfromanyofthepredecessors,smaybeexecutedifanypredecessor'sguardistrue.Notethatduetothemutualexclusionassumptionforbranchconditions,atmostonepredecessor'sguardcanbetrueatatime.Thedesiredenvironmentismorecomplex,aswewishtopreservethepath-sensitivityofouranalysisdowntothebitlevel.Thus,thevalueofeachbitofeachvariableintheenvironmentforeachpredecessorsiofsmustincludetheguardforsiaswell.ThismotivatesthefunctionMergeScalarinFigure3,whichimplementsamultiplexercircuitthatselectstheappropriatebitsfromtheinputenvironments( i(v))basedonthepredecessorguards(Gi).Finally,MergeEnvcombinesthetwocomponentstogethertodenetheinitialenvironmentandguardfors.Preservingpathsensitivityforeverymodeledbitisclearlyexpensiveanditiseasytoconstructrealisticexampleswherethenumberofmodeledpathsisexponentialinthesizeofthecontrol-\rowgraph.InSection3.3wepresentanoptimizationthatenablesustomakethisapproachworkinpractice.Finally,everycontrol-\rowgraphhasadistinguishedentrystatementwithnopredecessors.Theguardforthisinitialstatementistrue.Wepostponediscussionoftheinitialenvironment 0toSection3.2wherewedescribethelazymodelingoftheexternalexecutionenvironment.3WeusethenotationXiasashorthandforavectorofsimilarentities:X1:::Xn.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability7LanguageType()::=f(f1;1);:::;(fn;n)gj:::Obj(o)::=f(f1;o1);:::;(fn;on)gj:::Shorthando=f(f1;o1);:::;(fn;on)go:fidef=oield-accessRepresentationRep()::=f(f1;1);:::;(fn;n)gj:::Translationo=f(f1;o1);:::;(fn;on)g `oiE)ifori21::n `oE)f(f1;1);:::;(fn;n)gobject-strRecAssign( ;v;)= [v7!]RecAssign( ;o;)= nwhere8o=f(f1;o1);:::;(fn;on)g=f(f1;1);:::;(fn;n)g 0= i=RecAssign( i 1;oi;i)(8i21::n) `eE) 0=RecAssign( ;o;)G; `(o e)S)hG; 0iassign-structFig.4.Thetranslationofstructures.AsmentionedinSection1,thetwocheckersdescribedinthispapertreatloopsunsoundly.Onetechniqueweadoptistosimplyunrollaloopaxednumberoftimesandremovebackedgesfromthecontrol-\rowgraph.Thus,everyfunctionbodyisrepresentedbyanacycliccontrol-\rowgraph.Anothertransformationiscalledhavoc'ing,whichwediscussindetailinthecontextofthememoryleakchecker(Section6).Whileourhandlingofloopsisunsound,wehavefoundittobeusefulinpractice(seeSection5.6and6.9).2.3StructuresTheprogramsyntaxandtheencodingofstructuresisgiveninFigure4.Astructureisadatatypewithnamedelds,whichwerepresentasasetof(eldname;object)pairs.Weextendthesyntaxoftypes(resp.objects)withsetsoftypes(resp.objects)labeledbyeldnames,andsimilarlytherepresentationofastructinCistherepresentationoftheeldsalsolabeledbytheeldnames.Theshorthandnotationo:fiselectstheobjectofeldfifromobjecto.ThefunctionRecAssigndoestheworkofstructureassignment.Asexpected,assignmentofstructuresisdenedintermsofassignmentsofitselds.Becausestructuresmaythemselvesbeeldsofstructures,RecAssignisrecursivelydened.2.4PointersThenalandtechnicallymostinvolvedconstructinourencodingispointers.Thediscussionisdividedintothreeparts:inSection2.4.1,weintroduceaconceptcalledGuardedLocationSet(GLS)tocapturepath-sensitivepoints-toinformation.WeextendtherepresentationwithtypecastsandpolymorphiclocationsinSection2.4.2anddiscusstherulesindetailinSection2.4.3.2.4.1GuardedLocationSets.PointersinSaturnaremodeledwithGuardedLocationSets(GLS).AGLSrepresentsthesetoflocationsapointercouldrefer-ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 8YichenXieandAlexAikenLanguageType()::=jvoid*j:::Obj(o)::=pj:::Deref(m)::=(p):f1::fn(n0)Expr(e)::=nullj&oj&mj:::Stmt(s)::=load(m;o)jstore(m;e)jnewloc(p)j:::AddressAddr()::=^j^2j:::AddrOf:Obj7!Addr(Constraint:notwoobjectsofthesametypesharethesameaddress)RepresentationLoc(l)::=nulljojRep()::=fj(G0;l0);:::;(Gk;lk)jgj:::Translation= (p) `pE)pointer `&oE)fj(true;o)jggetaddr-objm=(p):f1::fn `pE)fj(G0;null);(Gi;oi)jg=fj(G0;null);(Gi;oi:f1::fn)jg `&mE)getaddr-mem (p)=fj(G0;null);(Gi;li)jg `liftc(p)C)Wi6=0Giliftc-pointerl=oifpisoftypeifpisoftypevoid*=fj(true;l)jgandoorfreshG; `newloc(p)S)hG; [p7!]inewloc (p)=fj(G0;null);(Gi;i)jgtypeofoi=andAddrOf(oi)=i `()pE)fj(G0;null);(Gi;oi)jgcast-from-void* (p)=fj(G0;null);(Gi;oi)jgAddrOf(oi)=i `(void*)pE)fj(G0;null);(Gi;i)jgcast-to-void*m=(p):f1::fn `pE)fj(G0;null);(G1;o1);:::;(Gk;ok)jgG0=G^:G0G0^Gi; `(oi:f1::fn e)S)hGi; ii(fori21::k)G; `store(m;e)S)MergeEnv(Gi; i)storeFig.5.Pointersandguardedlocationsets.AddGuard(G;fj(G1;l1);::;(Gk;lk)jg)=fj(G^G1;l1);::;(G^Gk;lk)jgMergePointerp;(Gi; i)=SiAddGuard(Gi; i(p))MergeEnv(Gi; i)=\nWiGi; where8 (v)=MergeScalarv;(Gi; i) (p)=MergePointerp;(Gi; i)Fig.6.Control-\rowmergeswithpointers.enceataparticularprogrampoint.Tomaintainpath-sensitivity,abooleanguardisassociatedwitheachlocationintheGLSandrepresentstheconditionunderwhichthepoints-torelationshipholds.WewriteaGLSasfj(G0;l0);:::;(Gn;ln)jg.Specialbraces(fjjg)distinguishGLSsfromothersets.WeillustrateGLSwithanexample.butdelaytechnicaldiscussionuntilSection2.4.3.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability91if(c)p=&x;/*p:fj(true;x)jg*/2elsep=&y;/*p:fj(true;y)jg*/3*p=3;/*p:fj(c;x);(:c;y)jg*/Inthetruebranch,theGLSforpisfj(true;x)jg,meaningpalwayspointstox.Similarly, (p)evaluatestofj(true;y)jginthefalsebranch.Atthemergepoint,branchguardsareaddedtotherespectiveGLSsandtherepresentationforpbecomesfj(c;x);(:c;y)jg.Finally,thestoreatline3makesaparallelassignmenttoxandyundertheirrespectiveguards(i.e.,if(c)x=3;elsey=3;).Tosimplifytechnicaldiscussion,weassumelocationsinaGLSoccuratmostonce|redundantentries(G;l)and(G0;l)aremergedinto(G_G0;l).Also,weassumetherstlocationl0isalwaysnull(weusethefalseguardforG0ifnecessary).2.4.2PolymorphicLocationsandTypeCasts.TheGLSrepresentationmodelspointerstoconcreteobjectswithasingleknowntype.However,itiscommonforheapobjectstogothroughmultipletypesinC.Forexample,inthefollowingcode,1void*malloc(intsize);2p=(int*)malloc(len);3q=(char*)p;4returnq;thememoryblockallocatedatline2goesthroughthreedierenttypes.Thesetypesallhavedierentrepresentations(i.e.,dierentnumbersofbits)andthusneedtobemodeledseparately,buttheanalysismustunderstandthattheyrefertothesamelocation.Weneedtomodel:1)thepolymorphicpointertypevoid*,and2)castoperationstoandfromvoid*.Castsbetweenincompatiblepointertypes(e.g.fromint*tochar*)canthenbemodeledviaanintermediatecasttovoid*.Wesolvethisproblembyintroducingaddresses(Addr),whicharesymboliciden-tiersassociatedwitheachuniquememorylocation.WeuseamappingAddrOf:Obj!Addrtorecordtheaddressesofobjects.Objectsofdierenttypessharethesameaddressiftheystartatthesamememorylocation.Intheexampleabove,pandqpointtodierentobjects,sayo1oftypeintando2oftypechar,ando1ando2mustsharethesameaddress(i.e.AddrOf(o1)=AddrOf(o2)).Furthermore,anaddressmayhavenoassociatedconcreteobjectsifitisreferencedonlybyapointeroftypevoid*andneverdereferencedatanyothertypes.Inotherwords,theinversemappingAddrOf 1maynotbedenedforsomeaddresses.Usingguardedlocationsetsandaddresses,wecannowdescribetheencodingofpointersindetail.2.4.3EncodingRules.Figure5denesthelanguageandencodingrulesforpointers.LocationsintheGLScanbe1)null,2)aconcreteobjecto,or3)anaddressofapolymorphicpointer(void*).WemaintainaglobalmappingAddrOffromobjectstotheiraddressesanduseitinthecastrulestoconvertpointerstoandfromvoid*.Therulesworkasfollows.Takingtheaddressofanobject(get-addr-fobj,memg)constructsaGLSwithasingleentry{theobjectitselfwithguardtrue.ThenewlocrulecreatesafreshobjectoraddressdependingonthetypeofthetargetpointerandbindstheGLScontainingthatlocationtothetargetpointerintheenvironment .NoticethatSaturndoesnothaveaprimitivemodelingexplicitdeallocation.Typecaststovoid*liftentriesintheGLStotheiraddressesusingtheAddrOfmapping,andcastsfromvoid*ndtheconcreteobjectoftheappropriatetypeintheAddrOfACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 10YichenXieandAlexAikenmappingtoreplaceaddressesintheGLS.Finally,thestorerulemodelsindirectassignmentthroughapointer,possiblyinvolvingelddereferences,bycombiningtheresultsforeachpossiblelocationthepointercouldpointto.Thepointerisassumedtobenon-nullbyadding:G0tothecurrentguard(recallG0istheguardofnullineveryGLS).NoticethatthestorerulerequiresconcretelocationsintheGLSasonecannotassignthroughapointeroftypevoid*.Loadingfromapointerissimilar.2.5AttributesAnotherfeatureinSaturnisattributes,whicharesimplyannotationsassociatedwithnon-nullSaturnlocations(i.e.structs,integervariables,pointers,andad-dresses).Weusethesyntaxo#attrnametodenotetheattrnameattributeofobjecto.Thedenitionandencodingofattributesissimilartostructeldsexceptthatitdoesnotrequirepredeclaration,andattributescanbeaddedduringtheanalysisasneeded.Similartostructelds,attributescanalsobeaccessedindirectlythroughpointers.Weomittheformaldenitionandencodingrulesbecauseoftheirsimilaritytoeldaccesses.Instead,weuseanexampletoillustrateattributeusageinanalysis.1(*p)#escaped true;2q (void*)p;3assert((*q)#escaped==(*p)#escaped);Intheexampleabove,weusethestorestatementatline1tomodelthefactthatthelocationpointedtobyphasescaped.Theadvantageofusingattributeshereisthattheyareattachedtoaddressesandpreservedthroughpointercasts|thustheassertionatline3holds.3.DISCUSSIONANDIMPROVEMENTSInthissection,wediscusshowourencodingreducesthesizeofsatisabilityqueriesbyachievingaformofprogramslicing(Section3.1).Wealsodiscusstwoimprove-mentstoourapproach.Therst(Section3.2)concernshowwetreatinputsofunknownshapetofunctionsandthesecond(Section3.3)isanoptimizationthatgreatlyreducesthecostofguards.3.1AutomaticSlicingProgramslicingisatechniquetosimplifyaprogrambyremovingthepartsthatareirrelevanttothepropertyofconcern.Slicingiscommonlydonebycomputingcontrolanddatadependenciesandpreservingonlythestatementsthatthepropertydependson.WeshowthatourencodingautomaticallyslicesaprogramandonlyusesclausesthatthecurrentSATqueryrequires.Considerthefollowingprogramsnippetbelow:if(x)y=a;elsey=b;z=/*complexcomputationhere*/;if(z)...else...;assert(y5);ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability11Thecomputationofzisirrelevanttothepropertywearechecking(y5).Thevariableyisdatadependentonaandbandcontroldependentonx.UsingtheencodingrulesinSection2,weseethattheencodingofy5onlyinvolvesthebitsinx,a,andb,butnotz,becausetheassignruleaccountsforthedatadependen-ciesandthemergerulepullsinthecontroldependency.Noextraconstraintsareincluded.Inlargeprograms,propertiesofinterestoftendependonasmallportionofthecodeanalyzed,thereforethisdesignhelpskeepthesizeofSATqueriesundercontrol.3.2LazyConstructionoftheEnvironmentAstandardprobleminmodularprogramanalysissystemsisthemodelingoftheexternalenvironment.Inparticular,weneedamethodtomodelandtrackdatastructuresused,butnotcreated,bythecodefragmentbeinganalyzed.Thereisnoconsensusonthebestsolutiontothisproblem.Tothebestofourknowledge,SLAM[BallandRajamani2001]andBlast[Henzingeretal.2003]requiremanualconstructionoftheenvironment.Forexample,toanalyzeamodulethatmanipulatesalinkedlistoflocksdenedelsewhere,thesesystemslikelyrequireaharnessthatpopulatesaninputlistwithlocks.Theproblemisreducedasthetargetcode-bases(e.g.,WindowsdriversinthecaseforSLAM)canoftenshareacarefullycraftedharness(e.g.,amodelfortheWindowskernel)[Balletal.2004].Nevertheless,theneedto\close"theenvironmentrepresentsasubstantialmanualeortinthedeploymentofsuchsystems.Becauseweachievescalabilitybycomputingfunctionsummaries,wemustana-lyzeafunctionindependentofitscallingcontextandstillmodelitsarguments.Oursolutionissimilarinspirittothelazyinitializationalgorithmdescribedin[Khur-shidetal.2003]and,conceptually,tolazyevaluationinlanguagessuchasHaskell.RecallinSection2,valuesofvariablesreferencedbutnotcreatedinthecode,i.e.,thosefromtheexternalenvironment,aredenedintheinitialevaluationenviron-ment 0.Saturnlazilyconstructs 0bycallingaspecialfunctionDefVal,whichissuppliedbytheanalysisdesignerandmapsallexternalobjectstoachecker-specicestimationoftheirdefaultvalues; 0isthendenedasDefVal(v)forallv.Operationally,DefValisappliedondemand,whenuninitializedobjectsarerstac-cessedduringsymbolicevaluation.Thisallowsustomodelpotentiallyunboundeddatastructuresintheenvironment.Besidesitsroleindeningtheinitialenviron-ment 0,DefValisalsousedtoprovideanapproximationofthereturnvaluesandside-eectsoffunctioncalls(Section5.3).Inourimplementation,wemodelintegersfromtheenvironmentwithavectorofunconstrainedbooleanvariables.Forpointers,weusethecommonassumptionthatdistinctpointersfromtheenvironmentdonotaliaseachother.ThiscanbemodeledbyaDefValthatreturnsafreshlocationforeachpreviouslyunseenpointerdereference.4Asoundalternativewouldbetouseaseparateglobalaliasanalysisaspartofthedenitionof 0.Noteonceapointerisinitialized,Saturn4Intheimplementation,DefVal(p)returnsfj(G;null);(:G;o)jg,whereGisanunconstrainedbooleanvariable,andoisafreshobjectoftheappropriatetype.Thisallowsustomodelcommondatastructureslikelinkedlistsandtreesofarbitrarylengthordepth.Aslightlysmartervarianthandlesdoublylinkedlistsandtreeswithparentpointersknowingonenodeinsuchadatastructure.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 12YichenXieandAlexAikenperformsanaccuratepath-sensitiveintraproceduralanalysis,includingrespectingaliasrelationships,onthatpointer.3.3UsingBDDsforGuardsConsiderthefollowingcodefragment:if(c){...}else{...}s;Afterconversiontoacontrol-\rowgraph,therearetwopathsreachingthestatementswithguardscand:c.Thustheguardofsisc_:c.Sinceguardsareattachedtoeverybitofeverymodeledlocationateveryprogrampoint,itisimportanttoavoidgrowthinthesizeofguardsateverycontrol-\rowmerge.Onewaytoaccomplishthistaskistodecompiletheunrolledcontrol\rowgraphintostructuredprogramswithonlyifstatements,sothatweknowexactlywherebranchconditionalscancel.However,thisapproachrequirescodeduplicationinthepresenceofgoto,break,andcontinuestatementscommonlyfoundinC.Oursolutionistointroduceanintermediaterepresentationofguardsusingbinarydecisiondiagrams[Bryant1986].Wegiveeachcondition(whichmaybeacomplexexpression)anameanduseaBDDtorepresentthebooleancombinationofallconditionnamesthatenableaprogrampath.Atcontrol-\rowmergeswejointhecorrespondingBDDs.TheBDDjoinoperationcansimplifytherepresentationofthebooleanformulatoacanonicalform;forexample,thejoinoftheBDDsforcand:cisrepresentedbytrue.Inourencodingofastatement,weconverttheBDDrepresentingthesetofconditionsatthatprogrampointtotheappropriateguard.ThesimplicationofguardsalsoeliminatestrivialcontroldependenciesintheautomaticslicingschemedescribedinSection3.1.Inthesmallexampleinthatsection,hadwenotsimpliedguards,theassertionwouldhavebeencheckedundertheguard(x_:x)^(z_:z),whichpullsintheotherwiseirrelevantcomputationofz.4.BUILDINGMODULARPROPERTYCHECKERSUNDERSATURNTheSaturnframeworkwehavedescribedsofarcanbeapplieddirectlytocheckingsimplepropertiessuchasassertions.Whileotherprogrambehaviorcanbeencodedandcheckedunderthecurrentscheme,therearetwomainlimitationsthatpreventitfrombeingappliedtocomplexpropertiesinlargesystems.(1)Functioncalls.Saturn,likemanyotherSAT-basedtechniques,doesnotdi-rectlymodelfunctioncalls.AcommonsolutionamongSAT-basedassertioncheckersisinlining.However,althoughweemployanumberofoptimizationsinourtransformationsuchasslicing,theexponentialtimecostofSAT-solvingmeansthatinliningwillnotbepracticalforlargesoftwaresystems.(2)Executionenvironment.Assertioncheckingcommonlyrequiresaclosedpro-gram.However,manysoftwaresystemsareopenprogramswhoseenvironmentisacomplexcombinationofuserinputandcomponentinterdependencies.Mod-elingtheenvironmentforsuchprogramsoftenrequiresextensivemanualeortthatisbothcostlyanderrorprone.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability13OursolutionisbasedonSaturn'sabilitytonotonlycheckprogramproperties,butalsoinferthembyformulatingSATqueriesthatcanbesolvedeciently.Thelatterabilitysolvesthetwoproblemsmentionedabove.First,inferenceenablesmodularanalyses5thatscale.Withappropriateabstrac-tions,thecheckercansummarizeafunction'sbehaviorwithrespecttoapropertyintoaconcisesummary.Thissummary,inturn,canbeusedinlieuofthefullfunctionbodyatthefunction'scallsites,whichpreventstheexponentialgrowthinthecostofanalysis.Secondly,bymakinggeneralenoughassumptionsabouttheexecutionenviron-ment,summariescapturethebehaviorofafunctionunderall(or,forerrordetectionpurposes,acommonsubsetof)runtimescenarios.Thisalleviatestherequirementofhavingtoclosetheenvironment.Anaddedbenetofthemodularapproachisthatitenableslocalreasoningduringerrorinspection.Insteadoffollowinglongerrortraceswhichmayinvolvemultiplefunctioncalls,human-readablefunctionsummariesgiveinformationabouttheassumptionsmadeforeachofthecalleesinthecurrentfunction.Therefore,theusercanfocusononefunctionatatimewheninspectingerrorreports.Inourexperience,wenditmucheasiertoconrmerrorsandidentifyfalsepositiveswiththehelpoffunctionsummaries.Basedonthemodularapproach,webrie\ryoutlineafourstepprocessbywhichweconstructpropertycheckersunderSaturn:(1)Firstofall,wemodelthepropertyweintendtocheckwithprogramconstructsinSaturn.Forexample,nitestatemachines(FSM)canbeencodedbyattachingintegerstateeldstoprogramobjectstotracktheircurrentstates.Statetransitionsaresubsequentlymodeledwithconditionalassignments,andcheckingisdonebyensuringthattheerrorstateisnotreachedattheendoftheprogram|ataskeasilyaccomplishedwithSATqueriesonthenalprogramstate.(2)Thenextstepistodesignthefunctionsummaryrepresentation.Agoodsum-maryisonethatisbothconciseforscalabilityandexpressiveenoughtoade-quatelydescribetherelevantpropertiesoffunctionbehavior.Strikingtherightbalanceoftentakesseveraliterationsofdesign.Forexample,indesigningtheFSMcheckingframework,westartedwithasimplesummarythatrecordsthesetoffeasiblestatetransitionsacrossthefunction,butfoundittobeinad-equateforLinuxlockcheckingbecauseofinterproceduraldatadependencies.Weobservedthattheoutputlockstateoftencorrelateswiththereturnvalueofthefunctionandremediedthesituationbysimplyincludingthereturnvalueinoursummarydesign.(3)Thethirdstepistodesignanalgorithmthatinfersandappliesfunctionsum-maries.Asmentionedabove,inferenceisdonebyautomaticallyinsertingSATqueriesatappropriateprogrampoints.Forexample,wecaninferthesetofpossiblestatetransitionsbyquerying,attheendofeachfunction,thesatis-5Here,modularanalysisisdenedintwosenses:1)theabilitytoinferandcheckopenprogrammodulesindependentoftheirusage;and2)theabilitytosummarizeresultsofanalyzedmodulessoastoavoidredundantanalysis.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. 14YichenXieandAlexAikenabilityofallpossiblecombinationsofinputandoutputstates.Thefeasible(i.e.,satisable)subsetisincludedinthefunctionsummary.6(4)Finally,werunthecheckeronanumberofreal-worldapplications,andinspecttheanalysisresults.Duringearlydesigniterations,theresultsoftenpointtoinaccuraciesinthepropertyencoding(Step1),inadequaciesinthesummarydesign(Step2),orinecienciesintheinferencealgorithm(Step3).Weusethatasfeedbacktoimprovethecheckerinthenextiteration.Followingthefourstepprocess,wedesignedandimplementedtwopropertycheckersforlargeopensourcesoftware:aLinuxlockcheckerandamemoryleakchecker.Wepresentthedetailsoftheconstructionandexperimentsinthefollowingtwosections.5.CASESTUDYI:CHECKINGFINITESTATEPROPERTIESFinitestatepropertiesareaclassofspecicationsthatcanbedescribedascertainprogramvaluespassingthroughanitesetofstates,overtime,underspecicconditions.Locking,wherealockcanlegallyonlygofromtheunlockedstatetothelockedstateandthenbacktotheunlockedstate,isacanonicalexample.Thesepropertiesarealsoreferredtoastemporalsafetyproperties.Inthissection,wefocusonnitestateproperties,anddescribeasummary-basedinterproceduralanalysisthatusestheSaturnframeworktoautomaticallychecksuchproperties.Westartbydeningacommonnamespaceforsharedobjectsbetweenthecallerandthecallee(Section5.1),whichweusetodeneageneralsummaryrepresentationfornitestateproperties(Section5.2).Wethendescribealgorithmsforapplying(Section5.3)andinferring(Section5.4)functionsummariesintheSaturnframework.Wedescribeourimplementationofaninterprocedurallockchecker(Section5.5)andendwithexperimentalresults(Section5.6).5.1InterfaceObjectsInC,thetwosidesofafunctioninvocationsharetheglobalnamespacebuthaveseparatelocalnamespaces.Thusweneedacommonnamespaceforobjectsreferredtointhesummary.Barringexternalchannelsandunsafememoryaccesses,thetwopartiessharevaluesthroughglobalvariables,parameters,andthefunction'sresult.Therefore,sharedobjectscanbenamedusingapathfromoneofthesethreeroots.Weformalizethisideausinginterfaceobjects(IObj)ascommonnamesforobjectssharedbetweencallerandcallee:IObj(l)::=paramijglobalvarjretvaljljl:fDependenciesacrossfunctioncallsareexpressedbyinterfaceexpressions(IExpr)andconditions(ICond),whicharedenedrespectivelybyreplacingreferencestoobjectswithinterfaceobjectsinthedenitionofExprandCond(asdenedinFigure1andextendedinFigure5).Toperforminterproceduralanalysisofafunction,wemustmapinputinterfaceobjectstothenamesusedinthefunctionbody,performsymbolicevaluationofthe6Thisisasimplicationoftheactualsummaryinferencealgorithm,whichtakesintoaccountfunctionside-eectsandreturn-valuestate-transitioncorrelations.WedescribethefullalgorithminSection5.2.ACMTransactionsonProgrammingLanguagesandSystems,Vol.TBD,No.TDB,MonthYear. Saturn:AScalableFrameworkforErrorDetectionusingBooleanSatisability15FSMStatesS=fError;s1;:::;sngSummaries=hPin;Pout;M;RiwherePin=fp1;:::;pngpi2ICond;Pout=fq1;:::;qngqi2ICond;MIObj,andRIObj2jPinjS2jPoutjSFig.7.Functionsummaryrepresentation.function,andmapthenalfunctionstatetothenalstateoftheinterfaceobjects.Thus,weneedtwomappingstoconvertbetweeninterfaceobjectsandthoseinthenativenamespaceofafunction:[[]]args:IObj!Objextand[[]] 1args:Obj!IObjConvertingIObj'stonativeobjectsisstraightforward.Forfunctioncallr=f(a0;:::;an),[[global]]a0:::an=global[[parami]]a0:::an=ai[[retval]]a0:::an=r[[l]]a0:::an=([[l]]a0:::an)[[l:f]]a0:::an=([[l]]a0:::an):fNotethattheresultoftheconversionisinObjext,whichisdenedasObj(Sec-tion2)extendedwithpointerdereferences.Theextradereferenceoperationscanbetransformedawaybyintroducingtemporaryvariablesandexplicitload/storeoperations.Weomitthedetailsofthistransformationforbrevity.Theinverseconversionismoreinvolved,sincetheremaybemultiplealiasesofthesameobjectintheprogram.Weincrementallyconstructthe[[]] 1argsmappingforobjectsaccessedthroughglobalvariablesandparameters.Forexample,invoidf(structstr*p)fspinlock(&p