/
Improved Error Reporting for Software that Uses BlackBox Components Jungwoo Ha Christopher Improved Error Reporting for Software that Uses BlackBox Components Jungwoo Ha Christopher

Improved Error Reporting for Software that Uses BlackBox Components Jungwoo Ha Christopher - PDF document

trish-goza
trish-goza . @trish-goza
Follow
491 views
Uploaded On 2015-03-05

Improved Error Reporting for Software that Uses BlackBox Components Jungwoo Ha Christopher - PPT Presentation

Rossbach Jason V Davis Indrajit R oy Hany E Ramadan Donald E Porter David L Chen Emmett Witchel Department of Computer Sciences The University of Texas at Austin habalsrossbachjdavisindrajitramadanporterdedl ccwitchel csutexasedu Abstract An error o ID: 41829

Rossbach Jason Davis

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Improved Error Reporting for Software th..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ImprovedErrorReportingforSoftwarethatUsesBlack-BoxComponentsJungwooHaChristopherJ.RossbachJasonV.DavisIndrajitRoyHanyE.RamadanDonaldE.PorterDavidL.ChenEmmettWitchelDepartmentofComputerSciencesTheUniversityofTexasatAustinfhabals,rossbach,jdavis,indrajit,ramadan,porterde,dlcc,witchelg@cs.utexas.eduAbstractAnerroroccurswhensoftwarecannotcompletearequestedactionasaresultofsomeproblemwithitsinput,conguration,orenviron-ment.Ahigh-qualityerrorreportallowsausertounderstandandcorrecttheproblem.Unfortunately,thequalityoferrorreportshasbeendecreasingassoftwarebecomesmorecomplexandlayered.End-userstakethecrypticerrormessagesgiventothembypro-gramsandstruggletoxtheirproblemsusingsearchenginesandsupportwebsites.Developerscannotimprovetheirerrormessageswhentheyreceiveanambiguousorotherwiseinsufcienterrorin-dicatorfromablack-boxsoftwarecomponent.WeintroduceClarify,asystemthatimproveserrorreportingbyclassifyingapplicationbehavior.Clarifyusesminimallyinvasivemonitoringtogenerateabehaviorprole,whichisasummaryoftheprogram'sexecutionhistory.Amachinelearningclassierusesthebehaviorproletoclassifytheapplication'sbehavior,therebyenablingamorepreciseerrorreportthantheoutputoftheapplicationitself.WeevaluateaprototypeClarifysystemonambiguouserrormessagesgeneratedbylarge,modernapplicationslikegcc,La-TeX,andtheLinuxkernel.Foraperformancecostoflessthan1%onuserapplicationsand4.7%ontheLinuxkernel,theprototypecorrectlydisambiguatesatleast85%ofapplicationbehaviorsthatresultinambiguouserrorreports.Thisaccuracydoesnotdegradesignicantlywithmorebehaviors:aClarifyclassierfor81La-TeXerrormessagesisatmost2.5%lessaccuratethanaclassierfor27LaTeXerrormessages.Finally,weshowthatwithoutanyhu-manefforttobuildaclassier,Clarifycanprovidenearest-neighborsoftwaresupport,whereuserswhoexperienceaproblemaretoldabout5otheruserswhomighthavehadthesameproblem.Onav-erage2.3ofthe5usersthatClarifyidentieshaveexperiencedthesameproblem.CategoriesandSubjectDescriptorsD.2.7[Distribution,Mainte-nance,andEnhancement]:EnhancementGeneralTermsDocumentation,Management,ReliabilityKeywordsSoftwaresupport,Errorreport,Proling,Classica-tion,MachinelearningPermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.Copyrightc\rACM,(2007).Thisistheauthor'sversionofthework.ItispostedherebypermissionofACMforyourpersonaluse.Notforredistribution.ThedenitiveversionwillappearintheACMSIGPLAN2007ConferenceonProgrammingLan-guageDesignandImplementation(PLDI'07).June11–13,2007,SanDiego,Califor-nia,USA.1.IntroductionBaderrorreportingismorethananinconvenienceformostusers.Alargepartofmodernsoftwaresupportcostcomesfromtimewastedwithbaderrormessages,whichwedeneasanymessagethatdoesnotprovidesufcientinformationforausertoxtheprobleminatimelyfashion.Onerecentstudyconcludedthatupto25percentofasystemadministrator'stimemaybespentfollowingblindal-leyssuggestedbypoorlyconstructedandunclearmessages[6].Thetimeandexpertiserequiredtoadministermoderncomputingsys-temsiscausingthecostofadministrating,conguringandupdatingamachinetosurpassthecostofthehardware[22].Improvingerrorreportingwillkeepdowncomputerownershipcostsandimproveend-usersatisfaction.Anerrororerrorbehaviorisanyprogrambehaviorthatisnotasuccessfulcompletionofataskspeciedbyauser.Errorsincludebugs,whichareprogrambehaviorsthatdonotmatchaprogram'sspecication.Itisalsoanerrorwhenaprogramfailsaconsistencycheckonitsinputs—possiblybecausetheuserenteredbadinput,ormis-conguredthesystem.Errorscauseprogramstoproduceerrorreports,whichareusuallytextmessagesordialogboxesthatinformtheuserthattherequestedactionwillnotcomplete.Crashesandhangscausetheprogramtooutputthenullerrormessage.Theusermustinterpretanerrorreporttogureouthowtogettheprogramtocompleteherrequest,oftenresortingtosearchenginesandsupportwebsites(likesupport.microsoft.com)formoreinformation.Considerthefollowingmodeloferrorreporting.Agivenappli-cationhasasetEoferrors,andasetRoferrorreports.Unfor-tunately,oneelementr2Rcancorrespondtomultipleelementse2Ebecauseanerrorreportisoftenambiguousacrossmultiplecauses.Forexample,theLinuxoperatingsystemusesthereturncodeEEXISTtosignaldiverseerrorconditions,suchasanattempttocreatealewhosenamealreadyexistsinadirectory,oranat-tempttoputaruleinaroutingtablethatconictswiththeroutingtable'scurrentstate.DeneSasasetofvectorsofruntimestatis-ticsaboutanapplication.Thenthetuple(r;s)jr2R;s2Scoulduniquelydeterminethepropere2E,eventhoughralonefails.Infact,rmightnotbeneededatall,salonemightsufce.WeintroduceClarify,asystemtoimproveerrorreporting.Clar-ifyconsistsoftwoparts:aruntimetomonitorablack-boxsoftwarecomponent,andaclassiertointerprettheoutputoftheruntime.Clarifymonitorstheprogramusingminimallyinvasivetechniqueslikereadingtheprogram'smemoryorcountingfunctioncalls.TheClarifyruntimeoutputsabehaviorprole(thes2S).Clarify'suserscollectthebehaviorprolesgeneratedwhentheprogramex-periencesaparticularerror,andtrainamachinelearningclassierthatrecognizestheapplication'serrorbehaviors.Theseusersalso (A) machine-learning model construction (B) behavior classification better error report machine learning classifier machine learning algorithm+human labeled profiles behavior profiles behavior profiles BlackboxcomponentClarify runtimedeployedtraining Clarify monitors application Figure1.Clarifyconsistsofaruntimemonitorandamachine-learningclassier.Therectanglesrepresentprocessesconsumingandproducingdata.TheClarifyruntimemonitorsablack-boxcomponenttogenerateabehaviorprolethatsummarizestheex-ecutionhistoryofthecomponent.Section(A)showsamachinelearningclassier,trainedofinefrombehaviorproles.Section(B)showsthetrainedclassierclassifyingbehaviorprolestopro-duceimprovederrorreports.writeanimprovederrorreportthatdescribestheerrorbehaviorandhowtoxorworkaroundit(thee2E).Classiertrainingisdonebyasmallminorityoftechnically-savvyClarifyuserssuchassupportengineerswhoreproduceuserproblemsin-house.End-usersgettheimprovederrorreportsbyclassifyingtheirbehaviorprole.Clarifyreducestheproblemofimprovingerrorreportstotheproblemofclassifyingerrorbehaviors.Figure1showsthemajorcomponentsofClarify:theruntimeandthemachine-learningclassier.Eachtimetheblack-boxcom-ponentexecutes,theClarifyruntimegeneratesabehaviorprole.Thebehaviorproleincludesinformationaboutthecontrolowordatavaluesoftheprogramexecution.Simpleexamplesofabe-haviorprolewouldincludecountsofeachfunctionexecution,orcountsofhowofteneachfunctionreturnedzero.Trainingthemachinelearningclassierhappensinsection(A)ofFigure1.Amachinelearningalgorithmtakeslabeledbehav-iorprolesasinputandproducesaclassier.Theclassiertakesabehaviorproleasinputandoutputsalabel.Alabelcouldbeanon-ambiguouserrorcode,oralengthydescriptionoftheprob-lemandhowtoresolveit.Theclassierimproveserrorreportsbe-causeuserscantraintheclassiertorecognizeveryspecicerrorsthathaveagenericerrorreport.Insettingswherelabeleddataisnotavailable,Clarifyemploysanearest-neighborsoftwaresupportmethod.Here,usersarepairedwithotherswhohaveexperiencedsimilarerrors.Non-technicalend-usersgetimprovederrorreportsfromClar-ifyinsection(B)ofFigure1.Clarifyclassiesanend-user'sbehav-iorprole,givingthemmorepreciseinformationabouttheirerrorandhowtoresolveit.Themachinelearningclassierusesfeaturesfromthebehaviorproletodeterminetheerrorclassication.Afeatureisthevalueofaparticularstatistic,likethenumberoftimesthefunctiondecode audio framewascalledinanexecutionofanmp3playerapplication.Avalueofzerocanindicateanerrorwherenoaudioframeswereeverplayed.Thecontributionsofthispaperare:Asystemthatcombinesruntimemonitoringandmachinelearn-inginanovelwaytoimproveerrorreportsofblack-boxsoft-warecomponents.Anewprolingtechniquecalledcall-treeproling,thatrep-resentssoftwarebehaviorsmoreaccurately,onaverage,thanexistingprolingtechniquessuchasfunctionproling,orpathproling.EvaluationofaClarifyprototypeonlarge,matureprogramsthatcurrentlyproduceunclearerrormessagesandconfusinger-rorbehavior,suchasthegcccompiler,andtheLinuxoperatingsystem.Ourevaluationincludesanin-labdeploymentofClar-ify.Introductionofnearest-neighborsoftwaresupport,whereusersarepairedwithotheruserswhohaveexperiencedthesameproblem.ThenextsectionprovidesanexampleuseofClarifythatmoti-vatesthedesignpresentedinSections3–5.Section6describesourbenchmarksandtheambiguouserrorstheyreportandSection7containstheevaluationoftheClarifyprototype.Section8reviewsrelatedworkandSection9concludes.2.ImprovingerrorreportswithClarify:anexampleThissectionprovidesanexampletoelucidatethemotivationforClarifyandthebenettoitsusers.Theexamplealsoprovidesmo-tivationforthesectionsthatdiscussClarify'sdesign(Sections3–5).2.1Clarifyscenariompg321isapopularcommand-linemp3playerwrittenbyJoeDrewthatisincludedinmanyLinuxdistributions.Softwaresup-portoptionsformpg321arelimited.ThereisasupportforumonSourceForge,andamailinglistfornoticationofnewreleases.Ad-ditionally,usersareinvitedtoemailJoeDrewdirectly.Thesupportforumhasmanyrequestsforhelpwithzeroreplies.Recentrequestsincludesomewithtitles,“Problemplayingmp3's”and“nosound.”mpg321tendstofailwithoutprintingdiagnosticmessages.Imaginetwousers,SmartyPandGrandpa,whowilluseClarifytoimprovetheerrorreportsofmpg321.WewillassumethatSmartyPpostedthemessageaboutproblemsplayingmp3sandthatGrandpapostedthenosoundproblem(Note,usernameshavebeenchanged,butpostingsubjectlineshavenot).SmartyPhasguredouthisproblem,andwantstodonatehissolutiontothempg321supportcommunityusingClarify.SmartyPfoundthathismp3audiodatawascorrupt,whichdoesindeedcausempg321torunwithoutproducingaudibleoutput.SuchaproblemcouldoccurifSmartyPwerestoringhislesonaashdrivethatwasfailing.ClarifyenablesGrandpa,anon-technicaluser,tobenetfromthediagnosisofatechnicaluser—SmartyP.Step1:TheClarifyruntime.WeassumethatSmartyPandGrandpahaveClarify-enabledversionsofmpg321,i.e.,thebi-nariesarealreadylinkedwiththeClarifyruntime.Modifyinganapplicationtomakeitproduceabehaviorproledoesnotrequiresourcecode,soitisreasonabletoassumethatClarify-enabledbi-nariescanbedistributedalongstandardsoftwaredistributionchan-nels.Step2:Collectbehaviorproles.WiththeClarifyruntime,SmartyPrunsmpg321onafewcorruptmp3les.Theinterfaceissimple:whenthesoftwarefailsitqueriestheuseraboutwhatwentwrong.SmartyPcanenter,“mpg321producesnosoundoutputduetocorruptedaudioframedatainthesourcemp3les.Checkyourmp3lesbecausetheircontentsareprobablycorrupt.”TheClarify-enabledbinarythenuploadstheproblemdescriptionandbehaviorprolesgeneratedbytheexecutionsthatfailtotheSourceForgesupportsite.Step3:Trainaclassier.AmoderatorfortheSourceForgesupportsitewouldreadSmartyP'serrordescriptionandgroupSmartyP'sbehaviorproleswiththeprolesofotheruserswhoexperiencedthesameproblem.Havingahumanintheloopensuresthatthelanguageintheerrorreportisclearandunderstandableand guardsagainstmaliciousorineptusers.Iflesscentralauthorityisdesired,apeerreputationsystemcanreplaceahumanmoderator.Thesupportsitesoftware(ormoderator)willbuildanewclassi-erfromthebehaviorprolessubmittedbyusers.Usersonlyneeduploadtheirproles,theydonotbuildclassiers.Therearemanypoliciesformanagingtheclassierbuildsuchasdoingitforeverynewerrorreport,orbuildingitonceaday.Step4:Usetheclassier.Insteadofposting“nosound”totheSourceForgesupportforum,GrandparunshisClarify-enabledbinary.WhenGrandpafailstohearanysoundfrommpg321,hehitsaspecialhelpkeywhichuploadshisbehaviorproletotheSourceForgesupportwebsite.ThesiteclassiesGrandpa'sbehaviorproleandprovideshimwithSmartyP'sdetailederrordescription,tellinghimthathismp3lehascorruptdata.2.2DiscussionByclassifyingprogrambehavior,Clarifyenablesausercommu-nitytoimprovesoftwaresupport.Italsoenablessoftwarevendorstoimprovesoftwaresupport.Microsofthasbuiltdistributedlabel-ingofproblemreportsintoWindowsVista.Inthedocumentationforthenew“Problemreportsandsolutions”controlpanelitem,Microsoftsaysitcanaskend-userstoprovideadditionaldetailsabouttheirproblemtocreateasolutionthatcanbeprovidedtootherusers[18].TheClarifyscenariopresentedhasmostcomputationoccurringontheserver,butclassicationcanhappenonaclient,iftheclienthasthelatestclassierdenitionfromtheserver.Clientsmightperiodicallyconnecttotheservertodownloadclassierupdates,likemodernviruscheckersupdatevirusdenitionles.Oncetheinformationiscachedlocally,aclientcandiagnoseerrorswithoutconnectingtothenetwork.SmartyPtreatsmpg321likeablackbox.Hedoesnotchangetheerrorreportsgeneratedbythesourcecode.MaybesuchchangeswouldbeacceptedbyJoeDrewinatimelyfashion,butmaybenot.Programsdevelopedbymorepeopleorcommercialorganizationsaredifcultorimpossibleforanend-usertochange.Theexamplemotivatesthefollowingquestions,whichwead-dressinsucceedingsections.Featurecollection(Section3).WhatinformationdoestheClar-ifyruntimecollect?Thissectiondescribesalternativesforthecon-tentsofbehaviorproles.Deploymentandsecurity(Section4).CanGrandparunamin-imallyinstrumentedexecutablethatisfastenoughfordailyuse,butproducesbehaviorprolesofsufcientdetailtodisambiguatecur-rentlyknownerrors?DoesSmartyP'scontributiontothesupportsitemeanthatGrandpacangureoutwhatkindofmp3sSmartyPlistensto?Minimizinghumaneffort(Section5).ClarifycangiveSmar-tyP'semailaddresstoGrandpa,evenbeforeSmartyPcontributedhislabeledexamples(orevenguredoutwhathisproblemis),be-causeitcandetectsimilaritybetweenuserexecutionsevenwithoutatrainedmachinelearningclassier.SmartyPonlyuploadsafewbehaviorproles,becauseheassumesthatotherwillalsouploadproles.Aclassiertrainedondiverseexamplesusuallygeneral-izesbetterthanonetrainedonhomogeneousexamples.Section7.4measureshowmanylabeledprolesarenecessarytotrainanac-curateclassier(inourexperiments,mpg321requires38prolespererrortype).3.BehaviorprolesTheClarifyruntimeshouldcollectthemostexpressiveruntimefea-turesatthelowestcost.Expressivefeaturesarethosethatamachinelearningalgorithmcanusetodiscriminatedifferenterrorbehaviorsrobustly.Intuitively,expressivefeaturescapturedetailsofcontroloworimportantdatavaluesthatarecausedbyaparticularer- Behavior Prole Key Value FP functionaddr&#x-1.6;鎕 #oftimescalled CSP call-siteaddr&#x-1.6;鎕 #oftimesinvoked PP pathinafunc&#x-1.6;鎕 #oftimesoccurred CTP bitvectorofcaller,bitvectorofcallee&#x-1.6;鎕 #oftimesexecuted CSRV call-siteaddr,predicatedreturnvalue&#x-1.6;鎕 #oftimesexecuted SS predicate&#x-1.6;鎕 #ofcounts Table1.SummaryofthetypeoffeaturethatiscollectedbytheClarifyruntime.rorbehavior.Forinstance,anincorrectlyformattedURLpassedtoawebbrowsercanbecorrelatedwiththeexecutionoffunctionsthatattempteverypossibleinterpretationoftheinputURLbeforedeclaringtheerror.Programsoftenhaveerror-reportingroutines,soonemightthinkthattheexecutionofsuchroutinesisasurereindicationofanerrorbehavior.However,highlymatureandfactoredpro-grams,likegcc,reuseerror-reportingcodeforotherpurposes,suchasproducingwarningsduringcorrectcompilation.Ineverynon-trivialprogramwehaveexamined,simplecorrelationsbe-tweenanerrorconditionandtheexecutionofagivenfunctionorthepresenceofagivenreturncodedonothold.Clarifycollectsfeaturecountsfromblack-boxcomponentsus-ingcodeinstrumentationthatdoesnotrequiresourcecode.Re-centbinary-to-binarytranslatorslikeTraceback[3](static)andDynamo(RIO)[4,11](dynamic),andne-grainedinstrumentationsystemsliketheOS-levelinstrumentationtoolKernInst[37],Sun'sDTrace[13],orLinux'skprobes[24](alldynamic)providetheop-portunitytoinsertasmallamountofinstrumentationcodetouser-levelapplicationsortheoperatingsystemwithverylowexecution-timecost.Clarifymustlimitthenumberoffeaturesitcollects.Errorbe-haviorisusuallycorrelatedwithasmallnumberoffeatures,socollectinglargenumbersoffeaturesrequiresthemachinelearningalgorithmwinnowthelargesetoffeaturesdowntotherelevantfew.Havingmorethanabout70,000featurespushesthelimitsofmanymachinelearningalgorithmsoftencausingaddressspaceex-haustionandunreasonableruntimes.ThissectiondiscussesClar-ify'sstrategyforcollectinginformationaboutcontrolowanddatavalues.3.1ControlowClarifycountsfeaturesthatarerelatedtocontrolowbecausecontrolowisagoodindicatorofprogrambehavior.Ingeneral,themoreinformationClarifycollectsaboutcontrolow,themoreaccurateitsmodelofprogrambehavior,butthisaccuracycomesatthepriceofgreaterCPUandmemoryoverhead.Oneformofbehaviorprolingcountstheexecutionoffunctioncallsites.Anothercountsintra-proceduralpathsusingpathprol-ing[5].Pathsencodemoreinformationaboutcontrolow,buttheyaremoreexpensivetocollectthanfunctioncounts.Clarifyalsoin-troducesanewprolingmethodcalledcall-treeprolingthatsum-marizesthecallingbehaviorofafunctionanditscaller.Thecall-ingbehaviorcontainssomeoftheintraproceduralcontrolowthatprogrampathsrepresent,butitislesscomputationallyintensivetogather.Clarifyusescountsbecausecountspreserverareevents.Oftenaprogramwillmakeauniquesequenceoffunctioncallsbeforeoutputtingacrypticerrorreportorcrashing.Clarifyusesthoseuniquecallsasthesignatureofthebehavior.Somesystemsuse Figure2.Anexampleofcall-treeproling.Theleftsideofthediagramistherightmostsubtreeofthedynamiccalltreewitharrowspointinginthedirectionoffunctioncalls.TherightsideistheCTPfeaturethatiscollectedwhenfunctionCreturns.TheCTPfeatureiscombinationofcallsequencesoffunctionCanditscallerA.eventprobabilities[8],whichpenalizetheimportanceofrarecodepaths,especiallyforprogramsthatrunforlongperiodsoftime.Weevaluateanumberofapproachestobehaviorprolesthathavedifferenttradeoffsforperformanceoverheadandlevelofexecutiondetail.3.1.1Functionandcall-siteprolingTherstmethodusesfunctionproling(FP)(sometimescalledfunctioncallproling[31]).Eachfunctionhasacounterthatisincrementedwhenthefunctionisexecuted.Theorderinwhichthefunctionsarecalledisnotretained.Functionprolingisefcientandtendstobeaccuratewheneachbehaviorhasasetofuniquefunctionsassociatedwithit.Thesecondmethodiscall-siteproling(CSP).ThisissimilartoFPbutthecounterisassociatedwitheachcallsite,ratherthanwiththecalltarget.Fordirectcalls,CSPdifferentiatesamongcallsites,whileFPdoesnot.3.1.2PathprolingThethirdcontrol-owbasedbehaviorprolingmethodispathproling(PP)asdescribedbyBallandLarus[5].Eachprogrampathwithinaprocedure(uniquesequenceofbasicblocks)hasacounterthatisincrementedwhenthepathisexecuted.Pathprolingdistinguishesamongstprogrambehaviorsthatresultindifferentcontrolowwithinafunction(intra-proceduralcontrolow),somethingthatfunctionprolingcannotdo.3.1.3Call-treeprolingThefourthcontrol-owbasedprolingtechniqueiscall-treepro-ling(CTP).Sinceeachfunctioninaprogramisoneprocessingstep,thedynamiccalltreeisagoodrepresentationoftheprogrambehavior.However,thesizeofthewholedynamiccalltreeisenor-mous,itisimpracticaltouseitfortheclassicationdirectly.CTPcountsthenumberoftimesaparticularcallingsequenceoccursinthecurrentfunctionanditscaller.Itcountsthesequenceateveryfunctionreturnorloopbackedge.CTPisanapproximationtothesubtreeofdepth2inthecalltree.Figure2showsanexampleofadynamiccalltreeandtheCTPpatternthatiscounted.Eacharrowindicatesthecalldirectionandtheleftsiblingiscalledbeforetherightsibling.ThetreeshowniswhereAcallsB(BmaycallotherfunctionsbutthatbehaviorisignoredbyCTP)andthencallsC,andCcallsDandthenE.WhenfunctionCreturnsthecallpatternforCis(C(DE)),andthecallpatternforC'scallerAis(A(BC)).Therefore,CTPincrementsbyoneacounterfortheentirepatternofCanditscaller,“(C(DE)),(A(BC)).”ToimplementCTPefciently,eachfunctiongetsaCTPbitvec-tor,whereeachcorrespondstoacallsiteinthefunction.Toreducethenumberofbitsused,abitisassignedonlyonceperbasicblockbecausecallsinabasicblockhappensinthesameorder.Somebitsaresharedforbasicblocksthatcannotbecalledtogetherinasin-glepath.Whenafunctionreturns,CTPincrementsacounterfortheconcatenationofthefunction'sanditscaller'sbitvector.CTPalsoincrementsthecounteronloopbackedges,clearingthecurrentfunction'sbitvector.Inthisway,CTPbitvectorsremaincompact.Pathprolingisabletopreservemorene-grainedinformationaboutpathswithinthefunctionthanCTP'sbitvector,butCTPpreservesmoreinformationaboutcallingcontextbyconcatenatingthecaller'sbitvector.Becausetheorderofthefunctioncallscanbedecodedofinewiththecontrol-owgraphandthebitvector,CTPisdistinctfromcallingcontexttrees[2,42]whicharelossywithrespecttocallingsequence.Experimentalresultsinsection7showthatCTPsupportshighclassicationaccuracy.3.2DataDatavaluescanproviderobustcharacterizationoferrorbehavior,thoughana¨veimplementationcangreatlyincreasethenumberoffeaturestherebycancelinganybenet.Forinstance,toassociatereturnvalueswiththeircallsites,Clarifycancountcallsite,returnvaluepairs.Afunctionthathas100distinctreturnvalueswillincreasethenumberoffeaturesby100.Suchencodingsincreasethecomplexityoftheclassicationtaskconsiderablyasmachinelearningalgorithmshaveperformanceandaccuracyproblemswhenconfrontedwithlargenumbersoffeatures.Predicationisastandardtechniquetoreducethefeaturespaceofdatavalues[27].WedeneninepredicateswhichareappliedtoClarifydataandreturnvalues;thepredicatesmaprawvaluestofeaturevalues.Thepredicatesindicatewhethertherawvalueisequaltozero,equalto1or-1,isasmallorlargepositiveornegativeinteger,orisapointertothestackorheap.Thethresholdsforsmallandlargepositiveandnegativeintegersarearbitrary:anyvaluewithabsolutevaluelessthan100issmall,anyvaluewithabsolutevaluegreaterthan100thatisneitherastackorheappointeris“large”.3.2.1Call-siteprolingwithpredicatedreturnvaluesCall-siteprolingwithpredicatedreturnvalues(CSRV)countspairsofcallsitesandpredicatedreturnvalues.Ifcall-siteAreturns255onehundredtimesandreturns-1once,thenthefeatureA,largeint&#x-3.6;⚃hasacountof100andthefeatureA,equals-1&#x-3.6;⚃hasacountof1.3.2.2StackscrapingStackscraping(SS)isabehaviorprolethatreliesonlyonthedynamicdatavaluesfromanexecutioninstance,ratherthanoncontrolow.Theinsightbehindstackscrapingisthatthestackcontainscontrolowhistoryintheformofreturnaddresses(someofthemresidualinmemorybelowthecurrentstackpointer)andstatusinformationlikefunctionreturncodes.Atthemomenttheprogramreturnsanerrorcode,itsexecutionispaused,therangeofmemoryallocatedtotheprogramstackistraversed,andafeaturevectorrepresentingthatinstanceofexecu-tioniscreatedbyapplyingpredicationtoeachwordinthestackrange.Therepresentationtradessomedelityforconvenienceandcompactness,comparedtoinstrumentation-basedcontrolowhis-tories.Thescraperobtainsthestackandheapboundsdynamically(from/proc/pid/mapsonLinux)soitcandifferentiatepoint-erstothestackandpointerstotheheap.StackscrapingisuniqueinClarifyfeaturesourcesinthatitdoesnotrequireinstrumentationofthesourceprogram.Itimposesverylittleruntimeoverhead,butitisalsotheleastaccuratefeaturesource.4.DeploymentissuesThissectiondiscussesthedifferentdeploymentscenariosforClar-ify,andaddressessecurityissuesofaClarifydeployment. 4.1Forensicvs.livedeploymentsClarifycanbedeployedintwoways:toimproveanyerrorreportaprogramcangive(livedeployment),ortoimproveaxedsetoferrorreports(forensicdeployment).Alivedeploymentwillinstru-mentanentireexecutable,sacricingsomeperformancetocollectdataabouttheentireapplication'sbehavior.Aforensicdeploymentonlycollectsdatathatisknowntohelpdisambiguateaxedsetoferrorreports.4.2SecurityClarifyimprovessoftwaresupport,butraisessecurityissuesforusersandsoftwarevendors.Userswouldliketokeeptheirdataandthewaytheyusesoftwareprivate.Vendorsdonotwanttodivulgeinformationaboutthestructure,controlow,andsupporthistoryoftheirproducttousersorcompetitors.Currentsoftwaresupportsystemssufferfromthisproblem.InWindowsVista,thereisanewcontrolpanelitemcalled,“Prob-lemreportsandsolutions,”[17]whichisarenementofthecur-rentWindowssupportdialog.Whenaprogrammalfunctions,itcansendapartialmemorydumptoMicrosoftandMicrosoftcansendtheuserabettererrorreport.HoweverthedumpsenttoMicrosoftcancontainarbitrarilysensitivedata(e.g.,passwords,creditcardinformation,etc.).Microsoft'sprivacystatementcurrentlydiscour-agesuserswhoareconcernedaboutprivacyfromusingtheirser-vice[16].ClarifydecisiontreescanbeevaluatedonClarifybehaviorpro-lesinsuchawaythattheend-userlearnsonlytheinformationre-latedtohiserror(andnothingaboutthesoftwaresupporthistoryorcontrolowoftheapplication).Thesoftwarevendorwhoprovidesthedecisiontreelearnsnothingabouttheuser'sexecution.These-curitydetailsareinaseparatepaper[9],butthesystemallowstreeswith255nodesand1,000attributestobesecurelyevaluatedfor28secondsofonlinecomputationand4.5MBofbandwidthforthevendor,and48secondsofonlinecomputationtimeand1.5MBofbandwidthfortheuser.5.MinimizinghumaneffortClarifyrequireshumanstolabelorgenerateexamplesoffaultyerrorreportsinordertotrainamachinelearningclassier.Evenwithoutaclassier,Clarifyshouldhelpusers.Wedescribenearest-neighborsoftwaresupport,anexecutionmodeClarifyuseswhenithasnoclassier.Thesectionnextdescribesdistributingtheworkoflabelingprolesamongasoftwaresupportcommunity.5.1NearestneighborsoftwaresupportClarifyneedsacertainnumberoflabeledexamplestobuildanaccuratemachinelearningclassier(exactnumbersareproblem-dependentandquantiedinSection7.4).Beforeithastrainedaclassier,Clarifyusesnearest-neighborsearchtomatchsimilarbehaviorproles.Forinstance,usersofmpg321cangivetheiremailaddressestoaasupportwebsite.Ifauserhasaproblemthatshedoesnotunderstand,shesendsherbehaviorproletothesitewhichrunsClarify.Thesitereturnstheemailsof5otheruserswhooptedintothesystemandwholikelyexperiencedthesameapplicationbehavior.(Thesystemmightgiveoutaparticularemailaddressonly3timesandtakeotherstepstomakesureparticipantsarenotoverwhelmedwithemailorputonspamlists.)AstheresultsinSection7.6show,nearest-neighborsearchissometimeshighlyeffective,butitisnotasaccurateingeneralasbuildingaclassier.5.2LabelingbehaviorprolesTheClarifyclassiermustbebuiltfromlabeledbehaviorproles.Therearethreewaysthislabelingcanbedone.Membersofasupportorganizationcandoalllabeling.Thisapproachishumanresourceintensive,butprovideshigh-qualitylabeling.End-userscanlabeltheirproles,distributingtheworkacrossmanymorepeople,butenablingmaliciousorineptuserstoaddnoiseintheformofincorrectlabels.End-usercontributionscanbegradedbysupportstafforbypeerreputation(likewhatisdoneoncurrentsupportwebsites).Supportengineerscanwritescriptstogeneratemanyvariantinputsforeachproblem.Allinputsexercisethesameproblem,sotheyallsharethesamelabel.WeusethismethodtoevaluateClarify.Itrequiresthemostexpertise,andtheinputsarenotguaranteedtoaccuratelymodelreal-lifeinputs.6.BenchmarksClarifyisintendedtoimprovetheerrorreportingofcomplex,black-boxsoftwarecomponents.ToevaluateClarify,wechoosebenchmarksthatarecommon,heavily-usedprogramsforwhichnon-exoticerrorconditionsleadtomisleadingornon-existenterrormessages.ThatcommonutilitiesprovideshoddyerrorreportingmakesclearthemotivationforClarify.WealsouseClarifyonprogramsthatspanthekernel/userboundary,containinguser-levelcodethatinteractswithkernelmodules.Interactionacrossaprotectionboundarycreateschal-lengesforerrorreportingduetoxedinterfacesandthedifcultyofpassingmemoryobjectsacrosstheboundary.Thissectionsummarizesthebenchmarksandthekindsofprob-lematicerrorstheyreport.Weexplainthebehaviorunderlyingtheerrorreports—itisthisunderlyingbehaviorthatClarifyisintendedtodiscover.6.1User-levelprogramsgcc.TheGNUCcompilerisapopularcompiler,containingbothhand-writtenandautomaticallygeneratedsourcecode.Ourex-perimentsuseversion3.1,executingonlythecompiler(thecc1phase),usingthe“.i”leoutputofthepre-processor,drawnfromapoolof4,070lespre-processedfromtheLinuxkernel2.6.13dis-tribution.Acorruptorscriptrandomlymodiescorrectsourcecodetoexhibitmistakesfromfourerrorclasses:addingasemicolonaf-teranif()thathasanelseclause,causingthecompilertofailontheelse;omittingtheclosingcurlybracketofaswitchblockcausingan“endofle”error;deletionofasemicolon,yieldingagenericsyntaxerror,oftenonaverydifferentlinefromtheremovedsemicolon;misspellingakeywordwhichalsogeneratesagenericsyntaxerror.Allerrorclassesresultinconfusingandimpreciseer-rormessages.mpg321.mpg321isanmp3playerforLinux.Thisbenchmarkhasthreefailuremodes:leformaterror(e.g.tryingtoplayawavleasifitwereanmp3),corruptedtag(mp3metadataisstoredinID3formattags,e.g.,artistname),corruptedframes(mp3framedataiscorrupt).TheClarifyclassierdistinguishesbetweenthesethreefailuremodesandnormalexecution.Theapplicationitselfdoesnotgiveanyconsistenterrormessageforanyoftheseerrorcases.LaTeX.Latexisatypesettingprogramwidelyusedbythere-searchcommunity.Itserrorreportingisknowntobeobscure.Rub-ber[34]isatoolthatltersLaTeX'soutputtomakeitmorecom-prehensibletotheuser.However,manyofLaTeX'serrormessagesaregenericandmanyhavevariedrootcauses,makingitdifcultforuserstounderstandwhatwentwrongandxit.OurLaTeXbenchmarkhas26ambiguouserrorcases,toomanytosummarizehere,sowedescribeoneillustrativeexample.Awebsite[15]containsanexplanationofalltheclasses. Ifatable,arrayoreqnarrayhasmoreseparatorchar-acters(ampersands)thancolumns,LaTeXprintstheobscureerrormessage,“!Extraalignmenttabhasbeenchangedto\cr”.MostLaTeXbooksandmostLaTeXsupportwebsitesrecommendcheck-ingthenumberofampersandsifauserreceivesthiserror.Somewebsitesandbooksarehelpfulenoughtosuggestamissingendofrowsymbol\\onthepreviousline.Whileforgettingthedoublebackslashwillcausetheerrorreport,theerrorreportisnotunique:misuseofthe\clinecommand(adirectivethatdrawsahorizon-tallineinthetable)willresultinthesamemessageifoneoftheargumentsto\clinereferstoanon-existentcolumninthetable.Userswhomakethe\clinemistakegetanerrormessagethatal-mostallsupportoptionssayareduetooneoftwopossiblecauses,eventhoughthereisathirdpossiblecause.Errorreportsarebiasedtotheirmostlikelycause,leavingauserwhoexecutesalesslikelyscenarioscratchingherhead,potentiallyforalongtime.6.2KernelbenchmarksToevaluatetheapplicabilityofClarifyacrosstheuser/kernelboundary,wechosethreebenchmarksthatdependonbothuser-spaceapplicationsandkernelmodules:iptables,iproute,andmount.iptables.iptablesisapopularopensourceLinuxappli-cationthatdoespacketltering,networkaddresstranslation,andotherpacketmangling.Thepoliciesfortheseoperationsareinkernel-spacedatastructures,whiletheuserapplicationisanin-terfacefortheend-user.Theerrorreportinginterfacebetweenthekernelandtheuserisnetlink.netlinksimpliestheinteractionbetweenthekernelanduserspace,allowinganyonetocreateakernelmoduleandusetheerrorreportinginfrastructure.Butnetlinkmakestheerrorreportinginterfacerigid,forcingthekerneltoreuseerrorcodeslikeEEXIST.TheEEXISTcodemeansboththataletheusertriedtocreatealreadyexists,andthatanewpackethandlingrulecreatesaconictwiththecurrentrules.Thisambiguityisespeciallyconfusingwhenanattempttoaddanewpackethandlingrulereturnsthestring,“Fileexists”becausethatisthedefaultstringfortheEEXISTerrorcodeintheCruntimelibrary.TherstbehaviorclassforthisbenchmarkincludesthemisuseoftableSNAT,DNAT,andSAME,allofwhichproducethegeneric“Invalidarguments”error.ThesecondclassisthemisuseofMARKasajumptarget,thethirdclassisabsenceofthekernelmodulethatisnecessarytohandletheuser'srequest,andthefourtherrorclassisusingaforwardingchainnamethatdoesnotexist.Thekernelreturnsthesameerrorcodeforthelastthreeclasses,whichcausestheapplicationtoprint,“Nochain/target/matchbythatname”.iproute.iproutecontrolsthecontentsofthekernelroutingtables.Ithassimilarproblemsreportingerrorsasdoesiptablesbecauseitalsousesthenetlinkerrorreportingstandard.Thersterrorclassisaddingroutingrulesthatconictwithexistingrules;thesecondisaddinganIPaddressthatconictswithexistingIPaddress;thethirdisentryofaconictingroutingtableentrythatshouldproduceanerror,butdoesnotduetoabuginthekernelmodule.Theerrormessageforboththerstandtheseconderrorclassesare“RTNETLINKanswers:Fileexists”duetotheuseoftheoverloadedreturncodeEEXIST.nfsmount.MountingaremoteNFSserverisacomplicatedop-erationinvolvingdifferentkernelsubsystemsandcross-machinecommunication.Itisnowonderthattheerrorreportsgeneratedfrommountcanbecryptic.Thersterrorclassisspecifyingthewrongportnumber,whichproducestheunrelatederrormessages“NFSv3notsupported!”or“Can'treadsuperblock”.TheseconderrorclassisaTCP/UDPmismatchbetweentheserverandtheclient,andthethirderrorclassiswhentheserverisdown.Inbothcases,themountprogramprints“RPC:Programnotregistered”. App. inst. Er FP CSP CSRV PP CTP latex81 34,677 81 395 6,802 61,202 1,504 23,296 latex27 11,528 27 395 2,191 21,425 1,504 20,761 mpg321 263 4 128 1,162 11,495 21,954 1,318 gcc 1,582 5 2,920 57,221 514,973 40,513 93,246 iptables 131 5 56 70 N/A N/A N/A iproute2 146 4 146 475 N/A N/A N/A mount 1,920 5 292 292 N/A N/A N/A Table2.SizesoftheClarifybehaviorprolesforeachbenchmark.Thesecondandthirdcolumnsshowthenumberofinstances(pro-gramexecutions)andthenumberoferrorclassesforeachbench-mark.Theremainingcolumnsshowthenumberoffeaturesforeachbehaviorrepresentation.SSisnotshowninthetablesinceitalwayshas9features.Kernelutilitiesonlygeneratethersttwobehaviorprolesduetolimitationsinhowthekernelcanbeinstrumented.Thiserrormessagemakessomesensebecausetheremoteproce-durecalldaemoncannotndtheproperprogramtohandletheuser'srequest,butthismightnotbeobvioustonormalend-users.ThefourtherrorclassisNFSversioncongurationmismatchbe-tweentheserverandtheclient.WetestedwithNFSv2andNFSv3.Theerrormessageis“RPC:Program/versionmismatch;lowver-sion=1,highversion=2”.WhiletheproblemdetectedtheNFSversionmismatch,theerrormessagereportsthewrongversionnumberswhichislikelytoconfuseauserdiagnosingtheproblem.6.3ComplexityofClarifybenchmarkdatasetTable2summarizesthecomplexityoftheClarifybenchmarkdataset.Eachprogramhasatleastthreeambiguousormislead-ingerrorclassesandonenormalclass.Latex27has26ambiguouserrorclassesand1normalclass.Ingeneral,moreaccurateproleshavemorefeatures.Forinstance,thereare533functionsinlatex,but6,802callsites,andcall-siteprolingismoreaccuratethanfunctionproling.Ourbenchmarksallhaveapproximatelyequalnumberofin-stancespererrortype.Thisdistributionisnotintendedtomodelthefrequencyofbugsoccurringintheeld,butrathertrainstheclassiertodistinguishamongthegivencases.7.EvaluationWeevaluateClarifyaccordingtofourcriteria:accuracy,perfor-mance,trainingcost,andscalability.First,Clarifymustcorrectlyclassifyprogrambehaviorsthatshareambiguouserrormessages.Accuracyissummarizedbytheratioofbehaviorprolescorrectlyclassiedtothetotalnumberofproles(Section7.1).Aperfectclassierwouldcorrectlyidentifyeacherrorscenariofromthebe-haviorproleforeachbenchmark.Asfurthervalidationofourclas-sicationmodels,weexaminethedecisiontreesgeneratedbyClar-ifyinSection7.3.Weshowthatthetreetestsprogramfeaturesthatintuitivelycorrelatewiththeobservedbehavior.TheaccuracyofClarifymustcomeatanacceptableperfor-mancecost,whichismeasuredinSection7.2.Asuccessfuldeploy-mentoftheClarifysystemshouldincurminimaloverheadcosts.Labeledexamplescanbeexpensivetocollect,asdeterminingtheerrortypeofagiveninstancecanrequireconsiderablehumaneffort.Section7.4showshowmanylabeledbehaviorprolesarerequiredtogenerateaClarifyclassier.Intheabsenceofanylabeleddata,Clarifyemploysanearest-neighboralgorithm,whereusersarepairedwithotheruserswhohaveexperiencedthesameproblem(Section7.6).Section7.7showsdataabouttheuseofClarifyasdeployedinourlab. App. CSP CTP Forensic Live Forensic Live latex 0.6% 5.3% 1.1% 97% mpg321 0.3% 1.2% 1.3% 67% gcc 1.0% 7.0% 9.9% 110% iptables 1.1% 3.2% N/A N/A iproute2 4.7% 7.6% N/A N/A mount 1.1% 3.1% N/A N/A Table3.SlowdownofprogramsrunningundertheClarifyrun-timeusingCSPandCTPforaforensicdeployment(whichcanonlyclassifyerrorsknownduringtraining),andalivedeployment(whichcanclassifynewerrorsfoundafterdeployment).Finally,section7.5examineshowtheaccuracyofClarify'sclassiersscalewiththenumberoferrorclasses.TherobustnessoftheClarifyclassiersisdemonstratedbytherelativelyhighaccuracyobtainedforthelatexbenchmarkwith81classes.7.1ClassicationaccuracyClarifyusesdecisiontreestoclassify.Decisiontreesarenestedif-then-elsestatementswhereeachleafcorrespondstoasingleclassprediction.Anadvantageofdecisiontrees(overmorecontinuousmethodslikesupportvectormachines)istheireaseofinterpreta-tion.Itispossibleforasoftwareengineertovalidatetheclassierbasedonknowledgeofprogramstructure.Further,inthecontextofourexperimentswithClarify,decisiontreesareasaccurateasothermachine-learningmethods.AlthoughClarify'sinstrumenta-tioncomputesthousandsoffeaturesthatdescribeeachprogramex-ecution,thetaskofclassifyingerrormessagescanbeaccomplishedbyanalyzingonlyafewfeatures.Thiscanbeseenthroughtherel-ativelysmallsizeandhighaccuracyofClarify'sdecisiontrees.Incontrast,methodsthatoptimizeovertheentirefeatureset—e.g.lo-gisticregressionorsupportvectormachines—tendtoyieldover-ttedmodelswithloweraccuracy.Otheralgorithmsthatoptimizeoveronlyasubsetoffeatures,suchasrulelearningandboosteddecisionstumps,yieldclassierswefoundtobecompetitivewithdecisiontrees.Figure3showstheaccuracyofuserandkernelbenchmarks,forseveraldifferentbehaviorrepresentations.Thesetablesreportaccuracyusing5-foldcrossvalidation,astandardtechniqueforevaluatingclassiers.Thedatasetispartitionedintovesections,theclassieristrainedandtestedvetimes;itistrainedonfoursectionsofthedataanditsaccuracytestedontheremainingfth.Theaverageofthesevetestsisthereportedaccuracyoftheclassier.ThedecisiontreesarebuiltusinganimplementationoftheC4.5algorithm[32]foundintheWEKAmachinelearningpackage[39].Call-treeproling(CTP)demonstratesthebestoverallaccuracy.Call-siteproling(CSP),pathproling(PP)andCTPhaveanaccuracyofover85%oneveryuser-levelbenchmark,andcall-siteprolinghasover85%accuracyforkernelbenchmarks.85%accuracyisasignicanthelpforimprovingerrorreports.Toevaluatesampling,wepresentresultsforsamplingFPandCSP,withasamplingrateof10%(whichisgenerousforsys-temsthatusesampling[27]).Forexample,thesampledfunctioncountsrecordoneofeverytenfunctioncalls,uniformlyatrandom.Thesampledresultsarethestippledpartofeachbar,achievinglowerclassicationaccuracythannon-sampleddataforalmostev-erybenchmarks.Thepooraccuracyofsamplingconrmsourintu-itionthatsamplingisthewrongapproachforclassifyingprogrambehavior,becauseClarifymustbesensitivetorareevents.7.2PerformanceTable3showstheperformanceofliveandforensicdeploymentsofcall-siteproling.Alltimingrunsareonadual-processorIntelXeon3.0GHzwith2GBofRAM.Becausethereisnofreelyavail-ablestaticbinarytranslatorforthex86architecture,theexperimentmodiestheassemblycodeoftheprogramstocountcallsitesinexactlythewayabinarymodicationtoolwoulddoit.Onthex86acountwithaknownaddresscanbeincrementedwithasingleinstruction.Thecountersresideinamemorymappedle,sotheresultscanbecollectedafterprogramtermination.Eachbenchmarkrunsseveralinputstoobtainarunningtimethatislongenoughtomeasureaccurately:gcccompilesthe23largest.ilesfromtheLinux2.6.16distribution,mpg321decodes256framesof200mp3les,andLaTeXprocesses5leswithatotalof27,587lines.Weaveragetheusertimeofthreeexecutions.TheremainingrowsinTable3showbenchmarksrunonthe2.6.17versionoftheLinuxkernel.Thekernelbehaviorproleisbuiltus-ingthekprobes[24],adynamicinstrumentationpackagethatisstandardinLinux.Kprobesusesbreakpointssoitisamoreex-pensiveformofinstrumentation.Weuseittocollectonlyfunctionprolingandcall-siteproling.Performanceoverheadislowforcall-siteproling,bothforforensicandlivedeployments.Thelivedeploymentoverheadforcall-siteprolingismodest,lessthan7.6%.ThelivedeploymentoverheadforCTPismuchhigher.Livedeploymentrequiresin-strumentingtheentirebinary,whileforensicdeploymentchoosesfeaturesthattrainingrunsindicatearenecessarytodisambiguateaknownsetofproblemsandthatarecheaptocollect,e.g.,theyresideinfunctionsthatarecalledinfrequently.Weuseapublishedmachinelearningalgorithm[19]thatusestrainingdatatondtheminimumcosttreewhoseaccuracyiswithin1%withourcost-oblivioustree.TheincreaseinperformancefromlivetoforensicforCTPisdramatic.TheoverheadofCSPissmallertobeginwith,sothereductionissmaller,buttheforensicoverheadofCSPforuser-levelprogramsislessthan1%.Thehighcostofbreakpointsinthekernelaccountsforthehigheroverheadrelativetouser-levelprograms.Forensicdeploymentisaneffectivemeansofdeploy-ingricherbehaviorproleslikeCTPatreasonablelevelsofperfor-mancecost.7.3VerifyingthemachinelearningmodelMachinelearningalgorithmstrainclassierswithoutanydomainknowledgeregardingtheunderlyingsemanticsoftheprogram'sbe-havior.Itispossibleforaclassiertofailmiserablyonunseendatabecausetheclassierexaminesfeaturesthataresemanticallyun-relatedtothebehavioritclassies.TomakesurethatClarifyclas-siersuseprogramfeaturesthatintuitivelyrelatetothebehaviorstheyclassify,weexaminedseveralclassiersbyhand.Classierstrainedusingfunctionprolingandcall-treeprolingforthemp3playermpg321areshowninFigure4.Thetreesshowhoweachbehaviorproleprovidesdifferentcluestotheclassieraboutthesameunderlyingbehavior.Thefunctionprolingtreeiscomposedofasimplersetofrulesthatdepictdifferencesincontrolowacrossthefourerrorclasses.Attherootofthetree,thefunctionmad layer IIIprovidesnearperfectdiscriminativeinformationforthe'wav'errorclass:themad layer IIIroutineispartofthelibmadlibraryandiscalledwhentheaudioframedecoderruns.Sincethewavformatisamongtheformatsnotsupportedbympg321,itwillnotsuccess-fullydecodeanyaudioframes,andthelibmadlibrarywillnevercallmad layer III.Theid3 tag deleteroutinedifferenti-atesbetweenthecorruptedtagandandotherclasses.TheID3tagparserinthelibid3taglibrarydynamicallyallocatesmemorytorepresenttagsandfreesthemwithid3 tag delete.Iftagpars-ingfails,thememoryforatagisnotallocated.Sincenotagparsing latex81 latex26 gcc mpg321 Accuracy (%) 45 50 55 60 65 70 75 80 85 90 95 100 FP CSP PP CTP CSRV SS iptables iproute2 mount Accuracy (%) 35 40 45 50 55 60 65 70 75 80 85 90 95 100 FP CSP Figure3.Thegureshowstheaccuracyoftheclassierusedtodistinguishtheerrorcases,basedonbehaviorproles,foreachbenchmark.Foreachbenchmarkaclassierisbuiltusingdifferentbehaviorproles:functionproling(FP),call-siteproling(CSP),pathproling(PP),call-treeproling(CTP),call-siteprolingwithpredicatedreturnvalues(CSRV),andstackscraping(SS).Thegurealsopresentssampledversionsoffunctionprolingandcallsiteprolingwithasamplingrateof10%(inthestippled,lowerbarinthestackedFPorCSPentry).succeedsinthecorruptedframescase,id3 tag deleteisnevercalledtofreethetagmemory,makingitsabsencediscriminativeforthatclass.Thelibmadaudiolibrary'sdefaulterrorhandlererror defaultisusediftheapplicationdoesnotspecifyone.mpg321doesnotspecifyitsownerrorhandler,sothepresenceofthefunctionindicatescorruptedaudioframes,anditsabsenceindicatesthecorruptedid3tagscase.Finally,III freqinver,whichperformssubbandfrequencyinversionforoddsamplelines,iscalledveryfrequentlyaspartofthenormalprocessofdecodingaudioframedata.Whentherearecorruptedframes,thisfunctioniscalledlessfrequently,andthedecisiontreealgorithmndsanap-propriatethresholdvaluetoseparatethenormalfromthecorruptedcase.Thedecisiontreebuiltoncall-treeprolingdatahasarichercombinationofdatasourcesthanfunctionproling.Call-treepro-lingusesthepresenceofthelibmadlibraryfunctionIII side-info(whichdecodesframesideinformationfromabitstream)callingtheutilityfunctionmad bit readasanindicatorofsuc-cessfulaudioframedecoding.Thelackofthatcallingpatternreliablyindicatesaleformaterror.Thecorruptedframesclassisonceagaindifferentiatedfromthenormalclassbyathresh-oldvalueonasubtreeoflibmadfunctionsthatwillonlybecalledduringsuccessfuldecodingofaudioframedata,suchasIII scalefactors,thediscretecosinetransformfunctionfastsdct,III huffdecode,andsoon.Thelibmadfunc-tionscanencapsulatestheprocessofreadingmp3les.ACTPrule(decodedbitvector)whereinscancallsafunctionthatcallsanumberoflow-levelstreammanipulationroutinessuchasmad- bit read,andmad timer set,andsoon,providesdiscrimi-nativepowerincombinationwithasimilarlycomplexcontrolowpatterninmainforthecorruptedtagserrorclass.ThedecisiontreenodewhoseCTPruleinvolvesmain,id3 get tag,andsoondifferentiatesbetweennormalanderrorconditionsforthehandlingofID3tags,whilethedecisiontreenodewhoseCTPruleinvolvesscandiscriminatesbetweensuccessfulandunsuccessfulaudiodecoding.ThehighlevelpatternexposedbytheserulesisthecombinationoffailedID3tagparsingwithsuccessfulaudiodecoding,whichpreciselydescribesthecorruptedtagerrorclass.7.4Howmanylabeledbehaviorprolesareneeded?TheclassiersusedbyClarifyaretrainedwithlabeledbehaviorproles.Labelingprolesgenerallyrequireshumaneffort,soitshouldbeminimized.Ingeneral,classierstrainedwithfewerlabeledtraininginstanceswillresultinlessaccuratemodels.Inthis #Classes Accuracy CreationTime 10 97.8% 25min 20 97.5% 1hr37min 35 94.9% 6hr2min 50 94.3% 10hr26min 65 93.9% 11hr50min 81 93.6% 18hr28min Table4.TheaccuracyandtimetocreatetheclassierasthenumberofbehaviorsisincreasedintheLaTeXbenchmark.section,weinvestigatethetradeoffbetweenclassicationaccuracyandtheamountoftrainingdatausedinbuildingtheclassier.Figure5plotstheclassicationaccuracyofthelatexbenchmarkasafunctionofthenumberofinstancesusedintraining(thebenchmarkincludes75ofthe81distincterrorclasses).TheC4.5algorithmusedtobuildthedecisiontreeissurprisinglyrobust:withasfewas15examplesperclass,thealgorithmachievesanaccuracyof86%.LookingatthelegendinFigure5,wecanseethattoachieveaccuracywithin1%ofthemaximum,onlyasmallsubsetofthetrainingdataisrequired.Forexample,gccneedsonly105instancestoattaintheaccuracylevelof88:9%whichiswithin1%oftheaccuracyreachedwhenweuseallofthe300availableexamplesperclass.Ahumandoesnotneedtolabeleachbehaviorproleindivid-ually.Forourtrainingsetsweuseascripttoinduceerrorsintheprograminput,producinglargenumbersoftrainingexampleswithlittlehumaninvolvement.However,inducingerrorsbyascriptisnotnecessarilyanaccuratemodelfortheerrorsthatClarifywouldseeindeployment.7.5ScalabilityInthissectionweanalyzehowClarifyscalesasthenumberoferrorclassesincreases.LaTeXhas247uniqueerrormessages,andweevaluatethescalabilityon81behaviorclasses—aboutonethirdofallpossibleLaTeXerrors.Table4showshowmodelcreationtimeandclassicationac-curacyscaleasthenumberoferrorclassesincreases.Weconsidersubsetsoferrorclasseswithvaryingsizes.Foreachsize,wepicked10randomsubsetsoferrorclassesandranourexperiments.Thecurvesshowninthegrapharetheaverageoftheresultsforeachsize.Wecanseethatasthenumberoferrorclassesincreasestheaccuracydropsfrom97:8%to93:6%.Thisdecreaseisacceptable Functionproling Call-treeproling Figure4.Decisiontreesproducedforthempg321benchmark.Dottedlinesaretakenwhenthenormalizedcountofthefeaturevalueislessthanorequaltoathreshold,whilethesolidlineistakenwhenitisgreaterthanthethreshold.Thethresholdisdeterminedautomaticallyforeachbenchmarkbythedecisiontreealgorithm,andcanbedifferentforeachnodeinthetree.Clearboxesarefeatures.FPfeaturesarenormalizedfunctioncounts,andcall-treeprolingfeaturesarenormalizedcountsofCTPsubtrees(representedbythesymbolictreenamesinbrackets,withfunctionnamesfornodesineachcalltree).Shadedboxesareerrorclasses.consideringthatthenumberofbehaviorshasincreasedbyafactorof8.Thetrainingtimeofthemodelincreasesfromunder30minutestomorethan18hoursasthenumberoferrorclassesincreasesfrom10to81.Thisincreasedoesnothinderscalabilitysincethemodelistrainedofine.Ofgreaterpracticalconcernistheexecutiontimeneededtoevaluatethedecisiontree,asthislargelydeterminestheamountofprocessingdoneattheclientend.Ourexperimentsshowthatthetimetoexecutethemodelsaveraged10ns,withamaximumof21ns,whichisimperceptibleforalmostanyapplication.Clarifyscalestonearlyonehundrederrorbehaviorswithoutmuchlossinaccuracyorsubstantialincreaseinprocessingtime.7.6NearestneighborsoftwaresupportIncontrasttothedecisiontreesusedbyClarify'sclassierswhichrelyononlyasmallsubsetofallfeatures,nearest-neighboral-gorithmsrelyonaveragesoverallfeatures.Forexample,theEu-clideandistancebetweentwoinstances,apopularmetricusedfornearest-neighborsearches,isafunctionoftheaverageofthesquaresofthedifferencebetweeneachpairoffeaturevalues.Such 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 iptables: 10iproute: 30nfs: 35mpg321: 38gcc: 105latex81: 110 Figure5.Thecurveshowshowtheaccuracyincreasesasthenum-beroftraininginstancespererrorclassisincreased.Thedatasetisthelatexbenchmarkwith75classes.Thetextonthegraphgivestheminimumnumberoftraininginstancesneededforabenchmarktoachieveaccuracywithin1%ofaccuracyobtainedusingallthetrainingdata. App. FP CSP CTP CSRV latex26 0.75 1.01 0.52 0.93 mpg321 2.22 4.21 1.52 1.40 gcc 2.73 2.67 2.19 0.97 iptables 1.12 1.03 N/A N/A iproute2 3.10 2.77 N/A N/A mount 2.71 2.40 N/A N/A Table5.Intheabsenceoflabeledtrainingdata,Clarifyusesanearest-neighboralgorithmwithlinear-regressionbasedfeaturescaling.Thistableshowstheexpectednumberofcorrectlyclassi-edneighborsforave-nearest-neighborsearch.distancefunctionsareparticularlysusceptibletodifferencesofscaleamongthevariousfeatures.InClarify,featurestakeonvastlydifferentscales:somefeaturesmayhaveacountunderten,whileothersmayhaveupwardsofonemillionoccurrences.Furthermore,forsomefeatures,thecountisafunctionofthelengthoftheprogramexecution,andforothersitisindependentofprogramexecution.Forexample,aparsing-relatedfeatureforgccwillbecalledmanymoretimesforalongerlethatcontainsmanyrepetitionsofparticularconstruct,thanashorterle.Somesectionsofcode—e.g.initializationfunctions—willbecalleda(roughly)constantnumberoftimesandthuswilltakeonvaluesindependentoftheprogramexecutionlength.Toovercomesuchscalingchallenges,Clarifyemploysalinearregression-basedfeaturescalingmethod.Foreachfeaturey,aleast-squareslineisttedtocorrelateeachfeaturevaluewithitscorrespondingprogramexecutionlengthx(denedasthesumofallfeaturevaluesofagivenexecutioninstance).Thefeaturevalueisnormalizedtobethescaleddifferencebetweenthefeaturevalueyandthettedfeaturevaluef(x).Thescalingfactorisdeterminedsuchthatthevarianceofeachresultingfeatureisone.Wenotethatforfeaturesthathavenocorrelationwithprogramlength,thelinearregressionstepwillhavenoeffectonthenalnormalizedfeaturevalues.Table5givestheexpectednumberofcorrectlyclassiedneigh-borsforanearestneighborsearchreturningveneighbors.Eu- clideandistanceisused.Forsomebenchmarkswithmanyclasses(theLaTeXbenchmarkintable5has27classes),accuracyofthenearest-neighborsearchissomewhatlower.Insuchcases,alargernumberofneighborsshouldbereturned.7.7DeploymentTobeginunderstandingtheperformanceofClarifyinadeployedenvironment,wecreatedaversionofLaTeXthatincludesstaticinstrumentationandasmallruntimetogeneratecall-siteinforma-tion.WedeployedtheversionofLaTeXtoauserbaseof6usersoveraperiodof3weeks.OurdeployedversionofLaTeXencoun-tered57distincterrorinputsrangingover17errorclassesandwasabletoclassify46%(26/57)ofthemcorrectly.LaTeXhas247er-rormessages—theexperimentwasnotlimitedtounclearoram-biguousmessages.Classifyingnearly50%ofaprogram'sbehaviorcorrectlyismuchmoredifcultthandisambiguatingasmallnum-beroferrorbehaviors.8.RelatedworkWerstcontrastClarifytoseveralsystemsthatappearsimilar.Clarifyimproveserrorreportingbyclassifyingprogrambehavior,itdoesnotndprogrambugs[20,27,1,14].Anambiguouserrormessageorreturncodemightmeetthespecicationforaprogram(e.g.,thenetlinkstandardforerrorreporting).Clarifydoesnotattempttondtherootcauseofprogramfaults[12,31],miscongurations[38,25],orprogramcrashes[10,29,30,7].Itsaimistoclassifytheapplicationbehaviortohelpthedeveloperorend-usergetbettererrorreportswhentheseeventshappen.TheremainderofthissectioncomparesClarifywithproblemdi-agnosissystems,andsystemsthatclassifyprogrambehavior.Clar-ifydoeshelpsoftwareproblemdiagnosisanditclassiesprogrambehavior.8.1ProblemdiagnosissystemsAgroupatMicrosoftResearchcorrelateslow-levelsystemeventswitherrorreportstoautomateproblemdiagnosis[40,41],justasClarifydoes.Theycurrentlyfocusonlyonforensicdeployments(inourterminology),andonbuildingmodelsfromsequencesofsystemcalls.Clarifyusescontrol-owanddatafromtheprogram,whichallowsittodealwitherrorsthatinvolveonlyusercode.Ph[36]alsousessequencesofsystemcallstobuildamodel,thoughtheirmodeldetectshostintrusions.Whilesystemcallsareagoodrepresentationofcertaintypesofprogrambehavior,manyprogramsmakefewsystemscalls(e.g.,SPEC).Becauseeverynamedsystemcallhaswrapperfunctionsfromuser-spacelibraries,Clarifycandetectsystemcallsbydetectingfunctioncallstothewrapperfunctions,givingitaricherinputsourcetodetermineprogrambehavior.Statisticalbugisolation[27,26]correlateslow-levelapplica-tionbehaviorwithapplicationbehavior(bugs)andbuildsamodel,asClarifydoes.Statisticalbugisolationrequiresaspecialcompilertoinsertinvariantchecksintotheprogram,whileClarifyrecordsasmallamountofcontrol-owanddatacontinuously.Statisticalbugisolationsamplestheinvariantsitinsertstogetgoodperfor-mance.Section7.1demonstratesasharplossofaccuracyifClar-ifyusessampling.Statisticalbugisolationmusteliminatesub-bugandsuper-bugpredictors;Clarifyhasananalogousstruggletogainenoughtraininginstancestoisolatetheprogrambehaviorcreatedbytheerrorcondition.Thesystemscouldbeusedtogethertogatherstatisticaldataoncrashesandprovidebettererrormessagesforcrashesandothermisbehaviors.DIDUCE[20]usesdynamicprograminvariantstodetectpro-grambehavioralanomalies.Theanomaliescanindicateprogrambugs,butataperformanceslowdownof6–20.Clarifyismuchfasterandcanclassifyprogrambehaviorthatisnotanomalous.StackbacktracesareusedbymanyremotediagnosticsystemslikeDr.Watson[29],Microsoft'sonlinecrashanalysis[30]andGNOME'sbug-buddy[7].IBMhasasystemtoclassifystackbacktracesharvestedonacrash[10],andthetechnologyhasbeendeployedintheirTrapFindertool.TheirmotivationissimilartoClarify's—reducethehumaneffortneededtomatchproblemsfromdifferentprogramexecutions.Clarifydiagnosesawiderrangeofproblemsthancrashes,anditoperatesonbehaviorproles,whicharearichersourceofdatathanstackbacktraces.8.2ClassifyingprogrambehaviorClassifyingprogrambehaviorhasreceivedattentioninthesoftwareengineeringliterature.Podgurskietal.[31]identifyasimilarmo-tivationtoClarifyandtheyalsoinvestigategccbehavior.Clar-ifyismoreaccurate(over85–100%accurate,ascomparedto24–96%),andismoreofacompletesystem,designedtoaddresstheproblemofimprovingerrorreporting.Bowringetal.[8]modelssoftwarebehaviorasMarkovmodelsusingcontrolowbetweenbasicblocksandthenusesactivelearningtoclusterthemodels.Markovmodelsuseprobabilitieswhichmaketheminsensitivetorareevents.Clarifyneedssensitivitytorareeventsbecauserareeventsoftencharacterizeanerrorbehavior—seethesamplingre-sultsinSection7.1.Bowringet.al.evaluatetheirmethodon33versionsofSPACE,whichisaverysmall6,200lineprogram.Liuetal.[28]useprogrambehaviorgraphsasfeaturesforamachine-learningmodeljustasClarifyusesdatarelatedtopro-gramcontrolow.Thenumberofprogrambehaviorgraphsgrowsquicklywithprogramsize,andcanbecomecomputationallyin-tractableevenforthesmallSiemensprograms[23]usedtoevaluatetheirmethod.SimPoint[35]characterizesthephasebehaviorofapplicationsusingbasicblockexecutioncountstomaintaintheaccuracyofarchitecturalsimulationwhileexecutingfewerinstructions.Thetypesofprogrambehavioritdetectsarecoarse-grainedandoccursovermuchlongertimewindowsthantheerrorsthatClarifyde-tects.SimPointcanreduceitsdatasetto15dimensionsandmain-tainphase-detectionaccuracy.Clarify'sclassiersmustbesensitivetosmall,localizedchangesinbehaviorthatformthesignatureofanerrorbehavior.AsseeninTable2,Clarify'srepresentationshavetensofthousandsoffeatures.Weveriedthatusingrandomprojec-tiontoreducethefeaturecount,likeSimPointdoes,dramaticallyreducesClarify'saccuracy.Programpaths[5]havebeenusedtoanalyzeruntimeprogrambehavior.PathSpectra[33]approximateanexecution'sbehaviorwiththeoccurrence(orfrequency)oftheindividualpaths.Spectraldifferenceshavebeenusedtoidentifytheportionsofaprogram'sexecutionthatdifferwithdifferentinputs,notably,duringY2Ktest-ing[21].PathSpectrafocusedonidentifyingpathdifferencesbe-tweenseveralprogramruns,whereasClarify'snoveluseofpathprolingusesmachinelearningtoidentifywhichpathsarecom-montoeacherrorclass.Clarify'scall-siteprolingismuchmoreefcientandnearlyasaccurateaspathproling.9.ConclusionWepresentClarify,asystemthatimprovestheerrorreportingofblack-boxsystems,e.g.,third-partylibraries,theoperatingsystem,andexternalprograms.OurClarifyprototypeaccuratelyandef-cientlyclassiesthebehaviorofallofthesesystems,enablingimprovederrorreporting.AcknowledgmentsThankstoWilliamCookforhelpwithwriting.ThankstoPeterStone,RaymondMooneyandKathrynMcKinleyforfeedbackonearlierdraftsofthepaper.Thisresearchhasbeensupported byagiftfromMicrosoft'sPhoenixcompilergroup,aDARPAgrantfromthearchitecturesforcognitiveinformationprocessingprogram,andbyNSFgrantCNS-0615104.References[1]M.K.Aguilera,J.C.Mogul,J.L.Wiener,P.Reynolds,andA.Muthitacharoen.Performancedebuggingfordistributedsystemsofblackboxes.InSOSP,BoltonLanding,NY,Oct.2003.[2]G.Ammons,T.Ball,andJ.R.Larus.Exploitinghardwareperformacnecounterswithowandcontextsensitiveproling.InPLDI'97,pages4–16,June1997.[3]AndrewAyers,ChristopherMetcalf,JunghwanRhee,RichardSchooler,AnantAgarwal,andEmmettWitchel.Traceback:Firstfaultdiagnosisbyreconstructionofdistributedcontrolow.InPLDI,June2005.[4]V.Bala,E.Duesterwald,andS.Banerjia.Dynamo:atransparentdynamicoptimizationsystem.InPLDI,pages1–12,2000.[5]T.BallandJ.R.Larus.Efcientpathproling.InMICRO,1996.[6]R.Barrett,E.Haber,E.Kandogan,P.P.Maglio,M.Prabaker,andL.A.Takayama.Fieldstudiesofcomputersystemadministrators:Analysisofsystemmanagementtoolsandpractices.InACMCSCW(Computer-supportedCooperativeWork),2004.[7]J.Berkman.Bug-buddy—GNOMEbug-reportingutility,2004.http://directory.fsf.org/All_Packages_in_Directory/bugbuddy.html.[8]J.F.Bowring,J.M.Rehg,andM.J.Harrold.Activelearningforautomaticclassicationofsoftwarebehavior.InISSTA,Jul2004.[9]JustinBrickell,DonaldE.Porter,VitalyShmatikov,andEmmettWitchel.Secureremotesoftwarediagnostics,Underreview.[10]M.Brodie,ShengMa,G.Lohman,L.Mignet,N.Modani,M.Wild-ing,J.Champlin,andP.Sohn.Quicklyndingknownsoftwareproblemsviaautomatedsymptommatching.InICAC'05,pages101–110,2005.[11]DerekBruening,TimothyGarnett,andSamanAmarasinghe.Aninfrastructureforadaptivedynamicoptimization.InCGO-03,2003.[12]Y.BrunandM.D.Ernst.Findinglatentcodeerrorsviamachinelearningoverprogramexecutions.InICSE,2004.[13]BryanCantrillandMikeShapiroandAdamLeventhal.Dtrace,2006.http://www.genunix.org/wiki/index.php/DTrace_FAQ.[14]TrishulM.ChilimbiandVinodGanapathy.Heapmd:Identifyingheap-basedbugsusinganomalydetection.InASPLOS'06,2006.[15]LatexErrorClasses.http://www.cs.utexas.edu/users/habals/clarify/latex_errors.html,2006.[16]Microsoftcorporation.Privacystatementforthemicrosofterrorreportingservice,2006.[17]Microsoftcorporation.Reportingandsolvingcomputerproblems,2006.[18]MicrosoftCorporation.WhatinformationissenttoMicrosoftwhenIreportaproblem?,2006.[19]JasonV.Davis,JungwooHa,ChristopherJ.Rossbach,HanyE.Ramadan,andEmmettWitchel.Cost-sensitivedecisiontreelearningforforensicclassication.InECML,2006.[20]S.HangalandM.S.Lam.Trackingdownsoftwarebugsusingautomaticanomalydetection.InICSE,2002.[21]M.J.Harrold,G.Rothermel,K.Sayre,R.Wu,andL.Yi.Anempiricalinvestigationoftherelationshipbetweenfault-revealingtestbehavioranddifferencesinprogramspectra.InJournalofSoftwareTesting,VericationandReliability,vol10,no3,2000.[22]J.HumphreysandV.Turner.On-demandenterprisesandutilitycomputing:Acurrentmarketassessmentandoutlook.Technicalreport,IDC,Jul2004.[23]M.Hutchins,H.Foster,T.Goradia,andT.Ostrand.Experimentsontheeffectivenessofdataow-andcontrolow-basedtestadequacycriteria.InICSE,1994.[24]JimKenistonandPrasannaSPanchamukhi.KernelProbes(Kprobes),2006.Documentation/kprobes.txt.[25]N.Lao,J.Wen,W.Ma,andY.Wang.Combininghighlevelsymptomdescriptionsandlowlevelstateinformationforcongurationfaultdiagnosis.InLISA,2004.[26]B.Liblit,A.Aiken,A.X.Zheng,andM.I.Jordan.Bugisolationviaremoteprogramsampling.InPLDI,2003.[27]B.Liblit,M.Naik,A.X.Zheng,A.Aiken,andM.I.Jordan.Scalablestatisticalbugisolation.InPLDI,2005.[28]C.Liu,X.Yang,H.Yu,J.Han,andP.S.Yu.Miningbehaviorgraphsfor”backtrace”ofnoncrashingbugs.InProc.of2005SIAMInt.Conf.onDataMining(SDM05),2005.[29]MicrosoftCorporation.Dr.WatsonOverview,2002.http://www.microsoft.com/TechNet/prodtechnol/winxppro/proddocs/drwatson_overview.asp.[30]MicrosoftCorporation.OnlineCrashAnalysis,2004.http://oca.microsoft.com/.[31]A.Podgurski,D.Leon,P.Francis,W.Masri,M.Minch,J.Sun,andB.Wang.Automatedsupportforclassifyingsoftwarefailurereports.InICSE,2003.[32]R.Quinlan.C4.5:programsformachinelearning.MorganKaufmannPublishers,1992.[33]T.Reps,T.Ball,M.Das,andJ.Larus.Theuseofprogramprolingforsoftwaremaintenancewithapplicationstotheyear2000problem.InM.JazayeriandH.Schauer,editors,ESEC/FSE97,pages432–449.Springer–Verlag,1997.[34]Rubber.http://www.pps.jussieu.fr/˜beffara/soft/rubber,2007.[35]TimothySherwood,ErezPerelman,GregHamerly,andBradCalder.Automaticallycharacterizinglargescaleprogrambehavior.InASPLOS,Oct2002.[36]A.SomayajiandS.Forrest.Automatedresponseusingsystem-calldelays.InProceedingsof9thUsenixSecuritySymposium,August2000.[37]ArielTamchesandBartonP.Miller.Fine-graineddynamicinstrumentationofcommodityoperatingsystemkernels.InOSDI,pages117–130,1999.[38]H.J.Wang,J.C.Platt,Y.Chen,R.Zhang,andY.Wang.AutomaticmiscongurationtroubleshootingwithPeerPressure.InOSDI,2004.[39]I.WittenandE.Frank.DataMining:PracticalmachinelearningtoolswithJavaimplementations.MorganKaufmann,SanFrancisco,2000.[40]C.Yuan,N.Lao,J.Wen,J.Li,Z.Zhang,Y.Wang,andW.Ma.Automatedknownproblemdiagnosiswitheventtraces.MSR-TR-2005-81,2005.[41]C.Yuan,N.Lao,J.Wen,J.Li,Z.Zhang,Y.Wang,andW.Ma.Automatedknownproblemdiagnosiswitheventtraces.InEuroSys,2006.[42]XiaotongZhuang,MauricioJ.Serrano,HaroldW.Cain,andJong-DeokChoi.Accurate,efcient,andadaptivecallingcontextproling.InPLDI,2006.