Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and - PDF document

Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and
Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and

Embed / Share - Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and


Presentation on theme: "Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and"— Presentation transcript


Figure2:NanoXMLTask:IdentifythecauseofthefailureinNanoXMLand xtheproblem.Jonesandcolleagues[11],forseveralreasons.First,Taran-tulais,likemoststate-of-the-artdebuggingtechniques,basedonsomeformofstatisticalrankingofpotentiallyfaultystatements.Second,athoroughempiricalevaluationofTaran-tulahasshownthatitcanoutperformothertechniques[12].(Morerecently,moretechniqueshavebeenproposed,buttheirimprovements,whenpresent,areforthemostpartmarginalanddependentonthecontext.)Finally,Tarantulaiseasytoexplainandteachtodevelopers.4.4.2PluginTomakeiteasyfortheparticipantstousetheselectedstatisticaldebuggingtechnique,wecreatedanEclipseplu-ginthatprovidestheuserswiththerankedlinkedofstate-mentsthatwouldbeproducedbyTarantula.Wedecidedtokeeptheplugin'sinterfaceassimpleaspossible:alistofstatements,orderedbysuspiciousness,whereclickingonastatementinthelistopensthecorrespondingsource leinEclipseandnavigatestothatlineofcode.Webelievethatthisapproachhasthetwofoldadvantageof(1)lettingusinvestigateourresearchquestionsdirectly,byhavingtheparticipantsoperateonarankedlistofstatements,and(2)clearlyseparatingthebene tsprovidedbytherankingbasedapproachfromthoseprovidedbytheuseofamoresophis-ticatedinterface,suchasTarantula'svisualization[11].Theplugin,showninFigure3,worksasfollows.First,theuserinputsacon guration leforataskbypressingtheload leicon.Oncethe leisloaded,theplugindisplaysatablewithseveralrows,whereeachrowsshowsastatementandthecorresponding lename,linenumber,andsuspi-ciousnessscore.Besidesclickingonastatementtojumptoit,asdiscussedabove,userscanalsouseapreviousandnextbuttontonavigatethroughthestatements.Tocomputetherankedlistofstatementsfortheplugin,weusedtheTarantulaformulasprovidedinReference[11],whichrequirecoveragedataandpass/failinformationforasetoftestinputs.ForbothTetrisandNanoXML,wecollectedcoveragedatausingEmma(http://www.eclemma.org/).ForNanoXML,weusedthetestcasesandpass/failstatusforsuchtestcasesavailablefromtheSIRrepository.ForTetris,forwhichnotestcaseswereavailable,wewroteacapture-replaysystemthatcouldrecordthekeyspressedwhenplayingTetrisandreplaythemastestcases.Overall,wecollected10gamesessions,2ofwhichexecutedthefaultystatement(i.e.,rotatedasquareblock). Figure4:Participantsaresplitintodi erentgroupshavingdi erentconditions.Eachboxrepresentsatask:thelabelintheboxindicatesthesoftwaresub-jectforthetask;thepresenceofawrenchindicatestheuseoftheautomateddebuggingtoolforthattask;theiconsrepresentinganarrowindicatetasksforwhichtherankofthefaultystatementhasbeenincreased(up)ordecreased(down).4.4.3DataAvailabilityOurEclipseplugin,programsubjects,andinstructionssheetsareavailableforresearcherswishingtoreplicateourstudyathttp://www.cc.gatech.edu/~vector/study/.4.5MethodToevaluateourHypothesis1,andassesswhetherpartici-pantscouldcompletetasksfasterwhenusinganautomateddebuggingtool(tool,hereafter),wecreatedtwoexperimen-talgroups:AandB.ParticipantsingroupAwereinstructedtousethetooltosolvetheTetristask.Conversely,partici-pantsingroupBhadtocompletetheTetristaskusingonlytraditionaldebuggingcapabilitiesavailablewithinEclipse.Ifthetoolweree ective,thereshouldbeasigni cantdif-ferencebetweenthetwogroup'staskcompletiontime.WeinvestigatedourHypothesis2,andassessedwhetherparticipantsbene tedmorefromusingthetoolonhardertasks,bygivingtheexperimentalgroupsasecondtask: xafaultinNanoXML.IngroupA,participantswerelimitedtouseonlytraditionaldebuggingtechniquesavailablewithinEclipse,whereasingroupB,participantscouldalsousethetooltosolvethetask.Inthiscase,wecomparedthedif-ferenceinperformanceforthegroupsusingthetoolfortheTetrisandtheNanoXMLtasks.Ifthetoolweremoree ec-tiveforhardertasks,theperformancegainofparticipantsusingthetoolfortheNanoXMLtaskshouldbebetterthanthatofparticipantsusingthetoolontheTetristask.OurHypothesis3aimstounderstandthee ectsoftherankofthefaultystatementontaskperformance.Toin-vestigatethishypothesis,wecreatedtwonewexperimentalgroups:CandD.BothgroupsweregivenboththeTetrisandtheNanoXMLtasksandwereinstructedtousethetooltocompletethetasks.Thedi erencebetweenthetwogroupswasthat,forgroupD,weloweredtherankofthefaultystatementforTetris(i.e.,moveditdownthelist)andincreasedtherankofthefaultystatementforNanoXML(i.e.,moveditupthelist).Ifrankwereanimportantfac-tor,thereshouldbeadecreaseinperformancefortheTetristaskandanincreaseinperformancefortheNanoXMLtaskforgroupD.AsummaryofthemethodweusedforinvestigatingourhypothesescanbeseeninFigure4. groupAperformedtheTetristask2.5timesfasterthantheNanoXMLtask.SubjectsingroupBperformedtheTetristask1.3timesfasterthantheNanoXMLtask.Thesevaluesaresigni cantlydi erent(p0:02)byatwo-tailedt-test.Accordingtotheseresults,statisticaldebuggingwiththetoolwasnomoree ectivethantraditionaldebuggingforsolvingahardertask.Therefore,wefoundnosupportforHypothesis2.Overall,theresultssuggestthattheremightbeseveralfactorsthatcanexplainwhytheautomateddebuggingtooldidnothelptheNanoXMLtask.Inthediscussionahead,wespeculatewhatthesefactorsmaybe.5.3ChangesinRankHavenoSignicantEf-fectsForHypothesis3,wewantedtoexplorethee ectofrankonthee ectivenessofthetool.Toassessthishypothesis,wemeasuredthee ectofarti ciallydecreasingandincreasingtherankofthefaultystatements.Ifthishypothesisweretrue,wewouldexpectthee ectivenesstodecreasewhendroppingtherankandincreasewhenraisingtherank.AsdiscussedinSection4.5,wetestedthishypothesisbyconductinganexperimentwith10newparticipantssplitintogroupsCandD.ForgroupC,therankoffaultystatementswaskeptintact.ForgroupD,therankforthefaultystate-mentinTetriswasloweredfromposition7toposition35.Similarly,therankforthefaultystatementinNanoXMLwasraisedfromposition83toposition16.(Thenewrankswereselectedinarandomfashion.)Thismodi cationoftheranksshouldhaveimprovedthee ectivenessofthetoolfortheNanoXMLtask,andhurtthee ectivenessofthetoolfortheTetristask,forgroupDincomparisontogroupC.ComparingtheaveragecompletiontimeoftheTetristaskforgroupsCandD,wedidobservethatgroupD(12:36)wasalittleslowerthangroupC(10:12).Surprisingly,fortheNanoXMLtaskgroupDwasnotanyfasterthangroupCdespitethemuchlowerrankofthefaultystatement(16ver-sus83).Infact,groupDactuallyperformedtheNanoXMLtaskslowerthangroupC|15:12forgroupCversus18:30forgroupD.However,thedi erencesinperformancebetweenthegroupswerenotstatisticallysigni cant.Infact,acomparisonofthecompletiontimeratioofTetristoNanoXMLyieldsthesameexactaveragefraction(.79)forbothgroups.Thissuggeststhatbothgroupswereverysimilarinperformance.Lower-ingrankdidnothurttheperformanceofgroupDontheTetristask,nordidraisingtherankforNanoXMLhelpim-provegroupD'sperformance.Therefore,overall,theresultsprovidenosupportforHy-pothesis3.Thissuggeststhattherankofthefaultystate-ment(s)maynotbeasimportantasotherfactorsorstrate-gies.Theparticipantsmaybeusingthetoolto ndotherstatementsthatarenearthefault,butrankedhigherthanthefault.Ortheymaybesearchingthroughthestatementsbasedonsomeintuition,thuscancelingthee ectofchang-ingtherelativepositionofthefaultystatement.Forex-ample,fourparticipantsingroupD,whohadtherankofthefaultyTetrisstatementlowered,wereabletoovercomethishandicapbyvisitinganotherstatementinposition3(previously8)thatwasinthesamefunctionasthebug.Thissuggeststhatprogrammersmayusethetooltoiden-tifystartingpointsfortheirinvestigation,someofwhichmaybenearthefault.Thiswouldlessentheimportanceofcorrectlyrankingtheexactlineofcodewiththefault.5.4ProgrammersSearchStatementsToanswerourResearchQuestion1onpatternsusedbydeveloperswheninspectingstatements,weanalyzedthelogsproducedduringtheusageofthetool.Speci cally,wewantedtoassesswhetherdevelopersinspectedstatementsinorderofranking,onebyone,orfollowedsomeotherstrategy.Weusedthenavigationdatacollectedfromthe24participantsingroupAandB,ofwhich22hadusablenavigationdata.Wealsoexaminedthequestionnaireofall34participantstogaininsightintotheirstrategiesforusingthetool.Basedonthisdata,wehavedeterminedthatprogrammersdonotvisiteachstatementinalinearfashion.Thereareseveralsourcesofsupportforthisobservation.First,foreachvisit,wemeasuredthedeltabetweenthepositionsoftwostatementsvisitedinsequence.Allpartic-ipantsexhibitedsomeformofjumpingbetweenpositions.Speci cally,37%ofthevisitsjumpedmorethanoneposi-tionand,onaverage,eachjumpskipped10positions.Theonlyexceptionwerelowperformers(thosewhodidnotcom-pleteanytask),whosemajority(95%)cycledthroughthestatementsandveryrarelyskippedpositions.Observingthenumberofpositionsskippedduringallthevisits,wehypoth-esizethatsmallerjumpsmaycorrespondtotheskippingofblocksofstatements;conversely,largerjumpsseemtocorre-spondtosomeformofsearchingor lteringofstatements|ahypothesisalsosupportedbytheresponsesinthepartici-pants'questionnaires.Second,thenavigationpatternwasnotlinear.Partici-pantsconsistentlychangeddirections(i.e.,theystartedde-scendingthelist, ippedaround,andstartedascendingthelist).Wemeasuredthenumberof\zigzags"inapartici-pant'snavigationpatternanytimetherewasachangeindirection.Onaverage,eachparticipanthad10.3zigzags,withanoverallrangebetween1and36zigzags.Finally,onourquestionnairegiventoallparticipants,manyparticipantsindicatedthatsometimestheywouldscantherankedlistto ndastatementthatmightcon rmtheirhypothesisaboutthecauseofthefailure,whereasothertimestheyskippedstatementsthatdidnotappearrelevant.5.5NoPerfectBugUnderstandingToinvestigateourResearchQuestion2ontheassumptionofperfectbugunderstanding,wemeasuredthetoolusagepatterns.Welookedatthe rsttimeaparticipantclickedonthefaultystatementinthetool,andthenexaminedtheparticipant'ssubsequentactivity.Ifthefaultynatureofastatementwereapparenttothedevelopersbyjustlookingatit,toolusageshouldstopassoonastheygettothatstatementinthelist.Weusedthelogdatafromthe24participantsingroupsAandBandexcludeddataforparticipantsthatneverclickedonthefaultystatement,whichleftuswithdatafor10par-ticipants.Only1participantoutof10stoppedusingthetoolafterclickingonthefault.Theremainingparticipants,onaverage,spentanothertenminutesusingthetoolafterthey rstexaminedthefaultystatement.Thatis,partici-pantsspent(orwasted)onaverage61%oftheirtimecon-tinuingtoinspectstatementswiththetoolaftertheyhadalreadyencounteredthefault.Thissuggeststhatsimplygivingthestatementwasnotenoughfortheparticipantstounderstandtheproblemandthatmorecontextwasneeded,whichmadeusconcludethatperfectbugunder-standingisgenerallynotarealisticassumption. modeltowhichthedevelopercanrelate.Whenusingthesetools,insteadofworkingwiththefamiliarandreliablestep-by-stepapproachofatraditionaldebugger,developersarecurrentlypresentedwithasetofapparentlydisconnectedstatementsandnoadditionalsupport.Observation2-Providingoverviewsthatclusterresultsandexplanationsthatincludedatavalues,testcaseinforma-tion,andinformationaboutslicescouldmakefaultseasiertoidentifyandtoolsultimatelymoree ective.6.2ResearchImplications6.2.1PercentagesWillNotCutItAstandardevaluationmetricforautomateddebuggingtechniquesistonormalizetherankoffaultystatementswithrespecttothesizeoftheprogram.Forexample,assigningthefaultystatementinNanoXML(4,408LOC)witharankof83,whenexpressedasapercentage,suggeststhatonly1.8%ofstatementswouldneedtobeinspected.Althoughthisresult,at rstglance,mayappearquitepositive,inprac-ticeweobservedthatdeveloperswerenotabletotranslatethisintoasuccessfuldebuggingactivity.Basedonourdata,werecommendthattechniquesfocusonimprovingabsoluterankratherthanpercentagerank,fortworeasons.First,thecollecteddatasuggeststhatpro-grammerswillstopinspectingstatements,andtransitiontotraditionaldebugging,iftheydonotgetpromisingresultswithinthe rstfewstatementstheyinspect.Forexample,evenwhenwechangedtherankofthefaultystatementinNanoXMLfrom83to16,therewasnoobservedbene t.Thisisconsistentwithotherresearchinsearchtasks,whereitisclearlyshownthatmostusersdonotinspectresultsbeyondthe rstpageandreformulatetheirsearchqueryin-stead[7].Second,theuseofpercentagesunderscoreshowdiculttheproblembecomeswhenmovingtolargerpro-grams.Percentageswouldsuggestthatwewouldnothavetochangeourtechniques,nomatterwhetherwearedealingwitha400LOCora4millionLOCprogram.Fromdirectexperiencewithscalingprogramanalysesfromtoyprogramstoindustrial-sizedprograms,weknowthatthisistypicallyunlikelytobetrue.Bettermeasurescanmakesureweareusingtheappro-priatemetricforevaluatingwhat,andtowhatextent,willhelpdevelopersinpractice.Otherwise,whatmayappearasasuccessfulnewdebuggingtechniqueinthelaboratory,couldinrealitybenomoree ectivethantraditionaldebug-gingapproaches.Implication1-Techniquesshouldfocusonimprovingab-soluterankratherthanpercentagerank.6.2.2FocusMoreOnSearchIfcurrentresearchisunabletoachievegoodvaluesforabsolutestatementranks,analternativedirectionmaybetoenrichthedebuggingtechniquesbyleveragingsomeofthesuccessfulstrategiesdeveloperswereobservedtouse.Inparticular,itmaybepromisingtofocusresearche ortsonhowsearchofstatementscanbeimproved.Weobservedthatacommoncauseoffrustrationduringdebuggingistheinabilitytodistinguishirrelevantstate-mentsfromrelevantones.Accordingtoreportsfromtheparticipants,somedeveloperssuccessfullyovercamethisprob-lemby lteringtheresultsbasedonkeywordsinthestate-ments.Wefoundthistobekey,astheremaybesomefunda-mentalinformationinthedeveloper'smindthat,whencom-binedwiththeautomateddebuggingalgorithm,mayyieldexcellentresults.Forexample,intheNanoXMLtask,developersnotedthatusingtermssuchas\index"or\colon"to lterthroughtheresultscouldhelpthem ndaresultthatmatchedtheirsus-picion.Infact,hadthedeveloperssearchedforanyoftheterms\index",\colon",\pre x",or\substring",theycouldhave lteredthestatementssothatthefaultyonewaswithinthe rst verankedresults.Unfortunately,performingthissearchmanuallyamongmanyresultscanbedicultinprac-tice,whereasthecombinationofrankingandsearchcouldbeapromisingdirection.Besidescombiningsearchandranking,futureresearchcouldalsoinvestigatewaystoautomaticallysuggestorhigh-lighttermsthatmayberelatedtoafailure.Thiswouldhelpincaseswhereadeveloperdoesnotknowtherighttermstosearchforandcouldbedone,forinstance,basedonthetypeoftheexceptionraisedorothercontextualclues.Itmayevenbethat,givengoodsearchtools,developerscoulddiscoverthattherankofafaultystatementdoesnotmatterasmuchasthesearchrank.Implication2-Debuggingtoolsmaybemoresuccessfuliftheyfocusedonsearchingthroughorautomaticallyhigh-lightingcertainsuspiciousstatements.6.2.3GrowtheEcosystemThewayperformance(withrespecttotime)iscomputedinmanystudiesmakesassumptionsthatdonotholdinprac-tice.Liketestsuiteprioritization,withautomatedfaultlo-calizationthetotaltimesavedbycon guringandusingthetoolshouldbelessthanthetimespentusingtraditionalde-buggingalone.Inpractice,atleastinsomescenarios,thetimetocollectcoverageinformation,manuallylabelthetestcasesasfailingorpassing,andrunthecalculationsmayex-ceedtheactualtimesavedusingthetool.Ingeneral,foratooltobee ective,itshouldseamlesslyintegratethedi erentpartsofthedebuggingtechniquecon-sideredandprovideend-to-endsupportforit.Althoughsomeoftheseissuescanbeaddressedwithcarefulengineer-ing,itmaybeusefultofocusresearche ortsonwaystostreamlineandintegrateactivitiessuchascoveragecollec-tion,test-caseclassi cationandrerunning,codeinspection,andsoon.Implication3-Researchshouldfocusonprovidinganecosystemthatsupportstheentiretoolchainforfaultlo-calization,includingmanagingandorchestratingtestcases.6.3ThreatstoValidityWechooseatimelimitfortasksthatmadeitpossibletoconductourexperimentwithinatwo-hourtimeframe,soastoavoidexhaustingparticipants.However,thistimelimitmighthaveexcludedlessexperiencedparticipantswhomayneedmoretimetocompletethetasks.Ourstudyhasfo-cusedonmoreexperienceddevelopers,manyofwhichcouldcompletethetaskswithinthetimelimit,andmaynotgen-eralizetonoviceusers.

By: debby-jeon
Views: 73
Type: Public

Download Section


Download Pdf - The PPT/PDF document "Are Automated Debugging Techniques Actua..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

View more...

If you wait a while, download link will show on top.Please download the presentation after loading the download link.

Are Automated Debugging Techniques Actually Helping Programmers Chris Parnin and - Description


parninorsogatechedu ABSTRACT Debugging is notoriously di64259cult and extremely time con suming Researchers have therefore invested a considerable amount of e64256ort in developing automated techniques and tools for supporting various debugging tasks ID: 3303 Download Pdf

Related Documents