/
7:2Binkleyetal.inisolationandthereforedifculttotakeanyactiontoamelior 7:2Binkleyetal.inisolationandthereforedifculttotakeanyactiontoamelior

7:2Binkleyetal.inisolationandthereforedifculttotakeanyactiontoamelior - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
369 views
Uploaded On 2017-02-01

7:2Binkleyetal.inisolationandthereforedifculttotakeanyactiontoamelior - PPT Presentation

74Binkleyetal fincxdoubled aincaff t42xx1dd2 tinctreturnxreturnd tdoubletgg g Fig2AnexampleSDGshowingwithslicetakenwithrespecttothevertexlabeleddd2Theverticesofthesliceareshow ID: 516211

7:4Binkleyetal. finc(x)double(d) a=inc(a)ff t=42x=x+1d=d*2 t=inc(t)returnxreturnd t=double(t)gg g Fig.2.AnexampleSDGshowingwithslicetakenwithrespecttothevertexlabeledd=d*2.Theverticesofthesliceareshow

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "7:2Binkleyetal.inisolationandthereforedi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

7:2Binkleyetal.inisolationandthereforedifculttotakeanyactiontoamelioratetheirpotentiallyharmfuleffectswithoutmajorrestructuring.Thismotivatedthestudyoflinchpinedgesandvertices[BinkleyandHarman2009].Alinchpinisasingleedgeorvertexinaprogram'sdependencegraphthroughwhichsomuchdependenceowsthatthelinchpinholdstogetheralargecluster.Oneobviousandnaturalwaytoidentifyalinchpinistoremoveit,re-constructthedependencegraph,andthencomparethe`before'and`after'graphstoseeifthelargedependenceclusterhaseitherdisappearedorreducedinsize.Thisna¨veapproachwasimplementedasaproofofconcepttodemonstratethatsuchlinchpinsdoindeedexist[BinkleyandHarman2009].Thatis,therearesingleverticesandedgesinrealworldsystems,theremovalofwhichcauseslargedependenceclusterstoessentiallydisappear.Thisna¨vealgorithmisusefulasademonstrationthatlinchpinsexist,butitmustconsiderallverticesaspotentiallinchpins.Unfortunately,thislimitsitsapplicabilityasausefulresearchtool.Thatis,thecomputationalresourcesrequiredforevenmid-sizedprogramsaresimplytoogreatfortheapproachtobepractical.Inthispaperweimprovetheapplicabilityoftheanalysisfromthousandsoflinesofcodetotensofthousandsoflinesofcodebydevelopingagraph-patternbasedtheorythatprovidesafoundationformoreefcientlinchpindetection.Finally,weintroduceanewlinchpindetectionalgorithmbasedonthistheoryandreportonitsperformancewithanempiricalstudyof38programscontainingatotalof494Klinesofcode.Thetheoryincludesaratiowhichwetermthe`riskratio'.Iftheriskratioissufcientlysmallthenweknowthattheimpact(ascapturedbytheratio)ofasetofverticesonlargedependenceclusterswillbenegligible.Onthisbasisweareabletodeneapredicatethatguardswhetherornotweareabletopruneverticesfromlinchpinconsideration.Thetheoryestablishestherequiredpropertiesoftheriskratio,butonlyanempiricalstudycananswerwhetherornottheguardingpredicatethatusesthisratioissatisedsufcientlyoftentobeusefulforperformanceimprovement.Wethereforecomplementthetheoreticalstudywithanempiricalstudythatinvestigatesthisquestion.Wendthatthepredicateissatisedinallbutfourofoveramillioncases.Furthermore,thesefouralloccurwiththestrictestcongurationinverysmallprograms.Thisprovidesempiricalevidencetosupporttheclaimthatthetheoryishighlyapplicableinpractice.Ourempiricalstudyalsoreportstheperformanceimprovementobtainedbythenewalgorithm.Finally,weintroduceandempiricallystudyatuningparameter(thesearchdepthinwhatwecall`fringelookahead').Ourempiricalstudyinvestigatestheadditionalperformanceincreaseobtainedusingvariousvaluesofthetuningparameter.Usingthenewalgorithmwewereabletostudyseveralmid-sizedprogramsrangingupto66KLoC.Thisprovidesarelativelyrobustsetofresultsonnon-trivialsystemsuponwhichwedrawevidencetosupportourempiricalndingsregardingtheimprovedexecutionefciencyofthenewalgorithm.Theresultssupportourclaimthatthetheoreticalimprovementofouralgorithmisborneoutinpractice.Forinstance,toanalyzethemid-sizedprogramgo,whichhas29,246linesofcode,usingthena¨veapproachtakes101days.Usingthetunedversionofthenewalgorithmthistimeisreducedtojust8days.Theprimarycontributionsofthepaperareasfollows:(1)Threetheorems,provedinSection3,identifysituationsinwhichitispossibletoeffectivelyexcludeverticesfromconsiderationaslinchpins.Thesetheoreticalndingshighlightopportu-nitiesforpruningthesearchforlinchpins.(2)Basedonthistheory,Section4introducesamoreefcientlinchpinsearchalgorithmthatex-ploitsthepruningopportunitiestoreducesearchtime.Section4alsoprovesthealgorithm'scorrectnesswithrespecttothetheoryintroducedinSection3.(3)Toempiricallyinvestigatetheimprovementachievedinpracticeusingournewalgorithm,Sec-tion5presentstheresultsofanempiricalstudyusingacollectionof38programs.Theresultsfromthisstudyrevealthatthebasicalgorithmcanbeused,withnotuningatall,toachieveatleastanorderofmagnitudespeedupinexecutiontime.Thismeansthatofinelinchpinidenti-cationbecomesfeasiblewhereitwaspreviouslyinfeasible.WealsopresentresultsthatanalyseACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. 7:4Binkleyetal. finc(x)double(d) a=inc(a)ff t=42x=x+1d=d*2 t=inc(t)returnxreturnd t=double(t)gg g Fig.2.AnexampleSDGshowingwithslicetakenwithrespecttothevertexlabeledd=d*2.Theverticesofthesliceareshownbold.summarizepathsthroughthecalledprocedure,allowPass1toincludetheinitializationoft(t=42)withoutdescendingintoprocedureinc.Thesecondpass,whichexcludesparameter-inandcalledges,startsfromalltheverticesencounteredintherstpass.Inparticular,whenstartingfromthevertexlabeledx=x outtheslicedescendsintoprocedureincandthusincludesthebodyoftheprocedure.Combined,thetwopassesrespectcallingcontextandthuscorrectlyomittherstcallonprocedureinc.Theverticesofthesliceareshowninbold.ItispossibletocomputetheslicewithrespecttoanySDGvertex.However,intheexperimentsonlytheverticesrepresentingsourcecodeareconsideredasslicestartingpoints.Furthermore,slicesizeisdenedasthenumberofverticesrepresentingsourcecodeencounteredwhileslicing.Re-strictingattentiontotheverticesrepresentingsourcecodeexcludesseveralkindsof`internal'ver-ticesintroducedbyCodeSurfer[GrammatechInc.2002](thetoolusedtobuildtheSDGs).Forexample,anSDGincludespseudo-parameterverticesrepresentingglobalvariablespotentiallyac-cessedbyacalledprocedure.Whilealternatedenitionsarepossible,dependenceclusterscanbedenedasmaximalsetsofSDGverticesthatallhavethesamebackwardslice.Thatis,twoverticesthathavethesameback-wardslicearedeemedtoresideinthesamecluster.Inpractice,itturnsoutthatsamebackwardslicecanbeverycloselyapproximatedbysamebackwardslicesize[BinkleyandHarman2005].Thisisaconservativeapproximationbecausetwobackwardslicesmaydiffer,yet,coincidentallyhavethesamesize.However,twoidenticalbackwardslicesmusthavethesamesize.The`samesize'ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. 7:6Binkleyetal. MSG(a)–originalMSG(b)–droponlyMSG(c)–brokenclusterFig.3.TheareaundertheMSGdropsundertwoconditions:theslicesoftheclustergetsmaller(centerMSG),orwhentheclusterbreaks(rightmostMSG).Thus,whileareductioninareaisnecessary,itisnotasufcientconditionforclusterbreaking.paperusethepercentageofthebackwardslicestakenonthex-axisandthepercentageoftheentireprogramonthey-axis.IngeneralthedenitionslaidoutinthenextsectionwillworkwithanMSGconstructedfromanysetofvertices.Asmentionedabove,fortheempiricalinvestigationpresentedinSection5thesetofsource-coderepresentingverticesisusedasboththeslicestartingpointsandwhendeterminingthesizeofaslice.Underthisarrangementaclusterappearsasarectanglethatistallerthanitiswide.ThesearchconsiderschangesintheareaundertheMSG,denotedAMSG.ThisareaisthesumofalltheslicesizesthatmakeuptheMSG.Formally,ifSCisthesetofSDGverticesrepresentingsourcecodethenAMSG=Xv2SCjb(v)jAsillustratedinFigure3,areductioninareaisanecessarybutnotasufcientconditionforiden-tifyingalinchpinvertex.Thisisbecausetherearetwopossibleoutcomes:adropandabreak.Thesetwoareillustratedbythecenterandright-mostMSGsshowninFigure3.Bothshowareductioninarea;however,thecenterMSGreectsonlyareductioninthesizeofthebackwardslicesthatmakeupupacluster.Onlytheright-mostMSGshowsatruebreakingofthecluster.Thesetwoareclear-cutextremeexamplesmeanttoillustratetheconceptsofadropandabreak.Inrealitytherearereductionsthatincorporatebotheffects.Intheend,thedecisionifareductionrepresentsadroporabreakissubjective.ThedetectionalgorithmpresentedinSection4reportsallcasesinwhichthereductionisgreaterthanathreshold.Thesemustthenbeinspectedtodetermineiftheareareductionrepresentsatruebreakingofacluster.FromthethreeexampleMSGsshowninFigure3,itisclearthatareductioninareamustaccompanythebreakingofacluster,butdoesnotimplythebreakingofacluster.Thus,totestifaVertexlisalinchpin,theMSGfortheprogramisconstructedwhileignoringl'sincomingdependenceedges.Ifasignicantreductioninareaoccurs,theresultingMSGcanthenbeinspectedtoseeiftheclusterisbroken.ThesearchforlinchpinsisthusconductedbycomputingtheMSGwhileignoringthedependenceassociatedwithaspecicvertex.Previouswork[BinkleyandHarman2009]hasshownthatignoringdependenceassociatedwithverticescouldidentifylinchpinverticesinrealprograms.However,thisimplementationisratherna¨ve.ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. 7:10Binkleyetal.b(v)=fu2Gju!visavalidpathgb1(v)=fu2Gju!visab1f2-validpathgb2(v)=fu2Gju!visab2f1-validpathgf(v)=fu2Gjv!uisavalidpathgf1(v)=fu2Gjv!uisab2f1-validpathgf2(v)=fu2Gjv!uisab1f2-validpathg2Example.InFigure2letthevbethevertexlabeledd=d*2,ubethevertexlabeledx=x+1,andwbethevertexlabeledt=x out.IntheSDGthereisab1f2-validpathfromwtovandab2f1-validpathfromutow.Thisplacesw2b1(v),u2b2(w),andu2b(v).Symmetrically,w2f1(u),v2f2(w),andv2f(u).Asnotedbefore,thenotationisoverloadedsuchthateachoftheaboveslicingoperatorscanbeappliedtoasetofverticesV.TheresultistheunionoftheslicestakenwithrespecttoeachvertexofV.Forexample,b(V)=[v2Vb(v);thusf(v)=f2(f1(v))andb(v)=b2(b1(v)).Finally,thesearchforlinchpinverticesmakesuseofpathcomposition,denotedp1p2,wherepathp1'snalvertexisthesameaspathp2'srstvertex.Somepathcompositionsyieldinvalidpaths.Thefollowingtabledescribesthelegalandillegalcompositions. PathCombinations 1b1f2-validpathb1f2-validpath!b1f2-validpath 2b1f2-validpathb2f1-validpath!invalidpath 3b1f2-validpathvalidpath!invalidpath 4b2f1-validpathb1f2-validpath!validpath 5b2f1-validpathb2f1-validpath!b2f1-validpath 6b2f1-validpathvalidpath!validpath 7validpathb1f2-validpath!validpath 8validpathb2f1-validpath!invalidpath 9validpathvalidpath!invalidpath Example.Avalidpathhastwosections:therst(matchingb2f1-validpath)includesonlyun-matched)i's,whilethesecond(matchingb1f2-validpath)includesonlyunmatched(i's.Therstcompositionrulenotesthatcomposingtwopathswithonlyunmatched(i'sleavesapathwithonlyunmatched(i's.Thesecondandthirdrulesobservethattheresultofappendingapaththatincludesunmatched(i'stoapaththatincludesunmatched)i'sisnotavalidpath.Forexample,inFigure2composingtheb1f2-validpaththatconnectstheverticeslabeledx in=aandx=x+1withtheb2f1-validpaththatconnectstheverticeslabeledx=x+1andt=x outresultsinapaththatentersincthroughonecallsitebutexiststhroughtheother;thispathisnotavalidpath.However,asseeninthetable,itisalwayslegaltoprexapathwithapaththatcontainsunmatched)i's(rules4and6)anditisalwayslegaltosufxapathwithapaththatcontainsunmatched(i's(rules4and7).Buildingonthesedenitions,Theorem1identiesaconditioninwhichthestrongsmall-impactpropertyholds.Theorem1(SMALLSLICE).LetlbeavertexfromSDGG.Ifjb(l)jAMSG=jGjorjf(l)jAMSG=jGjthenlsatisesthestrongsmall-impactproperty.PROOF.Forbothcases,theworstcasesituation(wheretherearenootherconnectionsbetweentheverticesofb(l)andf(l)exceptthroughl)isillustratedintheleftofFigure6.Typicallythereareotherconnectionsbetweentheseverticesandthusthetheoremisaconservativeover-approximationoftheactualreduction.ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. 7:14Binkleyetal.IntheprecedingtheoremtheareareductioncausedbyslicesfromSet3isnottightlybound.Toestablishaboundthefollowingcorollarymakesuseofaveragereductionbybalancingverticeswhosebackwardslicescannotbetightlyboundwithbackwardslicesthatdonotchange(i.e.,thoseofSet1).Thisaveragereductionisformalizedintherstofthreecorollaries.Corollary2.1(STRONGIMPACTCOROLLARYTOTHEOREM2).LetSi=jSetij=jVjdenotetheproportionofslicesinSeti.IfS3(1�)=S1thenthereductionintheareaunderAMSGn;isboundbyandlsatisesthestrongsmall-impactproperty.Thisisafoundationalresultthatunderpinsthealgorithm'sperformanceimprovement.Iftheguardingpredicate(S3(1�)=S1)holds,thenthestrongsmallimpactpropertyholdsand,therefore,allverticesignoredinthesearchforlinchpinswillhavelittleimpactondependenceclusters.ThetermS3(1�)=,theguardingpredicateforCorollary2.1,isreferredtoasthe`riskratio,'becausewhentheratioissufcientlysmall(therebysatisfyingtheguardingpredicate)thereisnoriskinignoringtheassociatedverticesinthelinchpinsearch.Theproofestablisheswhenthecorollaryholds,butempiricalresearchisneededtodeterminehowoftentheguardissatised,indicatingthattheriskratioissufcientlylow.Ifthisdoesnothappensufcientlyoftenthentheperformanceimprovementswillbepurelytheoretical.ThisempiricalquestionthereforeformstherstresearchquestionaddressedinSection5.PROOF.Thestrongsmall-impactpropertyrequiresthetotalareareductiontobelessthanpercent.Toshowthatthereductionisatmost,considerthelargestreductionpossibleforeachpartition.EachreductionisgivenasapercentageofV.Thisyieldstheinequality0S1+S2+1S3(becauseSet1slicesareunchanged,fromTheorem2slicesinSet2areboundby,andslicesfromSet3can,intheworstcase,include(nomorethan)theentiregraph).ThecorollaryfollowsfromsimplifyingandrearrangingthisinequalityasfollowsS2+S3S3�S2S3(1�S2)=(S1+S3)as1=S1+S2+S3S3S1+S3S3�S3S1S3(1�)=S1 Thus,foreveryvertexinSet3thereneedstobe(1�)=verticesinSet1.Forexample,if=5mustbe19timeslargerthanSet3.Inthiscasehaving19backwardslicesshowingzeroreductionandonebackwardsliceshowing(potentially)100reduction.Empirically,ifSet3iskeptbelowasizeofabout20,thenthecorollaryholdsforallbutthesmallestofprograms.ThestatementofTheorem2requirestheexistenceofasingleVertexu.ItisusefultoextendthisdenitionfromasinglevertexutoasetofverticesU.Figure8showsanSDGfragmentwheredpp(l;v1;u1)anddpp(l;v2;u2)butnotdpp(l;v1;u2)anddpp(l;v2;u1).Thusslicesb(x)thatincludeabutnotbrequireusingu1,whilethosethatincludebbutnotarequireu2(thosethatincludebothaandbcanuseeither).However,asthefollowingcorollaryshows,itispossibletousethesetU=fu1;u2ginplaceofasinglevertexu.Thefollowingcorollarygeneralizesthisrequirementfromasinglevertexutoacollectionofvertices,U.TheproofmakesuseofthefollowingsubsetsofthebackwardslicesofG.AgainSet2isexpanded,thistimetotakeaparticularu2Uintoaccount.Notethattheoriginalsubsetswerepartitions.ThiswasobservedtosimplifythepresentationofTheorem2anditsproof.Itisnotstrictlynecessary.Whenconsideringvertexx,thesetsmakeuseofthefollowingsubsetofU:U0(x)=fu2Uju2b(x)andthereisavalidpathfromutoxthatdoesnotincludelg.Set1-verticesxwherel62b(x)Set2-verticesxwherel2b(x)andU0(x)6=;ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. EfcientIdenticationofLinchpinVerticesinDependenceClusters7:17 booleanfringe search(Vertexl,Vertexv, depthk,percent) f letsuccess=trueandfail=false ifvmarkedascore returnsuccess//alreadyprocessedv markviscore foreachedgev!u ifarea reduction bound(l,u)�AMSG=V ifk==0 returnfail else iffringe search(l,u,k�1,)==fail returnfail returnsuccess g booleanexclude(Vertexl,depthk,percent) f iflisadeclarationvertex orjb(l)jAMSG=V orjf(l)jAMSG=V returntrue clear all marks() Marklpoison returnfringe search(l;l;k;)==success g Fig.9.Thelinchpinexclusionalgorithm.stopatl.Verticesinthesetcorearereachablefromlalongpathsthatcontainnomorethankedges(kisthefunction'ssecondparameter).Finally,theverticesofthesetfringehaveanincomingedgefromacorevertexbutarenotcorevertices.TheintentisthatthefringeverticesplaytheroleofUfromCorollary2.2ofTheorem2.Functionarea reduction boundshowninFigure10computesanupperboundontheareareduc-tionforVertexlwhereVertexuisoneoftheverticesfromthefringe(thesetUinCorollary2.2ofTheorem2).Inthecomputation,thesizeofSetimeasuresthewidthoftheMSGimpacted(i.e.,thenumberofbackwardslicesimpacted).Thisismultipliedbyaboundontheheight(slicesize)oftheimpact(thesecondmultiplicandofeachproduct).Toensurethestrongversionofthesmall-impactproperty,theimpactofSet3mustbeincluded(thelastlistofthefunction).Theimpactofthissetisignoredwhenusingtheweaksmall-impactpropertyandthusthecontributionofSet3isignored.Twoexamplesareusedtoillustratethealgorithm.First,considerl1fromFigure4withdepthk=0.Thismakesl1theonlycorevertexandutheonlyfringevertex.Inthiscase,Set1=fc;vg,Set2.11=fu;a;bg,Sets2.12,2.21,and2.22areallempty,andSet3=fl1g.Furthermoreb(l)=fl;v;cgasdoespb(u).Thusjb(l)�pb(u)jiszero.Functionarea reduction boundreturns20+30+00+00+00whenconsideringAMSGnSet3.Andadds13whenconsideringAMSG.ThusthechangeinAMSGnSet3is0verticesandthechangeinAMSGn;is3vertices(15%ofAMSG).The15%reductionforAMSGn;iscomparativelylargebecausetheSDGisverysmall.Asasecondexample,considerl1fromFigure5withdepthk=0.Thismakesl1theonlycorevertexandl2theonlyfringevertex.Inthiscase,therearenodualpathsconnectingthefringetovandccausingapotentiallylargereduction.Howeverusingdepthk=1,placesl1andl2intheACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. EfcientIdenticationofLinchpinVerticesinDependenceClusters7:19tiontotheweaksmall-impactproperty.Thusallexcludedverticesareguaranteedtohaveasmallimpactandareconsequentlynotlinchpins.Theorem4(ALGORITHMCORRECTNESS).IffunctionexcludefromFigure9returnstrueforVertexl,thenlsatisestheweaksmall-impactproperty.PROOF.Therearetwosteps.Therstshowsthatfv2Gjdpp(l;v;u)gpb(u)andthatfv2Gjdpp2(l;v;u)gpb2(u).ThentheremainderoftheproofshowsthatthefringesatisestherequirementsofthesetUfromtheMulti-PathImpactCorollary(Corollary2.2ofTheorem2).Tobeginwith,observethatdpp(l;v;u)requiresavalidpathfromvtouthatexcludesl.ByDenition6thisvalidpathimpliesthatv2b(u)andfurthermore,becausethepathexcludesl,v2pb(u).Thus,fv2Gjdpp(l;v;u)gpb(u).Theargumentthatfv2Gjdpp2(l;v;u)gpb2(u)isthesameexceptthatb2f1-validpathsareusedinplaceof(full)validpaths.Thesetwosubsetcontainmentsimplythatjb(l)�pb(u)jjb(l)�fv2Gjdpp(l;v;U)gjjb(l)�pb2(u)jjb(l)�fv2Gjdpp2(l;v;U)gjjb2(l)�pb(u)jjb2(l)�fv2Gjdpp(l;v;U)gjjb2(l)�pb2(u)jjb2(l)�fv2Gjdpp2(l;v;U)gjThesecondstepoftheproofestablishesthatthesixsets(Set1,Set2.11,Set2.12,Set2.21,Set2.22,andSet3)usedinTheorem2areequivalenttothosecomputedatthetopoffunctionarea reduction boundofFigure10.Foreachsettheargumentcentersontheobservationthatwhenvisinb(x)thenxisinf(v).ForSet1,observethatbackwardsliceswithl(i.e.,thosethatincludel)arethoseinf(l);thusbackwardsliceswithoutlarethosenotinf(l),whichisthesetofverticesV�f(l).ForSet2.11,rstobservethatf2isthedualofb1;thusifv2b1(u)thenu2f2(v).ThismeansthatallverticeswhoseslicesincludelanduduringPass1areintheforwardPass2sliceofbothlanduandthusinf2(l)\pf2(u).AswithSet2.11,forSet2.12allverticeswhosebackwardslicesincludelduringPass1areintheforwardPass2sliceofl.Set2.12alsoincludesbackwardsliceswhereuisincludedduringPass2butnotPass1.Thesearebackwardslicestakenwithrespecttotheverticesinpf(u)�pf2(u);thusSet2.12includestheverticesinf2(l)\(pf(u)�pf2(u)).TheargumentsforSet2.21and2.22aresimilar.Finally,foraSet3vertex,x,thebackwardsliceb(x)includeslbutnotu.Theverticesthatincludelintheirslicearethoseoff(l).ThosethatalsoincludeuareinSet2;thusSet3isefcientlycomputedasf(l)�[iSet2:i.Thenalstepintheproofistoobservethatbyconstructionthefunctionfringe searchidentiesasetoffringeverticesthatfullltheroleofthesetUfromthemulti-pathimpactcorollary(Corol-lary2.2ofTheorem2).Thustheaverageimpactcorollary(Corollary2.3)ofTheorem2impliesthattheaveragereductionforu2Ufromignoringtheincomingedgesoflisboundedby. Corollary4.1(STRONGALGORITHM).Ifjcorej(1�)=jSet1jandexclude(l)thenlsatisesthestrongsmall-impactproperty.PROOF.Theorem4provesthattheweaksmall-impactpropertyholdsforAMSGnSet3.ThusonlySet3needbeconsidered.Byconstructionallbackwardslicesthatencounteracorevertex,exceptthosetakenwithrespecttocorevertices,alsoencounterafringevertex.ThisimpliesthatSet3includesatmostthecorevertices.Consequently,undertheassumptionthatjcorejisboundbyjcorej(1�)=jSet1j,thestrongsmall-impactpropertyholds. ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. EfcientIdenticationofLinchpinVerticesinDependenceClusters7:21 Program LoC SLoC Vertices Edges SVertices fass 1,140 978 4,980 12,230 922 interpreter 1,560 1,192 3,921 9,463 947 lottery 1,365 1,249 5,456 13,678 1,004 time-1.7 6,965 4,185 4,943 12,315 1,044 compress 1,937 1,431 5,561 13,311 1,085 which 5,407 3,618 5,247 12,015 1,163 pc2c 1,238 938 7,971 11,185 1,749 wdiff.0.5 6,256 4,112 8,291 17,095 2,421 termutils 7,006 4,908 10,382 23,866 3,113 barcode 5,926 3,975 13,424 35,919 3,909 copia 1,170 1,112 43,975 128,116 4,686 bc 16,763 11,173 20,917 65,084 5,133 indent 6,724 4,834 23,558 107,446 6,748 acct-6.3 10,182 6,764 21,365 41,795 7,250 gcc.cpp 6,399 5,731 26,886 96,316 7,460 gnubg-0.0 10,316 6,988 36,023 104,711 9,556 byacc 6,626 5,501 41,075 80,410 10,151 ex2-4-7 15,813 10,654 49,580 105,954 11,104 space 9,564 6,200 26,841 74,690 11,277 prepro 14,814 8,334 27,415 75,901 11,745 oracolo2 14,864 8,333 27,494 76,085 11,812 tile-forth-2.1 4,510 2,986 90,135 365,467 12,076 EPWIC-1 9,597 5,719 26,734 56,068 12,492 userv-0.95.0 8,009 6,132 71,856 192,649 12,517 ex2-5-4 21,543 15,283 55,161 234,024 14,114 ndutils 18,558 11,843 38,033 174,162 14,445 gnuchess 17,775 14,584 56,265 165,933 15,069 cadp 12,930 10,620 45,495 122,792 15,672 ed 13,579 9,046 69,791 108,470 16,533 diffutils 19,811 12,705 52,132 104,252 17,092 ctags 18,663 14,298 188,856 405,383 20,578 wpst 20,499 13,438 140,084 382,603 20,889 ijpeg 30,505 18,585 289,758 822,198 24,029 ftpd 19,470 15,361 72,906 138,630 25,018 espresso 22,050 21,780 157,828 420,576 29,362 go 29,246 25,665 144,299 321,015 35,863 ntpd 47,936 30,773 285,464 1,160,625 40,199 csurf-pkgs 66,109 38,507 564,677 1,821,811 43,044 sum 494,025 342,949 2,694,603 7,953,166 465,914 Fig.11.Characteristicsofthesubjectprogramsstudied.LoCandSLoC(non-blank-non-commentLinesofCode)aresourcecodelinecountsasreportedbythelinuxutilitieswcandsloc.VerticesandEdgesarecountsfromtheresultingSDGwhileSVerticesisacountofthesource-code-representingvertices.(Inthisandtheremaininggures,programsareshownorderedbysizebasedonSVertices.)ThesubsetoftheverticesintheSDGthatrepresentsourcecodeareconsideredaspotentiallinchpinsandcountedwhendeterminingslicesize.AnSDGincludes“pseudo”verticesthatdonotdirectlyrepresentsourcecode.Forexample,whenacalltoprocedurePmayreferenceaglobalvariable,theSDGincludesverticesrepresentingthepassingoftheglobaltoP.Finally,verticesareACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. EfcientIdenticationofLinchpinVerticesinDependenceClusters7:23 partitionn1%10%20% included19.6%12.3%10.0%excludeddualpaths31.2%34.1%35.1%excludedsmallslice9.4%13.7%15.0%declarations39.9%39.9%39.9% totalexcluded80.4%87.7%90.0%Fig.12.Weightedaveragepercentageofverticesinfourcategoriesforthethreevaluesof.Theguresdonotsumto100%duetoroundingerrors.largedependencecluster[Harmanetal.2009].Becauseofthislargecluster,ed'sslicesareeitherlarge(forthoseverticesinthecluster)orsmall;thus,thereisnobenettoincreasingwithinareasonablerange.Perhapsbecausetherearesofewverticeswithsmallslices,thesearchforverticeswithdualpathsndsthemostsuccessined,whichalsoshowsthemostimprovementwithanincreasein.Finally,Figures14and15showtheruntimespeedup.Figure14showsthespeedupsforallpro-gramswiththeaveragesshownonthefarright.TheaveragesareshownaloneinFigure15.Theaveragespeedupforof1%,10%,and20%is8x,14x,and18x,respectively.Thisspeedupdirectlyparallelsthereductioninthenumberofverticesthatmustbeconsidered.Mostprogramsshowasimilarpattern,whereincreasingthevalueofbringsaclearbenet.Theprogrampc2c(aPascaltoCconverter)isanoutlierasitdoesnotgainmuchspeedupforlargervaluesof.Lookingatthesourcecodeforthisprogram,70%ofthecodeisinmainandtheremainingfunctionshavefewparametersorlocalvariables.Theimplicationofthisisthatdeclarationverticesaccountofonly10%oftheexcludedvertices.Thisisdramaticallylessthantheaverageof40%andaccountsfortheoveralldifference.Statisticallyallthreeversionsprovideasignicantimprovementinruntimeoverthena¨vesearch(p0:0001forallthreetests).Becausetheruntimesarenotnormallydistributed,thenon-parametricFriedman'spairedtestisused.Apairedtestisappropriatebecausethesamepopulationofprogramsisusedwitheachalgorithm.Headtoheadtheruntimesforof10%and20%aresignicantlylessthanthatforof1%(p=0:0003andp0:0001,respectively),andnally,theACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. 7:32Binkleyetal.andclusteringofdependenceathigherlevelsofabstraction,suchaswholefunctions,modules,andles[Eisenbarthetal.2003;Praditwongetal.2011;MitchellandMancoridis2006].Thestudyofclusteringisnotrestrictedtodependencenortoprograms.Thereisalsoworkonclusteringoftestcases[Yooetal.2009],butthisworkusesclusteringtondcommonalityandreducetesteffort.Insuchwork,clusteringisachoiceandithaspositivebenets.Inthepresentpaper,clusteringisconsideredtobepotentiallyharmfulanditisnotconstructedthroughchoice,butemergesfromaprogram'sdependencestructure.Aspecializedformofmutuallydependentclusters,coherentdependenceclusterswasrecentlyintroduced[Islametal.2010b].Suchclustersextenddependenceclusterstoincludenotonlyinternaldependence(eachstatementsofaclustermustdependonalltheotherstatementsofthecluster)butalsoexternaldependence.Analysisof16open-sourceprogramsfoundthat15ofthemhadacoherentclusterthatwasover5%oftheprogram.Visualizationofcoherentclusters[Islametal.2010a]hasalsobeenusedtolocatestructuralproblemswithinprograms.Lookingbeyondprograms,dependenceanalysisanddependenceclustersarealsointerestingtoresearchersstudyingotherdependencenetworks(asconstruedinthebroadestsense).Thesearchforlinchpinsisakintoraritymeasurementsandanomalydetectionused,forexample,insocialnetworkingresearch[Madeyetal.2003].Ingraphtheoryterms,asocialnetworkisadirectedgraphcomposedofverticesthatmostoftenrepresentpeopleandedgesthatrepresentrelationshipssuchassharedexperienceorcommoninterests.Anexampleisagraphoftelephonecallsacrossmultiplecustomers.EberleandHolder[EberleandHolder2009]describetworelatedapproaches.Intherst,LinandChalupskyusedraritymeasurementstodiscoverunusuallinkswithinagraph[LinandChlupsky2003].Thisapproachassumesthatagraphisbuiltfromapatternthatrepeatsitselfoverandover.Itlooksforsubgraphsthataredifferent.Thisapproachislocalinnature(similartotree-basedpatternmatchingcodegenerators).Incontrast,adependencegraphvertex(oredge)beingalinchpinisnotalocalpropertyandthuspatternbasedmatchingisineffective.However,somepatternsmightbeusedtolterfromconsiderationelementsthatcannotplaythelinchpinrole.Thisisatopicforfutureinvestigation.Inthesecondapproach,RattiganandJensenseektoidentifyoutliersinanydatasetthatcanberepresentedasagraphusingastatisticalapproachtoanomalouslinkdetection[RattiganandJensen2005].Theyobservethat“Relationallearningtechniquesseemespeciallysuitedtotheanomalydetectionproblem,becausestructureddatalendthemselvestoahostofpossiblemethodsforndinginterestinginstancesinadataset.”Forexample,inanauthorshipgraphwhereverticesrepresentauthorsandlinksrepresentcoauthoredpapers,itisusefultoknowwhenaninterestingcollaborationexists.ThetechniqueusedndsoutliersbasedontheKatzmeasure.Thismeasureisaweightedsumofthenumberofpathsinthegraphthatconnecttwonodes,withshorterpathsbeinggivenhigherweight.Asimilarapproachcouldbeappliedtodependencegraphswithallpathsbeinggivenequalweight.However,pathscapturetransitivedependenceandthusoverstatetheconnectednessinadependencegraph.7.FUTUREWORKFutureworkwillconsiderthefollowing:criteriathathelpseparaterefactorableclustersfromun-avoidableclusters,techniquesforaidingaprogrammerbreakdependenceclustersintosmallermoremanageableclusters,empiricalassessmentofdependencecluster'simpactonprogrammercompre-hension,otherpotentialcausesofdependenceclusters,andmoreefcientdetectiontechniques.Someoftheseareconsideredinmoredetailinthissection.Tobeginwith,whetherlinchpinscanberemovedfromcodewithoutaffectingbehaviourremainstobestudied.Clearlysomehumanguidedinterventionwillberequired.Itmaybeconsideredmoretroublethantheperceivedaccruedbenetinsomecases.However,theknowledgeoftheexistenceandlocationoflinchpinsmaybeusefulinformationinitself.Futureworkwillexploretheextenttowhichatoolcanguide,support,andreducehumaneffortinrefactoringcodetoremovetheneedforlinchpins,therebybreakingupclusters.ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013. EfcientIdenticationofLinchpinVerticesinDependenceClusters7:35MADEY,G.,FREEH,V.,TYNAN,R.,ANDHOFFMAN,C.2003.Ananalysisofopensourcesoftwaredevelopmentus-ingsocialnetworktheoryandagent-basedmodeling.InArrowheadConferenceonHumanComplexSystems.LakeArrowhead,CA,USA.MITCHELL,B.S.ANDMANCORIDIS,S.2006.Ontheautomaticmodularizationofsoftwaresystemsusingthebunchtool.IEEETransactionsonSoftwareEngineering32,3,193–208.PRADITWONG,K.,HARMAN,M.,ANDYAO,X.2011.Softwaremoduleclusteringasamulti-objectivesearchproblem.IEEETransactionsonSoftwareEngineering37,2,264–282.RATTIGAN,M.ANDJENSEN,D.2005.Thecaseforanomalouslinkdiscovery.ACMSIGKDDExpl.News7,2.REPS,T.,HORWITZ,S.,ANDSAGIV,M.1995.Preciseinterproceduraldataowanalysisviagraphreachability.InACMSymposiumonPrinciplesofProgrammingLanguages.SanFrancisco,CA,Jan.23-25.REPS,T.ANDROSAY,G.1995.Preciseinterproceduralchopping.InSIGSOFT'95:ProceedingsoftheThirdACMSIGSOFTSymposiumontheFoundationsofSoftwareEngineering,G.E.Kaiser,Ed.ACMPress,41–52.REPS,T.ANDYANG,W.1988.Thesemanticsofprogramslicing.Tech.Rep.TechnicalReport777,UniversityofWisconsin.SAVERNIK,L.2007.EntwicklungeinesautomatischenVerfahrenszurAu¨osungstatischerzyklischerAbh¨angigkeiteninSoftwaresystemen(inGerman).InSoftwareEngineering2007-Beitr¨agezudenWorkshops,FachtagungdesGI-FachbereichsSoftwaretechnik,27.-30.3.2007inHamburg,W.-G.Bleek,H.Schwentner,andH.Z¨ullighoven,Eds.LNISeries,vol.106.GI,357–360.SHARIR,M.ANDPNUELI,A.1981.Twoapproachestointerproceduraldataowanalysis.Prentice-Hall,EnglewoodCliffs,NJ.SZEGEDI,A.,GERGELY,T.,BESZ´EDES,´A.,GYIM´OTHY,T.,ANDT´OTH,G.2007.VerifyingtheconceptofunionslicesonJavaprograms.In11thEuropeanConferenceonSoftwareMaintenanceandReengineering(CSMR'07).233–242.TONELLA,P.2003.Usingaconceptlatticeofdecompositionslicesforprogramunderstandingandimpactanalysis.IEEETransactionsonSoftwareEngineering29,6,495–509.WEISER,M.1984.Programslicing.IEEETransactionsonSoftwareEngineering10,4,352–357.YONG,S.H.,HORWITZ,S.,ANDREPS,T.May1999.Pointeranalysisforprogramswithstructuresandcasting.InProceed-ingsoftheSIGPLAN99ConferenceonProgrammingLanguageDesignandImplementation(Atlanta,GA),91–103.YOO,S.,HARMAN,M.,TONELLA,P.,ANDSUSI,A.2009.Clusteringtestcasestoachieveeffectiveandscalableprioriti-sationincorporatingexpertknowledge.InACMInternationalConferenceonSoftwareTestingandAnalysis(ISSTA09).Chicago,Illinois,USA,201–212.ACMJournalName,Vol.35,No.2,Article7,Publicationdate:July2013.

Related Contents


Next Show more