linckejonaslundbergwelflowevxuse ABSTRACT This paper shows that existing software metric tools inter pret and implement the de64257nitions of objectoriented soft ware metrics di64256erently This delivers tooldependent met rics results and has even im ID: 25714
Download Pdf The PPT/PDF document "Comparing Software Metrics Tools Rdiger ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
somepracticalissuesandsharpenstheresearchquestions.Section4presentsthesetupofourexperiments.Sections5and6describetheexperimentalresultsandourinterpreta-tionsforthetwomainquestionsrespectively.InSection7,wediscussthreatstothevalidityofourstudy.Finally,inSection8,weconcludeourndingsanddiscussfuturework.2.BACKGROUNDEurocontroldevelops,togetherwithitspartners,ahighleveldesignofanintegratedAirTracManagement(ATM)systemacrossallECACStates.1Itwillsupersedethecur-rentcollectionofindividualnationalsystems[9].Thesystemarchitecture,calledOverallATM/CNSTargetArchitecture(OATA),isaUMLspecication.Asexternalconsultants,wesupportedthestructuralas-sessmentofthearchitectureusingametrics-basedapproachusingoursoftwaremetricstoolVizzAnalyzer.Asecond,in-ternalassessmentteamusedNTools,anothersoftwaremet-ricstool.Thisredundantassessmentprovidedthechancetouncoverandavoiderrorsintheassessmentmethodandtools.Thepilotvalidationfocusedonlyonasubsystemofthecompletearchitecture,whichconsistedof8modulesand70classes.Wejointlydenedthesetofmetricswhichquan-tifythearchitecturequality,thesubsetoftheUMLspeci-cation(basicallyclassandsequencediagrams)aswellasaqualitymodelformaintainabilityandusability.Thedenitionsofthemetricswerstselectedrefertostandardliterature.Duringimplementation,wefoundthatthemetricsdenitionsareambiguous,tooambiguoustobeimplementedinastraight-forwardway,andeventooam-biguousto(always)beinterpretedthesamewaybyotherparticipantsintheassessmentteam.WethereforejointlycreatedaMetricsDenitionDocumentdeningthemetricsandtheirvariantsthatshouldbeused,therelevantsoft-wareentities,attributesandrelations{actuallywedenedaUMLandprojectspecicmeta-model{andtheexactscopeoftheanalysis.Amongotherthings,welearnedsomelessonsrelatedtosoftwaremetrics.Severalissueswiththemetricsdenitionsexist:Unclearandinexactdenitionsofmetricsopenupthepossibilityfordierentinterpretationsandimplemen-tations.Dierentvariantsofthesamemetricarenotdis-tinguishedbyname,whichmakesitdiculttorefertoaparticularvariant.Wellknownmetricsfromliteratureareusedwithslightdeviationsorinterpreteddierentlythansuggestedoriginally,whichpartiallychangesthemeaningofthemetric.Consequently,deviationsofmetricsimplemen-tationsinthemetricstoolsexistand,hence,metricsvaluesarenotcomparable.Morespecically,eventhoughourViz-zAnalyzerandNToolsrefertothesameinformalmetricsdenitions,theresultsarenotcomparable.Despitecreat-ingaMetricsDenitionDocumentforxingvariantsinthemetricsdenition,itstilldidnotsolvetheproblemsinceitdidnotformallymaptheUMLlanguagetothemeta-model,andthemetricsdenitionsstillusednaturallanguageandsemi-formalapproaches.Mostissuescouldbesolvedwiththenextiterationoftheassessment.TheUMLtometa-modelmappingwasin-cluded,andthemetricsdenitionswereimproved.However,thisrequiredquiteaneort(unrelatedtotheclient'sanal- 1EuropeanCivilAviationConference;anintergovernmentalorganizationwithmorethan40Europeanstates.ysisquestionsbutrathertotheanalysismethod).Hence,twoquestionsattractedourattentionevenaftertheproject:Q1Dodierentsoftwaremetrictoolsingeneralcalculatedierentmetricsvaluesforthesamemetricsandthesameinput?Q2Ifyes,doesthismatter?Morespecically,arethesedierencesirrelevantmeasurementinaccuracies,ordotheyleadtodierentconclusions?3.HYPOTHESES&PRACTICALISSUESWewanttoknowifthedierencesweobservedbetweenVizzAnalyzerandNToolsinthecontextoftheEurocon-trolprojectarejustcoincidental,oriftheycanalsobeob-servedinothercontextsandwithothersoftwaremetricstools.However,wewantourapproachtobebothconserva-tivewrt.thescienticmethodandpracticallyrelevant.Thesetoftools,metrics,andtestsystemsisdeterminedbypracticalconsiderations.Adetaileddiscussionofthe-nalselectionisprovidedinSection4.Butbeforehand,wediscusstheselectionprocess.Ideally,wewouldinstallallmetricstoolsavailable,measurearandomselectionofsoft-waresystems,andcomparetheresultsforallknownmetrics.Butrealityimpliesyetanumberofpracticallimitations.First,wecannotmeasureeachmetricwitheachtool,sincetheselectionofimplementedmetricsdiersfromtooltotool.Hence,maximizingthesetofmetricswouldreducethesetofcomparabletoolsandviceversa.Weneedtocompromiseandselectthemetricswhichappearpracticallyinterestingtous.Themetricswefocusourexperimentonincludemainlyobject-orientedmetricsasdescribedinthemetricssuitesof,e.g.,Chidamber&Kemerer,LiandHenry,etal.[5,24].Second,theavailabilityofthemetricstoolsislimited,andwecannotknowofallavailabletools.Wefoundthetoolsafterathoroughsearchontheinternet,usingthestandardsearchenginesandstraight-forwardsearchterms.Wealsofollowedreferencesfromrelatedwork.Legalrestrictions,theprogramminglanguagesthetoolscananalyze,themetricstheyarecapabletocalculate,thesizeofthesystemstheycanbeappliedon,andthedataexportfunctionsposefur-therrestrictionsontheselectionoftools.Asaconsequence,weselectedonlytoolsavailablewithoutlegalrestrictionsandwhichweremeaningfultocompare,i.e.,thosetoolscananalyzethesamesystemswiththesamemetrics.Third,furtherlimitationsapplytothesoftwaresystemsanalyzed.Weobviouslycannotmeasureallavailablesys-tems;therearesimplytoomany.Also,legalrestrictionslimitthenumberofsuitablesystems.Mostmetricstoolsneedthesourcecodeand,therefore,werestrictedourselvestoopensourcesoftwareasavailableonSourceForge.NET2.Additionally,theavailablesoftwaremetricstoolslimitedtheprogramminglanguagesvirtuallytoeitherJavaorC/C++.Finally,wecannotcompareallmetricsvaluesofallclassesofallsystemstoa\goldstandard"decidingonthecorrect-nessofthevalues.Sucha\goldstandard"simplydoesnotexistanditisimpossibletocomputeitsincetheoriginalmetricsdenitionsaretooimprecise.Thus,werestrictour-selvestotestwhetherornottherearetooldependentdier-encesofthemetricsvalues.Consideringthelimitations,wescienticallyassessourresearchquestionQ1byinvalidatingthefollowinghypothesis: 2http://sourceforge.net analysisapplicationthatextractsdependencygraphsandminesthemforusefulinformation.Thisapplicationcomesasacommand-linetool,aSwing-basedapplication,awebapplication,andasetofAnttasks7.EclipseMetricsPlug-in1.3.6byFrankSauerisanopensourcemetricscalculationanddependencyanalyzerpluginfortheEclipseIDE.Itmeasuresvariousmetricsanddetectscyclesinpackageandtypedependencies8.EclipseMetricsPlug-in3.4byLanceWaltonisopensource.Itcalculatesvariousmetricsduringbuildcyclesandwarns,viatheProblemsView,ofmetrics'rangeviolations'9.OOMeterisanexperimentalsoftwaremetricstooldevel-opedbyAlghamdietal.ItacceptsJava/C#sourcecodeandUMLmodelsinXMIandcalculatesvariousmetrics[1].SemmleisanEclipseplug-in.ItprovidesanSQLlikequeryinglanguageforobject-orientedcode,whichallowstosearchforbugs,measurecodemetrics,etc.10.UnderstandforJavaisareverseengineering,codeexplo-rationandmetricstoolforJavasourcecode11.VizzAnalyzerisaqualityanalysistool.Itreadssoftwarecodeandotherdesignspecicationsaswellasdocumenta-tionandperformsanumberofqualityanalyses12.4.2MetricsSelectionThemetricsweselectedarebasicallythe\leastcommondenominator",thelargestcommonsubsetofthemetricsas-sessablebyallselectedsoftwaremetricstools.Wecreatedalistofallmetricswhichcanbecalculatedbyanyofthetoolsconsidered.Itturnedoutthattheto-talnumberofdierentmetrics(dierentbyname)isalmost200.Aftercarefullyreadingthemetricsdescriptions,wefoundthatthesedierentnamesseemtodescribe47dier-entmetrics.Matchingthemwasnotalwaysstraightforwardandinsomecasesitisnothingbutaqualiedguess.Those47metricsworkondierentprogramentities,e.g.,method,class,package,program,etc.Weconsideredonlymetricsascomparablewhenwewerecertainthatthesameconceptsweremeant.Further,weselected\class"metricsonly,sincethisisthenaturalunitofobject-orientedsoftwaresystemsandmostmetricshavebeendenedandcalculatedonclasslevel.Thisleft17object-orientedmetricswhich(i)wecouldrathersecurelyassigntothesameconcept,(ii)areknownanddenedinliterature,and(iii)workonclasslevel.Ofthesemetrics,weselected9whichmostofthe10remainingsoftwaremet-rictoolscancalculate.ThetoolsandmetricsareshowninTable1.Thecrosses\x"marksthatametricscanbecal-culatedbythecorrespondingmetrictool.Itfollowsabriefdescriptionofthemetricsnallyselected:CBO(CouplingBetweenObjectclasses)isthenumberofclassestowhichaclassiscoupled[5].DIT(DepthofInheritanceTree)isthemaximuminheri-tancepathfromtheclasstotherootclass[5].LCOM-CK(LackofCohesionofMethods)(asoriginallyproposedbyChidamber&Kemerer)describesthelackofcohesionamongthemethodsofaclass[5]. 7http://depnd.sourceforge.net8http://sourceforge.net/projects/metrics9http://eclipse-metrics.sourceforge.net10http://semmle.com11http://www.scitools.com12http://www.arisa.se Figure1:ToolsandmetricsusedinevaluationLCOM-HS(LackofCohesionofMethods)(asproposedbyHenderson-Sellers)describesthelackofcohesionamongthemethodsofaclass[12].LOC(LinesOfCode)countsthelinesofcodeofaclass[13].NOC(NumberOfChildren)isthenumberofimmediatesubclassessubordinatedtoaclassintheclasshierarchy[5].NOM(NumberOfMethods)isthemethodsinaclass[12].RFC(ResponseForaClass)isthesetofmethodsthatcanpotentiallybeexecutedinresponsetoamessagereceivedbyanobjectoftheclass[5].WMC(WeightedMethodsperClass)(usingCyclomaticComplexity[34]asmethodweight)isthesumofweightsforthemethodsofaclass[5].Providinganunambiguousdenitionofthesemetricsgoesbeyondthescopeofthispaper.Fordetailsaboutthemetricdenitions,pleaserefertotheoriginalsourcesofthemet-rics.Theseoftendonotgofarbeyondthedescriptiongivenabove,makingitdiculttoinferthecomplexityofthemet-ricsandwhatittakestocomputethem.Thissituationispartoftheproblemwetrytoilluminate.Wediscusstheunambiguityofmetrics,consequencesandpossiblesolutionsin[25]andprovideunambiguousmetricsdenitionsin[26].4.3SoftwareSystemsSelectionWiththeselectionofsoftwaremetricstools,welimitedourselvestotestsystemswritteninJava(sourceandbytecode).SourceForge.NETprovidesalargevarietyofopensourcesoftwareprojects.Over30.000arewritteninJavaanditispossibletosearchdirectlyforJavaprogramsofallkinds.Thus,wedownloadedabout100softwareprojectswhichweselectedmoreorlessrandomly.WetriedtogetalargevarietyofprojectsfromdierentcategoriesintheSourceForgeclassication.WepreferredprogramswithahighrankingaccordingtoSourceForge,sinceweassumedthattheseprogramshavealargeruserbase,hencerelevance.Wechosetoanalyzeprojectsindierentsizecategories.Becauseofthelimitedlicensesofsomecommercialtools,wequitearbitrarilyselectedsizesofabout5,50and500sourceles.Fromthesampleswedownloaded,werandomlyselectedthenalsampleofthreeJavaprograms,oneforeachofoursizecategories.Wedonotexpectthattheactualprogramsizeaectstheresultsofourstudy,butweprefertoworkondiversesamples.Theprogramsselectedare:JaimimplementstheAOLIMTOCprotocolasaJavali-brary.TheprimarygoalofJAIMistosimplifywritingAOLbotsinJava,butitcouldalsobeusedtocreateJavabasedAOLclients.Itconsistsof46sourceles.Java1.5.13 13http://sourceforge.net/projects/jaimlib jTcGUIisaLinuxtoolformanagingTrueCryptvolumes.Ithas5sourceles.Java1.6.14ProGuardisafreeJavaclassleshrinker,optimizer,andobfuscator.Itremovesunusedclasses,elds,methods,andattributes.Itthenoptimizesthebyte-codeandrenamestheremainingclasses,elds,andmethodsusingshortmeaning-lessnames.Itconsistsof465sourceles.Java1.5.154.4SelectedClientAnalysisWereusesomeofthemetricsselectedtodeneaclientanalysisansweringquestionQ2.Weapplyasoftwarequal-itymodelforabstractingfromthesinglemetricsvaluestoamaintainabilityvalue,whichcanbeusedtoranktheclassesinasoftwaresystemaccordingtotheirmaintainability.AsbasisforthesoftwarequalitymodelweuseMaintainabil-ityasoneofthesixfactorsdenedinISO9126[16,17].Weusefourofitsvecriteria:Analyzability,Changeability,Stability,andTestability,andomitCompliance.Inordertobeabletousethesoftwarequalitymodelwithalltools,wecanonlyincludemetricswhicharecalculatedbyalltools.Weshouldalsohaveasmanymetricsaspossible:weshouldhaveatleastonecoupling,onecohesion,onesize,andoneinheritancemetricincludedtoaddressthebiggestareasofquality-in uencingproperties,asalreadysuggestedbyBaretal.in[3].Wefurtherinvolveasmanytoolsaspos-sible.Maximizingthenumberoftoolsandmetricsinvolved,wecametoinclude4toolsand5metrics.Thetoolsare:An-alyst4j,C&KJavaMetrics,VizzAnalyzer,andUnderstandforJava.Themetricsinvolvedare:CBO,acouplingmet-ric,LCOM-CK,acohesionmetric,NOM,a(interface)sizemetric,andDITandNOC,inheritancemetrics.Thecompositionofthequalitymodelshouldnothavealargein uenceontheresults,aslongasitisthesameforeachtoolandproject.Therelationsandweightingofmetricstocriteria(Figure2)canbeseenarbitrarily. Figure2:ISO9126basedsoftwarequalitymodelThetablecanbeinterpretedinthefollowingway:ThefactorMaintainability(rstrow)isdescribedbyitsfourcri-teria:Analyzability,Changeability,StabilityandTestabil-itytoequalparts(weight1,secondrow).Theindividualcriteria(thirdrow)aredependingontheassignedmetrics(lastrow)accordingtothespeciedweights(weight1or2,fourthrow).Themappingfromthemetricsvaluestothefactorsisbythepercentageofclassesbeingoutliersaccord-ingtothemetricsvalues.Beingaoutliermeansthatthevalueiswithinthehighest/lowest15%ofthevaluerangedenedbyallclassesinthesystem(selfreferencingmodel).Thus,themetricsvaluesareaggregatedandabstractedbythefactorsandtheappliedweightstothemaintainabilitycriteria,whichdescribethepercentageofclassesbeingout-liersinthesystem,thushavingbadmaintainability.The 14http://sourceforge.net/projects/jtcgui15http://sourceforge.net/projects/proguardvaluerangeisfrom0.0to1.0(0-100%),meaningthat0.0isthebestpossiblemaintainability,sincetherearenooutliers(metricvaluesexceedingthegiventhresholdrelativetotheotherclassesinthesystem),and1.0beingtheworstpossi-blemaintainability,sinceallmetricsvaluesforaclassexceedtheirthresholds.Forexample,ifaclassAhasavalueforCBOwhichiswithintheupper15%(85%-100%)oftheCBOvaluesforallotherclassesinthesystem,andtheother4metricsarenotexceedingtheirthresholds,thisclasswouldhaveanAnalyz-abilityof2/9,Changeabilityof2/10,Stabilityof2/7,andTestabilityof2/9.Thiswouldresultinamaintainabilityof23.3%((2/9+2/10+2/7+2/9)/4).5.ASSESSMENTOFQ1/H15.1MeasurementandDataCollectionForcollectingthedata,weinstalledall10softwaremet-ricstoolsfollowingtheprovidedinstructions.Therewerenoparticulardependenciesorsideeectstoconsider.Sometoolsprovideagraphicaluserinterface,somearestand-alonetoolsorplug-instoanintegrateddevelopmentenvironment,otherswerecommand-linetools.Foreachofthetoolsbeingplug-instotheEclipseIDEwechosetocreateafreshinstal-lationofthelatestEclipseIDE(3.3.1.1)toavoidconfusingthedierenttoolsinthesameEclipseinstallation.Thetestsoftwaresystemswerestoredinadesignatedareasothatalltoolswereappliedonthesamesourcecode.Inordertoavoidunwantedmodicationsbytheanalyzingpro-gramsormeasurementerrorsbecauseofinconsistentcode,wesetthesourcecodelestoread-only,andwemadesurethatthesoftwarecompiledwithouterrors.Oncethetoolswereinstalledandthetestsoftwaresys-temsready,weappliedeachtooltoeachsystem.Weusedthetoolspecicexportfeaturestogenerateintermediatelescontainingtherawanalysisdata.Inmostcases,theex-portedinformationcontainedtheanalyzedentitiesidplusthecalculatedattributes,whichwerethenameandpathtotheclass,andthecorrespondingmetricsvalues.Inmostcases,theexportedinformationcouldnotbeadjustedbyconguringthemetricstools,sowehadtoltertheinfor-mationpriortodataanalysis.Sometoolsalsoexportedsummariesaboutthemetricvaluesandotheranalysisre-sultswhichweignored.MosttoolsgeneratedanHTMLorXMLreport,otherspresentedtheresultsintables,whichcouldbecopiedordumpedintocommaseparatedles.WeimportedthegeneratedreportsintoMSExcel2002.WestoredtheresultsforeachtestsysteminaseparateExcelworkbook,andtheresultsforeachtoolinaseparateExcelsheet.Allthisrequiredmainlymanualwork.Allthetablescontainingthe(raw)datahavethesamelayout.TheheaderspeciedthepropertiesstoredineachcolumnClassandMetrics.Classstoresthenameoftheclassforwhichmetricshavebeencalculated.Weremovedpackageinformationbecauseitisnotimportant,sincetherearenoclasseswiththesamename,andwecouldmatchtheclassesunambiguouslytothesources.Metricscontainsthemetricsvaluescalculatedfortheclassasdescribedintheprevioussection(CBO,DIT,LCOM-CK,LCOM-HS,LOC,NOC,NOM,TCC,WMC). Figure3:DierencesbetweenmetricstoolsforprojectjTcGUI Figure4:DierencesbetweenmetricstoolsforprojectJaim5.2EvaluationLookingatsomeoftheindividualmetricsvaluesperclass,itiseasilyvisiblethattherearedierencesinhowthetoolscalculatethesevalues.Forgettingabetteroverview,wecre-atedpivottablesshowingtheaverage,minimumandmax-imumvaluespertestsystemandmetricstool.Ifalltoolswoulddeliverthesamevalues,wewouldgetthesamevalues.LookingatFigure3,Figure4,andFigure5,wecanrecog-nizethattherearesignicantdierencesforsomemetricsbetweensomeofthetoolsinalltestsystems.LookingatjTcGUI(Figure3),wesee,thattheaverageofthe5classesofthesystemforthemetricCBOvariesbe-tween1.0asthelowestvalue(VizzAnalyzer)and17.6asthehighestvalue(UnderstandforJava).Thiscanbeobservedinasimilarmannerintheothertwosoftwaresystems.Thus,thetoolscalculatedierentvaluesforthesemetrics.Ontheotherhand,lookingattheNOCmetrics,weobservethatalltoolscalculatethesamevaluesfortheclassesinthisproject.ThiscanalsobeobserveinJaim(Figure4),butnotinProGuard(Figure5)whereweobservedsomedier-ences.C&KJavaMetricsandDependencyFinderaverageto0.440,CCCCto1.495,EclipseMetricsPlug-in1.3.6to0.480,Semmle,UnderstandforJava,VizzAnalyzerto1.489.OurexplanationforthedierencesbetweentheresultsfortheCBOandtheNOCmetricsisthattheCBOmetricsismuchmorecomplexinitsdescription,andthereforeitiseasiertoimplementvariantsofoneandthesamemetric,whichleadstodierentresults.TheNOCmetricsisprettystraightforwardtodescribeandtoimplement,thusthere-sultsaremuchmoresimilar.Yet,thisdoesnotexplainthedierencesintheProGuardproject.Summarizing,wecanrejectourhypothesesH1andourresearchquestionsQ1shouldthereforebeansweredwith:Yes,therearedierencesbetweenthemetricsmeasuredbydierenttoolsgiventhesameinput. Figure5:DierencesbetweenmetricstoolsforprojectProGuard5.3AnalysisAsshownintheprevioussection,thereareanumberofobviousdierencesamongtheresultsofthemetricstools.Itwouldbeinterestingtounderstandwhytherearedierences,i.e.,whatarethemostlikelyinterpretationsofthemetricstooldevelopersthatleadtothedierentresults(assumingthatallresultsareintentional{notduetobugsinthetools).Therefore,wetrytoexplainsomeofthedierencesfound.Forthispurpose,wepickedtheclassTableModelfromthejTcGUIproject.Thisclassissmallenoughtomanuallyap-plythemetricsdenitionsandvariantsthereof.WeignoredTCCandLCOM-HSbecausetheywereonlycalculatedby2respectively3tools.Fortheremaining7metricsandforeachmetricstool,wegivethemetricsvalues(inparentheses)andprovideourexplanation.Couplingmetrics(CBO,RFC)calculatethecouplingbe-tweenclasses.Decisivefactorsaretheentitiesandrelationsinthescopeandtheirtypes,e.g.,class,method,constructor,call,access,etc.Analyst4jcalculatesforCBO4andRFC12.ThesevaluescanbeexplainedbyAPIclassesbeingpartofthescope.Theseareallimportedclasses,excludingclassesfromjava.lang(StringandObject).Constructorscountasmethods,andallrelationscount(includingmethodandcon-structorinvocations).UnderstandforJavaandCCCCcal-culateCBO5and8,resp.ItappearstobethesameasforAnalyst4j,buttheyseemtoincludebothStringandOb-jectasreferencedclasses.Additionally,CCCCalsoseemstoincludeprimitivetypesintandlong.C&KJavaMetricscalculatesCBO1andRFC14.ThisvaluecanbeexplainediftheAPIclassesarenotinthescope.ThismeansthatonlythecouplingtosourceclassTrueCryptisconsidered.Ontheotherhand,foraRFCof14,theAPIclassesaswellasthedefaultconstructor,whichispresentinthebytecodeanalyzed,needtobeincluded.SemmlecalculatesRFC8.ThisvaluecanbeexplainediftheAPIisnotinscope,andiftheconstructorisalsocountedasamethod.VizzAnalyzercalculatesCBO1andRFC6,meaningthattheAPIisnotinscope,andtheconstructordoesnotcountasamethod.Cohesionmetrics(LCOM)calculatetheinternalcohe-sionofclasses.Decisivefactorsaretheentitiesandrelationswithintheclassandtheirtypes,e.g.,method,constructor,eld,invokes,accesses,etc.Analyst4j,C&KJavaMetrics,EclipseMetricPlug-in3.4,andUnderstandforJavacalcu-lateLCOM-CK0.8,1.0,0,and73,resp.Wecannotexplainhowthesevaluesarecalculated.UnderstandforJavacal-culatessomekindofpercentage.SemmlecalculatesLCOM-CK7.Thisdoesnotmatchourinterpretationofthemetricdenitionprovidedbythetoolvendor,andwecannotex-plainhowthisvalueiscalculated.VizzAnalyzercalculatesLCOM-CK4.ThisvaluecanbeexplainediftheAPIisnotinscope;andLCOMiscalculatedasnumberofmethodpairsnotsharingeldsminusnumberofmethodpairsshar-ingeldsconsideringunorderedmethodpairs.Inheritancemetrics(DIT)quantifytheinheritancehi-erarchyofclasses.Decisivefactorsaretheentitiesandre-lationsinthescopeandtheirtypes,e.g.,class,interface,implements,extends,etc.Analyst4j,C&KJavaMetrics,EclipseMetricsPlug-in1.3.6,Semmle,andUnderstandforJavacalculateDIT2.ThesevaluescanbeexplainediftheAPIclasses(ObjectandAbstractTableModel)areinscope,startingcountingat0atObjectandcalculatingDIT2forTableModel,whichissourcecode.CCCCandDependencyFindercalculateDIT1.ThesevaluescanbeexplainediftheAPIclassesarenotinscope,startingcountingwith1(TableModel,DIT1).VizzAnalyzercalculatesDIT0.ThisvaluecanbeexplainediftheAPIclassesarenotinscope,startingcountingwith0(TableModel,DIT0).SizeandComplexitymetrics(LOC,NOM,WMC)quan-tifystructuralandtextualelementsofclasses.Decisivefac-torsaretheentitiesandrelationsinthescopeandtheirtypes,e.g.,sourcecode,class,method,loopsandconditions,containsrelations,etc.ThecompilationunitimplementingtheclassTableModelhas76lines.DependencyFindercal-culatesLOC30.Thiscanbeexplainedifitcountsonlylineswithstatements,i.e.,elddeclarations,andmethodbodies,fromthebeginningoftheclassdeclaration(line18)totheendoftheclassdeclaration(closingg,line76),excludingmethoddeclarationsoranyclosingg.SemmlecalculatesLOC50.Thiscanbeexplainedifitcountsnon-emptylinesfromthebeginningoftheclassdeclaration(line18)totheendoftheclassdeclaration(closingg).UnderstandforJavacalculatesLOC59,meaningitcountsalllinesfromline18to76.VizzAnalyzercalculatesLOC64andthuscountsfrom line13(classcomment)toline76,i.e.,thefullclassdecla-rationplusclasscomments.Analyst4j,C&KJavaMetrics,CCCC,DependencyFinder,EclipseMetricsPlug-in1.3.6,Semmle,UnderstandforJavaallcalculateNOM6.Thevaluescanbeexplainedifallmeth-odsandconstructorsarecounted.VizzAnalyzercalculatesNOM5,thusitcountsallmethodsexcludingconstructors.Analyst4jcalculatesWMC17.Wecannotexplainit,butweassumeitincludesconstructorsandmightcounteachifandelse.VizzAnalyzer,EclipseMetricsPlug-in3.4andEclipseMetricsPlug-in1.3.6calculateWMC13,15and14,resp.Thesevaluescanbeexplainedwhentheyincludeconstructor(notVizzAnalyer)andcount1foreverymethod,if,do,for,while,andswitch.EclipseMetricsPlug-in3.4mightcount,inaddition,thedefaultstatements.Althoughwecannotexcludebugsinthetools,werec-ognizedtwomainreasonsfordierencesinthecalculatedvalues:First,thetoolsoperateondierentscopes,thatis,someconsideronlythesourcecode,othersincludethesur-roundinglibrariesorAPIs.Second,therearedierencesinhowmetricsdenitionsareinterpreted,e.g.,sometoolscountconstructorsasmethods,othersdonot;somestartcountingwith1,otherswith0;someexpressvaluesasper-centage,othersasabsolutevalues,etc.6.ASSESSMENTOFQ2/H2InSection5.2,weansweredourrstresearchquestionwithyes.WenowproceedwithansweringresearchquestionQ2:aretheobserveddierencesreallyaproblem?6.1MeasuringandDataCollectionObviously,wecanreusethedatacollectedbythemetricstoolsandthemetricsandsystemsfromstageoneofourcasestudyasinputtoourclientanalysis(seeSection4.4).Wejustaddnewcolumnsforthefactorsandcriteriaofthesoft-warequalitymodelandsortaccordingtomaintainability.Ifseveralclassesreceivethesamevalue,wesortusingtheCBOandLCOM-CKvaluesasthesecondandthirdsortingcriteria.ForjTcGUI,weselectall5classesforcomparison,forJaimandProGuard,weselectthe\top10"classes.6.2EvaluationandAnalysisThe\top10(5)"classesidentiedbythedierenttoolsineachprojectshowtooldependentdierences.Figures6,7,and8presenttheresultsastables.Sincethereisnocorrectrankingor\goldstandard",wecomparedeachtoolwithallothertools.Oncemore,thereisno\rightorwrong",wejustobservedierencesintherankingsduetothedierentinputmetricsvaluescomputedbythedierentmetricstools.Figure6,7,and8describethe\top5or10"classesforjTcGUI,JaimandProGuardasselected/ranked,basedonthemetricsdatacollectedbyeachtool.Rankdescribestheorderoftheclassesasdescribedintheprevioussection.I.e.,Rank1hasthelowestmaintainability(highestmaintainabil-ity,CBO,andLCOM-CKvalue),Rank2thesecondlowest,andsoon.TheCodesubstitutestheclassnameswithlet-tersa-zforeasierreference.Thenamesoftheclassesarepresentednexttothesubstitutioncode.Therstrowislabeledwiththetoolnameandsortreference.LookingatFigure6,wecanrecognizesomesmallvaria-tionsintherankingforjTcGUI.ToolAandDgetthesameresult.ToolBandCgetthesameresult,whichisslightlydierentfromtherankingproposedbyToolsAandD. Figure9:Distancebetweenrankings,jTcGUITofurtheranalyzethisobservation,weusethe\Code"foreachclasstoformastringdescribingtherankingoftheclasses.Thus,\abcde"correspondstotheranking\Gui,TrueCryptGui,Password,TrueCrypt,andTableModel".Inthecontextoftheclientanalysis,thismeansthatoneshouldstartrefactoringtheclasswiththelowestmaintainability,whichis\Gui",then\TrueCryptGui",etc.Usingthesesub-stitutionstrings,wecaneasilycomparethemanddescribetheirdierenceasnumericvalues,i.e.,aseditdistanceanddisjunctsets.WeselectedtheDamerau-LevenshteinDis-tance[4,7,23]forexpressingtheeditdistancebetweentwostrings,thusquantifyingdierencesintherankingoverthesameclasses.Avalueof0meansthestringsareidentical,avaluelargerthan0describesthenumberofoperationsnec-essarytotransformonestringintoanother,andthusthedierencethetwoprovidedstringsinourcasetheorderofthegivenclasses.Thehigherthevalue,themoredierentarethecalculatedrankings.Themaximumeditdistanceisthelengthofthestringsinourcases5or10,meaningthatcomparedsetsofclasseshavealmostnothingincom-monregardingcontainedclassesororder.Wealsomeasurehowdisjuncttheprovidedrankingsareasthepercentageofclasseswhichthetworankingsdonothaveincommon.Moreformally,cisthenumberofclasseswhichareinbothsetsbeingcompared(ranking1andranking2),andnisthenumberofclasseswhichtheycanhavepossiblyincommon.Disjunct=(1(c=n))100%.Figures9,10,and11provideanoverviewofthedier-encesbetweentherankingsprovidedbythefourtoolsperproject.ForjTcGUI(Figure9),weobservejustsmalldier-encesintherankingoftheclasses.Thebiggestdierences(Damerau-LevenshteinDistanceof2)arebetweenthetoolshavingadistancevalueof2.Thedisjunctsetisalways0%,sinceallclassesofthesystemareconsidered. Figure10:Distancebetweenrankings,JaimForJaim(Figure10),weobservemuchbiggerdierencesintherankingsoftheclasses.Thebiggestdierencesarebetweenthetoolshavingadistancevalueof9andadis-junctsetof80%.Sincethesystemhas46classesofwhich Figure6:RankingofjTcGUIclassesaccordingmaintainabilitypertool Figure7:RankingofJaimclassesaccordingmaintainabilitypertoolweinclude10inour\top10",itispossiblethatnotonlytheorderchanges,butthatotherclassesareconsideredincom-parisontoothertools.Recognizableisthatallmetricstoolselectthesameleastmaintainableclass,JaimConnection.ForProGuard(Figure11),weagainobservedierencesintherankingsoftheclasses.Thebiggestdierencesarebetweenthetoolshavingadistancevalueof10andadis-junctsetof70%.Sincethesystemhas486classesofwhichweinclude10inour\top10",itispossiblethatnotonlytheorderchanges,butthatotherclassesareconsideredincomparisontoothertools.Notableisthatthreeofthefourmetricstoolsselectthesameleastmaintainableclass,Sim-pliedVisitor.UnderstandforJavaranksitsecond. Figure11:Distancebetweenrankings,ProGuardPrecising,wefounddierencesintheorderandcomposi-tionofclasseselectedtobeleastmaintainableforallfourtoolsinallthreeprojects.Thedierencesbetweenthetoolpairsvaried,butespeciallyinthelargerprojectsaretheysig-nicant.Regardingourctivetask,thesoftwareengineersandmanagerswouldhavebeenpresentedwithdierentsetsofclassestofocustheireortson.Wecanonlyspeculateabouttheconsequencesofsuchtool-dependentdecisions.Summarizing,wecanrejectourhypothesesH2andourresearchquestionsQ2shouldthereforebeansweredwith:Yes,itdoesmatterandmightleadtodierentconclusions.7.VALIDITYEVALUATIONWehavefollowedthedesignandmethodsrecommendedbyRobertYin[35].Forsupportingthevalidity,wenowdiscusspossiblethreatsto:ConstructValidityisaboutestablishingcorrectoper-ationalmeasuresfortheconceptsbeingstudied.Toensureconstructvalidity,weassuredthattherearenoothervary-ingfactorsthanthesoftwaremetricstools,whichin uencetheoutcomeofthestudy.Weselectedanappropriatesetofmetricsandbroughtonlythosemetricsintorelationwherewehadahighcondencethatotherexperiencedsoftwareengineersorresearcherswouldcometothesameconclusion,giventhatmetricsexpressingthesameconceptmighthavedierentnames.Weassuredthatweranthemetricstoolsonidenticalsourcecode.Further,weassumedthatthelimitedselectionofthreesoftwareprojectsofthesameprogramminglanguagepossesstillenoughstatisticalpowertogeneralizeourconclusions.Werandomizedthetestsystemselection.InternalValidityisaboutestablishingacausalrela-tionship,wherebycertainconditionsareshowntoleadtocertainotherconditions,asdistinguishedfromspuriousre-lationships.Webelievethattherearenothreatstointernalvalidity,becausewedidnottrytoexplaincausalrelation-ships,butratherdealtwithanexploratorystudy.Thepos-sibilityforinterferingwaslimitedinoursetting.Therewerenohumansubjectswhichcouldhavebeenin uenced,whichcouldhaveledtodierentresultsdependingonthetimeorpersonofthestudy.Thein uenceontheprovidedtestsystemsandtheinvestigatedsoftwaremetricstoolswaslim-ited.Thevariationpointslikedataextractionandanalysisallowedonlyforverysmallroomforchanges.ExternalValiditydealswiththeproblemofknowingifourndingsaregeneralizablebeyondtheimmediatecasestudy.Weincludedthemostobvioussoftwaremetricstoolsavailableontheinternet.Theseshouldrepresentagooddealoftoolsusedinpractice.Weareawarethatthereislikelyamuchlargerbodyoftools,andmanycompaniesmighthavedevelopedtheirowntools.Itwasnecessarytogreatlyreducethenumberoftoolsandmetricsconsideredinordertoobtainresultsthatcouldallowforreasonablecomparisons.Fourtoolsandvemetricsappliedtothreedierentsystemsisfranklyspokennotveryrepresentativeforthespaceofpossibilities.Yet,wethinktheselectionandproblemsuncoveredarerepresentativeenoughtoindicateageneralproblem,whichshouldstimulateadditionalresearchincludingtestsofstatisticalsignicance.Thesameholdsfortheselectionofsoftwareprojectsmeasured.Weseenorea-sonwhyotherprojectsshouldallowfordierentconclusionsthanthethreesystemsweanalyzed,andtheprogramminglanguageshouldhavenoimpact.Theselectedmetricscouldincludeapotentialthreat.AswehaveseeninSection5,somemetrics,likeNOC,tendtoberatherstableovertheusedtools.Weonlyinvestigatedobject-orientedmetrics.Othermetrics,liketheHalsteadmetrics[10]implementedbysomeofthetools,mightbehavedierently.Yet,object- Figure8:RankingofProGuardclassesaccordingmaintainabilitypertoolorientedmetricsareamongthemostimportantmetricsinusenowadays.Theimaginarytaskandthesoftwarequalitymodelusedforabstractingthemetricsvaluescouldbeirrel-evantinpractice.Wespentquitesomethoughtondeningourctivetask,andconsideringtheexperienceswehad,e.g.,withEurocontrol,andthereengineeringtasksdescribedbyBaretalintheFAMOOSHandbookofRe-engineering[3],weconsideritasquiterelevant.Thewayweappliedsoft-warequalitymodelsisnothingnew,ithasbeendescribedinoneoranotherforminliterature[21,19,22,8,20].Reliabilityassuresthattheoperationsofastudy{suchasthedatacollectionprocedures{canberepeatedyieldingthesameresults.Thereliabilityofacasestudyisimpor-tant.Itshallallowalaterinvestigatortocometothesamendingsandconclusionswhenfollowingthesameprocedure.Wefollowedastraightforwarddesign,thussimplicityshouldsupportreliability.Wedocumentedallimportantdecisionsandintermediateresults,likethetoolselection,themap-pingfromthetoolspecicmetricsnamestoourconceptualmetricsnames,aswellastheproceduresfortheanalysis.Weminimizedourimpactontheusedartifactsanddocu-mentedanymodications.Wedescribedthedesignoftheexperimentsincludingthesubsequentselectionprocess.8.CONCLUSIONANDFUTUREWORKSoftwareengineeringpractitioners{architects,develop-ers,managers{mustbeabletorelyonscienticresults.Es-peciallyresearchresultsonsoftwarequalityengineeringandmetricsshouldbereliable.Theyareusedduringforward-engineering,totakeearlymeasuresifpartsofasystemde-viatefromthegivenqualityspecications,orduringmain-tenance,topredicteortformaintenanceactivitiesandtoidentifypartsofasystemneedingattention.Inordertoprovidethesereliablescienticresults,quitesomeresearchhasbeenconductedintheareaofsoftwaremetrics.Someofthemetricshavebeendiscussedandrea-sonedaboutforyears,butonlyfewmetricshaveevenbeenvalidatedexperimentallytohavecorrelationswithcertainsoftwarequalities,e.g.,maintainability[24].Referto[25]foranoverviewofsoftwarequalitymetricsandqualitymodels.Moreover,softwareengineeringpractitionersshouldbeabletorelyonthetoolsimplementingthesemetrics,tosup-porttheminqualityassessmentandassurancetasks,toal-lowtoquantifysoftwarequality,andtodelivertheinforma-tionneededasinputfortheirdecisionmakingandengineer-ingprocesses.Nowadaysalargebodyofsoftwaremetricstoolsexists.Butthesearenotthetoolswhichhavebeenusedtoevaluatethesoftwaremetrics.Inordertorestonthescienticdiscussionsandvalidations,i.e.,tosafelyapplytheresultsandtousetheminpractice,itwouldbeneces-sarythatallmetricstoolsimplementthesuggestedmetricsthewaytheyhavebeenvalidated.Yet,weshowedthatmetricstoolsdeliverdierentresultsgiventhesameinputand,hence,atleastsometoolsdonotimplementthemetricsasintended.Thus,wecollectedoutputforasetofninemetricscalculatedbytendierentmetrictoolsonthesamethreesoftwaresystems.Wefoundthat,atleastfortheseinvestigatedsoftwaremetrics,tool-dependentdierencesexist.Still,forcertainmetrics,thetoolsdeliveredsimilarresults.Forrathersimplemetrics,liketheNumberofChildren(NOC),mosttoolscomputedthesameorverysimilarresults.Forothermetrics,e.g.,theCouplingBetweenobjectClasses(CBO)orLackofCohe-sionofMethods(LCOM),theresultsshowedamuchbiggervariation.Overall,wecanconcludethatmosttoolsprovideddierentresultsforthesamemetricsonthesameinput.Inanattempttoexplainourobservations,wecarefullyanalyzedthedierencesforselectedclassesandfound(inmostcases)reasonableexplanations.Variationsinthere-sultswereoftenrelatedtodierentscopesthatmetricswereappliedtoanddierencesinmappingtheextractedpro-gramminglanguageconstructstoameta-modelusedinmea-surement.E.g.,thetoolsin-orexcludedlibraryclassesorinheritedfeaturesintheirmeasurements.Hence,itcouldbeconcludedthatmetricsdenitionsshouldincludeexactscopeandlanguagemappingdenitions.Minordierencesinthemetricsvalueswouldnotbeaproblemiftheinterpretationofthevaluesledtothesameconclusions,i.e.,ifsoftwareengineeringpractitionerswouldbeadvisedtoactinasimilarway.Sinceinterpretationisanabstraction,thiscouldstillbepossible.Actually,ouras-sumptionwasthatthedierencesobservedinmetricsvalueswouldbeirrelevantafterthisabstraction.Toconrmourassumption,wedenedaclientanalysis,whichabstractedfromthemetricsvaluesusingasoftwarequalitymodel.Theresultingmaintainabilityvalueswerein-terpretedtocreatearankingamongthemeasuredclasses.Softwareengineerscouldhavebeenadvisedtoattendtotheseclassesaccordingtotheirorder.Wefoundthatevenafterabstraction,thetwolargerprojectsshowedconsider-abledierencesinthesuggestedorderingofclasses.Thelistsofthetop10rankedclasseddieredupto80%forsometoolpairsandthesamesoftwaresystems.Ournalconclusionsarethat,fromapracticalpointofview,softwareengineersneedtobeawarethatthemetricsresultsaretooldependent,andthatthesedierenceschangetheadvicetheresultsimply.Especially,metricsbasedre-sultscannotbecomparedwhenusingdierentmetricstools.Fromascienticpointofview,validationsofsoftwaremet-ricsturnouttobeevenmoredicult.Sincemetricsresultsarestronglydependentontheimplementingtools,avalida-tiononlysupportstheapplicabilityofsomemetricsasim-plementedbyacertaintool.Moreeortwouldbeneededinspecifyingthemetricsandthemeasurementprocesstomaketheresultscomparableandgeneralizable.Regardingfuture work,morecasestudiesshouldrepeatourstudyforaddi-tionalmetrics,e.g.,Halsteadmetrics[10],andforfurtherprogramminglanguages.Moreover,alargerbaseofsoftwaresystemsshouldbemeasuredtoincreasethepracticalrele-vanceofourresults.Additionally,anin-depthstudyshouldseektoexplainthedierencesinthemeasurementresults,possiblydescribingthemetricsvariantsimplementedbythedierenttools.Furthermore,withtheinsightsgained,met-ricsdenitionshouldberevised.Finally,weorotherresearchersshouldreviseourexper-imentalhypotheses,whichhavebeenstatedverynarrowly.Weexpectedthatallthetoolsprovidethesamemetricsval-uesandsameresultsforclientanalyses,sothattheycanbeliterallyinterpretedinsuchawaythattheydonotrequiretestsofstatisticalsignicance.Restatingthehypothesestorequiresuchtests,inordertogetabettersenseofhowbadthenumbersforthedierenttoolsreallyare,isadditionalfutureworksupportingthegeneralizationofourresults.9.ACKNOWLEDGMENTSWewouldliketothankthefollowingcompaniesandindi-vidualsforkindlysupplyinguswithevaluationlicensesforthetoolsprovidedbytheircompanies:CodeSWATSupportforAnalyst4j.OliverWihler,AqrisSoftwareAS,forRefac-torIT,eventhoughthetoolcouldnotbemadeavailableintime.RobStuart,CustomerSupportMSquaredTech-nologies,forResourceStandardMetricsTool(Java).OlaviPoutannen,TestwellLtd,forCMTJava.ARiSAABfortheVizzAnalyzertool.WealsothankourcolleagueTobiasGutz-mannforreviewingourpaper.10.REFERENCES[1]J.Alghamdi,R.Rufai,andS.Khan.Oometer:Asoftwarequalityassurancetool.SoftwareMaintenanceandReengineering,2005.CSMR2005.9thEuropeanConferenceon,pages190{191,21-23March2005.[2]Aqrissoftware.http://www.aqris.com/.[3]H.Bar,M.Bauer,O.Ciupke,S.Demeyer,S.Ducasse,M.Lanza,R.Marinescu,R.Nebbe,O.Nierstrasz,M.Przybilski,T.Richner,M.Rieger,C.Riva,A.Sassen,B.Schulz,P.Steyaert,S.Tichelaar,andJ.Weisbrod.TheFAMOOSObject-OrientedReengineeringHandbook,Oct.1999.[4]G.V.Bard.Spelling-errortolerant,order-independentpass-phrasesviathedamerau-levenshteinstring-editdistancemetric.InACSW'07:Proc.ofthe5thAustralasiansymposiumonACSWfrontiers,pages117{124,Darlinghurst,Australia,2007.ACS,Inc.[5]S.R.ChidamberandC.F.Kemerer.AMetricsSuiteforObject-OrientedDesign.IEEETransactionsonSoftwareEngineering,20(6):476{493,1994.[6]Clarkwareconsultinginc.http://www.clarkware.com/.[7]F.Damerau.Atechniqueforcomputerdetectionandcorrectionofspellingerrors.Comm.oftheACM,1964.[8]R.G.Dromey.CorneringtheChimera.IEEESoftw.,13(1):33{43,1996.[9]EUROCONTROL.OverallTargetArchitectureActivity(OATA).http://www.eurocontrol.be/oca/public/standard page/overall arch.html,Jan2007.[10]M.H.Halstead.ElementsofSoftwareScience(Operatingandprogrammingsystemsseries).ElsevierScienceInc.,NewYork,NY,USA,1977.[11]hello2morrow.http://www.hello2morrow.com/.[12]B.Henderson-Sellers.Object-orientedmetrics:measuresofcomplexity.Prentice-Hall,Inc.,UpperSaddleRiver,NJ,USA,1996.[13]W.S.Humphrey.Introductiontothepersonalsoftwareprocess.Addison-WesleyLongmanPublishingCo.,Inc.,Boston,MA,USA,1997.[14]hypercisioninc.,http://hypercision.com/.[15]instantiationsinc.http://www.instantiations.com/.[16]ISO.ISO/IEC9126-1\Softwareengineering-ProductQuality-Part1:Qualitymodel",2001.[17]ISO.ISO/IEC9126-3\Softwareengineering-ProductQuality-Part3:Internalmetrics",2003.[18]Andrewcain.http://www.it.swin.edu.au/projects/jmetric/products/jmetric/default.htm.[19]E.-A.Karlsson,editor.SoftwareReuse:AHolisticApproach.JohnWiley&Sons,Inc.,NewYork,NY,USA,1995.[20]N.KececiandA.Abran.Analysing,MeasuringandAssessingSoftwareQualityInaLogicBasedGraphicalModel,2001.QUALITA2001,Annecy,France,2001,pp.48-55.[21]B.LagueandA.April.MappingofDatrix(TM)SoftwareMetricsSettoISO9126MaintainabilitySub-Characteristics,October1996.SES'96,ForumonSoftwareEng.StandardsIssues,Montreal,Canada.[22]Y.LeeandK.H.Chang.ReusabilityandMaintainabilityMetricsforObject-OrientedSoftware.InACM-SE38:Proc.ofthe38thannualonSoutheastregionalconference,pages88{94,2000.[23]V.Levenshtein.Binarycodescapableofcorrectingdeletions,insertions,andreversals.SovietPhysicsDoklady,1966.[24]W.LiandS.Henry.MaintenanceMetricsfortheObjectOrientedParadigm.InIEEEProc.ofthe1stInt.Sw.MetricsSymposium,pages52{60,May1993.[25]R.Lincke.ValidationofaStandard-andMetric-BasedSoftwareQualityModel{CreatingthePrerequisitesforExperimentation.Licentiatethesis,MSI,VaxjoUniversity,Sweden,Apr2007.[26]R.LinckeandW.Lowe.CompendiumofSoftwareQualityStandardsandMetrics.http://www.arisa.se/compendium/,2005.[27]J.A.McCall,P.G.Richards,andG.F.Walters.FactorsinSoftwareQuality.TechnicalReportVol.I,NTISSpringeld,VA,1977.NTISAD/A-049014.[28]Msquaredtechnologies.http://www.msquaredtechnologies.com/.[29]Powersoftware.http://www.powersoftware.com/.[30]Semanticdesignsinc.http://www.semdesigns.com/.[31]W.Tichy.Shouldcomputerscientistsexperimentmore?Computer,31(5):32{40,May1998.[32]Verifysofttechnology.http://www.verifysoft.com/.[33]Virtualmachinery.http://www.virtualmachinery.com/.[34]A.H.WatsonandT.J.McCabe.StructuredTesting:ATestingMethodologyUsingtheCyclomaticComplexityMetric.NISTSpecialPub.500-235,1996.[35]R.K.Yin.CaseStudyResearch:DesignandMethods(AppliedSocialResearchMethods).SAGEPublications,December2002.