/
A Refactoring Technique for Large A Refactoring Technique for Large

A Refactoring Technique for Large - PowerPoint Presentation

lauren
lauren . @lauren
Follow
66 views
Uploaded On 2023-09-18

A Refactoring Technique for Large - PPT Presentation

Groups of Software Clones Master thesis defense Department of Computer Science and Software Engineering Faculty of Engineering and Computer Science Concordia University Asif AlWaqfi Supervised by Dr Nikolaos ID: 1017600

column clone null row clone column row null type statements data string url groups tip entities dataset clustering based

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Refactoring Technique for Large" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. A Refactoring Technique for Large Groups of Software Clones(Master thesis defense)Department of Computer Science and Software EngineeringFaculty of Engineering and Computer ScienceConcordia UniversityAsif AlWaqfiSupervised by: Dr. Nikolaos Tsantalis

2. IntroductionSoftware Maintenance is the last step in System Development Life Cycle (SDLC)Duplicate CodeIncrease maintenance effort and cost [LozanoICSM2008]Error proneness when clones are updated inconsistently [JuergensICSE2009]Code instability [MondalACM2012]Software Refactoring2

3. MotivationAmount of clones in the systemsResearchers reported that clones in systems range between 5% to 50% of the systems code base.Lack of mature and reliable clone refactoring toolsSupport specific clone typesOther limitationsDevelopers care about duplicate code and they try to avoid duplicate code when performing maintenance tasks. [YamashitaUSER2013, SilvaFSE2016]3

4. Clone Types4Clone Type I

5. Clone TypesClone Type II5

6. Clone Types6Clone Type III

7. Clone TypesClone Type IVvoid  loopOver (int  var){ while(var > 0) { System.out.println(var); var--; }}void  loopOver (int  var){ if(var > 0) { System.out.println(var); loopOver(--var); }}7

8. Thesis goalWe ran 4 clone detection tools on 9 open source projects: 31% of the reported clone groups contain more than 2 clone instances.8Refactoring Clone Groups

9. Approach9

10. ApproachClone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsCommon StructureDifferencesSub GroupsClusteringClone Detection Tool10

11. Approach11Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

12. Information extraction (Data type, Example)url = getItemURLGenerator(row, column).generateURL(dataset, row, column);Assignment Left-Side: Identifier name: urlData Types (Including Super types): CharSequence, StringAssignment Right-Side (Method Call):Method name: getItemURLGenerator(row, column).generateURLReturn Data Types (Including Super types): CharSequence, StringParameters Data Types: ({IntervalCategoryDataset, KeyedValues2D, Dataset, CategoryDataset, GanttCategoryDataset, Values2D}, {int}, {int})12

13. Approach13Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

14. ClusteringClone detection tools might report clone groups having dissimilar clone instances, affecting the refactorability of the group as a whole. For instance, token-based and text-based detectors don’t validate the control statements in the clones, one could be an If while the other is a For.The goal of clustering step is to create smaller groups that their clone instances:Have a common control structure.Less different.14

15. 15if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    CategoryToolTipGenerator tipster = getToolTipGenerator(row,column);    if (tipster != null) {      tip = tipster.generateToolTip(dataset, row, column);  }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url = getItemURLGenerator(row, column).generateURL(dataset, row, column);    }    CategoryItemEntity entity =new CategoryItemEntity(bar,tip,url, dataset, dataset.getRowKey(row), dataset.getColumnKey(column)); entities.add(entity);  }}Clone (1)Clone (2)if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    if (getToolTipGenerator(row, column) != null) {      tip = getToolTipGenerator(row, column).generateToolTip(      dataset, row, column);    }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url = getItemURLGenerator(row, column).generateURL(dataset, row, column);    }   CategoryItemEntity entity = new CategoryItemEntity(bar, tip,url, dataset, dataset.getRowKey(row),dataset.getColumnKey(column)); entities.add(entity);  }}if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    CategoryToolTipGenerator tipster= getToolTipGenerator(row, column);    if (tipster != null) {      tip = tipster.generateToolTip(data, row, column);    }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url= getItemURLGenerator(row,column).generateURL(data,row, column);    }  CategoryItemEntity entity = new CategoryItemEntity(bar,tip, url, data, data.getRowKey(row), data.getColumnKey(column));    entities.add(entity);  }}if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    CategoryToolTipGenerator tipster = getToolTipGenerator(row, column);    if (tipster != null) {      tip = tipster.generateToolTip(data, row, column);    }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url = getItemURLGenerator(row, column).generateURL(data, row, column);   }  CategoryItemEntity entity = new CategoryItemEntity(bar, tip, url, data, data.getRowKey(row),data.getColumnKey(column));   entities.add(entity);  }}  Clone (3)Clone (4)EXAMPLE

16. Clustering (common structure)16Clone (1)1358Clone (2)if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    CategoryToolTipGenerator tipster= getToolTipGenerator(row, column);    if (tipster != null) {      tip = tipster.generateToolTip(data, row, column);    }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url= getItemURLGenerator(row,column).generateURL( data,row,column);    }  CategoryItemEntity entity = new CategoryItemEntity (bar,tip, url, data, data.getRowKey(row), data.getColumnKey(column));    entities.add(entity);  }}1369if (state.getInfo() != null) {  EntityCollection entities = state.getEntityCollection();  if (entities != null) {    String tip = null;    if (getToolTipGenerator(row, column) != null) {      tip = getToolTipGenerator(row, column).generateToolTip(      dataset, row, column);    }    String url = null;    if (getItemURLGenerator(row, column) != null) {      url = getItemURLGenerator(row, column).generateURL(dataset, row, column);    }    CategoryItemEntity entity = new CategoryItemEntity(bar, tip, url, dataset, dataset.getRowKey(row),dataset.getColumnKey(column)); entities.add(entity);  }}

17. Clustering (common structure)1713581369Clone (1)Clone (2)

18. 18Clone 1, Clone 21358Clone 11369Clone 21369Clone 21369Clone 31369Clone 31369Clone 4=?Clone 1, Clone 2, Clone 3=?Clone 1, Clone 2, Clone 3, Clone 4

19. Approach19Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

20. Clustering (Differences)This step of clustering is done by:Compute distance matrix.Apply Hieratical clustering.In each round in Hieratical clustering a merge is done and Silhouette Coefficient is computed.Clusters with the highest Silhouette Coefficient are selected.Silhouette CoefficientA measurement that is used to measure and estimate the consistency and quality of clusters.The closer Silhouette Coefficient to 1 the less the dissimilarity within the same cluster and greater to the other clusters so we can say it is well-clustered.20

21. Example21Clone(1)(2)(3)(4)(1)0.014.014.014.0(2)14.00.06.06.0(3)14.06.00.06.0(4)14.06.06.00.0Clone(1)Clone(2)Clone(3)Clone(4)Distance MatrixClone(1)Clone(2)Clone(3)Clone(4)Clone(1)Clone(2)Clone(3)Clone(4)Clone(1)(2)(3)(4)(1)0.014.014.014.0(2)14.00.06.06.0(3)14.06.00.06.0(4)14.06.06.00.0Silhouette Coefficient = ~0.43Silhouette Coefficient = 0Clone(1)Clone(3)Clone(4)Clone(2)

22. Approach22Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

23. Clone PairHas control statement?(1) Map statements based on control dependencies(2) Map statements based on data dependencies(3) Map statements based on dependencies from method signature(4) Statements have noincoming dependencies(5) Statements not matchedMapping ResultYesNoChildren are mapped using (in order):String similarity = 1.0 and Vector similarity = 1.0Vector similarity = 1.0String similarity = 1.0Data typesStatements Mapping using (in order):String similarity = 1.0Vector similarity = 1.0Data TypesPairwise matching23

24. Pairwise matching (Example)24System.out.println("Start");double x = 4.1; double y = 2.0; double z1 = y + 3.0; double z2 = x + 5.0; String str ="Do nothing"; System.out.println("End"); if(x > 0){ System.out.println("Inside If"); }System.out.println("Start"); double y = 2.0; double x = 4.1; double z1 = y + 3.0;String str ="Do nothing";String str2 = "String 2" + x;System.out.println("End"); if(x > 0){ System.out.println("Inside If"); } double z2 = x + 9.0;

25. Pairwise matching (Example)25String SimilarityClone (1)123456789Clone (2)11000000002001000000301000000040001000005000001000600000000070000001008000000010900000000110000000000Vector SimilarityClone (1)123456789Clone (2)11000001012001000000301000000040001000005000001000600000000071000001018000000010910000010110000010000

26. Pairwise matching (Example)26System.out.println("Start");double x = 4.1; double y = 2.0; double z1 = y + 3.0;double z2 = x + 5.0; String str ="Do nothing"; System.out.println("End"); if(x > 0){ System.out.println("Inside If"); }System.out.println("Start"); double y = 2.0; double x = 4.1; double z1 = y + 3.0;String str ="Do nothing";String str2 = "String 2" + x;System.out.println("End"); if(x > 0){ System.out.println("Inside If"); } double z2 = x + 9.0; Clone (2)Clone (1)12345678910893245167Clone (2)Clone (1)Clone PairHas control statement?(1) Map statements based on control dependencies(2) Map statements based on data dependencies(3) Map statements based on dependencies from method signature(4) Statements have no incoming dependencies(5) Statements not matchedMapping ResultYesNo(1) Map statements based on control dependencies(2) Map statements based on data dependencies(3) Map statements based on dependencies from method signature(4) Statements have no incoming dependencies(5) Statements not matchedMapping ResultHas control statement?Cone (2)Cone (1)

27. Pairwise matching (Example)27

28. Approach28Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

29. Statement alignmentThe results from previous step (Pairwise Matching) are matched pairs. The goal of this step is to connect the mapped statements in these pairs and to find all common statements across the fragments in the cluster. The alignment process follows a transitive approach, so in our example:Cluster contains three clones: Clone (2), Clone (3), Clone (4)Pairwise Matching return two pairs: Pair1: Clone (2) and Clone (3)Pair2: Clone (3) and Clone (4)Alignment results in the next slide. 29

30. 30

31. Approach31Clone ParsingProject ParsingInformation ExtractionPairwise MatchingStatement AlignmentRefactorability AssessmentProjectClone GroupsSub GroupsClone Detection ToolCommon StructureDifferencesClustering

32. Refactorability AssessmentTo validate if the clones within the same cluster can be refactoredWe extended the work of Tsantalis et al. [TsantalisTSE2015] to accept more than two fragments and return refactorability status A cluster is refactorable if it passes all the 8-Preconditions proposed by Tsantalis et al. [TsantalisTSE2015] 32

33. 33

34. Qualitative study34

35. Qualitative study (Setup)Project: JFreeChart 1.0.10Clone Set: Clones were detected by Deckard in production code only.Total clone instances: 2306Total groups: 847Pairwise matching is done to all pair combination for clones within the same cluster35Group SizeNumber of Groups2591392498521>545

36. Qualitative study (Discussion)Accuracy EvaluationIf pairs are similarly matched by Our work and Tsantalis et al. [TsantalisTSE2015] work. Performance EvaluationCompare the time for our work to Tsantalis et al. work [TsantalisTSE2015] time.Clustering EvaluationThe impact of the second step of clustering (Differences) in improving groups refactorability.Clone Group level EvaluationWe assess the Grous refactorability, along with the execution time for the whole approach.Group time is in compare to Tsantalis et al. work [TsantalisTSE2015] 36

37. Accuracy EvaluationFor Clone Type I our work has an identical matching to Tsantalis et al. For Type II and Type III there are differences:37Clone Type Number of clone pairsIdentical mapping at Pair level Identical mapping at statement levelType I326100%100%Type II73294%98%Type III2462.5%93%Clone Type Number of clone pairs Different mapping More mapped statements Less mapped statements Different mapping & more mapped statementsType II 44241514Type III 83410

38. 38Different statement MappingOur MappingTsantalis et al. Mapping

39. 39Our MappingTsantalis et al., MappingMore statements mapped

40. 40Less statements mappedOur MappingTsantalis et al., Mapping

41. Accuracy EvaluationWe have differences in statements mappings, but:These differences didn’t affect the refactorability of the pairs. For the refactorable pairs we need to extend our work to perform actual refactoring.41

42. Performance EvaluationMean or Median? Decide based on the distribution of the dataSkewness: This measure describes the symmetry of the data points around the Mean (skewness = 0).Kurtosis: This measure describes if the shape of the data is the same as the Gaussian distribution (kurtosis = 0), or if it has a tail.42Our workTsantalis et al.Skewness0.96.6Kurtosis6.382.5Median69.1 (ms)72.3 (ms)

43. Performance Evaluation43Millisecond

44. Performance EvaluationMedians Our work: 69.1 (ms)Tsantalis et al.: 72.3 (ms)Time distribution (ms)Our work: (5.1 - 170)Tsantalis et al.: (6.2 - 225)Medians are almost the same but the time distribution shows our time is better. 44

45. Clustering EvaluationIn this evaluation we compare the clusters resulted from Common Structure to the clusters after applying Differences, and we found that:45Change# CasesNo changes to the clusters resulting from clustering based on Common Structure25Removing clone fragments from the clusters resulting from clustering based on Common Structure increased the number of refactorable clustersThe clusters resulting from clustering based on Differences were more and/or smaller from the clusters resulting from clustering based on Common StructureRemoving a clone fragment from the clusters resulting from clustering based on Common Structure increased the number of mapped statements10191

46. Clustering Evaluation (Example 1)46

47. 47

48. 48

49. 49

50. Clone Group level EvaluationIn terms of refactorability:Initial Groups:256 Groups containing 3 clones or more60 Groups excluded (Class level or repeated group) 196 groups in the comparisonResults:98 Clusters (containing 3 clone instances or more)48 (out of 98) Refactorable Clusters41 clone groups (~21% refactorable groups)In terms of group execution time50Cluster Size# of Clusters# of RefactorableClusters3442243714585622743811911>910

51. 51

52. Clone Group Level EvaluationIn terms of Refactorability: 21% (out of 196) refactorable groups were foundIn terms of Time: For groups containing 2-5 clone instances both times are almost the sameFor groups containing 6 clone instances or more our approach does better (A huge improvement)52

53. Empirical Study53

54. Experiment SetupTo evaluate on large scale projects from different domains. We ran the experiment on clones detected by NiCad, CCFinder, Deckard, and CloneDR on the 9 projects 44k clone groups, where 13.6k contain 3 clone instances or more. 54CCFinderCloneDRDeckardNiCad BlindNiCad ConsistentClone Type I4,87516,1562,4563,2783,844Clone Type II94,35451,92768,23184,32065,286Clone Type III986351,82130,1746,996Total Pairs100,21568,11872,508117,77276,126Total pairs in the comparison:

55. 55Accuracy EvaluationCCFinderCloneDRDeckardNiCad BlindNiCad ConsistentPSPSPSPSPSClone Type I100%100%100%100%100%100%100%100%100%100%Clone Type II87.5%94.8%97%98.4%89.9%96.8%85.2%95.2%89%96.3%Clone Type III85.8%92.9%70.4%87.5%60.9%89.4%77.6%90.8%77.6%93.3%P: Pairs that our work and Tsantalis et al. have the same mappingS: Statements that our work and Tsantalis et al. have the same mappingPerformance EvaluationCCFinderCloneDRDeckardNiCad BlindNiCad ConsistentOTOTOTOTOTClone Type I59.6946.7852.641.9552.7146.6746.8249.243.8146.11Clone Type II51.7950.3252.5444.7743.9240.2146.9647.6743.1640.48Clone Type III55.4442.4554.5345.3148.4167.4744.0847.2428.9732Average55.6446.5253.2244.0148.3551.4545.9548.0438.6539.53O: Our execution time in millisecondsT: Tsantalis et al. execution time in milliseconds

56. Clone Group RefactorabilityA total of 7,217 subgroup we were able to findOut of the total subgroups 2,833 subgroup were refactorable that contain a total of 13,398 clone instances.56CCFinderCloneDRDeckardNiCad BlindNiCad ConsistentTRTRTRTRTRApache Ant11550.4%19572.3%6738.81%9635.42%12138.84%Columba10045%18555.7%8269.51%11448.25%11154.95%EMF18624.2%23259.9%718.45%18527.57%22924.0%Hibernate20140.8%22061.8%8137.04%13724.09%11335.4%JEdit2626.9%5451.9%1625%3432.35%2429.17%JFreechart59618.8%43656%34624.28%26625.56%29327.65%JMeter6949.3%2138.1%6741.79%7937.97%8241.46%JRuby15129.1%15347.1%9629.17%16317.79%14323.78%SQuirreL SQL20537.1%39464.5%9645.83%27439.42%29241.1%Average(Tool)34.4%59.5%33.3%31.1%34.0%T: Total subgroupsR: Refactorable subgroups

57. Threats to ValidityInternal threatsClone Detection configurationsNo actual refactoring is doneClustering step might create redundant clusters or discard some refactorable fragmentsExternal threatsIn ability to generalize our findings beyond the 9-projects we examined and the four clone detection tools we used57

58. conclusion and Future work58

59. conclusionPairwise MatchingClone Type I: 100% at pair and statement levelClone Type II: 85.2% - 97% at pair level, and 94.8%-98.4% at statement levelClone Type III: 60.9%-85.8% at pair level, and 87.5%-93.3% at statement levelMap reordered statementsNo thresholds were used in any step of our workSubgroups RefactorabilityWe achieved 59.5% for clones detected by CloneDR, and around 31.1%-34.4% for the rest of the tools.We found for some groups removing a single fragment through clustering make them refactorable.59

60. Future workAddress some of the internal threats to validity.Add the support for actual refactoring.Extend our work to support clone refactoring using Lambda expressions.Improve the steps in our approach.Create an Eclipse plug-in for clone group refactoring.60Thanks