August Shi Tifany Yung Alex Gyori and Darko Marinov NSF Grant Nos CCF1012759 CCF1421503 CCF1434590 CCF1439957 FSE 2015 Bergamo Italy 09022015 Testing is Important but Slow ID: 1001275
Download Presentation The PPT/PDF document "Comparing and Combining Test-Suite Reduc..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Comparing and Combining Test-Suite Reduction and Regression Test SelectionAugust Shi, Tifany Yung, Alex Gyori, and Darko MarinovNSF Grant Nos.CCF-1012759, CCF-1421503, CCF-1434590, CCF-1439957FSE 2015Bergamo, Italy09/02/2015
2. Testing is Important but Slow2Code Under TestV0test0test1test2test3…testN-1testN
3. Regression Testing is Slow(er)3Code Under TestV0test0test1test2test3…testN-1testNCode Under TestV1test0test1test2test3…testN-1testNCode Under TestV2test0test1test2test3…testN-1testN
4. Speeding up Regression TestingTest-Suite ReductionRegression Test SelectionTest-Suite ParallelizationRefactoring TestsMany More4
5. Speeding up Regression TestingTest-Suite ReductionRegression Test SelectionTest-Suite ParallelizationRefactoring TestsMany More5
6. Test-Suite Reduction (TSR)6Code Under TestV0test0test1test2test3…testN-1testNCode Under TestV1test0test1test2test3…testN-1testNCode Under TestV2test0test1test2test3…testN-1testN
7. Regression Test Selection (RTS)7ΔCode Under TestV0test0test1test2test3…testN-1testNCode Under TestV1test0test1test2test3…testN-1testN
8. Regression Test Selection (RTS)8Code Under TestV0test0test1test2test3…testN-1testNCode Under TestV1test0test1test2test3…testN-1testNCode Under TestV2test0test1test2test3…testN-1testN
9. TSR versus RTS(Known Qualitative Comparison)9Test-Suite ReductionRegression Test SelectionCan it miss failing tests from the original test suite?YesNo (if safe)How often is analysis performed?InfrequentlyEvery revisionHow are tests chosen to run?Redundancy(one revision)Changes(two revisions)
10. How do TSR and RTS compare quantitatively?How can TSR and RTS be combined?10
11. How do TSR and RTS compare quantitatively?How can TSR and RTS be combined?11
12. TSR Background12S1S2S3S4S5T1XXT2XXT3XXT4XT5XXT = TestsS = Statements
13. TSR Background13S1S2S3S4S5T1XXT2XXT3XXT4XT5XXT = TestsS = StatementsReduced Test Suite R = {T3,T5}
14. TSR Background14T = TestsS = StatementsR = {T3,T5}SizeS1S2S3S4S5T1XXT2XXT3XXT4XT5XX
15. TSR Background15T = TestsS = StatementsR = {T3,T5} SizeS1S2S3S4S5T1XXT2XXT3XXT4XT5XX
16. TSR Background16T = TestsS = StatementsR = {T3,T5}Fault-Detection CapabilitySizeS1S2S3S4S5T1XXT2XXT3XXT4XT5XX
17. TSR Background17 M1M2M3M4XXXXXXXXXT = TestsS = StatementsM = Mutants R = {T3,T5}Fault-Detection CapabilitySizeS1S2S3S4S5T1XXT2XXT3XXT4XT5XX
18. RTS Background18S1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXT = TestsS = StatementsVi-1ViS2 changedS6 addedΔ
19. RTS Background19T = TestsS = StatementsSelected Tests Si,Δ = {T1,T2,T3}S1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXVi-1ViS2 changedS6 addedΔ
20. RTS Background20T = TestsS = StatementsSi,Δ = {T1,T2,T3} SizeVi-1ViS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedΔ
21. RTS Background21T = TestsS = StatementsSafe RTS does not fail to detect change-related faultsSizeFault-Detection CapabilitySi,Δ = {T1,T2,T3}Vi-1ViS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedΔ S1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 added
22. How can TSR and RTS be combined?22
23. Applying RTS after TSR23T = TestsS = StatementsS1S2S3S4S5T1XXT2XXT3XXT4XT5XXVi-1
24. Applying RTS after TSR24T = TestsS = StatementsR = {T3,T5}S1S2S3S4S5T1XXT2XXT3XXT4XT5XXVi-1
25. Applying RTS after TSR25T = TestsS = StatementsRi = {T3,T5}Vi-1ViS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedΔ
26. Applying RTS after TSR26T = TestsS = StatementsRi = {T3,T5}Vi-1ViS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedΔ
27. Applying RTS after TSR27T = TestsS = StatementsVi-1ViS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedS1S2S3S4S5S6T1XXT2XXXT3XXXT4XT5XXS1S2S3S4S5T1XXT2XXT3XXT4XT5XXS2 changedS6 addedSelection of Reduction (SeRe) SizeIf RTS is safe, then as good as reduced test-suiteFault-Detection Capability {T3} Δ
28. Metrics to compare between approachesSize Decrease:TSR: RTS: SeRe: Fault-Detection Capability DecreaseCurrently, NO metric for fault-detection capability between approachesWe need a metric that takes CHANGE into account 28
29. Map Tests to Faults29T = TestsF = FaultsF1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1ViNeed criteria that includes these change-related faults
30. Detect all Faults?30T = TestsF = FaultsF1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1Vi
31. Which is Better?31T = TestsF = FaultsF1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1ViDetects 5 faultsDetects 1 change-related fault
32. Which is Better?32T = TestsF = FaultsF1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1ViDetects 5 faultsDetects 1 change-related faultDetects 4 faultsDetects 2 change-related faultsIf criteria is to detect all faults, can get misleading comparisons with respect to these change-related faults!
33. Finding Change-Related Faults33Safe RTS will not fail to select tests whose behavior differs after the changeF1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1T = TestsF = FaultsVi
34. Finding Change-Related Faults34F1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1T = TestsF = FaultsSi,Δ = {T1,T2,T3}Vi
35. Finding Change-Related Faults35F1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXVi-1T = TestsF = FaultsSi,Δ = {T1,T2,T3}Faults(Si,Δ) = {F1,F2,F3,F4,F6}Vi
36. Finding Change-Related Faults36Faults detected by non-selected tests cannot be change-related!Faults(Si,Δ) \ Faults(Oi \ Si,Δ) = Faults({T1,T2,T3}) \ Faults({T4,T5}) Faults(S1,Δ) \ Faults(O1 \ S1,Δ= {F1,F2,F3,F4,F6} \ {F1,F3,F5} = {F2,F4,F6}ChangeRelatedFaultsi,Δ = Faults(Si,Δ) \ Faults(Oi \ Si,Δ)F1F2F3F4F5F6T1XXT2XXXT3XXXT4XT5XXF1F2F3F4F5T1XXT2XXT3XXT4XT5XXViVi-1
37. Change-Related Requirements (CRR)37Use testing requirements (statements covered or mutants killed) to approximate fault-detection capability of test suite T chosen from Oi Evaluate loss in change-related fault-detection capability of reduced test suite
38. Evaluation Setup38
39. Projects39
40. Experimental SetupUse Greedy heuristic to perform TSRRemove redundant tests with respect to statement coverageStatement coverage/mutants killed collected using PIThttp://pitest.org Use Ekstazi to perform (safe) RTSSelect tests based on file-level dependenciesTests selected at test class levelhttp://www.ekstazi.orgSimulate evolving reduced test suite and selection of reduction40
41. Evolving Reduced Test Suite41Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3
42. Evolving Reduced Test Suite42Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3
43. Evolving Reduced Test Suite43Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4
44. Evolving Reduced Test Suite44Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4
45. Evolving Reduced Test Suite45Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4
46. Evolving Reduced Test Suite46Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4Code Under TestV2test0test1test2test3test4
47. Evolving Reduced Test Suite47Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4Code Under TestV2test0test1test2test3test4Code Under TestV2test0test1test2test3test4
48. Evolving Reduced Test Suite48Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4Code Under TestV2test0test1test2test3test4Code Under TestV2test0test1test2test3test4
49. Evolving Reduced Test Suite49Code Under TestV0test0test1test2test3Code Under TestV0test0test1test2test3Code Under TestV1test0test1test2test3test4Code Under TestV1test0test1test2test3test4Code Under TestV2test0test1test2test3test4Code Under TestV2test0test1test2test3test4
50. Selection of ReductionCan see result of selection of reduction by looking at tests chosen by TSR and RTSGiven and , can intersect the two to see what tests from are selected due to changes 50
51. Size Comparison51
52. Evaluation: Size Comparisons52Apache Commons-Lang
53. Evaluation: Size Comparisons53LA4J
54. Evaluation: Size Comparison (Aggregated)54SeRe runs even fewer tests (difference in median of 5.34pp) RTS runs fewer tests than TSR (difference in median of 40.15pp)Apache commons-langLA4J
55. Change-RelatedFault-Detection Capability Comparison55
56. Evaluation: Fault-Detection Capability Comparison56 TSR has small loss in change-related fault-detection capability(greatest median loss 5.93%)RTS has no lossSeRe has same loss as TSRP17
57. DiscussionCRR is not an optimal way of measuring change-related fault-detection capabilityBut better than only looking at changed portions of codeFuture work in finding better criteria57
58. ConclusionsRegression testing is slow, but there are approaches to speed it upTest-suite reduction (TSR) and regression test selection (RTS) are such approaches, and we compare them quantitativelyRTS performs better than TSRRuns fewer tests (40.15pp), no loss in change-related fault-detection capabilitySelection of Reduction (SeRe) runs even fewer tests (5.34pp) with small loss in change related-fault-detection capability (5.93%)58awshi2@illinois.edu
59. BACKUP59
60. Threats to ValidityResults for projects used for evaluation may not generalize for all projectsRTS tracks dependencies at file level and selects at test class level, TSR tracks dependencies at statement level and reduces at test method levelRTS selects at coarser granularity level, yet our findings show that it selects fewer tests on average than TSRCRR relies on RTS to be safe and preciseAlthough RTS tool is safe, it is imprecise, meaning possibly more requirements are considered change-related than actually should be60
61. Evaluation: SeRe Selection Ratio61Ratios are very similar (mean ratio difference only 0.72pp) Reduced test suite representative of original test suite
62. LA4J Re-Reduction62
63. Evaluation: Size Comparisons63Joda-TimeApache Commons-Lang
64. Evaluation: Size Comparisons64LA4j (Reduced Early)LA4J (Reduced Late)
65. Evaluation: Size Comparisons65LA4J (Reduced Late)LA4j (Reduced Early)