Evaluating and Coping with the Impact of Test Dependence Wing Lam Sai Zhang Michael D Ernst University of Washington 2 Executing them in a different order Order dependent Dependent ID: 386739
Download Presentation The PPT/PDF document "When Tests Collide:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
When Tests Collide: Evaluating and Coping with the Impact of Test Dependence
Wing Lam, Sai Zhang, Michael D. ErnstUniversity of WashingtonSlide2
2
Executing them in a
different
order:
Order dependent
Dependent
t
est
Two tests:
createFile
(“foo”)
...
readFile
(“foo”)
...
(the
intended
test results)
Executing them in
default
order:Slide3
Why should we care about test dependence? Makes test behaviors
inconsistentAffects downstream testing techniques
3
CPU
2
CPU 1
Test parallelization
Test
prioritization
Test
selectionSlide4
Test independence is
assumed by:
Test selectionTest prioritizationTest parallel executionTest factoringTest generation…
Conventional wisdom:
test dependence is not a significant issue
4
31 papers
in
ICSE, FSE, ISSTA, ASE,
ICST, TSE, and TOSEM(2000 – 2013)Slide5
Test independence is
assumed by:Test selection
Test prioritizationTest parallel executionTest factoringTest generation…
Conventional wisdom:
test dependence is not a significant issue
31 papers
in
ICSE, FSE, ISSTA, ASE,
ICST, TSE, and TOSEM(2000 – 2013)
5
27
3
1
Assume test independencewithout justification
As a threat to validity
Consider test dependenceSlide6
Recent workIllinois MIR work on flaky tests and test dependences [
Luo FSE’14, Gyori ISSTA’15, Gligoric ISSTA’15]Tests revealing inconsistent resultsDependent test is a special type of flaky test.
UW PLSE work on empirically revisiting the test independence assumption [Zhang et al ISSTA’14]
Dependent test assumption should not be ignored
6Slide7
Is the test independence assumption valid?Does test dependence arise in practice?
What repercussions does test dependence have?How can we nullify the impact of test dependence?7
Affecting downstream testing techniques
General algorithm adds/reorders tests for techniques such as prioritization, etc.
No!
Yes
, in both human-written and
automatically-generated
suitesSlide8
Is the test independence assumption valid?
Does test dependence arise in practice?
What repercussions does test dependence have?How can we nullify the impact of test dependence?
8
Affecting downstream testing techniques
General algorithm adds/reorders tests for techniques such as prioritization, etc.
Yes
, in both human-written and
automatically-generated
suitesSlide9
Methodology9
Reported dependent tests
5 issue tracking systems
New dependent tests
5
real-world projectsSlide10
Methodology10
Reported dependent tests
5 issue tracking systems
Search for
4
key phrases:
(“
dependent test”, “test dependence”,
“
test execution order”, “different test
outcome”)
Manually inspect
450
matched bug reports
Identify
96
distinct dependent tests
Characteristics
:Root cause
Developers’ actionSlide11
Root cause11
96
dependent testsSlide12
Root cause12
5
9
23
10
4
s
tatic variable
file system
database
Unknown
at least
61%
are due to
side-effecting
access
to
static
variables
.Slide13
Developers’ action13
98% of the reported tests are marked as major or minor
issues91% of the dependence has been fixedImproving documentation
Fixing test code or source codeSlide14
Methodology14
Human-written test suites 6413 tests
Automatically-generated test suites
use
Randoop
[Pacheco’07]11002 tests
Selected these subjects from previous project [
Zhang et al. ISSTA’
14] that identified dependent tests in them
37 (0.6%)
dependent tests608 (5.5%) dependent tests
New dependent tests
5
real-world projectsSlide15
Is the test independence assumption valid?
Does test dependence arise in practice?
What repercussions does test dependence have?How can we nullify the impact of test dependence?
15
Affecting downstream testing techniques
General algorithm adds/reorders tests for techniques such as prioritization, etc.
Yes
, in both human-written and
automatically-generated
suitesSlide16
Test prioritization16
…
A test execution order
…
A new test execution order
Achieve coverage faster
Improve fault detection rate
…
Each test should yield the
same
result.Slide17
Four test prioritization techniques [Elbaum et al. ISSTA 2000
]17
Test prioritization techniquePrioritize on coverage of statements
Prioritize on coverage
of statements not yet covered
Prioritize on coverage of methods
Prioritize on coverage of methods not yet covered
Record the number of dependent tests yielding
different results
Total:
37 human-written and 608 automatically-generated dependent tests
5
real-world projectsSlide18
Evaluating test prioritization techniques18
Test prioritization technique
Number of tests that yield different resultsPrioritize on coverage of statements
5 (13.5%)
Prioritize on coverage
of statements not yet covered
9
(24.3%)
Prioritize on coverage of methods
7 (18.9%)
Prioritize on coverage of methods not yet covered
6 (16.2%)
Implication:On average, 18%
chance test dependence would affect test prioritization on human-written testsOut of 37 human-written dependent testsSlide19
Evaluating test prioritization techniques19
Test prioritization technique
Number of tests that yield different resultsPrioritize on coverage of statements
372 (61.2%)
Prioritize on coverage
of statements not yet covered
331
(54.4%)
Prioritize on coverage of methods
381 (62.3%)
Prioritize on coverage of methods not yet covered357
(58.7%)
Implication:On average, 59%
chance test dependence would affect test prioritization on automatically-generated testsOut of
608 automatically-generated testsSlide20
Test selection20
…
A test execution order
…
A subset of the test
execution order
Runs faster
…
Each test should yield the
same
result.Slide21
Six test selection techniques [Harrold et al. OOPSLA 2001
]21
Selection granularityOrdered by
Statement
Test id (no re-ordering)
Statement
Number of elements
tests cover
Statement
Number of uncovered elements tests cover
Function
Test id (no re-ordering)Function
Number of elements tests cover
Function
Number of uncovered elements tests cover
Record the number of dependent tests yielding different results
Total: 37 human-written and
608 automatically-generated dependent tests
5
real-world projectsSlide22
Evaluating test selection techniques22
Implication:On average, 3.2% chance test dependence would affect test
selection on human-written testsOut of 37
human-written dependent tests
Selection granularity
Ordered by
Number of tests
that yield different results
Statement
Test id (no re-ordering)
1 (2.7%)
StatementNumber of elements
tests cover1
(2.7%)Statement
Number of uncovered elements tests cover
1 (2.7%)
Function
Test id (no re-ordering)
1 (2.7%)Function
Number of elements tests cover
1 (2.7%)
Function
Number of uncovered elements tests cover2 (5.4%)Slide23
Evaluating test selection techniques23
Implication:On average, 32% chance test dependence would affect test selection on
automatically-generated testsOut of 608
automatically-generated dependent tests
Selection granularity
Ordered by
Number of tests
that yield different results
Statement
Test id (no re-ordering)
95 (15.6%)Statement
Number of elements tests cover
109 (17.9%)
StatementNumber of uncovered elements tests cover
109 (17.9%)
Function
Test id (no re-ordering)
266 (44.0%)
FunctionNumber of elements
tests cover294 (48.4%)
Function
Number of uncovered elements tests cover
297 (48.8%)Slide24
Test parallelization24
…
A test execution order
Reduce test latency
…
Each test should yield the
same
result.
…
Schedules the test execution order across multiple CPUs
CPU 1
CPU
2Slide25
Two test parallelization techniques [1]
25Record the number of dependent tests yielding different
resultsTotal: 37 human-written and
608
automatically-generated dependent tests
5
real-world projects
[
1
]
Executing
unit tests in parallel on a multi-CPU/core machine
in
Visual Studio. http://
blogs.msdn.com
/b/
vstsqualitytools
/archive/2009/12/01/executing-unit-tests-in-parallel- on-a-multi-
cpu
-core-
machine.aspx
.
Test parallelization technique
Parallelize on test idParallelize
on test execution timeSlide26
Evaluating test parallelization techniques26
Implication:On average, when the #CPUs = 2, 27% chance test dependence would affect test
parallelization for human-written tests. When the #CPUs = 16, 36% chance
Out of
37
human-written dependent tests
Parallelize on test id
Parallelize on test
execution time
#
CPUs = 2#CPUs = 16
#CPUs = 2
#CPUs = 16
2 (5.4%)
13 (35.1%)
14 (37.8%)14 (37.8%)
#CPUs = 4 and 8 were evaluated but omitted for space reasonsSlide27
Evaluating test parallelization techniques27
Implication:On average, when the #CPUs = 2, 46% chance test dependence would affect test parallelization for automatically-generated tests. When the #CPUs = 16,
64% chance
Parallelize
on test id
Parallelize on test
execution time
#
CPUs = 2
#CPUs = 16
#CPUs = 2
#CPUs = 16
194 (31.9%)
349 (57.4%)360
(59.2%)433 (71.2%)
Out of 608 automatically-generated dependent tests
#CPUs = 4 and 8 were evaluated but omitted for space reasonsSlide28
Impact of test
d
ependence
28
Technique
Test suite type
Chance
of impact by test dependence
Prioritization
Human-written
Low
Prioritization
Automatically-generated
High
Selection
Human-written
Low
Selection
Automatically-generated
Moderate
Parallelization
Human-written
Moderate
Parallelization
Automatically-generated
High
Chance
Average % of dependent
test exposed
Low
0-25%
Moderate
25-50%
High
+50%
Dependent tests
does affect
downstream testing technique especially for
automaticatlly
-generated
test suites!Slide29
Is the test independence assumption valid?
Does test dependence arise in practice?
What repercussions does test dependence have?How can we nullify the impact of test dependence?
29
Affecting downstream testing techniques
General algorithm adds/reorders tests for techniques such as prioritization, etc.
Yes
, in both human-written and
automatically-generated
suitesSlide30
General algorithm to nullify test dependence30
A test suite:
-Product of test prioritization, selection, parallelization
Known test dependences
:
-Can be generated through approximate algorithms
[
Zhang et al. ISSTA’14
]
or empty
-
Reuseable
for
different testing techniques and when developers change their code
…
A test suite
…
Reordered/Amended test suite
…
Known test dependences
…
Known test dependencesSlide31
Prioritization algorithm to nullify test dependence31
Measured average
area under the curve (APFD)
for percentage of faults detected over life of the test suite
APFD of original
prioritization algorithms was
89.1%
.
This dependence-aware algorithm was
88.1%
— a negligible difference.
…
A test suite
Prioritization
…
Prioritized test suite
…
Reordered test suite
…
Known test dependences
…
Known test dependences
General algorithmSlide32
Selection algorithm to nullify test dependence32
Measured number of tests selected
Number of tests selected on average of original selection algorithms was
41.6%
.
This dependence-aware algorithm selected
42.2%
— a negligible difference.
…
A test suite
Selection
…
Selected test suite
…
Reordered/Amended test suite
…
Known test dependences
…
Known test dependences
General algorithmSlide33
Parallelization algorithm to nullify test dependence33
Measured time taken by slowest machine and its average speedup compared to
unparallelized
suites
Average speedup of original parallelization algorithms
was
41%
.
This dependence-aware algorithm’s speedup was
55%
.
…
A test suite
Parallelization
…
Subsequences of
test suite
…
Reordered/Amended test suite
…
Known test dependences
…
Known test dependences
General algorithmSlide34
Future workFor test selection, measure the time it takes for our dependence-aware test suites to run compared to the dependence-unaware test suites
34Evaluate our effectiveness at incrementally
recomputing test dependences when developers make code changesSlide35
Evaluating and coping with impact of test dependenceTest dependence arises in practiceTest dependence does
affect downstream testing techniquesOur general algorithm is effective in practice to nullify impact of test dependenceOur tools, experiments, etc.
https://github.com/winglam
/dependent-tests-impact/
Contributions
35Slide36
[Backup slides]36Slide37
Why more dependent tests in automatically-generated test suites?Manual
test suites:Developer’s understanding of the code and their testing goals help build well-structured testsDevelopers often try to initialize and destroy the shared objects each unit test may useAuto test suites:Most tools are not “state-aware”The generated tests often “
misuse” APIs, e.g., setting up the environment incorrectlyMost tools can not generate environment setup / destroy code37Slide38
Dependent tests vs. Nondeterministic testsNondeterminism does
not imply dependenceA program may execute non-deterministically, but its tests may deterministically succeed.Test dependence does not imply nondeterminismA program may have no sources of nondeterminism, but its tests can still be dependent on each other
38