/
When Tests Collide: When Tests Collide:

When Tests Collide: - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
402 views
Uploaded On 2016-07-02

When Tests Collide: - PPT Presentation

Evaluating and Coping with the Impact of Test Dependence Wing Lam Sai Zhang Michael D Ernst University of Washington 2 Executing them in a different order Order dependent Dependent ID: 386739

tests test dependent dependence test tests dependence dependent prioritization techniques generated cpus selection human number written algorithm automatically coverage

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "When Tests Collide:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

When Tests Collide: Evaluating and Coping with the Impact of Test Dependence

Wing Lam, Sai Zhang, Michael D. ErnstUniversity of WashingtonSlide2

2

Executing them in a

different

order:

Order dependent

Dependent

t

est

Two tests:

createFile

(“foo”)

...

readFile

(“foo”)

...

(the

intended

test results)

Executing them in

default

order:Slide3

Why should we care about test dependence? Makes test behaviors

inconsistentAffects downstream testing techniques

3

CPU

2

CPU 1

Test parallelization

Test

prioritization

Test

selectionSlide4

Test independence is

assumed by:

Test selectionTest prioritizationTest parallel executionTest factoringTest generation…

Conventional wisdom:

test dependence is not a significant issue

4

31 papers

in

ICSE, FSE, ISSTA, ASE,

ICST, TSE, and TOSEM(2000 – 2013)Slide5

Test independence is

assumed by:Test selection

Test prioritizationTest parallel executionTest factoringTest generation…

Conventional wisdom:

test dependence is not a significant issue

31 papers

in

ICSE, FSE, ISSTA, ASE,

ICST, TSE, and TOSEM(2000 – 2013)

5

27

3

1

Assume test independencewithout justification

As a threat to validity

Consider test dependenceSlide6

Recent workIllinois MIR work on flaky tests and test dependences [

Luo FSE’14, Gyori ISSTA’15, Gligoric ISSTA’15]Tests revealing inconsistent resultsDependent test is a special type of flaky test.

UW PLSE work on empirically revisiting the test independence assumption [Zhang et al ISSTA’14]

Dependent test assumption should not be ignored

6Slide7

Is the test independence assumption valid?Does test dependence arise in practice?

What repercussions does test dependence have?How can we nullify the impact of test dependence?7

Affecting downstream testing techniques

General algorithm adds/reorders tests for techniques such as prioritization, etc.

No!

Yes

, in both human-written and

automatically-generated

suitesSlide8

Is the test independence assumption valid?

Does test dependence arise in practice?

What repercussions does test dependence have?How can we nullify the impact of test dependence?

8

Affecting downstream testing techniques

General algorithm adds/reorders tests for techniques such as prioritization, etc.

Yes

, in both human-written and

automatically-generated

suitesSlide9

Methodology9

Reported dependent tests

5 issue tracking systems

New dependent tests

5

real-world projectsSlide10

Methodology10

Reported dependent tests

5 issue tracking systems

Search for

4

key phrases:

(“

dependent test”, “test dependence”,

test execution order”, “different test

outcome”)

Manually inspect

450

matched bug reports

Identify

96

distinct dependent tests

Characteristics

:Root cause

Developers’ actionSlide11

Root cause11

96

dependent testsSlide12

Root cause12

5

9

23

10

4

s

tatic variable

file system

database

Unknown

at least

61%

are due to

side-effecting

access

to

static

variables

.Slide13

Developers’ action13

98% of the reported tests are marked as major or minor

issues91% of the dependence has been fixedImproving documentation

Fixing test code or source codeSlide14

Methodology14

Human-written test suites 6413 tests

Automatically-generated test suites

use

Randoop

[Pacheco’07]11002 tests

Selected these subjects from previous project [

Zhang et al. ISSTA’

14] that identified dependent tests in them

37 (0.6%)

dependent tests608 (5.5%) dependent tests

New dependent tests

5

real-world projectsSlide15

Is the test independence assumption valid?

Does test dependence arise in practice?

What repercussions does test dependence have?How can we nullify the impact of test dependence?

15

Affecting downstream testing techniques

General algorithm adds/reorders tests for techniques such as prioritization, etc.

Yes

, in both human-written and

automatically-generated

suitesSlide16

Test prioritization16

A test execution order

A new test execution order

Achieve coverage faster

Improve fault detection rate

Each test should yield the

same

result.Slide17

Four test prioritization techniques [Elbaum et al. ISSTA 2000

]17

Test prioritization techniquePrioritize on coverage of statements

Prioritize on coverage

of statements not yet covered

Prioritize on coverage of methods

Prioritize on coverage of methods not yet covered

Record the number of dependent tests yielding

different results

Total:

37 human-written and 608 automatically-generated dependent tests

5

real-world projectsSlide18

Evaluating test prioritization techniques18

Test prioritization technique

Number of tests that yield different resultsPrioritize on coverage of statements

5 (13.5%)

Prioritize on coverage

of statements not yet covered

9

(24.3%)

Prioritize on coverage of methods

7 (18.9%)

Prioritize on coverage of methods not yet covered

6 (16.2%)

Implication:On average, 18%

chance test dependence would affect test prioritization on human-written testsOut of 37 human-written dependent testsSlide19

Evaluating test prioritization techniques19

Test prioritization technique

Number of tests that yield different resultsPrioritize on coverage of statements

372 (61.2%)

Prioritize on coverage

of statements not yet covered

331

(54.4%)

Prioritize on coverage of methods

381 (62.3%)

Prioritize on coverage of methods not yet covered357

(58.7%)

Implication:On average, 59%

chance test dependence would affect test prioritization on automatically-generated testsOut of

608 automatically-generated testsSlide20

Test selection20

A test execution order

A subset of the test

execution order

Runs faster

Each test should yield the

same

result.Slide21

Six test selection techniques [Harrold et al. OOPSLA 2001

]21

Selection granularityOrdered by

Statement

Test id (no re-ordering)

Statement

Number of elements

tests cover

Statement

Number of uncovered elements tests cover

Function

Test id (no re-ordering)Function

Number of elements tests cover

Function

Number of uncovered elements tests cover

Record the number of dependent tests yielding different results

Total: 37 human-written and

608 automatically-generated dependent tests

5

real-world projectsSlide22

Evaluating test selection techniques22

Implication:On average, 3.2% chance test dependence would affect test

selection on human-written testsOut of 37

human-written dependent tests

Selection granularity

Ordered by

Number of tests

that yield different results

Statement

Test id (no re-ordering)

1 (2.7%)

StatementNumber of elements

tests cover1

(2.7%)Statement

Number of uncovered elements tests cover

1 (2.7%)

Function

Test id (no re-ordering)

1 (2.7%)Function

Number of elements tests cover

1 (2.7%)

Function

Number of uncovered elements tests cover2 (5.4%)Slide23

Evaluating test selection techniques23

Implication:On average, 32% chance test dependence would affect test selection on

automatically-generated testsOut of 608

automatically-generated dependent tests

Selection granularity

Ordered by

Number of tests

that yield different results

Statement

Test id (no re-ordering)

95 (15.6%)Statement

Number of elements tests cover

109 (17.9%)

StatementNumber of uncovered elements tests cover

109 (17.9%)

Function

Test id (no re-ordering)

266 (44.0%)

FunctionNumber of elements

tests cover294 (48.4%)

Function

Number of uncovered elements tests cover

297 (48.8%)Slide24

Test parallelization24

A test execution order

Reduce test latency

Each test should yield the

same

result.

Schedules the test execution order across multiple CPUs

CPU 1

CPU

2Slide25

Two test parallelization techniques [1]

25Record the number of dependent tests yielding different

resultsTotal: 37 human-written and

608

automatically-generated dependent tests

5

real-world projects

[

1

]

Executing

unit tests in parallel on a multi-CPU/core machine

in

Visual Studio. http://

blogs.msdn.com

/b/

vstsqualitytools

/archive/2009/12/01/executing-unit-tests-in-parallel- on-a-multi-

cpu

-core-

machine.aspx

.

Test parallelization technique

Parallelize on test idParallelize

on test execution timeSlide26

Evaluating test parallelization techniques26

Implication:On average, when the #CPUs = 2, 27% chance test dependence would affect test

parallelization for human-written tests. When the #CPUs = 16, 36% chance

Out of

37

human-written dependent tests

Parallelize on test id

Parallelize on test

execution time

#

CPUs = 2#CPUs = 16

#CPUs = 2

#CPUs = 16

2 (5.4%)

13 (35.1%)

14 (37.8%)14 (37.8%)

#CPUs = 4 and 8 were evaluated but omitted for space reasonsSlide27

Evaluating test parallelization techniques27

Implication:On average, when the #CPUs = 2, 46% chance test dependence would affect test parallelization for automatically-generated tests. When the #CPUs = 16,

64% chance

Parallelize

on test id

Parallelize on test

execution time

#

CPUs = 2

#CPUs = 16

#CPUs = 2

#CPUs = 16

194 (31.9%)

349 (57.4%)360

(59.2%)433 (71.2%)

Out of 608 automatically-generated dependent tests

#CPUs = 4 and 8 were evaluated but omitted for space reasonsSlide28

Impact of test

d

ependence

28

Technique

Test suite type

Chance

of impact by test dependence

Prioritization

Human-written

Low

Prioritization

Automatically-generated

High

Selection

Human-written

Low

Selection

Automatically-generated

Moderate

Parallelization

Human-written

Moderate

Parallelization

Automatically-generated

High

Chance

Average % of dependent

test exposed

Low

0-25%

Moderate

25-50%

High

+50%

Dependent tests

does affect

downstream testing technique especially for

automaticatlly

-generated

test suites!Slide29

Is the test independence assumption valid?

Does test dependence arise in practice?

What repercussions does test dependence have?How can we nullify the impact of test dependence?

29

Affecting downstream testing techniques

General algorithm adds/reorders tests for techniques such as prioritization, etc.

Yes

, in both human-written and

automatically-generated

suitesSlide30

General algorithm to nullify test dependence30

A test suite:

-Product of test prioritization, selection, parallelization

Known test dependences

:

-Can be generated through approximate algorithms

[

Zhang et al. ISSTA’14

]

or empty

-

Reuseable

for

different testing techniques and when developers change their code

A test suite

Reordered/Amended test suite

Known test dependences

Known test dependencesSlide31

Prioritization algorithm to nullify test dependence31

Measured average

area under the curve (APFD)

for percentage of faults detected over life of the test suite

APFD of original

prioritization algorithms was

89.1%

.

This dependence-aware algorithm was

88.1%

— a negligible difference.

A test suite

Prioritization

Prioritized test suite

Reordered test suite

Known test dependences

Known test dependences

General algorithmSlide32

Selection algorithm to nullify test dependence32

Measured number of tests selected

Number of tests selected on average of original selection algorithms was

41.6%

.

This dependence-aware algorithm selected

42.2%

— a negligible difference.

A test suite

Selection

Selected test suite

Reordered/Amended test suite

Known test dependences

Known test dependences

General algorithmSlide33

Parallelization algorithm to nullify test dependence33

Measured time taken by slowest machine and its average speedup compared to

unparallelized

suites

Average speedup of original parallelization algorithms

was

41%

.

This dependence-aware algorithm’s speedup was

55%

.

A test suite

Parallelization

Subsequences of

test suite

Reordered/Amended test suite

Known test dependences

Known test dependences

General algorithmSlide34

Future workFor test selection, measure the time it takes for our dependence-aware test suites to run compared to the dependence-unaware test suites

34Evaluate our effectiveness at incrementally

recomputing test dependences when developers make code changesSlide35

Evaluating and coping with impact of test dependenceTest dependence arises in practiceTest dependence does

affect downstream testing techniquesOur general algorithm is effective in practice to nullify impact of test dependenceOur tools, experiments, etc.

https://github.com/winglam

/dependent-tests-impact/

Contributions

35Slide36

[Backup slides]36Slide37

Why more dependent tests in automatically-generated test suites?Manual

test suites:Developer’s understanding of the code and their testing goals help build well-structured testsDevelopers often try to initialize and destroy the shared objects each unit test may useAuto test suites:Most tools are not “state-aware”The generated tests often “

misuse” APIs, e.g., setting up the environment incorrectlyMost tools can not generate environment setup / destroy code37Slide38

Dependent tests vs. Nondeterministic testsNondeterminism does

not imply dependenceA program may execute non-deterministically, but its tests may deterministically succeed.Test dependence does not imply nondeterminismA program may have no sources of nondeterminism, but its tests can still be dependent on each other

38