Fan Long and Martin Rinard MIT EECS amp CSAIL 1 Application Negative Inputs Positive Inputs Test Suite Inputs Correct Outputs Context Goal Automatic Patch Generation System ID: 542074
Download Presentation The PPT/PDF document "An Analysis of the Search Space of Gener..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
An Analysis of the Search Space of Generate and Validate Patch Generation Systems
Fan Long and Martin RinardMIT EECS & CSAIL
1Slide2
Application
=
Negative
Inputs
=
Positive
Inputs
≠
≠
Test Suite
Inputs
Correct
Outputs
Context
=
=
=
Goal
Automatic Patch Generation System
2Slide3
=
Negative
Inputs
=
Positive
Inputs
≠
≠
=
=
=
Generate and Validate Patching
Generate a search space of candidate patches
…
p-
>
f1
= y
;
…
3Slide4
=
Negative
Inputs
=
Positive
Inputs
≠
=
=
=
Generate and Validate Patching
Validate each candidate patch against the test suite
…
p-
>
f1
=
y
z
;
…
4Slide5
=
Negative
Inputs
=
Positive
Inputs
=
=
≠
Generate and Validate Patching
Validate each candidate patch against the test suite
…
p-
>
f1
f2
= y;
…
5Slide6
=
Negative
Inputs
=
Positive
Inputs
=
=
=
Generate and Validate Patching
Collect all of the patches that validate
…
if (p != 0) return;
p-
>
f1
= y
;
…
6Slide7
Are all validated patches correct?
No!
7Slide8
=
Negative
Inputs
=
Positive
Inputs
=
=
=
A Validated but Incorrect Patch
Because test suite is incomplete!
…
exit(0)
p-
>
f1
= y
;
…
8Slide9
How to make such a system to generate correct patches?
9Slide10
Correct patch
Original code
Possible Code Space
Other points in the plane correspond to candidate patches (modified code).
…
if (p != 0) return;
p-
>
f1
= y
;
…
10Slide11
Correct patch
Original code
Possible Code Space
Search space
1. Search space needs to contain correct patches
11Slide12
Correct patch
Original code
Possible Code Space
Search space
2
. The search algorithm can explore the search space and find correct patches
12Slide13
Correct patch
Original code
Possible Code Space
Search space
Validated but incorrect patch
Some points in the plane correspond to validated but incorrect patches!
…
exit(0)
p-
>
f1
= y
;
…
13Slide14
Validated but incorrect patch
Possible Code Space
Correct patch
Original code
Search space
3. Search space does not contain too many validated but incorrect patches
14Slide15
Research Questions
How many correct patches in the search space for a defect?How many candidate patches in the search space for a defect?How many validated but incorrect patches in the search space
for a defect?
15Slide16
An Inherent Search Space Tradeoff
Coverage: the space needs to contain enough useful patchesTractability: the space needs to be small enough to explore effectivelyFind correct patch in timeRank correct patch first among validated patches
16Slide17
An empirical quantitative analysis on the
search space of two patch generation systems SPR and Prophet17Slide18
SPR and Prophet Overview
SPR [FSE’15]:Work with a search space derived from a productive set of expression-level modificationsUse staged program repair and condition synthesis techniques to efficiently explore the space
Prophet
[POPL’16]
:
Work with the same SPR search space
Learn a patch correctness model from past successful human patches to guide the search to recognize the correct patches
18Slide19
SPR Search Space: Anatomy of a Modification
if (C) {…} else {…}
i
f (C
&& E
) {…} else {…}
Statement in Original Unpatched Program
Statement in Patched Program
Instantiate
E
to get a patch
19Slide20
SPR Search Space: Other Modifications
S
if (
E
)
{ S }
if (
E ) return c
; SS
S
Q[
replace v1 with v2]; S
S
S[replace v1 with v2]
if (C) {…} else {…}if (C || E ) {…} else {…}
Replace
Copy & Replace
Initialize
memset
(&e, 0,
sizeof
(e)); S
S
20Slide21
Prophet
Patch Correctness Model
0.2
0.05
0.02
0.01
…
…
Search Space
Rank Patches
with Probability Scores
21Slide22
Prophet: Key Ideas
Correct patches share
universal
features that hold across applications
These features capture
interactions
between the
patch
and the
surrounding code
Use program analysis to extract features
Obtain corpus of patches from open source software development
efforts
Learn model to prioritize correct patches
22Slide23
Setup for Model
S
if (
E
)
{ S }
Modification
Location
(statement in )
Program
Goal: estimate , given ,
Use the estimate to rank the patches
A patch is a modification applied to a location
( identifies a statement in program )
23Slide24
A patch is a modification applied to a location
Probability that modification applied at location in program given produces a correct patch
Geometric distribution that encodes error localization
Log linear distribution based on extracted features
lllll
l
Probabilistic Model
( identifies a statement in program )
24Slide25
Experimental Setup
25Slide26
Application
LoC
Tests
Defects
libtiff
77 K
78
8
lighttpd
62 K295
7php
1,046 K847131
gmp145 K
1462gzip491 K12
4python407 K
359wireshark2,814 K636fbc97 K7732Total69GenProg Benchmark set by Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest,
Westley Weimer [ICSE 2012]
Benchmark
26Slide27
Search Space Configurations
Consider different numbers of candidate statementsTop 100, 200, 300, or 2000 instead of just top 200 from the error localizerConsider more operators for synthesizing condition (
CExt
)
Consider more complicated expressions for replacement modifications (
RExt
)e.g., r
eplace v1 with (v2 + v3)27Slide28
Experimental Setup
Run SPR and Prophet with 16 different search spaces including the default one.There are 19 defects whose correct patches are in the default search space.
There are 24 defects whose correct patches are in some extended search spaces.
28Slide29
Experimental Setup
For each of the 24 defects and each search space configuration, we record:The number of candidate patches in 12 hoursThe number of v
alidated patches in 12 hours
The number of correct patches in 12 hours
The rank of the first correct patch among all validated patches
29Slide30
Finding 1: Validated patches are abundant
in the search space
30Slide31
1. SPR and Prophet generate 700-4000 validated patches if
CExt
is off
2. SPR and Prophet generate 5900-12000 validated patches if
CExt
is on
CExt
: Condition synthesis extension turned on
RExt
: Replacement extension turned on
100, 200, 300, or 2000: The number of candidate statements to consider
31Slide32
Identify Correct Patches
Manually analyze the root cause of each defectAnnotate modifications that produce correct patches for each caseRun a script to match validated patches against annotationsAll developer fixes in later revisions are correct
32Slide33
Finding 2: Correct patches are sparse in search space
33Slide34
1. SPR and Prophet generate 11 to 25 correct patches for all 24 defects.
2. Prophet generates 25 correct patches with default search space.
3. SPR and Prophet generate only 11 correct patches with largest search space 2000+CExt+RExt.
CExt
: Condition synthesis extension turned on
RExt
: Replacement extension turned on
100, 200, 300, or 2000: The number of candidate statements to consider
34Slide35
Validated but incorrect patches are much more abundant than correct patches!
35Slide36
Will Stronger Test Suite Reduce the Number of Validated Patches?
PHP has 10x more test cases than any other application in the benchmark set13 PHP defects 11 non-PHP defects
SPR and Prophet generate 5x-20x more validated patches on non-PHP defects
However, SPR and Prophet still generate hundreds of validated patches on PHP defects
36Slide37
Finding 3: Larger search space may produce correct patches for less defects
37Slide38
38Slide39
Explanation
More candidate patches:Unable to find correct patch within time-budgetRelatively much more validated patches than correct patches:
Find incorrect patch first
The second case occurs more often!
39Slide40
Conclusion
Search Space Tradeoff: Coverage v.s. tractability
Correct patches are sparse; validated patches are relatively abundant.
Challenge: How to distinguish correct patches among many validated patches?
How to move forward:
Use additional information other than test suite to recognize correct patches (Prophet)
More productive and focused search space
40