Making Flow and ContextSensitive Pointer Analysis Scalable for Millions of Lines of Code Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo Institute of Computing Technology Chinese Academy of Sciences ID: 318965
Download Presentation The PPT/PDF document "Level by Level:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code
Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo Institute of Computing Technology, Chinese Academy of Sciences { htyu, zqzhang, fxb, huowei }@ict.ac.cn
1
Jingling
Xue
University of New South Wales
jingling@cse.unsw.edu.au
Slide2
OutlineIntroduction
FrameworkAnalyzing a LevelExperimentsConclusion
2Slide3
Introduction
MotivationWho needs flow- and context-sensitive (FSCS) pointer analysis ?Software checking toolsProgram understandingParallelization tools Hardware synthesis
Existed methods cannot scale to large real programsAiming at millions of lines of C code
3Slide4
Improve scalabilityFor flow-sensitivity
Decreasing iterations in dataflow analysisSaving space of points-to graphFor context-sensitivitySummary-basedLow storage penaltyLow apply penalty4Slide5
Idea
Level by Level analysisAnalyze the pointers in decreasing order of their points-to levelsSuppose int **q, *p, x;q
has a level 2, p has a level 1 and x
has a level 0.
Fast flow-sensitive analysis on full sparse SSA
Fast and accurate context-sensitive analysis using a full transfer function
5Slide6
Contribution
performs a full-sparse flow-sensitive pointer analysis using a flow-insensitive algorithmperforms a context-sensitive pointer analysis efficiently with precise full transfer functionyields a flow-
and context-sensitive interproce-dural may/must mod/ref on a compact SSA formanalyzes million lines of code in minutes,
fast-
er
than the state-of-the art FSCS pointer
ana-lysis
algorithms
6Slide7
Framework
Figure 1. Level-by-level pointer analysis (LevPA).
Evalute transfer functions
Bottom-up
Top-down
Propagate points-to set
Compute points-to level
for points-to level from the highest to lowest
incremental build call graph
7Slide8
Points-to levelProperty 1.
If a variable x is possibly pointed to by a pointer y, then ptl(
x) ≤ ptl(y
).
Property 2
.
If a variable
y
is possibly assigned to
x
, then
ptl
(
x
) =
ptl
(
y
).
Compute points-to level by a Unification-based pointer analysis
8Slide9
Example
int o, t; main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b;
L4:
foo
(x, y);
L5: *b = 5;
L6:
if
( … ) { x = &c; y = &e; }
L7:
else
{ x = &d; y = &d; }
L8: c = &t;
L9:
foo
( x, y);
L10: *e = 10; }
void
foo
(
int
**p,
int
**q) {
L11: *p = *q;
L12: *q = &
obj
;
}
9
ptl(x, y, p, q
) =2ptl(
a,
b, c, d, e) =1 ptl
(t, o) =
0
analyze
first {
x
,
y
,
p
,
q
}
then {
a
,
b
,
c
,
d
,
e
}
last {
t, o
}Slide10
Bottom-up analyze level 2
void
foo( int **p, int **q) { L11: *p = *q;
L12: *q = &obj; }
main() {
L1:
int
**x, **y;
L2:
int
*a, *b, *c, *d, *e;
L3: x = &a; y = &b;
L4:
foo
(x, y);
L5: *b = 5;
L6:
if
( … ) { x = &c; y = &e; }
L7:
else
{ x = &d; y = &d; }
L8: c = &t;
L9:
foo
( x, y);
L10: *e = 10; }
10Slide11
Bottom-up analyze level 2
void
foo( int **p, int **q) { L11: *p
1
= *q
1
;
L12: *q
1
= &obj; }
main() {
L1:
int
**x, **y;
L2:
int
*a, *b, *c, *d, *e;
L3: x = &a; y = &b;
L4: foo(x, y);
L5: *b = 5;
L6:
if
( … ) { x = &c; y = &e; }
L7:
else
{ x = &d; y = &d; }
L8: c = &t;
L9: foo( x, y);
L10: *e = 10; }
11 p1’s points-to depend on formal-in p
q1’s points-to depend on formal-in qSlide12
Bottom-up analyze level 2
void
foo( int **p, int **q) { L11: *p
1
= *q
1
;
L12: *q
1
= &obj; }
main() {
L1:
int
**x, **y;
L2:
int
*a, *b, *c, *d, *e;
L3: x
1
= &a; y
1
= &b;
L4: foo(x
1
, y
1
);
L5: *b = 5;
L6: if ( … ) { x
2 = &c; y2 = &e; } L7: else { x
3 = &d; y3 = &d; } x
4=ϕ (x
2, x3); y
4=ϕ (y
2, y3) L8: c = &t;
L9: foo( x
4
, y
4
);
L10: *e = 10; }
12
p
1
’s
points-to depend on formal-in p
q
1
’s
points-to depend on formal-in q
x
1
→
{ a }
y
1
→
{ b }
x
2
→
{ c }
y
2
→
{ e }
x
3
→
{ d }
y
3
→
{ d }
x
4
→
{ c, d }
y
4
→
{ e, d }Slide13
Full-sparse AnalysisAchieve flow-sensitivity flow-insensitively Regard each SSA name as a unique variable
Set constraint-based pointer analysisFull sparseSaving timeSaving space13Slide14
Top-down analyze level 2
L4: foo.p
→ { a } foo.q
→
{ b }
L9:
foo.p
→
{ c, d }
foo.q
→
{ d, e }
foo.p
→
{ a, c, d }
foo.q
→
{ b, d, e }
main
:
Propagate to
callsite
14
void
foo
(
int
**p,
int **q) { L11: *p = *q; L12: *q = &
obj; }
main() {
L1:
int
**x, **y;
L2:
int
*a, *b, *c, *d, *e;
L3: x = &a; y = &b;
L4:
foo
(x, y);
L5: *b = 5;
L6:
if
( … ) { x = &c; y = &e; }
L7:
else
{ x = &d; y = &d; }
L8: c = &t;
L9:
foo
( x, y);
L10: *e = 10; }Slide15
Top-down analyze level 2
void foo
( int **p, int
**q) {
μ(b, d, e)
L11: *
p
1
= *
q
1
;
χ(a, c, d)
L12: *
q
1
= &
obj
;
χ(b, d, e)
}
foo
: E
xpand pointer dereferences
15
Merging calling contexts here
void foo
( int **p, int
**q) { L11: *p = *q; L12: *q = &
obj; }
main() {
L1:
int
**x, **y;
L2:
int
*a, *b, *c, *d, *e;
L3: x = &a; y = &b;
L4:
foo
(x, y);
L5: *b = 5;
L6:
if
( … ) { x = &c; y = &e; }
L7:
else
{ x = &d; y = &d; }
L8: c = &t;
L9:
foo
( x, y);
L10: *e = 10; }Slide16
Context Condition
To be context-sensitivePoints-to relation cip ⟹ v (p→v
) , p must (may) point to v, p is a formal parameter.
Context Condition ℂ
(c
1
,…,c
k
)
a Boolean function consists of
higher-level
points-to relations
Context-sensitive μ and χ
μ(
v
i
,
ℂ(c
1
,…,c
k
)
)
v
i+1
=χ(v
i, M, ℂ(c1,…,c
k)) M ∈ {may, must}, indicates weak/strong update
16Slide17
Context-sensitive μ and χ
void foo( int **p, int **q) { μ(b, q⟹b) μ(d, q
→d) μ(e, q
→
e
)
L11: *p
1
= *q
1
;
a=χ(a , must,
p⟹a
)
c=χ(c , may,
p→c
)
d=χ(d , may,
p→d
)
L12: *q1 = &
obj
;
b=χ(b , must,
q⟹b
) d=χ(d , may, q→d)
e=χ(e , may, q→e)}
17Slide18
Bottom-up analyze level 1
void foo( int **p, int **q) { μ(b1, q⟹b)
μ(d1, q→
d
)
μ(e
1
,
q
→
e
)
L11: *p
1
= *q
1
;
a
2
=χ
(a
1
, must
,
p⟹a)
c2=χ(c
1 , may, p→
c) d2
=χ(d1 , may, p→
d)L12: *q1 = &obj;
b2=χ(b1 , must, q⟹b) d3
=χ(d2 , may, q→d)
e2=χ(e1 , may, q
→e)}
18Slide19
Points-to SetLocal Points-to Set
Loc (p) = { <v, ℂ(c1,…,ck)> | ℂ(c1,…,c
k) is a context condition}. p can point to v
if and only if
ℂ(c
1
,…,c
k
)
holds.
is computed explicitly during the bottom-up analysis.
Dependence Set
Dep(p) =
{ <
q, ℂ(c
1
,…,c
k
)>
|
q
is a formal-in parameter of level
lev
and
ℂ(c
1
,…,ck) is a context condition
Ptr(p) includes Ptr(q) if and only if ℂ(c
1,…,ck) holds.
19Slide20
Transfer functionTrans(proc, v
)< Loc(v), Dep(v), ℂ(c1,…,ck), M >
v is a formal-out parameterℂ(c
1
,…,c
k
)
is a context condition.
V
can be modified at a
callsite
invoking
proc
only if
ℂ(
c
1
,…,c
k
)
holds at the
callsite
M ∈ {may, must
}
,
indicates may/must mod effect
Trans(proc)
a set of all individual transfer functions Trans(proc, v).
20Slide21
Bottom-up analyze level 1
void foo( int **p, int **q) { μ(b1, q⟹b) μ(d1, q→d)
μ(e1, q→e)
L11: *p
1
= *q
1
;
a
2
=χ(a
1
, must
,
p
⟹
a
)
c
2
=χ(c
1
, may
,
p
→c) d2=χ(d
1 , may, p→d)
L12: *q1 = &obj; b
2=χ(b1 , must, q⟹
b) d3=χ(d2
, may, q→d) e2
=χ(e1 , may, q→e)}Trans(
foo, a) = < { }, { <b, q⟹
b> , < d, q→d>, <
e, q→e>} ,
p⟹a, must >
21
Trans(
foo
, c)
= < { }, { <
b
,
q
⟹
b
> , <
d
,
q
→
d
>, <
e
,
q
→
e
>}
,
p
→
c
,
may
>
Trans(
foo
, b)
= < {<
obj
,
q
⟹
b
> }, { } ,
q⟹b
,
must
>
Trans(
foo
, e)
= < {<
obj
,
q
→
e
> }, { } ,
q
→
e
,
may
>
Trans(
foo
, d)
= < {<
obj
,
q
→
d
> }, { <
b
,
p
→
d
∧
q
⟹
b
> , <
d
,
p
→
d
>, <
e
,
p
→
d
∧
q
→
e
> } ,
p
→
d
∨
q
→
d
,
may
> Slide22
Bottom-up analyze level 1
int obj, t; main() { L1: int
**x, **y; L2: int
*a, *b, *c, *d, *e;
L3:
x
1
= &a;
y
1
= &b;
μ(
b
1
,
true
)
L4:
foo
(
x
1
,
y
1
);
a
2=χ(a
1 , must, true
) b2=χ(b1 ,
must, true)
L5: *
b1 = 5;
L6: if ( … ) { x
2 = &c; y2
= &e; }
L7:
else
{
x
3
= &d;
y
3
= &d; }
x
4
=
ϕ
(
x
2
,
x
3
)
y
4
=
ϕ
(
y
2
,
y
3
)
L8:
c
1
= &t;
μ(
d
1
,
true
)
μ(
e
1
,
true
)
L9:
foo
(
x
4
,
y
4
);
c
2
=χ(c
1
,
may ,
true
)
d
2
=χ(d
1
,
may , true) e2=χ(e1, may , true) L10: *e1= 10; }
at L4, p ⟹ a holds, q ⟹ b holds
at L9, p → c, p → d holds,q → e, q → d holds,
22Slide23
BDD and context conditionContext conditions are implemented using BDD
Compactly represented Boolean operations efficiently23
x1
x2
x3
0
1
0
1
0
1
1
0
variable
x1
represents
p
→
a
variable
x2
represents
q
→
a
variable
x3
represents
p
→
b
BDD for
ℂ = (
p → a ∧ q → a) ∨ p → b
if only p → b holds at a call site, we can write
ℂ
|
x1=0;x2=0;x3=1
to see whether C holds at the call site.Slide24
ExperimentAnalyzes million lines of code in minutes
Faster than the state-of-the art FSCS pointer analysis algorithms.Table 2. Performance (secs).
24
Benchmark
KLOC
LevPA
Bootstrapping(PLDI’08)
64bit
32bit
32bit
Icecast-2.3.1
22
2.18
5.73
29
sendmail
115
72.63
143.68
939
httpd
128
16.32
35.42
161
445.gombk
197
21.37
40.78
/
wine-0.9.24
1905
502.29
891.16
/
wireshark
-1.2.2
2383
366.63
845.23
/Slide25
ConclusionWe present a scalable method for flow- and
context-sensitive pointer analysisAnalyzes the pointers in a program level by level in terms of their points-to levels. Fast flow-sensitive analysis on full sparse SSA form Fast and accurate context-sensitive analysis using full transfer functions represented by BDD.
Can analyze million lines of C code in minutes, faster than the state-of-the-art methods.
25Slide26
Thanks
26