of Binaries Yaniv David Nimrod Partush Eran Yahav The research leading to these results has received funding from the European Unions Seventh Framework Programme FP7 under grant agreement n ID: 625917
Download Presentation The PPT/PDF document "Statistical Similarity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Statistical Similarity of Binaries
Yaniv David, Nimrod Partush, Eran Yahav @
*The research leading to these results has received funding from the European Union's - Seventh Framework Programme (FP7) under grant agreement n° 615688 – ERC- COG-PRIME.Slide2
N
etwork time protocol (
ntpd)Motivation
2
RedHat’s
Linux distribution
Apple’s OSX
’s 5900x
(
switches
)
(table source: https://queue.acm.org/detail.cfm?id=2717320)
🔥
🔥?
🔥?
🔥?Slide3
Semantic Similarity Wish List
Given q (query) and set T (targets) rank targets based on similarity to q
Precise - avoid false positives Flexible – find similiarities across Different compiler versionsDifferent compiler vendorsDifferent versions of the same code
Work on stripped binaries
3Slide4
Challenge: Finding Similar Procedures
4
shr eax,
8lea r14d, [r12+13h]mov r13,
rbx
lea
rcx
,
[r13+3]
mov
[r13+1],
al
mov
[r13+2],
r12b
mov rdi,
rcx
mov
r9, 13h
mov r12, rbx
add rbp
,
3mov
rsi
,
rbp
lea
rdi
, [r12+3]mov [r12+2], bllea r13d, [rcx+r9]shr eax, 8
?
Heartbleed,
gcc
v.4.9
Heartbleed, clang v.3.5Slide5
Images courtesy of
Irani
et al.
c
a
b
5Slide6
Similarity by Composition - Irani et al. [2006]
image1 is similar to a image2 if you can compose image1 from the segments of image2
6
similar
less similar
Segments can be transformed
rotated, scaled, moved
Segments of (statistical) significance, give more evidence
black background should be much less accounted forSlide7
shr eax, 8
Similarity of Binaries: 3 Step Recipe7
1. Decomposition
2. Pairwise Semantic
Similarity
3. Statistical Similarity Evidence
mov r13, rbx
lea rcx, [r13+3]
Heartbleed, gcc v.4.9 -03
mov r12, rbx
lea rdi, [r12+3]
Heartbleed, clang v.3.5 -03
mov r12, rbx
lea rdi, [r12+3]
mov r13, rbx
lea rcx, [r13+3]
shr eax, 8
shr eax, 8
CORPUS
?
Heartbleed, gcc v.4.9 -03
shr eax, 8
lea r14d, [r12+13h]
mov r13, rbx
lea rcx, [r13+3]
mov [r13+1], al
mov [r13+2], r12b
Slide8
Heartbleed, gcc v.4.9 -03
shr eax, 8
lea r14d, [r12+13h]
mov r13, rbx
lea rcx, [r13+3]
mov [r13+1], al
mov [r13+2], r12b
shr eax, 8
Similarity of Binaries: 3 Step Recipe
8
1. Decomposition
2. Pairwise Semantic
Similarity
3. Statistical Similarity Evidence
mov r13, rbx
lea rcx, [r13+3]
Heartbleed, gcc v.4.9 -03
mov r12, rbx
lea rdi, [r12+3]
Heartbleed, clang v.3.5 -03
mov r12, rbx
lea rdi, [r12+3]
mov r13, rbx
lea rcx, [r13+3]
shr eax, 8
shr eax, 8
CORPUS
?
Slide9
9
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
We need to
decompose procedure
s
into comparable units
Step 1 - Procedure DecompositionSlide10
10
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide11
11
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide12
12
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
,
rcx
Step 1 - Procedure DecompositionSlide13
13
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide14
14
shr
eax, 8
lea r14d, [r12+13h]
mov
r13,
rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide15
15
shr
eax, 8
lea r14d, [r12+13h]
mov
r13
,
rbx
lea
rcx
,
[
r13+3
]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide16
16
shr
eax, 8
lea r14d, [r12+13h]
mov r13,
rbx
lea
rcx
,
[
r13+3
]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
Step 1 - Procedure DecompositionSlide17
shr
eax, 8lea
r14d, [r12+13h]mov
r13, rbx
lea
rcx
, [r13+3]
mov
[r13+1], al
mov
[r13+2], r12b
mov
rdi
, rcx
17
Step 1 - Procedure DecompositionSlide18
shr
eax, 8lea
r14d, [r12+13h]mov
r13, rbx
lea
rcx
, [r13+3]
mov
[r13+1]
, al
mov
[r13+2]
, r12b
mov
rdi
, rcx
18
Step 1 - Procedure DecompositionSlide19
mov
r13,
rbxlea
rcx, [r13+3]mov
rdi
,
rcx
Step 1 - Procedure Decomposition
19
Inputs:
rbx
Vars: rdi,rcx,r13Slide20
Step 1 - Procedure Decomposition
201:
shr eax, 8
2: lea r14d, [r12+13h]
3: mov
r13,
rbx
4:
lea
rcx
, [r13+3]
5:
mov
[r13+1], al
6:
mov
[r13+2], r12b
7:
mov
rdi, rcxSlide21
Step 1 - Procedure Decomposition
Aapplying program slicing on the basic-block level until all variables are covered
We call these basic-block slices Strands
21
1: shr
eax
, 8
2:
lea
r14d, [r12+13h]
3:
m
o
v
r
1
3
,
r
bx
4:
lea
rcx
, [r13+3]5:
mov
[r13+1], al
6:
mov
[r13+2], r12b
7:
mov
rdi, rcxSlide22
Heartbleed, gcc v.4.9 -03
shr eax, 8
lea r14d, [r12+13h]
mov r13, rbx
lea rcx, [r13+3]
mov [r13+1], al
mov [r13+2], r12b
shr eax, 8
Similarity of Binaries: 3 Step Recipe
22
1. Decomposition
2. Pairwise Semantic
Similarity
3. Statistical Similarity Evidence
mov r13, rbx
lea rcx, [r13+3]
Heartbleed, gcc v.4.9 -03
mov r12, rbx
lea rdi, [r12+3]
Heartbleed, clang v.3.5 -03
mov r12, rbx
lea rdi, [r12+3]
mov r13, rbx
lea rcx, [r13+3]
shr eax, 8
shr eax, 8
CORPUS
?
Slide23
S
trand 3
@Heartbleed, gcc v.4.9 -03
v1
:
=
rbx
r
13
:
= v1
v2
:
=
r
13
+ 3
v3
:
= int_to_ptr(v2)rcx
:= v3
v4
:
= rcx
r
di
:
= v
4
mov
r13,
rbx
lea
rcx, [r13+3]mov rdi,
rcx
Step 2 – Pairwise Semantic Similairity23
Strand 3
in Boogie representation
BAP + SmackSlide24
v1 :=
r12v2 := 13h + v1
v3 := int_to_ptr(v2)r14
:= v3v4 := 18h
rsi := v4
v5 := v4 + v3
rax
:= v5
Step 2 – Pairwise Semantic Similairity
24
Heartbleed, gcc v.4.9 -03
S
trand 6
v1 := 13h
r9
:= v1
v2 :=
rbx
v3 := v2 + v3
v4 :=
int_to_ptr
(v3)r13
:= v4v5 := v1 + 5
rsi := v5
v6 := v5 + v4rax
= v6
Heartbleed, clang v.3.5 -03
S
trand 11
?Slide25
v1 :=
r12v2 := 13h + v1
v3 := int_to_ptr(v2)r14
:= v3v4 := 18h
rsi := v4
v5 := v4 + v3
rax
:= v5
Step 2 – Pairwise Semantic Similairity
25
Heartbleed,
gcc
v.4.9
S
trand 6
v1 := 13h
r9
:= v1
v2 :=
rbx
v3 := v2 + v3
v4 :=
int_to_ptr(v3)r13
:= v4v5 := v1 + 5
rsi := v5
v6 := v5 + v4
rax
= v6
Heartbleed, clang v.3.5
S
trand 11
?Slide26
Step 2 – Pairwise Semantic Similairity
v1q
:= r12qv2q := 13h + v1
qv3q
:= int_to_ptr(v2
q
)
r14
q
:= v3
q
v4
q
:= 18h
rsi
q
:= v4
q
v5q := v4q + v3q
raxq := v5q
v1t
:= 13hr9t
:= v1t
v2t :=
rbxt
v3
t
:= v2
t
+ v3
t
v4
t := int_to_ptr(v3t)r13t := v4tv5t := v1t + 5rsit := v5tv6
t := v5
t + v4traxt := v6
26
Strand
Inputs:
rbxt
Variables: v1
t,
r9
t
,v2
t
,v3
t
, v4
t
,r13
t
,v5
t
,rsi
t
,v6
t
,rax
t
Strand
Inputs: r12
q
Variables: v1
q
,v2
q
,v3
q
,
r14
q
,v4
q
,rsi
q
,v5
q
,rax
qSlide27
Step 2 – Pairwise Semantic Similarity
27
assume: r12q == rbxt
;
;
a
ssert
:
…
a
ssert
:
…
a
ssert
:
…
…
Max number of equal variables Slide28
Step 2 – Pairwise Semantic Similairity
v1q
:= r12qv2q := 13h + v1
qv3q
:= int_to_ptr(v2
q
)
r14
q
:= v3
q
v4
q
:= 18h
rsi
q
:= v4
q
v5q := v4q + v3q
raxq := v5q
v1t
:= 13hr9t
:= v1t
v2t :=
rbxt
v3
t
:= v2
t
+ v3
t
v4
t := int_to_ptr(v3t)r13t := v4tv5t := v1t + 5rsit := v5tv6
t := v5
t + v4traxt := v6
28
assume r12q
== rbx
tassert v1
q==v2t , v2
q
==v3t , v3q==v4
t
, r14
q
==r13
t
v4
q
==v5
t
,
rsi
q
==
rsi
t
,v5
q
==v6
t
,
rax
q
==
rax
tSlide29
Step 2 - Quantify Semantic Similarity
=
MaxEqualVars
(
)
/
Variable Containment Proportion
A
n
asymmetric
relation
Using dataflow information and
optimizations make
this calculation
feasible
29Slide30
Step 2 – Pairwise Semantic Similairity
v1q
= r12qv2q = 13h + v1
qv3q
= int_to_ptr(v2
q
)
r14
q
= v3
q
v4
q
= 18h
rsi
q
= v4
q
v5q = v4q + v3q
raxq = v5q
v1t
= 13hr9t
= v1t
v2t =
rbxt
v3
t
= v2
t
+ v3
t
v4
t = int_to_ptr(v3t)r13t = v4tv5t = v1t + 5rsit = v5tv6
t = v5
t + v4traxt = v6
30
assume r12q
== rbx
tassert v1q
==v2t , v2
q==v3
t , v3q==v4
t
, r14
q
==r13
t
v4
q
==v5
t
,
rsi
q
==
rsi
t
,v5
q
==v6
t
,
rax
q
==
rax
t
VCP(
;
) = 8/8
Slide31
Step 2 – Pairwise Semantic Similairity
v1q
= r12qv2q = 13h + v1
qv3q
= int_to_ptr(v2
q
)
r14
q
= v3
q
v4
q
= 18h
rsi
q
= v4
q
v5q = v4q + v3q
raxq = v5q
v1t
= 13hr9t
= v1t
v2t =
rbxt
v3
t
= v2
t
+ v3
t
v4
t = int_to_ptr(v3t)r13t = v4tv5t = v1t + 5rsit = v5tv6
t = v5
t + v4traxt = v6
31
assume r12q
== rbx
tassert v1q
==v2t , v2
q==v3
t , v3q==v4
t
, r14
q
==r13
t
v4
q
==v5
t
,
rsi
q
==
rsi
t
,v5
q
==v6
t
,
rax
q
==
rax
t
VCP(
;
) = 8/8
VCP(
)=8/10
Slide32
Heartbleed, gcc v.4.9 -03
shr eax, 8
lea r14d, [r12+13h]
mov r13, rbx
lea rcx, [r13+3]
mov [r13+1], al
mov [r13+2], r12b
shr eax, 8
Similarity of Binaries: 3 Step Recipe
32
1. Decomposition
2. Pairwise Semantic
Similarity
3. Statistical Similarity Evidence
mov r13, rbx
lea rcx, [r13+3]
Heartbleed, gcc v.4.9 -03
mov r12, rbx
lea rdi, [r12+3]
Heartbleed, clang v.3.5 -03
mov r12, rbx
lea rdi, [r12+3]
mov r13, rbx
lea rcx, [r13+3]
shr eax, 8
shr eax, 8
CORPUS
?
Slide33
Step 3 – Statistical Evidence
We need to turn VCP into a
probability that
is input-output equivalent to
=
33
“Throw” bad results close to 0
“Throw” good results close to 1Slide34
Step 3 – Statistical Evidence
We need to know how significant
is
To do that we use all the comparison data available
34
…
…
…
=
…Slide35
Step 3 – Statistical Evidence
Define a Local Evidence Score to quantify the statistical significance of matching each strand
35
Slide36
Step 3 – Statistical Evidence
36
shr eax, 8
mov r13, rbx
lea rcx, [r13+3]
mov
r12, rbx
lea
rdi
, [r12+3]
shr eax, 8
Slide37
Step 3 - Global Similarity
Procedures are similar if one can be composed using non-trivial, significantly similar parts of the other37
Slide38
Heartbleed, gcc v.4.9 -03
shr eax, 8
lea r14d, [r12+13h]
mov r13, rbx
lea rcx, [r13+3]
mov [r13+1], al
mov [r13+2], r12b
shr eax, 8
Similarity of Binaries: Recap
38
1. Decomposition
2. Pairwise Semantic
Similarity
3. Statistical Similarity Evidence
mov r13, rbx
lea rcx, [r13+3]
Heartbleed, gcc v.4.9 -03
mov r12, rbx
lea rdi, [r12+3]
Heartbleed, clang v.3.5 -03
mov r12, rbx
lea rdi, [r12+3]
mov r13, rbx
lea rcx, [r13+3]
shr eax, 8
shr eax, 8
CORPUS
?
Slide39
Evaluation - Vulnerabilities
Corpus
Real-world code packagesopen-ssl, bash, qemu, wget, ws-snmp, ffmpeg, coreutils
Spanning across product versions e.g
. openssl-1.0.1{
e,f,g
}
Compiled with
clang 3.{4,5}
,
gcc 4.{6,8,9}
and
icc {14,15}
1500 procedures picked at random
Queries
Focused on vulnerabilities (for motivation’s sake)
39Slide40
Results -
Finding Heartbleed
40Query: OpenSSL 1.0.1f,compiled using clang 3.5,Heartbleed procedure
Full 1500 corpus
VSSlide41
Results -
Finding Heartbleed
41Slide42
Results - Vulnerabilities
Low FP rate
Crucial to the vulnerability search scenarioPrevious methods fail at cross-{version,compiler} scenario or produce too many FPs (see paper)
42
False positives
rate
False Positives
Vulnerability
0
0
Heartbleed
1
0.002
3
Shellshock
2
0
0
Venom
3
0.0126
19
Clobberin
' Time
4
0
0
Shellshock #2
5
0.0006
1
ws-snmp
6
0
0
wget
7
0
0
ffmpeg
8Slide43
Evaluation – All vs All
Verified with randomly picked procedures
For example – when ff_rev34_decode@ffmpeg-2.4.6 is selected43
clang
gcc
icc
1.0
1.0
1.0
clang
icc
gccSlide44
Evaluation – All vs All
Verified with randomly picked procedures
For example – when ff_rev34_decode@ffmpeg-2.4.6 is selected44
clang
gcc
icc
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
clang
icc
gccSlide45
Results – All vs All
45All v. All comparison
wget-1.8:
ftp_syst()
ff_rv34_decode_
init_thread_copy
()
compare_nodes
()
i_write
()
create_hard_link
()
cached_umask
()
print_stat
()
default_format
()
dev_ino_compare
()
parse_integer
()
Slide46
46
www.binsim.com(code+demo)Slide47
Summary
Clear motivationFinding vulnerable code, detecting clones, etc.Challenging scenarioFinding similarity cross-{compiler, version} in stripped binaries
Applied to real-world codeTake home:A semantic approach, yet feasibleAccuracy achieved with statistical framework
47