/
Statistical Similarity Statistical Similarity

Statistical Similarity - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
368 views
Uploaded On 2018-01-22

Statistical Similarity - PPT Presentation

of Binaries Yaniv David Nimrod Partush Eran Yahav The research leading to these results has received funding from the European Unions Seventh Framework Programme FP7 under grant agreement n ID: 625917

mov r13 r12 rcx r13 mov rcx r12 shr lea eax step rbx rdi heartbleed 13h similarity gcc r14d decomposition semantic procedure

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Statistical Similarity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Statistical Similarity of Binaries

Yaniv David, Nimrod Partush, Eran Yahav @

*The research leading to these results has received funding from the European Union's - Seventh Framework Programme (FP7) under grant agreement n° 615688 – ERC- COG-PRIME.Slide2

N

etwork time protocol (

ntpd)Motivation

2

RedHat’s

Linux distribution

Apple’s OSX

’s 5900x

(

switches

)

(table source: https://queue.acm.org/detail.cfm?id=2717320)

🔥

🔥?

🔥?

🔥?Slide3

Semantic Similarity Wish List

Given q (query) and set T (targets) rank targets based on similarity to q

Precise - avoid false positives Flexible – find similiarities across Different compiler versionsDifferent compiler vendorsDifferent versions of the same code

Work on stripped binaries

3Slide4

Challenge: Finding Similar Procedures

4

shr eax,

8lea r14d, [r12+13h]mov r13,

rbx

lea

rcx

,

[r13+3]

mov

[r13+1],

al

mov

[r13+2],

r12b

mov rdi,

rcx

mov

r9, 13h

mov r12, rbx

add rbp

,

3mov

rsi

,

rbp

lea

rdi

, [r12+3]mov [r12+2], bllea r13d, [rcx+r9]shr eax, 8

 

?

Heartbleed,

gcc

v.4.9

Heartbleed, clang v.3.5Slide5

Images courtesy of

Irani

et al.

c

a

b

5Slide6

Similarity by Composition - Irani et al. [2006]

image1 is similar to a image2 if you can compose image1 from the segments of image2

6

similar

less similar

Segments can be transformed

rotated, scaled, moved

Segments of (statistical) significance, give more evidence

black background should be much less accounted forSlide7

shr eax, 8

Similarity of Binaries: 3 Step Recipe7

1. Decomposition

2. Pairwise Semantic

Similarity

3. Statistical Similarity Evidence

mov r13, rbx

lea rcx, [r13+3]

Heartbleed, gcc v.4.9 -03

mov r12, rbx

lea rdi, [r12+3]

Heartbleed, clang v.3.5 -03

mov r12, rbx

lea rdi, [r12+3]

mov r13, rbx

lea rcx, [r13+3]

shr eax, 8

shr eax, 8

CORPUS

?

 

Heartbleed, gcc v.4.9 -03

shr eax, 8

lea r14d, [r12+13h]

mov r13, rbx

lea rcx, [r13+3]

mov [r13+1], al

mov [r13+2], r12b

Slide8

Heartbleed, gcc v.4.9 -03

shr eax, 8

lea r14d, [r12+13h]

mov r13, rbx

lea rcx, [r13+3]

mov [r13+1], al

mov [r13+2], r12b

shr eax, 8

Similarity of Binaries: 3 Step Recipe

8

1. Decomposition

2. Pairwise Semantic

Similarity

3. Statistical Similarity Evidence

mov r13, rbx

lea rcx, [r13+3]

Heartbleed, gcc v.4.9 -03

mov r12, rbx

lea rdi, [r12+3]

Heartbleed, clang v.3.5 -03

mov r12, rbx

lea rdi, [r12+3]

mov r13, rbx

lea rcx, [r13+3]

shr eax, 8

shr eax, 8

CORPUS

?

 

Slide9

9

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

We need to

decompose procedure

s

into comparable units

Step 1 - Procedure DecompositionSlide10

10

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide11

11

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide12

12

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

,

rcx

Step 1 - Procedure DecompositionSlide13

13

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide14

14

shr

eax, 8

lea r14d, [r12+13h]

mov

r13,

rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide15

15

shr

eax, 8

lea r14d, [r12+13h]

mov

r13

,

rbx

lea

rcx

,

[

r13+3

]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide16

16

shr

eax, 8

lea r14d, [r12+13h]

mov r13,

rbx

lea

rcx

,

[

r13+3

]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

Step 1 - Procedure DecompositionSlide17

shr

eax, 8lea

r14d, [r12+13h]mov

r13, rbx

lea

rcx

, [r13+3]

mov

[r13+1], al

mov

[r13+2], r12b

mov

rdi

, rcx

17

Step 1 - Procedure DecompositionSlide18

shr

eax, 8lea

r14d, [r12+13h]mov

r13, rbx

lea

rcx

, [r13+3]

mov

[r13+1]

, al

mov

[r13+2]

, r12b

mov

rdi

, rcx

18

Step 1 - Procedure DecompositionSlide19

mov

r13,

rbxlea

rcx, [r13+3]mov

rdi

,

rcx

Step 1 - Procedure Decomposition

19

Inputs:

rbx

Vars: rdi,rcx,r13Slide20

Step 1 - Procedure Decomposition

201:

shr eax, 8

2: lea r14d, [r12+13h]

3: mov

r13,

rbx

4:

lea

rcx

, [r13+3]

5:

mov

[r13+1], al

6:

mov

[r13+2], r12b

7:

mov

rdi, rcxSlide21

Step 1 - Procedure Decomposition

Aapplying program slicing on the basic-block level until all variables are covered

We call these basic-block slices Strands

21

1: shr

eax

, 8

2:

lea

r14d, [r12+13h]

3:

m

o

v

r

1

3

,

r

bx

4:

lea

rcx

, [r13+3]5:

mov

[r13+1], al

6:

mov

[r13+2], r12b

7:

mov

rdi, rcxSlide22

Heartbleed, gcc v.4.9 -03

shr eax, 8

lea r14d, [r12+13h]

mov r13, rbx

lea rcx, [r13+3]

mov [r13+1], al

mov [r13+2], r12b

shr eax, 8

Similarity of Binaries: 3 Step Recipe

22

1. Decomposition

2. Pairwise Semantic

Similarity

3. Statistical Similarity Evidence

mov r13, rbx

lea rcx, [r13+3]

Heartbleed, gcc v.4.9 -03

mov r12, rbx

lea rdi, [r12+3]

Heartbleed, clang v.3.5 -03

mov r12, rbx

lea rdi, [r12+3]

mov r13, rbx

lea rcx, [r13+3]

shr eax, 8

shr eax, 8

CORPUS

?

 

Slide23

S

trand 3

@Heartbleed, gcc v.4.9 -03

v1

:

=

rbx

r

13

:

= v1

v2

:

=

r

13

+ 3

v3

:

= int_to_ptr(v2)rcx

:= v3

v4

:

= rcx

r

di

:

= v

4

mov

r13,

rbx

lea

rcx, [r13+3]mov rdi,

rcx

Step 2 – Pairwise Semantic Similairity23

Strand 3

in Boogie representation

BAP + SmackSlide24

v1 :=

r12v2 := 13h + v1

v3 := int_to_ptr(v2)r14

:= v3v4 := 18h

rsi := v4

v5 := v4 + v3

rax

:= v5

Step 2 – Pairwise Semantic Similairity

24

Heartbleed, gcc v.4.9 -03

S

trand 6

v1 := 13h

r9

:= v1

v2 :=

rbx

v3 := v2 + v3

v4 :=

int_to_ptr

(v3)r13

:= v4v5 := v1 + 5

rsi := v5

v6 := v5 + v4rax

= v6

Heartbleed, clang v.3.5 -03

S

trand 11

 

?Slide25

v1 :=

r12v2 := 13h + v1

v3 := int_to_ptr(v2)r14

:= v3v4 := 18h

rsi := v4

v5 := v4 + v3

rax

:= v5

Step 2 – Pairwise Semantic Similairity

25

Heartbleed,

gcc

v.4.9

S

trand 6

v1 := 13h

r9

:= v1

v2 :=

rbx

v3 := v2 + v3

v4 :=

int_to_ptr(v3)r13

:= v4v5 := v1 + 5

rsi := v5

v6 := v5 + v4

rax

= v6

Heartbleed, clang v.3.5

S

trand 11

 

?Slide26

Step 2 – Pairwise Semantic Similairity

v1q

:= r12qv2q := 13h + v1

qv3q

:= int_to_ptr(v2

q

)

r14

q

:= v3

q

v4

q

:= 18h

rsi

q

:= v4

q

v5q := v4q + v3q

raxq := v5q

v1t

:= 13hr9t

:= v1t

v2t :=

rbxt

v3

t

:= v2

t

+ v3

t

v4

t := int_to_ptr(v3t)r13t := v4tv5t := v1t + 5rsit := v5tv6

t := v5

t + v4traxt := v6

26

Strand

Inputs:

rbxt

 

Variables: v1

t,

r9

t

,v2

t

,v3

t

, v4

t

,r13

t

,v5

t

,rsi

t

,v6

t

,rax

t

Strand

Inputs: r12

q

 

Variables: v1

q

,v2

q

,v3

q

,

r14

q

,v4

q

,rsi

q

,v5

q

,rax

qSlide27

Step 2 – Pairwise Semantic Similarity

27

assume: r12q == rbxt

;

;

 

a

ssert

:

 

a

ssert

:

 

a

ssert

:

 

Max number of equal variables Slide28

Step 2 – Pairwise Semantic Similairity

v1q

:= r12qv2q := 13h + v1

qv3q

:= int_to_ptr(v2

q

)

r14

q

:= v3

q

v4

q

:= 18h

rsi

q

:= v4

q

v5q := v4q + v3q

raxq := v5q

v1t

:= 13hr9t

:= v1t

v2t :=

rbxt

v3

t

:= v2

t

+ v3

t

v4

t := int_to_ptr(v3t)r13t := v4tv5t := v1t + 5rsit := v5tv6

t := v5

t + v4traxt := v6

28

assume r12q

== rbx

tassert v1

q==v2t , v2

q

==v3t , v3q==v4

t

, r14

q

==r13

t

v4

q

==v5

t

,

rsi

q

==

rsi

t

,v5

q

==v6

t

,

rax

q

==

rax

tSlide29

Step 2 - Quantify Semantic Similarity

=

MaxEqualVars

(

)

/

Variable Containment Proportion

A

n

asymmetric

relation

Using dataflow information and

optimizations make

this calculation

feasible

 

29Slide30

Step 2 – Pairwise Semantic Similairity

v1q

= r12qv2q = 13h + v1

qv3q

= int_to_ptr(v2

q

)

r14

q

= v3

q

v4

q

= 18h

rsi

q

= v4

q

v5q = v4q + v3q

raxq = v5q

v1t

= 13hr9t

= v1t

v2t =

rbxt

v3

t

= v2

t

+ v3

t

v4

t = int_to_ptr(v3t)r13t = v4tv5t = v1t + 5rsit = v5tv6

t = v5

t + v4traxt = v6

30

assume r12q

== rbx

tassert v1q

==v2t , v2

q==v3

t , v3q==v4

t

, r14

q

==r13

t

v4

q

==v5

t

,

rsi

q

==

rsi

t

,v5

q

==v6

t

,

rax

q

==

rax

t

VCP(

;

) = 8/8

 Slide31

Step 2 – Pairwise Semantic Similairity

v1q

= r12qv2q = 13h + v1

qv3q

= int_to_ptr(v2

q

)

r14

q

= v3

q

v4

q

= 18h

rsi

q

= v4

q

v5q = v4q + v3q

raxq = v5q

v1t

= 13hr9t

= v1t

v2t =

rbxt

v3

t

= v2

t

+ v3

t

v4

t = int_to_ptr(v3t)r13t = v4tv5t = v1t + 5rsit = v5tv6

t = v5

t + v4traxt = v6

31

assume r12q

== rbx

tassert v1q

==v2t , v2

q==v3

t , v3q==v4

t

, r14

q

==r13

t

v4

q

==v5

t

,

rsi

q

==

rsi

t

,v5

q

==v6

t

,

rax

q

==

rax

t

VCP(

;

) = 8/8

 

VCP(

)=8/10

 Slide32

Heartbleed, gcc v.4.9 -03

shr eax, 8

lea r14d, [r12+13h]

mov r13, rbx

lea rcx, [r13+3]

mov [r13+1], al

mov [r13+2], r12b

shr eax, 8

Similarity of Binaries: 3 Step Recipe

32

1. Decomposition

2. Pairwise Semantic

Similarity

3. Statistical Similarity Evidence

mov r13, rbx

lea rcx, [r13+3]

Heartbleed, gcc v.4.9 -03

mov r12, rbx

lea rdi, [r12+3]

Heartbleed, clang v.3.5 -03

mov r12, rbx

lea rdi, [r12+3]

mov r13, rbx

lea rcx, [r13+3]

shr eax, 8

shr eax, 8

CORPUS

?

 

Slide33

Step 3 – Statistical Evidence

We need to turn VCP into a

probability that

is input-output equivalent to

=

 

33

“Throw” bad results close to 0

“Throw” good results close to 1Slide34

Step 3 – Statistical Evidence

We need to know how significant

is

To do that we use all the comparison data available

 

34

 

 

=

 

…Slide35

Step 3 – Statistical Evidence

Define a Local Evidence Score to quantify the statistical significance of matching each strand

35

 Slide36

Step 3 – Statistical Evidence

36

 

shr eax, 8

mov r13, rbx

lea rcx, [r13+3]

 

mov

r12, rbx

lea

rdi

, [r12+3]

shr eax, 8

Slide37

Step 3 - Global Similarity

Procedures are similar if one can be composed using non-trivial, significantly similar parts of the other37

 Slide38

Heartbleed, gcc v.4.9 -03

shr eax, 8

lea r14d, [r12+13h]

mov r13, rbx

lea rcx, [r13+3]

mov [r13+1], al

mov [r13+2], r12b

shr eax, 8

Similarity of Binaries: Recap

38

1. Decomposition

2. Pairwise Semantic

Similarity

3. Statistical Similarity Evidence

mov r13, rbx

lea rcx, [r13+3]

Heartbleed, gcc v.4.9 -03

mov r12, rbx

lea rdi, [r12+3]

Heartbleed, clang v.3.5 -03

mov r12, rbx

lea rdi, [r12+3]

mov r13, rbx

lea rcx, [r13+3]

shr eax, 8

shr eax, 8

CORPUS

?

 

Slide39

Evaluation - Vulnerabilities

Corpus

Real-world code packagesopen-ssl, bash, qemu, wget, ws-snmp, ffmpeg, coreutils

Spanning across product versions e.g

. openssl-1.0.1{

e,f,g

}

Compiled with

clang 3.{4,5}

,

gcc 4.{6,8,9}

and

icc {14,15}

1500 procedures picked at random

Queries

Focused on vulnerabilities (for motivation’s sake)

39Slide40

Results -

Finding Heartbleed

40Query: OpenSSL 1.0.1f,compiled using clang 3.5,Heartbleed procedure

Full 1500 corpus

VSSlide41

Results -

Finding Heartbleed

41Slide42

Results - Vulnerabilities

Low FP rate

Crucial to the vulnerability search scenarioPrevious methods fail at cross-{version,compiler} scenario or produce too many FPs (see paper)

42

False positives

rate

False Positives

Vulnerability

0

0

Heartbleed

1

0.002

3

Shellshock

2

0

0

Venom

3

0.0126

19

Clobberin

' Time

4

0

0

Shellshock #2

5

0.0006

1

ws-snmp

6

0

0

wget

7

0

0

ffmpeg

8Slide43

Evaluation – All vs All

Verified with randomly picked procedures

For example – when ff_rev34_decode@ffmpeg-2.4.6 is selected43

clang

gcc

icc

1.0

1.0

1.0

clang

icc

gccSlide44

Evaluation – All vs All

Verified with randomly picked procedures

For example – when ff_rev34_decode@ffmpeg-2.4.6 is selected44

clang

gcc

icc

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

clang

icc

gccSlide45

Results – All vs All

45All v. All comparison

wget-1.8:

ftp_syst()

ff_rv34_decode_

init_thread_copy

()

compare_nodes

()

i_write

()

create_hard_link

()

cached_umask

()

print_stat

()

default_format

()

dev_ino_compare

()

parse_integer

()

Slide46

46

www.binsim.com(code+demo)Slide47

Summary

Clear motivationFinding vulnerable code, detecting clones, etc.Challenging scenarioFinding similarity cross-{compiler, version} in stripped binaries

Applied to real-world codeTake home:A semantic approach, yet feasibleAccuracy achieved with statistical framework

47