/
Static Analysis Static Analysis

Static Analysis - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
420 views
Uploaded On 2016-10-17

Static Analysis - PPT Presentation

Towards Automatic Signature Generation of Vulnerabilitybased Signature Dynamic Analysis Unleashing Mayhem on Binary Code Automatic exploit detection attack and defense Static and Dynamic Analysis ID: 477205

vulnerability input arg symbolic input vulnerability symbolic arg char exploit int execution fmt signature args count buf program memory

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Static Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Static AnalysisTowards Automatic Signature Generation of Vulnerability-based SignatureDynamic AnalysisUnleashing Mayhem on Binary CodeAutomatic exploit detection: attack and defense

Static and Dynamic AnalysisSlide2

Towards Automatic Signature Generation of Vulnerability-based SignatureDefense: Static AnalysisSlide3

DefinitionVulnerability - A vulnerability is a type of bug that can be used by an attacker to alter the intended operation of the software in a malicious way.Exploit - An exploit is an actual input that triggers a software vulnerability, typically with malicious intent and devastating consequences

BackgroundSlide4

Zero-day attacks that exploit unknown vulnerabilities represent a serious threatNo patch or signature availableSymantec:20 unknown vulnerabilities exploited 07/2005 – 06/2007Current practice is new vulnerability analysis and protection generation is mostly manual

Our goal: automate the process of protection generation for unknown vulnerabilities

MotivationSlide5

Software Patch: patch the binary of vulnerable applicationInput Filter: a network firewall or a module on the I/O path

Data Patch: patch the data input instead of binary

Signature: signature-based input filtering

How to protect a Vulnerability Application?

Data

Input

Input Filter

Vulnerable application

DroppedSlide6

Automatic signature generationReason:Manual signature generation is slow and errorFast generation is important – previously unknown or unpatched vulnerabilities can be exploited orders of magnitude faster than a human can respondMore accurate

Our GoalSlide7

There are usually several different polymorphic exploit variants that can trigger a software vulnerabilityExploit variants may differ syntactically but be semantically equivalentTo be effective -- the signature should be constructed based on the property of the vulnerability, instead of an exploitChallengesSlide8

Require manual stepsEmploy heuristics which may fail in many settingsTechniques rely on specific properties of an exploit – return addressesOnly work for specific vulnerabilities in specific circumstancesLimitations of previous approachesSlide9

At a high level, our main contribution is a new class of signature, that is not specific to details such as whether an exploit successfully hijacks control of the program, but instead whether executing an input will (potentially) result in an unsafe execution state.Our approachSlide10

vulnerability signaturewhether executing an input potentially results in an unsafe program stateT(P, x) the execution trace obtained by executing a program P on input xVulnerability condition

representation (how to express a vulnerability as a signature)

coverage (measured by false positive rate)

OverviewSlide11

vulnerability signaturerepresentation for set of inputs that define a specified vulnerability conditiontrade-offsrepresentation: matching accuracy vs. efficiencysignature creation: creation time vs. coverage

Tuple {

P,T,x,c

}

binary program (P), instruction trace

(T), exploit string (x), vulnerability condition (c)

Vulnerability SignatureSlide12

(P,c) = (< i1, . . . , ik >,c)T(

P,x

) is the execution trace of running P with input x means

T satisfies vulnerability condition c

L

P,c

consists of the set of all inputs x to a program P

such that

Formally: An exploit for a vulnerability (

P,c) is an input

Vulnerability Signature NotationSlide13

P given in boxx = g/AAAAT={1,2,3,4,6,7, 8,9,8,10,11,10, 11,10,11,10,

11,10,11}

c = heap

overflow

(on 5th iteration of line 11)

ExampleSlide14

A vulnerability signature is a matching function MATCH which for an input x returns either EXPLOIT or BENIGN for a program P without running the programA perfect vulnerability signature satisfies

Completeness:

Soundness:

Vulnerability Signature DefinitionSlide15

C: Ґ×D×M×K×I ->{BENIGN, EXPLOIT}Ґ is a memoryD is the set of variables definedM is the program’s map from memory to valuesK is the continuation stack

I is the next instruction to execute

Vulnerability ConditionSlide16

Turing machine signaturesprecise (no false positive or negatives)may not terminate (in presence of loops, e.g.)symbolic constraint signaturesapproximates looping, aliasingguaranteed to terminateregular expression signatures

approximates elementary constructs (counting)

very efficient

Signature Representation ClassesSlide17

Can provide a precise, even exact, characterization of the vulnerability condition in a particular programA TM that exactly emulates the program has no error rateTuring Machine Sig.Slide18

says that for 10-char input, the first char is ‘g’ or ‘G’, up to four of the next chars may be spaces and at least 5 chars are non-spacesSymbolic Constraint Sig.Slide19

says ‘g’ or ‘G’ followed by 0 or more spaces and at least 5 non-spacesE.g: [g|G][ ]*[ˆ ]{5,}Regular Expression Sig.Slide20

TM - inlining vulnerability condition takes poly timeSymb. Constraint - poly-time transformations on TMRegexp - solve constraint (exp time; PSPACE-complete)or data-flow on TM (poly time)

Accuracy VS. EfficiencySlide21

Algorithm OverviewInput:Vulnerable program

P

Vul

condition

c

Sample exploit

xInstruction trace T

Output:

TM sigSymbolic constraint sig

RegEx sigSlide22

MEP is a straight-line program -- e.g. the path that the exploit took to reach the vulnerabilityPEP includes different paths to the vulnerabilitya complete PEP coverage signature accepts all inputs in LP,c

complete coverage through a

chop of the program

includes all paths from the input read (

v

init

) to the vulnerability point (

v

final)

MEP and PEPSlide23

TM -> Symbolic ConstraintStatically estimate effects of memory updates and loopsMemory updates: SSA analysisLoops: static unrollingSlide24

Evaluation9000 lines C++ codeCBMC model checker to build/solve symbolic constraints, generate RegEx’sdisassembler based on

Kruegel

; IR new

ATPhttpd

various vulnerabilities;

sprintf

-style string too long10 distinct subpaths

to

RegEx in 0.1216sec

BINDstack overflow vulnerability; TSIG vulnerability10 distinct graphs in symbolic constraint

30ms for chopping88% of functions were reachable between entry and vulnerabilitySlide25

Propose a framework on automatically generate vulnerability signaturesTuring MachineSymbolic ConstraintsRegular ExpressionsPreliminary work on the feasibility of a grand challenge problem for decades

Conclusion Slide26

Attack: Dynamic Analysis Unleashing Mayhem on Binary CodeSlide27

Automatic Exploit Generation ChallengeAutomatically Find Bugs & Generate Exploits

27

Explore ProgramSlide28

Ghostscript v8.62 Bugint

outprintf

(

const

char

*

fmt

, … )

{

int count; char

buf[1024]; va_list args;

va_start( args,

fmt ); count = vsprintf

( buf, fmt,

args ); outwrite

(

buf

, count ); // print out

}

int

main(

int

argc

,

char

*

argv

[] )

{

const

char

*

arg

;

while

( (

arg

= *

argv

++) != 0 ) {

switch

(

arg

[0] ) {

case

‘-’: {

switch

(

arg

[1] ) {

case

0:

default

:

outprintf

( “unknown switch %s\n”,

arg

[1] );

}

}

default

: …

}

28

Reading user input from command line

Buffer overflow

CVE

-2009-

4270Slide29

Multiple Pathsint

outprintf

(

const

char

*

fmt

, … )

{

int count;

char buf[1024]; va_list

args;

va_start( args, fmt

); count = vsprintf

( buf, fmt

,

args

);

outwrite

(

buf

, count ); // print out

}

int

main(

int

argc

,

char

*

argv

[] )

{

const

char

*

arg

;

while

( (

arg

= *

argv

++) != 0 ) {

switch

(

arg

[0] ) {

case

‘-’: {

switch

(

arg

[1] ) {

case

0:

default

:

outprintf

( “unknown switch %s\n”,

arg

[1] );

}

}

default

: …

} …

29

Many

Branches!Slide30

Automatic Exploit Generation ChallengeAutomatically Find Bugs & Generate Exploits

30

Transfer Control to Attacker Code

(exec “/bin/

sh

”)Slide31

Generating Exploitsint

outprintf

(

const

char

*

fmt

, … )

{

int count; char buf

[1024]; va_list args;

va_start( args,

fmt ); count = vsprintf(

buf, fmt,

args ); outwrite

(

buf

, count ); // print out

}

int

main(

int

argc

,

char

*

argv

[] )

{

const

char

*

arg

;

while

( (

arg

= *

argv

++) != 0 ) {

switch

(

arg

[0] ) {

case

‘-’: {

switch

(

arg

[1] ) {

case

0:

default

:

outprintf

( “unknown switch %s\n”,

arg

[1] );

}

}

default

: …

}

31

outprintf

fmt

ret

addr

count

args

buf

user input

main

espSlide32

Generating Exploitsint

outprintf

(

const

char

*

fmt

, … )

{

int count; char buf

[1024]; va_list args;

va_start( args,

fmt ); count = vsprintf(

buf, fmt,

args ); outwrite

(

buf

, count ); // print out

}

int

main(

int

argc

,

char

*

argv

[] )

{

const

char

*

arg

;

while

( (

arg

= *

argv

++) != 0 ) {

switch

(

arg

[0] ) {

case

‘-’: {

switch

(

arg

[1] ) {

case

0:

default

:

outprintf

( “unknown switch %s\n”,

arg

[1] );

}

}

default

: …

}

32

Read Return Address from Stack Pointer (

esp

)

32

outprintf

fmt

ret

addr

count

args

buf

user input

main

esp

Control

H

ijack PossibleSlide33

Source

int

main(

int

argc

,

char

*

argv

[] ){ const

char *arg;

while( (arg = *

argv++) != 0 ) {…

Executables

(Binary)

01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010

Unleashing Mayhem

33

Automatically Find Bugs & Generate Exploits

for

ExecutablesSlide34

f

t

f

t

f

t

x

= input()

34

How Mayhem Works:

Symbolic Execution

if x > 42

if x*x = 0xffffffff

v

uln

()

x

can be anything

x

> 42

(x > 42)

(x*x == 0xffffffff)

if x < 100Slide35

f

t

f

t

f

t

x

= input()

if x > 42

if x*x = 0xffffffff

vuln

()

35

Path Predicate =

Π

x

can be anything

x

> 42

(x > 42)

(x*x == 0xffffffff)

Π

=

if x < 100Slide36

f

t

f

t

f

t

x

= input()

36

How Mayhem Works:

Symbolic Execution

if x > 42

if x*x = 0xffffffff

vuln

()

x

can be anything

x

> 42

(x > 42)

(x*x == 0xffffffff)

Violates

Safety Policy

if x < 100Slide37

int outprintf( const char

*

fmt

, … )

{

int

count;

char

buf

[1024]; va_list args;

va_start( args, fmt

); count = vsprintf( buf,

fmt, args );

outwrite( buf, count ); // print out

}

Safety Policy in Mayhem

37

outprintf

fmt

ret

addr

count

args

buf

user input

main

esp

Return to user-controlled address

EIP

not affected by user input

Instruction Pointer (EIP) level:Slide38

Exploit Generation38

Π

input[0-31] = attack code

i

nput[1038-1042] = attack code address

Exploit is an input that satisfies the predicate:

Exploit Predicate

Can transfer control to attack code?

Can position attack code?Slide39

ChallengesSymbolic Execution

Exploit Generation

39

Efficient Resource Management

Symbolic Index

Challenge

Hybrid Execution

Index-based Memory ModelSlide40

Challenge 1: Resource Management inSymbolic Execution40Slide41

Current Resource Management in Symbolic Execution41

Online

Symbolic Execution

Offline

Symbolic Execution

(a.k.a.

Concolic

)Slide42

Offline Execution42

One path

at a time

Method 1:

Re-run from scratch

⟹ Inefficient

Re-executed

every timeSlide43

Online Execution43

Method 2:

Stop forking

⟹ Miss paths

Method 3:

Snapshot process ⟹ Huge disk image

Hit Resource Cap

Fork at branchesSlide44

Mayhem: Hybrid Execution44

Our Method:

Don’t snapshot state;

u

se path predicate to recreate state

9.4M

 500K

Hit Resource Cap

Fork at branches

Ghostscript

8.62

“Checkpoint”Slide45

45Hybrid Execution

Manage #executors

in memory within resource cap

Minimize duplicated work

Lightweight checkpoints

✓Slide46

Challenge 2: Symbolic Indices46Slide47

Symbolic Indices47

x

=

user_input

();

y

=

mem

[

x

];a

ssert (y == 42);

x

can be anything

Which memory cell contains

42?

2

32

cells to check

Memory

0

2

32

-1Slide48

One Cause: Table LookupsTable lookups in standard APIs:Parsing: sscanf, vfprintf

, etc.

Character test

:

isspace

,

isalpha, etc.Conversion:

toupper

, tolower,

mbtowc, etc.…

48Slide49

Method 1: ConcretizationOver-constrainedMisses 40% of exploits in our experiments

49

Π

mem

[

x

] = 42

Π’

Π

∧ x = 17

mem

[x] = 42

Π

Solvable

ExploitsSlide50

Method 2: Fully Symbolic50

Π

mem

[

x

]

= 42

Π’

Solvable

Exploits

Π

mem

[x]

= 42

mem

[0] = v

0

∧…∧

mem

[2

32

-1

]

=

v

2

32

-1

Π

’Slide51

Our ObservationPath predicate (Π)constrains range

of symbolic memory

accesses

51

y =

mem

[

x

]

f

t

x <= 42

x can be anything

f

t

x

>= 50

Use symbolic execution state to:

Step 1:

Bound memory addresses referenced

Step 2:

Make search tree for memory address values

Π

 42 < x < 50Slide52

Step 1 — Find Bounds52

mem

[

x & 0xff

]

Value Set Analysis

1

provides initial bounds

Over-approximation

Query solver to refine bounds

Lowerbound

= 0, Upperbound

= 0xff

[1]

Balakrishnan

et al

., Analyzing memory accesses in x86

executables

, ICCC 2004Slide53

Step 2 — Index Search Tree Construction

53

y

=

mem

[

x

]

if x = 1 then y = 10

Index

Memory

Value

10

12

22

20

if x = 2 then y = 12

if x = 3 then y = 22

if x = 4 then y = 20

ite

( x < 3, left, right )

ite

( x < 2, left, right )Slide54

Exploit Generation54Slide55

55

Linux

(22)

Windows

(7)Slide56

56

2 Unknown Bugs:

FreeRadius

,

GnuGolSlide57

LimitationsWe do not claim to find all exploitable bugsGiven an exploitable bug, we do not guarantee we will always find an exploit

Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.

We

do

not

consider defenses, which may defend against otherwise exploitable

bugs

Q [Schwartz

et al., USENIX 2011]

57

But Every Report is

ActionableSlide58

Related WorkAPEG [Brumley et al., IEEE S&P 2008]

Uses patch to locate bug, no

shellcode

executed

Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities

[

Heelan

, MS

Thesis, U.

of Oxford 2009]Creates control flow hijack from crashing input

AEG [Avgerinos et al., NDSS 2011]Find and generate exploits from source code

BitBlaze, KLEE, Sage, S2E, etc.Symbolic execution frameworks58Slide59

ConclusionMayhem automatically generated 29 exploits against Windows and Linux programsHybrid ExecutionEfficient resource management for symbolic execution

Index

-based

Memory Modeling

Handle symbolic memory in real-world applications

59Slide60

Backup SlidesSlide61

Algorithm OverviewPre-processDisassemble binaryConvert to an intermediate representation (IR)Chop A chop is a partial program

P’

that starts at T

0

and ends at exploit point

Call-graph level

Compute the sigGet TM sigTM -> Symbolic constraint

Symbolic constraint ->

RegExSlide62

ChoppingChopping reduces the size of program to be analyzedPerformed on call-graph levelNo function pointer support yetSlide63

Get TM SigReplace outgoing JMP with RET BENIGNSlide64

Symbolic Constraint -> RegExSolution 1: Solve constraint system S and or-ing together all membersSolution 2: Data-flow analysis optimizationSlide65

if x < 100

if x*x = 0xffffffff

x

= input()

65

How Mayhem Works:

Symbolic Execution

if x > 42

vuln

()

x

can be anything

x

> 42

(x > 42)

(x*x != 0xffffffff)

(x > 42)

(x*x

!=

0xffffffff

)

(x >= 100)

f

t

f

t

f

tSlide66

One Cause: Overwritten Pointers

66

42

mem

[0x11223344]

mem

[input]

arg

ret

addr

ptr

buf

user input

assert(*

ptr

==42);

return;

ptr

address 11223344

ptr

= 0x11223344Slide67

Index Search Tree Optimization:Piecewise Linear Approximation67

y

= 2*x + 10

y

= - 2*x + 28

Index

Memory

ValueSlide68

Piecewise Linear Approximation68

Time

2x faster

atphttpd

v0.4b