Towards Automatic Signature Generation of Vulnerabilitybased Signature Dynamic Analysis Unleashing Mayhem on Binary Code Automatic exploit detection attack and defense Static and Dynamic Analysis ID: 477205
Download Presentation The PPT/PDF document "Static Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Static AnalysisTowards Automatic Signature Generation of Vulnerability-based SignatureDynamic AnalysisUnleashing Mayhem on Binary CodeAutomatic exploit detection: attack and defense
Static and Dynamic AnalysisSlide2
Towards Automatic Signature Generation of Vulnerability-based SignatureDefense: Static AnalysisSlide3
DefinitionVulnerability - A vulnerability is a type of bug that can be used by an attacker to alter the intended operation of the software in a malicious way.Exploit - An exploit is an actual input that triggers a software vulnerability, typically with malicious intent and devastating consequences
BackgroundSlide4
Zero-day attacks that exploit unknown vulnerabilities represent a serious threatNo patch or signature availableSymantec:20 unknown vulnerabilities exploited 07/2005 – 06/2007Current practice is new vulnerability analysis and protection generation is mostly manual
Our goal: automate the process of protection generation for unknown vulnerabilities
MotivationSlide5
Software Patch: patch the binary of vulnerable applicationInput Filter: a network firewall or a module on the I/O path
Data Patch: patch the data input instead of binary
Signature: signature-based input filtering
How to protect a Vulnerability Application?
Data
Input
Input Filter
Vulnerable application
DroppedSlide6
Automatic signature generationReason:Manual signature generation is slow and errorFast generation is important – previously unknown or unpatched vulnerabilities can be exploited orders of magnitude faster than a human can respondMore accurate
Our GoalSlide7
There are usually several different polymorphic exploit variants that can trigger a software vulnerabilityExploit variants may differ syntactically but be semantically equivalentTo be effective -- the signature should be constructed based on the property of the vulnerability, instead of an exploitChallengesSlide8
Require manual stepsEmploy heuristics which may fail in many settingsTechniques rely on specific properties of an exploit – return addressesOnly work for specific vulnerabilities in specific circumstancesLimitations of previous approachesSlide9
At a high level, our main contribution is a new class of signature, that is not specific to details such as whether an exploit successfully hijacks control of the program, but instead whether executing an input will (potentially) result in an unsafe execution state.Our approachSlide10
vulnerability signaturewhether executing an input potentially results in an unsafe program stateT(P, x) the execution trace obtained by executing a program P on input xVulnerability condition
representation (how to express a vulnerability as a signature)
coverage (measured by false positive rate)
OverviewSlide11
vulnerability signaturerepresentation for set of inputs that define a specified vulnerability conditiontrade-offsrepresentation: matching accuracy vs. efficiencysignature creation: creation time vs. coverage
Tuple {
P,T,x,c
}
binary program (P), instruction trace
(T), exploit string (x), vulnerability condition (c)
Vulnerability SignatureSlide12
(P,c) = (< i1, . . . , ik >,c)T(
P,x
) is the execution trace of running P with input x means
T satisfies vulnerability condition c
L
P,c
consists of the set of all inputs x to a program P
such that
Formally: An exploit for a vulnerability (
P,c) is an input
Vulnerability Signature NotationSlide13
P given in boxx = g/AAAAT={1,2,3,4,6,7, 8,9,8,10,11,10, 11,10,11,10,
11,10,11}
c = heap
overflow
(on 5th iteration of line 11)
ExampleSlide14
A vulnerability signature is a matching function MATCH which for an input x returns either EXPLOIT or BENIGN for a program P without running the programA perfect vulnerability signature satisfies
Completeness:
Soundness:
Vulnerability Signature DefinitionSlide15
C: Ґ×D×M×K×I ->{BENIGN, EXPLOIT}Ґ is a memoryD is the set of variables definedM is the program’s map from memory to valuesK is the continuation stack
I is the next instruction to execute
Vulnerability ConditionSlide16
Turing machine signaturesprecise (no false positive or negatives)may not terminate (in presence of loops, e.g.)symbolic constraint signaturesapproximates looping, aliasingguaranteed to terminateregular expression signatures
approximates elementary constructs (counting)
very efficient
Signature Representation ClassesSlide17
Can provide a precise, even exact, characterization of the vulnerability condition in a particular programA TM that exactly emulates the program has no error rateTuring Machine Sig.Slide18
says that for 10-char input, the first char is ‘g’ or ‘G’, up to four of the next chars may be spaces and at least 5 chars are non-spacesSymbolic Constraint Sig.Slide19
says ‘g’ or ‘G’ followed by 0 or more spaces and at least 5 non-spacesE.g: [g|G][ ]*[ˆ ]{5,}Regular Expression Sig.Slide20
TM - inlining vulnerability condition takes poly timeSymb. Constraint - poly-time transformations on TMRegexp - solve constraint (exp time; PSPACE-complete)or data-flow on TM (poly time)
Accuracy VS. EfficiencySlide21
Algorithm OverviewInput:Vulnerable program
P
Vul
condition
c
Sample exploit
xInstruction trace T
Output:
TM sigSymbolic constraint sig
RegEx sigSlide22
MEP is a straight-line program -- e.g. the path that the exploit took to reach the vulnerabilityPEP includes different paths to the vulnerabilitya complete PEP coverage signature accepts all inputs in LP,c
complete coverage through a
chop of the program
includes all paths from the input read (
v
init
) to the vulnerability point (
v
final)
MEP and PEPSlide23
TM -> Symbolic ConstraintStatically estimate effects of memory updates and loopsMemory updates: SSA analysisLoops: static unrollingSlide24
Evaluation9000 lines C++ codeCBMC model checker to build/solve symbolic constraints, generate RegEx’sdisassembler based on
Kruegel
; IR new
ATPhttpd
various vulnerabilities;
sprintf
-style string too long10 distinct subpaths
to
RegEx in 0.1216sec
BINDstack overflow vulnerability; TSIG vulnerability10 distinct graphs in symbolic constraint
30ms for chopping88% of functions were reachable between entry and vulnerabilitySlide25
Propose a framework on automatically generate vulnerability signaturesTuring MachineSymbolic ConstraintsRegular ExpressionsPreliminary work on the feasibility of a grand challenge problem for decades
Conclusion Slide26
Attack: Dynamic Analysis Unleashing Mayhem on Binary CodeSlide27
Automatic Exploit Generation ChallengeAutomatically Find Bugs & Generate Exploits
27
Explore ProgramSlide28
Ghostscript v8.62 Bugint
outprintf
(
const
char
*
fmt
, … )
{
int count; char
buf[1024]; va_list args;
va_start( args,
fmt ); count = vsprintf
( buf, fmt,
args ); outwrite
(
buf
, count ); // print out
}
int
main(
int
argc
,
char
*
argv
[] )
{
const
char
*
arg
;
while
( (
arg
= *
argv
++) != 0 ) {
switch
(
arg
[0] ) {
case
‘-’: {
switch
(
arg
[1] ) {
case
0:
…
default
:
outprintf
( “unknown switch %s\n”,
arg
[1] );
}
}
default
: …
}
…
28
Reading user input from command line
Buffer overflow
CVE
-2009-
4270Slide29
Multiple Pathsint
outprintf
(
const
char
*
fmt
, … )
{
int count;
char buf[1024]; va_list
args;
va_start( args, fmt
); count = vsprintf
( buf, fmt
,
args
);
outwrite
(
buf
, count ); // print out
}
int
main(
int
argc
,
char
*
argv
[] )
{
const
char
*
arg
;
while
( (
arg
= *
argv
++) != 0 ) {
switch
(
arg
[0] ) {
case
‘-’: {
switch
(
arg
[1] ) {
case
0:
…
default
:
outprintf
( “unknown switch %s\n”,
arg
[1] );
}
}
default
: …
} …
29
Many
Branches!Slide30
Automatic Exploit Generation ChallengeAutomatically Find Bugs & Generate Exploits
30
Transfer Control to Attacker Code
(exec “/bin/
sh
”)Slide31
Generating Exploitsint
outprintf
(
const
char
*
fmt
, … )
{
int count; char buf
[1024]; va_list args;
va_start( args,
fmt ); count = vsprintf(
buf, fmt,
args ); outwrite
(
buf
, count ); // print out
}
int
main(
int
argc
,
char
*
argv
[] )
{
const
char
*
arg
;
while
( (
arg
= *
argv
++) != 0 ) {
switch
(
arg
[0] ) {
case
‘-’: {
switch
(
arg
[1] ) {
case
0:
…
default
:
outprintf
( “unknown switch %s\n”,
arg
[1] );
}
}
default
: …
}
…
31
outprintf
…
fmt
ret
addr
count
args
buf
user input
main
espSlide32
Generating Exploitsint
outprintf
(
const
char
*
fmt
, … )
{
int count; char buf
[1024]; va_list args;
va_start( args,
fmt ); count = vsprintf(
buf, fmt,
args ); outwrite
(
buf
, count ); // print out
}
int
main(
int
argc
,
char
*
argv
[] )
{
const
char
*
arg
;
while
( (
arg
= *
argv
++) != 0 ) {
switch
(
arg
[0] ) {
case
‘-’: {
switch
(
arg
[1] ) {
case
0:
…
default
:
outprintf
( “unknown switch %s\n”,
arg
[1] );
}
}
default
: …
}
…
32
Read Return Address from Stack Pointer (
esp
)
32
outprintf
…
fmt
ret
addr
count
args
buf
user input
main
esp
Control
H
ijack PossibleSlide33
Source
int
main(
int
argc
,
char
*
argv
[] ){ const
char *arg;
while( (arg = *
argv++) != 0 ) {…
Executables
(Binary)
01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010
Unleashing Mayhem
33
Automatically Find Bugs & Generate Exploits
for
ExecutablesSlide34
f
t
f
t
f
t
x
= input()
34
How Mayhem Works:
Symbolic Execution
if x > 42
if x*x = 0xffffffff
v
uln
()
x
can be anything
x
> 42
(x > 42)
∧
(x*x == 0xffffffff)
if x < 100Slide35
f
t
f
t
f
t
x
= input()
if x > 42
if x*x = 0xffffffff
vuln
()
35
Path Predicate =
Π
x
can be anything
x
> 42
(x > 42)
∧
(x*x == 0xffffffff)
Π
=
if x < 100Slide36
f
t
f
t
f
t
x
= input()
36
How Mayhem Works:
Symbolic Execution
if x > 42
if x*x = 0xffffffff
vuln
()
x
can be anything
x
> 42
(x > 42)
∧
(x*x == 0xffffffff)
Violates
Safety Policy
if x < 100Slide37
int outprintf( const char
*
fmt
, … )
{
int
count;
char
buf
[1024]; va_list args;
va_start( args, fmt
); count = vsprintf( buf,
fmt, args );
outwrite( buf, count ); // print out
}
Safety Policy in Mayhem
37
outprintf
…
fmt
ret
addr
count
args
buf
user input
main
esp
Return to user-controlled address
EIP
not affected by user input
Instruction Pointer (EIP) level:Slide38
Exploit Generation38
Π
∧
input[0-31] = attack code
∧
i
nput[1038-1042] = attack code address
Exploit is an input that satisfies the predicate:
Exploit Predicate
Can transfer control to attack code?
Can position attack code?Slide39
ChallengesSymbolic Execution
Exploit Generation
39
Efficient Resource Management
Symbolic Index
Challenge
Hybrid Execution
Index-based Memory ModelSlide40
Challenge 1: Resource Management inSymbolic Execution40Slide41
Current Resource Management in Symbolic Execution41
Online
Symbolic Execution
Offline
Symbolic Execution
(a.k.a.
Concolic
)Slide42
Offline Execution42
One path
at a time
Method 1:
Re-run from scratch
⟹ Inefficient
Re-executed
every timeSlide43
Online Execution43
Method 2:
Stop forking
⟹ Miss paths
Method 3:
Snapshot process ⟹ Huge disk image
Hit Resource Cap
Fork at branchesSlide44
Mayhem: Hybrid Execution44
Our Method:
Don’t snapshot state;
u
se path predicate to recreate state
9.4M
500K
Hit Resource Cap
Fork at branches
Ghostscript
8.62
“Checkpoint”Slide45
45Hybrid Execution
Manage #executors
in memory within resource cap
✓
Minimize duplicated work
✓
Lightweight checkpoints
✓Slide46
Challenge 2: Symbolic Indices46Slide47
Symbolic Indices47
x
=
user_input
();
y
=
mem
[
x
];a
ssert (y == 42);
x
can be anything
Which memory cell contains
42?
2
32
cells to check
Memory
0
2
32
-1Slide48
One Cause: Table LookupsTable lookups in standard APIs:Parsing: sscanf, vfprintf
, etc.
Character test
:
isspace
,
isalpha, etc.Conversion:
toupper
, tolower,
mbtowc, etc.…
48Slide49
Method 1: ConcretizationOver-constrainedMisses 40% of exploits in our experiments
49
Π
∧
mem
[
x
] = 42
∧
Π’
Π
∧ x = 17
∧
mem
[x] = 42
∧
Π
’
✓
Solvable
✗
ExploitsSlide50
Method 2: Fully Symbolic50
Π
∧
mem
[
x
]
= 42
∧
Π’
✗
Solvable
✓
Exploits
Π
∧
mem
[x]
= 42
∧
mem
[0] = v
0
∧…∧
mem
[2
32
-1
]
=
v
2
32
-1
∧
Π
’Slide51
Our ObservationPath predicate (Π)constrains range
of symbolic memory
accesses
51
y =
mem
[
x
]
f
t
x <= 42
x can be anything
f
t
x
>= 50
Use symbolic execution state to:
Step 1:
Bound memory addresses referenced
Step 2:
Make search tree for memory address values
Π
42 < x < 50Slide52
Step 1 — Find Bounds52
mem
[
x & 0xff
]
Value Set Analysis
1
provides initial bounds
Over-approximation
Query solver to refine bounds
Lowerbound
= 0, Upperbound
= 0xff
[1]
Balakrishnan
et al
., Analyzing memory accesses in x86
executables
, ICCC 2004Slide53
Step 2 — Index Search Tree Construction
53
y
=
mem
[
x
]
if x = 1 then y = 10
Index
Memory
Value
10
12
22
20
if x = 2 then y = 12
if x = 3 then y = 22
if x = 4 then y = 20
ite
( x < 3, left, right )
ite
( x < 2, left, right )Slide54
Exploit Generation54Slide55
55
Linux
(22)
Windows
(7)Slide56
56
2 Unknown Bugs:
FreeRadius
,
GnuGolSlide57
LimitationsWe do not claim to find all exploitable bugsGiven an exploitable bug, we do not guarantee we will always find an exploit
Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.
We
do
not
consider defenses, which may defend against otherwise exploitable
bugs
Q [Schwartz
et al., USENIX 2011]
57
But Every Report is
ActionableSlide58
Related WorkAPEG [Brumley et al., IEEE S&P 2008]
Uses patch to locate bug, no
shellcode
executed
Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities
[
Heelan
, MS
Thesis, U.
of Oxford 2009]Creates control flow hijack from crashing input
AEG [Avgerinos et al., NDSS 2011]Find and generate exploits from source code
BitBlaze, KLEE, Sage, S2E, etc.Symbolic execution frameworks58Slide59
ConclusionMayhem automatically generated 29 exploits against Windows and Linux programsHybrid ExecutionEfficient resource management for symbolic execution
Index
-based
Memory Modeling
Handle symbolic memory in real-world applications
59Slide60
Backup SlidesSlide61
Algorithm OverviewPre-processDisassemble binaryConvert to an intermediate representation (IR)Chop A chop is a partial program
P’
that starts at T
0
and ends at exploit point
Call-graph level
Compute the sigGet TM sigTM -> Symbolic constraint
Symbolic constraint ->
RegExSlide62
ChoppingChopping reduces the size of program to be analyzedPerformed on call-graph levelNo function pointer support yetSlide63
Get TM SigReplace outgoing JMP with RET BENIGNSlide64
Symbolic Constraint -> RegExSolution 1: Solve constraint system S and or-ing together all membersSolution 2: Data-flow analysis optimizationSlide65
if x < 100
if x*x = 0xffffffff
x
= input()
65
How Mayhem Works:
Symbolic Execution
if x > 42
vuln
()
x
can be anything
x
> 42
(x > 42)
∧
(x*x != 0xffffffff)
(x > 42)
∧
(x*x
!=
0xffffffff
)
∧
(x >= 100)
f
t
f
t
f
tSlide66
One Cause: Overwritten Pointers
66
42
mem
[0x11223344]
mem
[input]
…
arg
ret
addr
ptr
buf
user input
…
assert(*
ptr
==42);
return;
ptr
address 11223344
ptr
= 0x11223344Slide67
Index Search Tree Optimization:Piecewise Linear Approximation67
y
= 2*x + 10
y
= - 2*x + 28
Index
Memory
ValueSlide68
Piecewise Linear Approximation68
Time
2x faster
atphttpd
v0.4b