Venkatesh Srinivasan Ara Vartanian Thomas Reps venk aravart repscswiscedu 1 Venkatesh Srinivasan is now at Google Software is everywhere Everyday systems phones laptops watches cars etc ID: 713004
Download Presentation The PPT/PDF document "Model-Assisted Machine-Code Synthesis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Model-Assisted Machine-Code Synthesis
Venkatesh Srinivasan*, Ara Vartanian, Thomas Reps{venk, aravart, reps}@cs.wisc.edu
1
*
Venkatesh
Srinivasan is now at GoogleSlide2
Software is everywhereEveryday systems: phones, laptops, watches, cars, etc.Critical systems: aircrafts, space shuttles, medical devices, etc.
01000101011111101010101001000101011111101010Binary00101010101110101010101001000101011111101010Software is Pervasive2
Binaries are
Binaries are
No source code
No documentation
[Motivation]
Analyze Binaries
Rewrite BinariesSlide3
0100010
Binary’1011101
1101010Binary’0010111
0111101Binary’
0010111
Binary Rewriting
Superoptimization
Partial Evaluation
Program Repair
Obfuscation
Partial Evaluation
Slicing
3
01000101011111101010101001000101011111101010
Binary
00101010101110101010101001000101011111101010
[Motivation]
Binary Rewriter
Why rewrite binaries?Slide4
Inside a Semantics-Based
Binary Rewriter *4
Semantic Representation
Binary Analyses
Transform
Transformed Semantics
McSynth
++
Offline binary optimization
[Motivation]
Partial evaluation
Slicing
Binary obfuscation
Binary repair
* V.
Srinivasan, T. Sharma,
and T. Reps.
Speeding up Machine-Code Synthesis.
In
OOPSLA, 2016
Disassembly of section .text:
080482e0 <.text>:
80482e0: xor
ebp,ebp
80482e2: pop
esi
80482e3: mov
ecx,esp
80482e5: and esp, 0xfffffff0
80482e8: push
eax
80482e9: push esp
80482ea: push
edx
80482eb: push 0x80483d0
80482f0: push 0x80483e0
80482f5: push
ecx
80482f6: push
esi
80482f7: push 0x80483a5
80482fc: call 80482c4
.
.
8048394: push ebp
8048395: mov ebp, esp
8048397: sub esp, 0x10
804839a: mov
eax
, [ebp + 8]
804839d: mov [ebp-4],
eax
80483a0: mov
eax
, [ebp-4]
.
.
80483c0: mov
eax
, [ebp-4]
80483c3: leave
80483c4: ret
Rewritten Binary
080482e0 <.text>:
80482e0: xor
ebp,ebp
80482e2: pop
esi80482e3: mov ecx,esp80482e5: and esp, 0xfffffff080482e8: push 180482e9: push 280482ea: push 380482eb: push 0x80483d080482f0: push 0x80483e080482f5: push ecx80482f6: push esi80482f7: push 0x80483a580482fc: call 80482c4 ..8048394: lea esp, [esp – 4] 8048395: mov ebp, esp8048397: sub esp, 0x10804839a: mov eax, 10804839d: mov [ebp-4], eax80483a0: mov eax, 20..80483c0: mov [ebp-4], eax80483c3: leave80483c4: ret
Semantic Representation
Transform
Transformed Semantics
McSynth
++
Logical Formula
Machine-Code SynthesizerSlide5
0100010
Binary’1011101
1101010Binary’0010111
0111101Binary’
0010111
McSynth
++
is not fast enough!
Superoptimization
Partial Evaluation
Program Repair
Obfuscation
Partial Evaluation
Slicing
5
01000101011111101010101001000101011111101010
Binary
00101010101110101010101001000101011111101010
[Motivation]
Binary Rewriter
Analysis
Transformation
Synthesis
Rewriter makes many calls to
McSynth
++
Each call to
McSynth
++
takes minutes
Time taken to rewrite a binary – several hours or days!
We need to speed-up
McSynth
++!
McSynth
++
→
McSynth
-ML Slide6
OutlineMotivationIntel x86 (IA-32) PrimerMcSynth++
McSynth-MLExperimentsConclusion6Slide7
Intel x86 (IA-32) Primer
7 Registers
1000
ESP
Stack
push
eax
20
EAX
20
mov
ebx
,
eax
20
EBX
mov [
esp
], 40
40
lea
esp
, [esp+4]
996
1000
[Background]Slide8
OutlineMotivationIntel x86 (IA-32) PrimerMcSynth++
McSynth-MLExperimentsConclusion8Slide9
Synthesis of Machine Code*
Synthesize machine-code instructions from a semantic specificationQFBV formulaParameterized by ISA9
push
eax
push
eax
ESP’
= ESP – 4
∧
Mem’ = Mem[ESP – 4 ↦ EAX]
⟪ push
eax
⟫
lea
esp
, [
esp
– 4]
mov
[
esp
],
eax
ESP’
= ESP – 4
∧
Mem’ = Mem[ESP – 4 ↦ EAX]
⟪
lea esp, [esp – 4];
mov
[esp],
eax
⟫
McSynth
++
ESP’
= ESP – 4
∧
Mem’ = Mem[ESP – 4 ↦ EAX]
ESP’
= ESP – 4
∧
Mem’ = Mem[ESP – 4 ↦ EAX]
∧
EBP’ = EBP
∧
EAX’ = EAX
∧
EBX’ = EBX
∧
.
.
.
CF’ = CF
∧
OF’ = OF
[
McSynth
++]
*
V
. Srinivasan and T. Reps. Synthesis of machine code from semantics. In PLDI, 2015 V. Srinivasan, T. Sharma, and T. Reps. Speeding up machine-code synthesis. In OOPSLA, 2016Slide10
43,000 unique IA-32 instruction schemas
Exponential cost of enumeration
+
=
Enumerative Synthesis –
Challenges
Synthesis of an instruction sequence of length 2 takes several hours or days!
[
McSynth
++]Slide11
McSynth++ Design11
ϕMaster
ϕ
1,2
ϕ
1,1
.
.
.
.
.
ϕ
m,1
ϕ
m,2
ϕ
m,n
.
.
.
Slave
Slave
Slave
I
1
I
2
I
n
ϕ
m-1,1
ϕ
m-1,k
T
ϕ
'
m,1
ϕ
'
m,2
I'
2
I'
1
Slave
I
[
McSynth
++]Slide12
McSynth++ Design
12[McSynth++]
Master
Slave
Slave
φ
1
:
EAX’ = Mem(ESP + 10)
φ
2
:
ESP’ = ESP + 18
∧
ZF
’ = (ESP + 10 = 0)
add
esp
, 10
lea
esp
, [esp+8]
mov
eax
, [esp+10
]
φ
:
EAX
’
= Mem(ESP +
10)
∧
ESP’
=
ESP + 18
∧
ZF’ = (ESP + 10 = 0)
Few seconds
10 minutesSlide13
McSynth
++ Slave13
Instruction Enumerator
φ
1
I
Search Space
(Instruction sequences)
Footprint-Based
Pruner
CEGIS
Slave Synthesizer
Prunes useless candidates
Instantiation of counterexample-guided inductive synthesis framework for machine code
[
McSynth
++]
Bits-Lost-Based Pruner Slide14
McSynth++ Slave
14Space of instruction sequences
Linear search
[
McSynth
++]
One-instruction sequences
Two-instruction sequences
Three-instruction sequences
.
.
.
.
.
.
.
Slave performs no prioritization of candidates
Not all candidates are equally likely to implement
φ
Slide15
OutlineMotivationIntel x86 (IA-32) PrimerMcSynth++
McSynth-MLExperimentsConclusion15Slide16
McSynth-ML Key InsightPrioritize candidatesRetain completeness propertiesLinear search → Best-first search assisted by models
16[McSynth-ML]Slide17
McSynth-ML Design17
[McSynth-ML]
McSynth
++
McSynth
-ML
Master
ϕ
Slave
Slave
Slave
. . . .
I
1
I
2
I
m
Master
ϕ
Slave-ML
Slave-ML
Slave-ML
. . . .
I
1
I
2
I
n
Model-assisted slave
:
Faster enumerative synthesisSlide18
Training CorpusCorpus of
〈Specification, Implementation〉 pairs 18Machine-code Synthesizer
QFBV
Instruction sequence
Search
Instruction
sequence
InstrToQFBV
QFBV
Symbolic execution
Corpus of 〈
φ
, I〉 pairs
[McSynth-ML]Slide19
Learning Models19
Corpus of 〈φ, I〉 pairs
Language model
n-gram
“push ebp; mov ebp, esp” is more common than “push ebp; xor ebp, esp”
Regression model
k-nearest neighbors
If
φ
contains
“+
8/16/32
”
operator, I most likely will contain “add”, “sub”, or “lea” opcodes
If
φ
contains “+
8
” operator, I most likely will contain 8-bit operands
[McSynth-ML]
n-gram
k-NNSlide20
Space of instruction sequences
Model–Assisted
Best–First Search
20
[McSynth-ML]
n-gram
k-NN
Next prefix to expand?
Model scores
Next prefix to expand
mov
esp
, <
imm
>
mov
esp
, [
esp
]
add
esp
, <
imm
>
add
esp
, <
imm
>
add
esp
, <
imm
>
mov
esp
, <
imm
>
add
esp
, <
imm
>
mov
esp
, <
imm
>
add esp, <imm> add esp, <imm
>
. . .
N-gram score:
How common is I?
K-NN score:
How well do features of I
correlate with features of
φ?Slide21
Space of instruction sequences
Optimization –
Instruction-Pool Truncation
21
[McSynth-ML]
n-gram
k-NN
k-NN
φSlide22
OutlineMotivationIntel x86 (IA-32) PrimerMcSynth++
McSynth-MLExperimentsConclusion22Slide23
Test Suite
“Important” instruction-sequences from real-world programsSPECINT 2006 benchmarks (10 binaries)Instruction-sequences of length 5 through 1050
instruction-sequences in total
No overlaps
No restriction on source of QFBV formula
23
Symbolic Execution
McSynth++
I
I’
φ
[Experiments]Slide24
Training Corpus24
[Experiments]Sym Exec
I
t
Binary1
ϕ
t
Test Input
Training Corpus
Sym
Exec
{I
1
, I
2
, I
3
, …, I
n
}
Binary2
Binary3
Binary10
{
ϕ
1
,
ϕ
2
,
ϕ
3
, …,
ϕ
n
}
Corpus of 〈
φ
, I〉 pairs
. . . .Slide25
McSynth-ML Vs. McSynth++
25No. of timeouts: 6 Vs. 0
(out of 50)Avg. speedup for formulas that timed out in McSynth++:
over
526X
Avg. speedup for remaining formulas:
4.55X
(
12.6X
for formulas with baseline synthesis-time > 100s)
[Experiments]Slide26
Experiments Summary26
[Experiments]
Search strategy
Instruction-pool truncation
via k-NN
Linear
Model-assisted best-first
No
No. of timeouts
Speedup for formulas
that timeout
Speedup for remaining formulas
No. of timeouts
Speedup for formulas that timeout
Speedup for remaining formulas
6
--
--
0
At least 38
1.78
Yes
0
At least
200
4
0
At least 526
4.55
Search strategy
Instruction-pool truncation
via k-NN
Linear
Model-assisted best-first
No
No. of timeouts
Speedup for formulas
that timeout
Speedup for remaining formulas
No. of timeouts
Speedup for formulas that timeout
Speedup for remaining formulas
6
--
--
0
Yes
0
0Slide27
ConclusionUse of Machine Learning to make machine-code synthesis smarter and fasterBest-first search assisted by modelsn-gram language model
k-NN regression modelPrioritization of candidatesRetains completenessOver 526X speedup for formulas that timed out in McSynth++4.55X speedup for remaining formulas27Slide28
Space of instruction sequences
Questions?
28
n-gram
k-NN
k-NN
φSlide29
Master in McSynth++
29 φ : EAX’
= Mem(ESP + 4) ∧
EBX’ = ((EAX * 2) >> 2) + EAX
∧
Mem’= Mem[ESP ↦ EAX]
Flow
independence
Legal Split
⇔
φ
1
: Mem’ = Mem[ESP
↦
EAX]
φ
2
: EAX’
= Mem(ESP + 4)
∧
EBX’ = ((EAX * 2) >> 2) + EAX
φ
3
:
EBX’ = ((EAX * 2) >> 2) + EAX
φ
4
: EAX’
= Mem(ESP + 4)
Mem(ESP) and Mem(ESP +4) could never aliasSlide30
Flattening “Deep” Terms30
φ3 :
EBX’ = ((EAX * 2) >> 2) + EAX
φ
’
3
:
m = EAX * 2
∧
n = m >> 2
∧
EBX’ = n + EAX
Equisatisfiable
Scratch Locations (Dead Registers) = {EBX}
φ
5
:
m = EAX * 2
φ
6
:
n = m >> 2
φ
7
:
EBX’ = n + EAX
φ
5
:
EBX’ = EAX * 2
φ
6
:
EBX’ = EBX >> 2
φ
7
:
EBX’ = EBX + EAXSlide31
Master in
McSynth++31Master++
φ
1
:
Mem’ = Mem[ESP
↦ EAX]
φ
5
:
EBX’ = EAX * 2
φ
6
:
EBX’ = EBX >> 2
φ
7
:
EBX’ = EBX + EAX
φ
4
:
EAX’ = Mem(ESP + 4)
φ
: EAX’
= Mem(ESP + 4)
∧
EBX’ = ((EAX * 2) >> 2) + EAX
∧
Mem’= Mem[ESP ↦ EAX]Slide32
Footprint-Based Search-Space Pruning32
φ1 : Mem’ = Mem[ESP – 4 ↦ EBP]
Instruction Enumerator
mov
eax
, <imm>
mov
esp
, <imm>
add
esp
, <imm>
push
eax
mov
eax
, <
imm
>
Abstract semantic Footprint
Sound abstraction of locations that might be used or modified by a QFBV formula
SFP#
use
(
φ
1
) = {ESP, EBP}
SFP#
kill
(
φ
1) = {Mem’}Ψ : EAX’ = imm
SFP
#use
(Ψ) =
{ }SFP#kill(
Ψ) = {EAX’}
mov
eax
, <
imm
>
mov
ebx
,
ecx
Ψ
: EAX’ =
imm
^ EBX’ = ECX
SFP
#
use
(
Ψ
)
=
{
ECX
}
SFP#kill(Ψ) = {EAX’, EBX’}Useless PrefixSFP#useSFP#killmov eax, <imm>Slide33
Bits-Lost-Based Pruning
φ7 requires pre-state bits in EBXΨ loses the pre-state bits in EBX when it transforms the statePrunes a candidate if Ψ does not retain enough bits to implement
φ7
Retains a candidate if pre-state bits required to implement
φ
7
are
possibly latent
in
Ψ
33
φ
7
:
EBX’ = EBX + EAX
Instruction Enumerator
Ψ
: EBX’ = EAX
mov ebx,
eax
Slave
Ψ
: EBX’ = EBX - EAX
sub ebx,
eaxSlide34
Synthesis Vs. CompilationFinds several implementations of varying “qualities”Can handle more general formulas (implicit form), e.g., EAX’ + EBX’ = EAX + EBX + 4
Finds a single implementationOnly handles state-transformation formulas (explicit form)34Slide35
Synthesis Vs. SuperoptimizationMachine-code synthesis is more general than superoptimization: superopt(I) =
synthOpt(InstrToQFBV(I))Input to synthesizer is QFBV formula, not an instruction sequenceSuperoptimizer cannot handle more general QFBV formulas such as EAX’ + EBX’ = EAX + EBX + 435Slide36
Effect of McSynth++ on a ClientWiPEr: Machine-code partial evaluator *
Total time to synthesize residual code: McSynth Vs. McSynth++Effect of McSynth-ML: no significant speedupSmall input formulas36* V. Srinivasan and T. Reps. Partial Evaluation of machine code. In OOPSLA, 2015
Application
LOC
No. of calls to synthesizer
Synthesis time
using McSynth (seconds)
Synthesis time using McSynth++ (seconds)
Speedup
power
19
6
16
13.5
1.19interpreter7119
3022.81.32
sha11402325.4
211.21filter107212
2411771.36dotproduct29306312267
1.17
Average speedup: 1.25X