for the Mass Markets Alex Polozov polozovcswashingtonedu Microsoft PROSE team prosecontactmicrosoftcom Jan 20 2017 1 UC Berkeley httpsmicrosoftgithubioprose PRO gram S ynthesis using ID: 747216
Download Presentation The PPT/PDF document "PROSE: Inductive Program Synthesis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
PROSE: Inductive Program Synthesisfor the Mass Markets
Alex Polozovpolozov@cs.washington.eduMicrosoft PROSE teamprose-contact@microsoft.com
Jan 20, 2017
1
UC Berkeley
https://microsoft.github.io/proseSlide2
PROgram S
ynthesis using Examples
Vu Le
Daniel Perelman
Danny Simmons
Mark Plesko
Mohammad Raza
Abhishek
Udupa
Sumit Gulwani
Ranvijay Kumar
Alex Polozov
Jan 20, 2017
Prateek Jain
UC Berkeley
2Slide3
Hackathon
Build a programming-by-examples app with PROSE Win an Xbox!Saturday & Sunday Immersion Room @ SWARM Lab amhttp://tiny.cc/prose-hack
Jan 20, 2017UC Berkeley3Slide4
Outline
Jan 20, 2017UC Berkeley4Slide5
Motivation99% of spreadsheet users
do not know programmingData scientists spend 80% time extracting & cleaning dataJan 20, 2017UC Berkeley5Slide6
Flash FillJan 20, 2017
6UC BerkeleySlide7
PBE Architecture
Jan 20, 2017UC Berkeley7
Program Synthesizer
Debugging
Example-based
intent spec
Ranking function
DSL
Ranked
program set
Intended
program
Refined intent
Translator
Test inputs
Intended program in
Python/C#/C++/…Slide8
PBE Timeline
Jan 20, 2017UC Berkeley8
…
2010-2012
[POPL 11]
FlashFill
(text transformations)
2012-2014
[PLDI 14]
FlashExtract
(text extraction)
2012-2015
[PLDI 15]
FlashRelate
(table transformations)Slide9
“Project FlashMeta”
Challenge №: Program synthesis in a real-life domain is hard.Guided by a domain-specific languageExploits domain-specific knowledge about operator propertiesChallenge №
: Must be accessible to end users.Spec: input/output examples
Challenge №: Robust & agile industrial deployment.
-
person-years of PhD-caliber work, one-off
Jan 20, 2017
UC Berkeley
9Slide10
PBE Timeline
Jan 20, 2017UC Berkeley10
…
2010-2012
[POPL 11]
FlashFill
(text transformations)
2012-2014
[PLDI 14]
FlashExtract
(text extraction)
2012-2015
[PLDI 15]
FlashRelate
(table transformations)
2014-2015
[OOPSLA 15]
FlashMeta
(PBE framework)
PROSE
SDK
2015-presentSlide11
Outline
Jan 20, 2017UC Berkeley11Slide12
Meta-synthesizer framework
PROSESynthesisStrategies
DSLDefinition
I/O Specification
Synthesizer
Input
Output
Programs
App
PROSE
12
Jan 20, 2017
UC BerkeleySlide13
Meta-synthesizer framework
PROSESynthesisStrategies
DSLDefinition
I/O Specification
Synthesizer
Input
Output
Programs
App
PROSE
13
Jan 20, 2017
UC BerkeleySlide14
Key Insights
PBE search strategy entangled divide-and-conquer algorithm and domain-specific insightsRefactor out the generic divide-and-conquer algorithmDomain-specific insights are accessible to non-logiciansDomain-specific insights depend only on specific operator semanticsModularity: define once, reuse in multiple DSLs
Jan 20, 201714UC BerkeleySlide15
Jan 20, 201715
Backpropagation in one slide
Seattle, WAExamples for
Substring(s, P1, P2)
Seattle, WA
Seattle, WA
Examples for
P
1
Examples for
P
2
[Polozov & Gulwani 15]
UC BerkeleySlide16
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of Jan 20, 2017UC Berkeley
16Slide17
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
Jan 20, 2017
UC Berkeley17Slide18
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
Jan 20, 2017
UC Berkeley18Slide19
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
Jan 20, 2017UC Berkeley19Slide20
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-100511-75
Jan 20, 2017
UC Berkeley
20Slide21
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-100511-75
100
-100
51
-75
Jan 20, 2017
UC Berkeley
21Slide22
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-100511-75
100
-100
51
-75
100
0
51
5
Jan 20, 2017
UC Berkeley
22Slide23
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-100511-75
100
-100
51
-75
100
0
51
5
Backpropagation
Jan 20, 2017
UC Berkeley
23Slide24
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-100511-75
100
-100
51
-75
100
0
51
5
Backpropagation
Conditional backpropagation
Jan 20, 2017
UC Berkeley
24Slide25
Input
Output10076-100
51
51-75
Backpropagation, a.k.a. Inverse Semantics
Goal:
find
and
s.t.
satisfies the examples
Method:
invert the semantics
of
100
76-10051
51-75
100
7 76 76- 76-1 …
515
51
51- 51-7 …100
51
100
7515
100
76-10
51
51-7
100
76
5151
…
…
1006-10051
1-75
100
-100
51
-75
100
0
51
5
Backpropagation
Conditional backpropagation
Jan 20, 2017
Ex. BK: breaking numbers is unlikely
Ex. BK: breaking numbers is unlikely
UC Berkeley
25Slide26
Backpropagation, a.k.a. Inverse Semantics
Efficient: deductive “divide-and-conquer” descent over the DSLAlways operates over a partially correct programUser-facing app must respond in
- sec
Modular: define backprop functions once for , reuse in all DSLsAccessible: compact domain-specific insight with background knowledge
The search strategy is refactored out
Development time:
months
weeks!
Jan 20, 2017
UC Berkeley
26Slide27
Performance & Number of Examples
Jan 20, 201727UC BerkeleySlide28
Outline
Jan 20, 2017UC Berkeley28Slide29
Ambiguity ResolutionJan 20, 2017
29UC BerkeleySlide30
P
2 = Concat
(First capital letter, first letter of last word)
P3
=
Concat
(“I”, first letter of last word)
P
4
=
Concat
(first capital letter, “A”)
Example
Input
Isaac Asimov
Kyokutei
Bakin
Howard Roger
Garis
Enid Blyton
Edwy
S. Brooks
Barbara Cartland
Margaret Atwood
Iain M. Banks
John Smith III
P
1
IA
KB
HR
EB
ES
BC
MA
IM
JS
P
2
IA
KB
HG
EB
EB
BC
MA
IB
JI
P
3
IA
IB
IG
IB
IB
IC
IA
IB
II
P
4
IA
KA
HA
EA
EA
BA
MA
IA
JA
P
5
IA
KB
HRoger
G
EB
ES. B
BC
MA
IM. B
JS
P
6
IA
KB
HG
EB
EB
BC
MA
IB
JS
P
5
=
Concat
(First capital letter,
SubStr
(second capital letter, last lowercase word))
P
6
=
Concat
(First capital letter,
SubStr
(last capital letter followed by lowercase word, +1))
P
1
=
Concat
(First capital letter, second capital letter)
Jan 20, 2017
30
UC BerkeleySlide31
AnecdotesFlash Fill was not accepted to Excel until it solved the most common scenarios from one
exampleSome users still do not know you can give two!Jan 20, 201731
Adam Smith
AdamAlice Williams
Alic
UC BerkeleySlide32
x
Concat(Round(x, Down
, 25), Const
(“-”), Round(x, Up
, 25))
Ambiguity resolution
Option 1:
machine-learned robustness-based ranking
Idioms/patterns from test data can influence search & ranking
E.g.:
bucketing
Option 2:
interactive clarification
Pick an input or a subset of inputs to use for disambiguationJan 20, 2017UC Berkeley3210076-100
51
51-75
86x Concat(Round(x, Down, 25), Const
(“-”), Round(x, Up, 25))
[Singh & Gulwani 15]Slide33
P
2 = Concat
(First capital letter, first letter of last word)
P3
=
Concat
(“I”, first letter of last word)
P
4
=
Concat
(first capital letter, “A”)
Distinguishing Inputs
Input
Isaac Asimov
Kyokutei
Bakin
Howard Roger
Garis
Enid Blyton
Edwy
S. Brooks
Barbara Cartland
Margaret Atwood
Iain M. Banks
John Smith III
P
1
IA
KB
HR
EB
ES
BC
MA
IM
JS
P
2
IA
KB
HG
EB
EB
BC
MA
IB
JI
P
3
IA
IB
IG
IB
IB
IC
IA
IB
II
P
4
IA
KA
HA
EA
EA
BA
MA
IA
JA
P
5
IA
KB
HRoger
G
EB
ES. B
BC
MA
IM. B
JS
P
6
IA
KB
HG
EB
EB
BC
MA
IB
JS
P
5
=
Concat
(First capital letter,
SubStr
(second capital letter, last lowercase word))
P
6
=
Concat
(First capital letter,
SubStr
(last capital letter followed by lowercase word, +1))
P
1
=
Concat
(First capital letter, second capital letter)
Jan 20, 2017
33
UC BerkeleySlide34
Ambiguity resolution – Summary
Level : manual (syntactic) ranking schemeLevel : machine-learned (syntactic + data-based) ranking schemeMajor boost from semantic data featuresLevel : interactive disambiguationUse Z3 to pick the best distinguishing inputs (
set cover)Focus only on top-ranked programs (for sensibility & performance)
Ambiguity resolution Learning
Jan 20, 2017
UC Berkeley
34Slide35
DevelopmentInitial DSL design:
weeksThis is not a bottleneck!*Ranking: bulk of the effortDesigning a score for is
-x longer than designing (incl. synthesis!)E.g.:
rock-paper-scissors among string processing operators
Jan 20, 2017
35
*
Once you learn the skill…
Should I process the string
“25-06-11”
with regexes? Treat it
as a numeric computation? A date?
UC BerkeleySlide36
Noise
InputOutput2/3/2011Thu
1/11/2017Wed
10/4/2016thu
Jan 20, 2017
UC Berkeley
36
It is easier to prevent a mistake in a spec than to fix it.
How did you know it was a Thursday in the first place?
Autocompletion of examples
Noise detection
Slide37
Outline
Jan 20, 2017UC Berkeley37Slide38
Predictive Program Synthesis
Goal: zero-example program synthesisNo output examples, rely solely on the input dataBenefits: reduce manual effort, enable batch data processingIdea: introduce domain-specific lifting functions in the classic bottom-up enumerative synthesisDetails:
[Raza & Gulwani ]
Jan 20, 2017UC Berkeley
38Slide39
Example: Text splitting
Any number of arbitrary delimiter strings
A string be used as a delimiter in some places but not in the others
A delimiter may be empty
Jan 20, 2017
39
UC BerkeleySlide40
Interactive Program Synthesis
Goal: leverage the iterative nature of a PBE session.Idea: convert the version space algebra representation of into a
finite expansion of a DSL
, and learn in it.Details: paper under preparation, see also [SYNT 16].
Jan 20, 2017
UC Berkeley
40
Given a program set
that satisfies the currently accumulated spec
, and a new constraint
, learn a subset
of programs that satisfy the new spec
Slide41
Interactive Program SynthesisJan 20, 2017
41
Intended
program
Program Synthesis Framework
Intent spec
:
input-output examples
Program ranking
function
Domain-specific
language
Refined intent
Translator
Test inputs
Deployable code
in Python/R/C#/…User (Debugging)
Ranked set of
valid programs
Hypothesizer
Interactive
questions
😊😠
Best
programRefined DSL induced by
UC BerkeleySlide42
Summary
Decomposition of PBE into a meta-algorithm & backprop functionsPROSE: Modular and accessible for industrial software developmentDeductive reasoning
ensures real-time response on wrangling tasksKey challenges of industrial PBE: ambiguity resolution and debugging
Interactive clarification is the most effective disambiguation modelShould be a first-class citizen in the synthesis frameworksCome try for yourself tomorrow!Jan 20, 2017
42
Thank you!
UC Berkeley
https://microsoft.github.io/prose