Overnight framework Which country has the highest CO2 emissions Which had the highest increase since last year What fraction is from the five countries with highest GDP Training data The data problem ID: 575700
Download Presentation The PPT/PDF document "Building a Semantic Parser Overnight" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Building a Semantic Parser OvernightSlide2
Overnight frameworkSlide3
Which country has the highest CO2 emissions?
Which had the highest increase since last year?
What fraction is from the five countries with highest GDP?Slide4Slide5Slide6Slide7
Training dataSlide8Slide9
The data problem:
The main database is 600 samples (GEO880)
To compare:
Labeled photos: millionsSlide10
Not only quantity:
The data can lack critical functionality Slide11
The process
Domain Seed lexicon
Logical forms and canonical utterances
Paraphrases
Semantic parserSlide12
The data base:
Triples (e1, p, e2)
e1 and e2 are entities
(e.g., article1, 2015)
p is a property
(e.g.,
publicationDate
)Slide13
Seed lexicon
For every property, a lexical entry of the form
<t → s[p]>
t is a natural language phrase and s is a syntactic category
< “publication date” → RELNP[
publicationDate
]>Slide14
Seed lexicon
In addition, L contains
two
typical entities for each semantic type in the database
<
alice
→ NP[
alice
]> Slide15
Unary
TYPENP
ENTITYNP
Verb phrases VP ( “has a private bath”)
Binaries:
RELNP functional properties (e.g., “publication date”)
VP/NP transitive verbs (“cites”, “is the president of”)Slide16
Grammar
<α1 . . . αn → s[z]>
α1 . . . αn tokens or categories,
s is a syntactic category
z is the logical form constructedSlide17
Grammar
<RELNP[r] of NP[x] → NP[R(r).x]>
Z: R(
publicationDate
).article1
C: “publication date of article 1”Slide18
Crowdsourcing
X: “when was article 1 published?”
D = {(x, c, z)} for each (z, c) ∈ GEN(G ∪ L) and x ∈ P(c)Slide19
Training
log-linear distribution
p
θ
(z, c | x, w) Slide20
Under the hood Slide21
Lambda DCS
Entity: singleton set {e}
Property: set of pairs (e1, e2)Slide22
Lambda DCS
binary b and unary u join
b.u
Slide23
Lambda DCS
¬u
Slide24
Lambda DCS
R(b)
(e1, e2) ∈ [b] -> (e2, e1) ∈ [R(b)]Slide25
Lambda DCS
count(u)
sum(u)
average(u, b)
argmax
(u, b)Slide26
Lambda DCS
λx.u
is a set of (e1, e2): e1 ∈ [u[x/e2]]
w
R(
λx.count
(R(cites).x))
(e1, e2), where e2 is the number of entities that e1 cites.Slide27
Seed lexicon for the SOCIAL domainSlide28
Seed lexicon
article
publication date
cites
won an awardSlide29
Grammar
Assumption 1 (Canonical compositionality):
Using a small grammar, all logical forms expressible in natural language can be realized compositionally based on the logical form.Slide30
Grammar
Functionality-driven
Generate superlatives, comparatives, negation, and coordinationSlide31
GrammarSlide32
Grammar
From seed: types, entities, and properties
noun phrases (NP)
verbs phrases (VP)
complementizer
phrase (CP)
“that cites
Building a Semantic Parser Overnight
”
“that cites more than three article”Slide33
GrammarSlide34
GrammarSlide35
GrammarSlide36
Paraphrasing
“meeting whose attendee is
alice
” ⇒ “meeting with
alice
”
“author of article 1” ⇒ “who wrote article 1”
“player whose number of points is 15” ⇒ “player who scored 15 points”Slide37
Paraphrasing
“article that has the largest publication date ⇒ newest article”.
“housing unit whose housing type is apartment ⇒ apartment”
“university of student
alice
whose field of study is music” ⇒
“At which university did Alice study music?”,
“Which university did Alice attend?”Slide38
Sublexical compositionality
“parent of
alice
whose gender is female ⇒ mother of
alice
”.
“person that is author of paper whose author is X ⇒ co-author of X”
“person whose birthdate is birthdate of X ⇒ person born on the same day as X”.
“meeting whose start time is 3pm and whose end time is 5pm ⇒ meetings between 3pm and 5pm”
“that allows cats and that allows dogs ⇒ that allows pets”
“author of article that article whose author is X cites ⇒ who does X cite”.Slide39
Crowdsourcing in numbers
Each
turker
paraphrased 4 utterances
28 seconds on average per paraphrase
38,360 responses
26,098 examples remainedSlide40
Paraphrasing noise in the data 17%
noise in the data 17%
(“player that has the least number of team ⇒ player with the lowest jersey number”)
(“restaurant whose star rating is 3 stars ⇒ hotel which has a 3 star rating”).Slide41
Model and Learning
numbers, dates, and database entities firstSlide42
Model and Learning
(z, c) ∈ GEN(G ∪ Lx)
z, c | x, w) ∝
exp
(
φ(
c, z, x, w) >
θ
)
Slide43
Floating parserSlide44
Floating parserSlide45
Floating parserSlide46
Floating parserSlide47
Model and Learning
FeaturesSlide48
Model and Learning
AdaGrad
(
Duchi
et al., 2010)Slide49
Experimental Evaluation