/
Building a Semantic Parser Overnight Building a Semantic Parser Overnight

Building a Semantic Parser Overnight - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
395 views
Uploaded On 2017-08-04

Building a Semantic Parser Overnight - PPT Presentation

Overnight framework Which country has the highest CO2 emissions Which had the highest increase since last year What fraction is from the five countries with highest GDP Training data The data problem ID: 575700

alice grammar parser article grammar alice article parser dcs seed data author cites lambda publication entities lexicon player logical

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Building a Semantic Parser Overnight" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Building a Semantic Parser OvernightSlide2

Overnight frameworkSlide3

Which country has the highest CO2 emissions?

Which had the highest increase since last year?

What fraction is from the five countries with highest GDP?Slide4
Slide5
Slide6
Slide7

Training dataSlide8
Slide9

The data problem:

The main database is 600 samples (GEO880)

To compare:

Labeled photos: millionsSlide10

Not only quantity:

The data can lack critical functionality Slide11

The process

Domain Seed lexicon

Logical forms and canonical utterances

Paraphrases

Semantic parserSlide12

The data base:

Triples (e1, p, e2)

e1 and e2 are entities

(e.g., article1, 2015)

p is a property

(e.g.,

publicationDate

)Slide13

Seed lexicon

For every property, a lexical entry of the form

<t → s[p]>

t is a natural language phrase and s is a syntactic category

< “publication date” → RELNP[

publicationDate

]>Slide14

Seed lexicon

In addition, L contains

two

typical entities for each semantic type in the database

<

alice

→ NP[

alice

]> Slide15

Unary

TYPENP

ENTITYNP

Verb phrases VP ( “has a private bath”)

Binaries:

RELNP functional properties (e.g., “publication date”)

VP/NP transitive verbs (“cites”, “is the president of”)Slide16

Grammar

<α1 . . . αn → s[z]>

α1 . . . αn tokens or categories,

s is a syntactic category

z is the logical form constructedSlide17

Grammar

<RELNP[r] of NP[x] → NP[R(r).x]>

Z: R(

publicationDate

).article1

C: “publication date of article 1”Slide18

Crowdsourcing

X: “when was article 1 published?”

D = {(x, c, z)} for each (z, c) ∈ GEN(G ∪ L) and x ∈ P(c)Slide19

Training

log-linear distribution

p

θ

(z, c | x, w) Slide20

Under the hood Slide21

Lambda DCS

Entity: singleton set {e}

Property: set of pairs (e1, e2)Slide22

Lambda DCS

binary b and unary u join

b.u

 Slide23

Lambda DCS

¬u

 Slide24

Lambda DCS

R(b)

(e1, e2) ∈ [b] -> (e2, e1) ∈ [R(b)]Slide25

Lambda DCS

count(u)

sum(u)

average(u, b)

argmax

(u, b)Slide26

Lambda DCS

λx.u

is a set of (e1, e2): e1 ∈ [u[x/e2]]

w

R(

λx.count

(R(cites).x))

(e1, e2), where e2 is the number of entities that e1 cites.Slide27

Seed lexicon for the SOCIAL domainSlide28

Seed lexicon

article

publication date

cites

won an awardSlide29

Grammar

Assumption 1 (Canonical compositionality):

Using a small grammar, all logical forms expressible in natural language can be realized compositionally based on the logical form.Slide30

Grammar

Functionality-driven

Generate superlatives, comparatives, negation, and coordinationSlide31

GrammarSlide32

Grammar

From seed: types, entities, and properties

noun phrases (NP)

verbs phrases (VP)

complementizer

phrase (CP)

“that cites

Building a Semantic Parser Overnight

“that cites more than three article”Slide33

GrammarSlide34

GrammarSlide35

GrammarSlide36

Paraphrasing

“meeting whose attendee is

alice

” ⇒ “meeting with

alice

“author of article 1” ⇒ “who wrote article 1”

“player whose number of points is 15” ⇒ “player who scored 15 points”Slide37

Paraphrasing

“article that has the largest publication date ⇒ newest article”.

“housing unit whose housing type is apartment ⇒ apartment”

“university of student

alice

whose field of study is music” ⇒

“At which university did Alice study music?”,

“Which university did Alice attend?”Slide38

Sublexical compositionality

“parent of

alice

whose gender is female ⇒ mother of

alice

”.

“person that is author of paper whose author is X ⇒ co-author of X”

“person whose birthdate is birthdate of X ⇒ person born on the same day as X”.

“meeting whose start time is 3pm and whose end time is 5pm ⇒ meetings between 3pm and 5pm”

“that allows cats and that allows dogs ⇒ that allows pets”

“author of article that article whose author is X cites ⇒ who does X cite”.Slide39

Crowdsourcing in numbers

Each

turker

paraphrased 4 utterances

28 seconds on average per paraphrase

38,360 responses

26,098 examples remainedSlide40

Paraphrasing noise in the data 17%

noise in the data 17%

(“player that has the least number of team ⇒ player with the lowest jersey number”)

(“restaurant whose star rating is 3 stars ⇒ hotel which has a 3 star rating”).Slide41

Model and Learning

numbers, dates, and database entities firstSlide42

Model and Learning

(z, c) ∈ GEN(G ∪ Lx)

z, c | x, w) ∝

exp

(

φ(

c, z, x, w) >

θ

)

 Slide43

Floating parserSlide44

Floating parserSlide45

Floating parserSlide46

Floating parserSlide47

Model and Learning

FeaturesSlide48

Model and Learning

 

AdaGrad

(

Duchi

et al., 2010)Slide49

Experimental Evaluation