/
Policy Gradient as a Proxy for Policy Gradient as a Proxy for

Policy Gradient as a Proxy for - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
350 views
Uploaded On 2018-11-04

Policy Gradient as a Proxy for - PPT Presentation

Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein Policy Gradient as a Proxy for ID: 713088

cat 2016 dynamic man 2016 cat man dynamic true oracle policy parsing gradient nap constituent oracles training constituency bias loss local mismatch

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Policy Gradient as a Proxy for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Daniel Fried and Dan KleinSlide2

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Daniel Fried and Dan KleinSlide3

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Daniel Fried and Dan KleinSlide4

Parsing by Local DecisionsThe

cat

took

a

nap

.

NP

NP

VP

S

(S

(NP

The

cat

)

(VP

 

…Slide5

Non-local ConsequencesExposure Bias

Prediction

True

Parse

(S

(NP

The

(S

(VP

(NP

cat

??

 

 

[

Ranzato

et al. 2016; Wiseman and Rush 2016]

Loss-Evaluation Mismatch

The

cat

took

a

nap

.

NP

NP

VP

S

The

cat

took

a

nap

.

VP

NP

VP

S

NP

: -F1

 

 

 Slide6

Dynamic Oracle TrainingPrediction

(sample, or greedy)True Parse

(S

(NP

The

(S

(VP

(NP

cat

The

The

 

 

(NP

Oracle

 

The

cat

Explore at training time. Supervise each state with

an expert policy.

 

 

choose to maximize achievable F1 (typically)

addresses loss mismatch

addresses exposure bias

[Goldberg &

Nivre

2012; Ballesteros et al. 2016; inter alia]Slide7

Dynamic Oracles Help!Expert Policies / Dynamic Oracles

Daume III et al., 2009; Ross et al., 2011;

Choi and Palmer, 2011; Goldberg and

Nivre

, 2012;

Chang et al., 2015; Ballesteros et al., 2016; Stern et al. 2017

System

Static Oracle

Dynamic

Oracle

Coavoux

and

Crabbé

, 2016

88.6

89.0

Cross and Huang, 2016

91.0

91.3

Fernández

-González and

Gómez-Rodríguez, 2018

91.5

91.7

PTB Constituency Parsing F1

m

ostly dependency parsingSlide8

What if we don’t have a dynamic oracle?

Use

r

einforcement learningSlide9

Reinforcement Learning Helps! (in other tasks)Auli

and Gao, 2014; Ranzato et al., 2016; Shen et al., 2016

machine translation

Xu et al., 2016; Wiseman and Rush, 2016;

Edunov

et al. 2017

machine translation

several, including dependency parsing

CCG parsingSlide10

Policy Gradient Training[Williams, 1992]

Minimize expected sequence-level cost:

 

a

ddresses exposure bias (compute by sampling)

addresses loss mismatch

(compute F1)

compute in the same way as for the

true tree

The

man

had

an

idea

.

NP

NP

VP

S

The

man

had

an

idea

.

NP

NP

VP

S

NP

 

Prediction

True Parse

 

 

 Slide11

Policy Gradient Training

(negative F1)

 

The cat took a nap.

The

cat

took

a

nap

.

NP

NP

VP

S

NP

 

 

The

cat

took

a

nap

.

NP

NP

VP

S-INV

 

 

The

cat

took

a

nap

.

NP

NP

ADJP

S

 

 

gradient

for candidate

 

 

 

The

cat

took

a

nap

.

NP

NP

VP

S

 

 

 

 

k

candidates,

 

Input,

 Slide12

ExperimentsSlide13

Setup

Parsers

Span-Based [Cross & Huang, 2016]

Top-Down [Stern et al. 2016]

RNNG [Dyer et al. 2016]

In-Order [Liu and Zhang, 2017]

Training

Static oracle

Dynamic oracle

Policy gradient

xSlide14

English PTB F1Slide15

Training EfficiencyPTB learning curves for the Top-Down parserSlide16

French Treebank F1Slide17

Chinese Penn Treebank v5.1 F1Slide18

ConclusionsLocal decisions can have non-local consequences

Loss mismatchExposure bias

How to deal with the issues caused by local decisions?

Dynamic oracles: efficient, model specific

Policy gradient: slower to train, but general purposeSlide19

Thank you!Slide20

For Comparison: A Novel Oracle for RNNG

(S

(NP

The

man

1. Close current constituent if it’s a true constituent…

… or it could never be a true constituent.

2

. Otherwise, open the outermost unopened true constituent at this position.

3. Otherwise, shift the next word.

(S

(NP

The

man

)

(VP

had

)

(S

(NP

The

man

)

(VP

)

(S

(NP

The

man

)

(VP

(S

(NP

The

man

)

(VP

had

…Slide21

What if we don’t have a dynamic oracle?

Define oneSlide22

For Comparison: A Novel Oracle for RNNG

(S

(NP

The

man

1. Close current constituent if it’s a true constituent…

… or it could never be a true constituent.

2

. Otherwise, open the outermost unopened true constituent at this position.

3. Otherwise, shift the next word.

(S

(NP

The

man

)

(VP

had

)

(S

(NP

The

man

)

(VP

)

(S

(NP

The

man

)

(VP

(S

(NP

The

man

)

(VP

had