/
Lecture  8: Learning to Search Lecture  8: Learning to Search

Lecture 8: Learning to Search - PowerPoint Presentation

della
della . @della
Follow
343 views
Uploaded On 2022-06-11

Lecture 8: Learning to Search - PPT Presentation

Imitation Learning in NLP KaiWei Chang CS UCLA kwkwchangnet Couse webpage httpsuclanlpgithubioCS26917 1 ML in NLP Learning to search approaches ShiftReduce parser ID: 916699

chang wei pro loss wei chang loss pro kai learning search policy ref action decision reference analysis optimal roll

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture 8: Learning to Search" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Lecture 8:Learning to Search –Imitation Learning in NLP

Kai-Wei ChangCS @ UCLAkw@kwchang.netCouse webpage: https://uclanlp.github.io/CS269-17/

1

ML in NLP

Slide2

Learning to search approachesShift-Reduce parserMaintain a

buffer and a stackMake predictions from left to rightThree (four) types of actions:Shift, Reduce, Left, Right

2

Kai-Wei Chang (University of Virginia)

Credit: Google

research blog

Slide3

Structured Prediction as a Search problem Decomposition of

y often implies an ordering a sequential decision making process

 

Kai-Wei Chang

3

I

can

can

a

can

Pro

Md

Vb

Dt

Nn

Slide4

NotationsInput:

Truth:

Predicted

:

Loss

:

 

4

I

can

can

a

can

Pro

Md

Vb

Dt

Nn

Pro

Md

Nn

Dt

Vb

Pro

Md

Nn

Dt

Md

Pro

Md

Md

Dt

Nn

Pro

Md

Md

Dt

Vb

Goal: make joint prediction to minimize a joint loss

find

such that

minimizing

based

on

samples

 

Kai-Wei Chang ( MSR -> U of Virginia)

Slide5

Credit Assignment ProblemWhen making a mistake, which local decision should be blamed?

5

sentence

Kai-Wei Chang (University of Virginia)

Slide6

Jump in {0,1}

Right in {0,1}Left in {0,1}

Speed in {0,1}

Extracted 27K+ binary features

from last 4 observations

(14 binary features for every cell)

Output:

Input:

From Mario AI competition 2009

An

Analogy

from P

laying

Mario

High level goal:

Watch an expert play and

learn to mimic her behavior

Video credit:

St

é

phane

Ross, Geoff Gordon and Drew

Bagnell

Slide7

Example of Search SpaceKai-Wei Chang

7

I

Pro

Md

Vb

Dt

Nn

decision

action

decision

can

Pro

Md

Vb

Dt

Nn

action

decision

can

Pro

Md

Vb

Dt

Nn

action

I

can

can

a

can

Pro

Md

Slide8

Example of Search SpaceKai-Wei Chang

8

I

can

Pro

Md

Vb

Dt

Nn

I

can

can

a

can

Pro

Md

Vb

Dt

Vb

can

a

can

e

end

Encodes an output

ŷ = ŷ(e)

from which

loss(y, ŷ)

can be computed

(at training time)

Slide9

PoliciesA policy maps observations to actionsKai-Wei Chang

9p( )

=a

obs.

input:

x

timestep

:

t

partial

traj

:

τ

… anything else

Slide10

Labeled data → Reference policyGiven partial traj

.

and

true label

, the

minimum achievable loss

is

 

Kai-Wei Chang

10

 

 

 

 

e

 

 

Slide11

Labeled data → Reference policyGiven partial traj

.

and

true label

, the

minimum achievable loss

is

The

optimal action

is the

corresponding

The

optimal policy

is the policy that always selects the optimal action

Reference policy can be constructed by the gold label in the training phase

 

Kai-Wei Chang

11

Slide12

Ingredients for learning to searchKai-Wei Chang

12

Structured LearningInput:

Truth

:

Outputs:

Loss

:

 

Learning to Search

Search Space:

- state:

- action:

- end state

Policies:

Reference policy:

 

Slide13

A Simple ApproachCollect trajectories from expert

Store as dataset

Train classifier

on

Let

play the game!

 

Kai-Wei Chang

13

Slide14

Learning a Policy[Chang+ 15, Ross+15]

At “?” state, we construct a cost-sensitive multi-class example (?, [0, .2, .8])Kai-Wei Chang14

?

E

E

E

rollin

rollout

loss=.2

loss=0

loss=.8

one-step

deviations

Slide15

Example: Sequence Labeling

Receive input:Make a sequence of predictions:Pick a timestep and try all perturbations there:

Compute losses and construct example:

x = the monster ate the sandwich

y = Dt

Nn

Vb

Dt

Nn

x = the monster ate the sandwich

ŷ = Dt

Dt

Dt

Dt

Dt

x = the monster ate the sandwich

ŷDt = Dt

Dt Vb

Dt Nn l=1ŷ

Nn

= Dt

Nn

Vb

Dt

Nn

l=0

ŷ

Vb

= Dt

Vb

Vb

Dt

Nn

l=1

( { w=monster, p=Dt,

…},

[1,0,1] )

Slide16

Learning a Policy[Chang+ 15, Ross+15]

At “?” state, we construct a cost-sensitive multi-class example (?, [0, .2, .8])Kai-Wei Chang16

?

E

E

E

rollin

rollout

loss=.2

loss=0

loss=.8

one-step

deviations

Slide17

Analysis

[ICML 15]: Learning to search better than your teacher

17

Slide18

Analysis

Roll-in with Ref: unbounded structured regret18

Slide19

Analysis

Roll-out with Ref: no local optimal if reference is sub-optimal19

Slide20

Analysis

Roll-in & Roll-out with current policy ignore Ref

 

20

Slide21

Analysis

Minimizes a combination of regret to Ref and regret to its own one-step deviations.Competes with Ref when Ref is good.Competes with local deviations to improve on suboptimal Ref

21

Slide22

How to Program?Sample codes are available in VW

CS XXX Lecture 122