Imitation Learning in NLP KaiWei Chang CS UCLA kwkwchangnet Couse webpage httpsuclanlpgithubioCS26917 1 ML in NLP Learning to search approaches ShiftReduce parser ID: 916699
Download Presentation The PPT/PDF document "Lecture 8: Learning to Search" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Lecture 8:Learning to Search –Imitation Learning in NLP
Kai-Wei ChangCS @ UCLAkw@kwchang.netCouse webpage: https://uclanlp.github.io/CS269-17/
1
ML in NLP
Slide2Learning to search approachesShift-Reduce parserMaintain a
buffer and a stackMake predictions from left to rightThree (four) types of actions:Shift, Reduce, Left, Right
2
Kai-Wei Chang (University of Virginia)
Credit: Google
research blog
Slide3Structured Prediction as a Search problem Decomposition of
y often implies an ordering a sequential decision making process
Kai-Wei Chang
3
I
can
can
a
can
Pro
Md
Vb
Dt
Nn
Slide4NotationsInput:
Truth:
Predicted
:
Loss
:
4
I
can
can
a
can
Pro
Md
Vb
Dt
Nn
Pro
Md
Nn
Dt
Vb
Pro
Md
Nn
Dt
Md
Pro
Md
Md
Dt
Nn
Pro
Md
Md
Dt
Vb
Goal: make joint prediction to minimize a joint loss
find
such that
minimizing
based
on
samples
Kai-Wei Chang ( MSR -> U of Virginia)
Slide5Credit Assignment ProblemWhen making a mistake, which local decision should be blamed?
5
sentence
Kai-Wei Chang (University of Virginia)
Slide6Jump in {0,1}
Right in {0,1}Left in {0,1}
Speed in {0,1}
Extracted 27K+ binary features
from last 4 observations
(14 binary features for every cell)
Output:
Input:
From Mario AI competition 2009
An
Analogy
from P
laying
Mario
High level goal:
Watch an expert play and
learn to mimic her behavior
Video credit:
St
é
phane
Ross, Geoff Gordon and Drew
Bagnell
Slide7Example of Search SpaceKai-Wei Chang
7
I
Pro
Md
Vb
Dt
Nn
decision
action
decision
can
Pro
Md
Vb
Dt
Nn
action
decision
can
Pro
Md
Vb
Dt
Nn
action
I
can
can
a
can
Pro
Md
Slide8Example of Search SpaceKai-Wei Chang
8
I
can
Pro
Md
Vb
Dt
Nn
I
can
can
a
can
Pro
Md
Vb
Dt
Vb
can
a
can
e
end
Encodes an output
ŷ = ŷ(e)
from which
loss(y, ŷ)
can be computed
(at training time)
Slide9PoliciesA policy maps observations to actionsKai-Wei Chang
9p( )
=a
obs.
input:
x
timestep
:
t
partial
traj
:
τ
… anything else
Slide10Labeled data → Reference policyGiven partial traj
.
and
true label
, the
minimum achievable loss
is
Kai-Wei Chang
10
e
Labeled data → Reference policyGiven partial traj
.
and
true label
, the
minimum achievable loss
is
The
optimal action
is the
corresponding
The
optimal policy
is the policy that always selects the optimal action
Reference policy can be constructed by the gold label in the training phase
Kai-Wei Chang
11
Slide12Ingredients for learning to searchKai-Wei Chang
12
Structured LearningInput:
Truth
:
Outputs:
Loss
:
Learning to Search
Search Space:
- state:
- action:
- end state
Policies:
Reference policy:
A Simple ApproachCollect trajectories from expert
Store as dataset
Train classifier
on
Let
play the game!
Kai-Wei Chang
13
Slide14Learning a Policy[Chang+ 15, Ross+15]
At “?” state, we construct a cost-sensitive multi-class example (?, [0, .2, .8])Kai-Wei Chang14
?
E
E
E
rollin
rollout
loss=.2
loss=0
loss=.8
one-step
deviations
Slide15Example: Sequence Labeling
Receive input:Make a sequence of predictions:Pick a timestep and try all perturbations there:
Compute losses and construct example:
x = the monster ate the sandwich
y = Dt
Nn
Vb
Dt
Nn
x = the monster ate the sandwich
ŷ = Dt
Dt
Dt
Dt
Dt
x = the monster ate the sandwich
ŷDt = Dt
Dt Vb
Dt Nn l=1ŷ
Nn
= Dt
Nn
Vb
Dt
Nn
l=0
ŷ
Vb
= Dt
Vb
Vb
Dt
Nn
l=1
( { w=monster, p=Dt,
…},
[1,0,1] )
Slide16Learning a Policy[Chang+ 15, Ross+15]
At “?” state, we construct a cost-sensitive multi-class example (?, [0, .2, .8])Kai-Wei Chang16
?
E
E
E
rollin
rollout
loss=.2
loss=0
loss=.8
one-step
deviations
Slide17Analysis
[ICML 15]: Learning to search better than your teacher
17
Slide18Analysis
Roll-in with Ref: unbounded structured regret18
Slide19Analysis
Roll-out with Ref: no local optimal if reference is sub-optimal19
Slide20Analysis
Roll-in & Roll-out with current policy ignore Ref
20
Slide21Analysis
Minimizes a combination of regret to Ref and regret to its own one-step deviations.Competes with Ref when Ref is good.Competes with local deviations to improve on suboptimal Ref
21
Slide22How to Program?Sample codes are available in VW
CS XXX Lecture 122