Autonomy Policy Learning by Demonstration Manuela M Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University Grad AI Spring 2013 Task Representation ID: 245609
Download Presentation The PPT/PDF document "Confidence Based" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Confidence Based Autonomy:Policy Learning by Demonstration
Manuela M. VelosoThanks to Sonia Chernova
Computer Science Department
Carnegie Mellon
University
Grad AI – Spring 2013Slide2
Task Representation
Robot state
Robot actions
Training dataset:
Policy as
classifier
(e.g., Gaussian Mixture Model, Support Vector Machine)
policy action decision boundary with greatest confidence for the query classification confidence w.r.t. decision boundary
sensor data
f
1
f2
sSlide3
Confidence-Based Autonomy Assumptions
Teacher understands and can demonstrate the taskHigh-level task learningDiscrete actionsNon-negligible action durationState space contains all information necessary to learn the task policy
Robot is able to stop to request demonstration
… however, the environment may continue to changeSlide4
Policy
No
Yes
Confident Execution
s
2
s
t
…
s
i
…
s
4
s
3
s
1
Time
Current
State
s
i
Request
Demonstration
?
Execute
Action
a
p
Relearn Classifier
Execute
Action
a
d
Request
Demonstration
a
d
Add Training
Point (s
i
, a
d
)Slide5
Demonstration Selection
When should the robot request a demonstration? To obtain useful training dataTo restrict autonomy in areas of uncertaintySlide6
Fixed Confidence Threshold
Why not apply a fixed classification confidence threshold?Example: conf = 0.5
Simple
How to select good threshold value?
s
sSlide7
Confident Execution Demonstration Selection
Distance parameter dist Used to identify outliers and unexplored regions of state spaceSet of confidence parameters conf
Used to identify ambiguous state regions in which more than one action is applicableSlide8
Confident Execution Distance Parameter
Distance parameter
dist
s
where
Given
Given state query , request demonstration ifSlide9
Confident Execution Confidence Parameters
Set of confidence parameters
conf
One for each decision boundary
where
Given
and classifier
Given state query , request demonstration if
sSlide10
Policy
No
Yes
Confident Execution
s
i
Request
Demonstration
?
Execute
Action
a
p
Relearn Classifier
Execute
Action
a
d
Request
Demonstration
a
d
Add Training
Point (s
i
, a
d
)
orSlide11
Corrective
Demonstration
Confidence-Based Autonomy
Confident
Execution
Policy
No
Yes
s
i
Request
Demonstration
?
Execute
Action
a
p
Relearn Classifier
Execute
Action
a
d
Request
Demonstration
a
d
Add Training
Point (s
i
, a
d
)
a
c
Teacher
Relearn
Classifier
Add Training
Point (s
i
, a
c
)Slide12
Evaluation in Driving Domain
Introduced by
Abbeel and Ng, 2004
Task:
Teach the agent to drive on the highway
Fixed driving speed
Pass slower cars and avoid collisions
current lane
nearest car lane 1
nearest car lane 2
nearest car lane 3
state
merge left
merge right
stay in lane
actionsSlide13
Evaluation in Driving Domain
Demonstration Selection Method
# Demonstrations
Collision Timesteps
“
Teacher knows best
”
1300
2.7%
Confident Execution
fixed
conf 1016
3.8%
Confident Execution
dist
&
mult
.conf
504
1.9%
CBA
703
0%
CBA Final PolicySlide14
Demonstrations Over Time
Total Demonstrations
Confident Execution
Corrective DemonstrationSlide15Slide16
Summary
Confidence-Based Autonomy algorithmConfident Execution demonstration selection Corrective DemonstrationSlide17
What did we do today?
(PO)MDPs: need to generate a good policyAssumes the agent has some method for estimating its state (given current belief state and action, observation, where do I think I am now?)How do we estimate this?Discrete latent states
HMMs (simplest DBNs)
Continuous latent states, observed states drawn from Gaussian, linear dynamical system
Kalman filters(Assumptions relaxed by Extended
Kalman Filter, etc)Not analytic particle filtersTake weighted samples (“particles”) of an underlying distributionWe’ve mainly looked at policies for discrete state spacesFor continuous state spaces, can use
LfD:ML gives us a good-guess action based on past actions
If we’re not confident enough, ask for help!