/
Confidence Based Confidence Based

Confidence Based - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
412 views
Uploaded On 2016-03-07

Confidence Based - PPT Presentation

Autonomy Policy Learning by Demonstration Manuela M Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University Grad AI Spring 2013 Task Representation ID: 245609

state demonstration confidence confident demonstration state confident confidence action request policy execution classifier execute training conf lane task autonomy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Confidence Based" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Confidence Based Autonomy:Policy Learning by Demonstration

Manuela M. VelosoThanks to Sonia Chernova

Computer Science Department

Carnegie Mellon

University

Grad AI – Spring 2013Slide2

Task Representation

Robot state

Robot actions

Training dataset:

Policy as

classifier

(e.g., Gaussian Mixture Model, Support Vector Machine)

policy action decision boundary with greatest confidence for the query classification confidence w.r.t. decision boundary

sensor data

f

1

f2

sSlide3

Confidence-Based Autonomy Assumptions

Teacher understands and can demonstrate the taskHigh-level task learningDiscrete actionsNon-negligible action durationState space contains all information necessary to learn the task policy

Robot is able to stop to request demonstration

… however, the environment may continue to changeSlide4

Policy

No

Yes

Confident Execution

s

2

s

t

s

i

s

4

s

3

s

1

Time

Current

State

s

i

Request

Demonstration

?

Execute

Action

a

p

Relearn Classifier

Execute

Action

a

d

Request

Demonstration

a

d

Add Training

Point (s

i

, a

d

)Slide5

Demonstration Selection

When should the robot request a demonstration? To obtain useful training dataTo restrict autonomy in areas of uncertaintySlide6

Fixed Confidence Threshold

Why not apply a fixed classification confidence threshold?Example: conf = 0.5

Simple

How to select good threshold value?

s

sSlide7

Confident Execution Demonstration Selection

Distance parameter dist Used to identify outliers and unexplored regions of state spaceSet of confidence parameters conf

Used to identify ambiguous state regions in which more than one action is applicableSlide8

Confident Execution Distance Parameter

Distance parameter

dist

s

where

Given

Given state query , request demonstration ifSlide9

Confident Execution Confidence Parameters

Set of confidence parameters

conf

One for each decision boundary

where

Given

and classifier

Given state query , request demonstration if

sSlide10

Policy

No

Yes

Confident Execution

s

i

Request

Demonstration

?

Execute

Action

a

p

Relearn Classifier

Execute

Action

a

d

Request

Demonstration

a

d

Add Training

Point (s

i

, a

d

)

orSlide11

Corrective

Demonstration

Confidence-Based Autonomy

Confident

Execution

Policy

No

Yes

s

i

Request

Demonstration

?

Execute

Action

a

p

Relearn Classifier

Execute

Action

a

d

Request

Demonstration

a

d

Add Training

Point (s

i

, a

d

)

a

c

Teacher

Relearn

Classifier

Add Training

Point (s

i

, a

c

)Slide12

Evaluation in Driving Domain

Introduced by

Abbeel and Ng, 2004

Task:

Teach the agent to drive on the highway

Fixed driving speed

Pass slower cars and avoid collisions

current lane

nearest car lane 1

nearest car lane 2

nearest car lane 3

state

merge left

merge right

stay in lane

actionsSlide13

Evaluation in Driving Domain

Demonstration Selection Method

# Demonstrations

Collision Timesteps

Teacher knows best

1300

2.7%

Confident Execution

fixed

conf 1016

3.8%

Confident Execution

dist

&

mult

.conf

504

1.9%

CBA

703

0%

CBA Final PolicySlide14

Demonstrations Over Time

Total Demonstrations

Confident Execution

Corrective DemonstrationSlide15
Slide16

Summary

Confidence-Based Autonomy algorithmConfident Execution demonstration selection Corrective DemonstrationSlide17

What did we do today?

(PO)MDPs: need to generate a good policyAssumes the agent has some method for estimating its state (given current belief state and action, observation, where do I think I am now?)How do we estimate this?Discrete latent states 

HMMs (simplest DBNs)

Continuous latent states, observed states drawn from Gaussian, linear dynamical system

 Kalman filters(Assumptions relaxed by Extended

Kalman Filter, etc)Not analytic  particle filtersTake weighted samples (“particles”) of an underlying distributionWe’ve mainly looked at policies for discrete state spacesFor continuous state spaces, can use

LfD:ML gives us a good-guess action based on past actions

If we’re not confident enough, ask for help!