/
Discovering Optimal Training Policies: Discovering Optimal Training Policies:

Discovering Optimal Training Policies: - PowerPoint Presentation

test
test . @test
Follow
378 views
Uploaded On 2017-08-02

Discovering Optimal Training Policies: - PPT Presentation

A New Experimental Paradigm Robert V Lindsey Michael C Mozer Institute of Cognitive Science Department of Computer Science University of Colorado Boulder Harold Pashler Department of Psychology ID: 575139

space policy optimization function policy space function optimization estimate training conditions task noisy population mapped participant time learning experiment

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Discovering Optimal Training Policies:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Discovering Optimal Training Policies:A New Experimental Paradigm

Robert V. Lindsey, Michael C. Mozer

Institute of Cognitive Science

Department of Computer Science

University of Colorado, Boulder

Harold

Pashler

Department of Psychology

UC San DiegoSlide2

Common Experimental Paradigm In Human Learning ResearchPropose several instructional conditions to compare based on intuition or theory

E.g., spacing of study sessions in fact learning

Equal: 1 – 1 – 1

Increasing: 1 – 2 – 4Run many participants in each conditionPerform statistical analyses to establish reliable differencebetween conditionsSlide3

What Most Researchers Interested In Improving Instruction Really Want To Do

Find the best

training policy

(study schedule)

Abscissa: space of all training policies

Performance function defined

over policy spaceSlide4

ApproachPerform single-participant experiments at selected points in policy space (

o

)

Use function approximationtechniques to estimate shapeof the performance functionGiven current estimate,select promising policiesto evaluate next.

promising = has potential

to be the optimum policy

linear

regression

Gaussian

process

regressionSlide5

Gaussian Process RegressionAssumes only that functions are smoothUses data efficiently

Accommodates noisy data

Produces estimates of both function shape and uncertaintySlide6

Simulated ExperimentSlide7
Slide8

Embellishments On Off-The-ShelfGP RegressionActive selection heuristic: upper confidence bound

GP is embedded in generative task model

GP represents skill level (-∞

 +∞)Mapped to population mean accuracy on test (0  1)Mapped to individual’s mean accuracy, allowing for

interparticipant

variability

Mapped to # correct responses via binomial sampling

Hierarchical Bayesian approach to parameter selection

Interparticipant

variabilityGP smoothness (covariance function)Slide9

Concept Learning TaskSlide10
Slide11
Slide12
Slide13
Slide14
Slide15

GLOPNOR = GraspabilityEase of picking up & manipulating object with one handBased on norms from Salmon, McMullen, &

Filliter

(2010)Slide16

Two-Dimensional Policy SpaceFading policy

Repetition/alternation

policySlide17

Two-Dimensional Policy SpaceSlide18

Policy Space

f

ading

policy

repetition/

alternation

policySlide19

ExperimentTraining25 trial sequence generated by chosen policyBalanced positive / negative

Testing

24 test trials, ordered randomly, balanced

No feedback, forced choiceAmazon Mechanical Turk$0.25 / participantSlide20

Results

# correct of 25Slide21

Best Policy

Fade from easy to semi-difficulty

Repetitions initially, alternations later

*Slide22

ResultsSlide23

Final Evaluation

65.7

%

60.9

%

66.6%

68.6%

N

=49

N

=

53

N

=50

N

=48Slide24

Novel Experimental ParadigmInstead of running a few conditions each with many participants, …

…run

many

conditions each with a different participant.Although individual participants provide a very noisy estimate of the population mean, optimization techniques allow us to determine the shape of the policy space.Slide25

What Next?Plea for more interesting policy spaces!Other optimization problemsAbstract concepts from examples

E.g., irony

Motivation

ManipulationsRewards/points, trial pace, task difficulty, time pressureMeasureVoluntary time on taskSlide26

LeftoversSlide27

OptimizationE.g., time-varying repetition/alternation policySlide28

How To Do Optimization?Reinforcement LearningPOMDPsFunction ApproximationGaussian Process Surrogate-Based OptimizationSlide29

Approach(1) Using the current policy function estimate, choose a promising next policy to evaluate(2) Conduct a small experiment using that policy to obtain a (noisy) estimate of population mean performance for that policy

(3) Use data collected so far to

reestimate

the shape of the policy function(4) Go to step 1