A New Experimental Paradigm Robert V Lindsey Michael C Mozer Institute of Cognitive Science Department of Computer Science University of Colorado Boulder Harold Pashler Department of Psychology ID: 575139
Download Presentation The PPT/PDF document "Discovering Optimal Training Policies:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Discovering Optimal Training Policies:A New Experimental Paradigm
Robert V. Lindsey, Michael C. Mozer
Institute of Cognitive Science
Department of Computer Science
University of Colorado, Boulder
Harold
Pashler
Department of Psychology
UC San DiegoSlide2
Common Experimental Paradigm In Human Learning ResearchPropose several instructional conditions to compare based on intuition or theory
E.g., spacing of study sessions in fact learning
Equal: 1 – 1 – 1
Increasing: 1 – 2 – 4Run many participants in each conditionPerform statistical analyses to establish reliable differencebetween conditionsSlide3
What Most Researchers Interested In Improving Instruction Really Want To Do
Find the best
training policy
(study schedule)
Abscissa: space of all training policies
Performance function defined
over policy spaceSlide4
ApproachPerform single-participant experiments at selected points in policy space (
o
)
Use function approximationtechniques to estimate shapeof the performance functionGiven current estimate,select promising policiesto evaluate next.
promising = has potential
to be the optimum policy
linear
regression
Gaussian
process
regressionSlide5
Gaussian Process RegressionAssumes only that functions are smoothUses data efficiently
Accommodates noisy data
Produces estimates of both function shape and uncertaintySlide6
Simulated ExperimentSlide7Slide8
Embellishments On Off-The-ShelfGP RegressionActive selection heuristic: upper confidence bound
GP is embedded in generative task model
GP represents skill level (-∞
+∞)Mapped to population mean accuracy on test (0 1)Mapped to individual’s mean accuracy, allowing for
interparticipant
variability
Mapped to # correct responses via binomial sampling
Hierarchical Bayesian approach to parameter selection
Interparticipant
variabilityGP smoothness (covariance function)Slide9
Concept Learning TaskSlide10Slide11Slide12Slide13Slide14Slide15
GLOPNOR = GraspabilityEase of picking up & manipulating object with one handBased on norms from Salmon, McMullen, &
Filliter
(2010)Slide16
Two-Dimensional Policy SpaceFading policy
Repetition/alternation
policySlide17
Two-Dimensional Policy SpaceSlide18
Policy Space
f
ading
policy
repetition/
alternation
policySlide19
ExperimentTraining25 trial sequence generated by chosen policyBalanced positive / negative
Testing
24 test trials, ordered randomly, balanced
No feedback, forced choiceAmazon Mechanical Turk$0.25 / participantSlide20
Results
# correct of 25Slide21
Best Policy
Fade from easy to semi-difficulty
Repetitions initially, alternations later
*Slide22
ResultsSlide23
Final Evaluation
65.7
%
60.9
%
66.6%
68.6%
N
=49
N
=
53
N
=50
N
=48Slide24
Novel Experimental ParadigmInstead of running a few conditions each with many participants, …
…run
many
conditions each with a different participant.Although individual participants provide a very noisy estimate of the population mean, optimization techniques allow us to determine the shape of the policy space.Slide25
What Next?Plea for more interesting policy spaces!Other optimization problemsAbstract concepts from examples
E.g., irony
Motivation
ManipulationsRewards/points, trial pace, task difficulty, time pressureMeasureVoluntary time on taskSlide26
LeftoversSlide27
OptimizationE.g., time-varying repetition/alternation policySlide28
How To Do Optimization?Reinforcement LearningPOMDPsFunction ApproximationGaussian Process Surrogate-Based OptimizationSlide29
Approach(1) Using the current policy function estimate, choose a promising next policy to evaluate(2) Conduct a small experiment using that policy to obtain a (noisy) estimate of population mean performance for that policy
(3) Use data collected so far to
reestimate
the shape of the policy function(4) Go to step 1