University of Wisconsin Madison CS 540 Transfer Learning Education Hierarchical curriculum Learning tasks share common stimulusresponse elements Abstract problemsolving Learning tasks share general underlying principles ID: 563920
Download Presentation The PPT/PDF document "Lisa Torrey" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Lisa TorreyUniversity of Wisconsin – MadisonCS 540
Transfer LearningSlide2
EducationHierarchical curriculumLearning tasks share common stimulus-response elements
Abstract problem-solving
Learning tasks share general underlying principlesMultilingualismKnowing one language affects learning in anotherTransfer can be both positive and negative
Transfer Learning in HumansSlide3
Transfer Learning in AI
Given
Learn
Task T
Task SSlide4
Goals of Transfer Learning
performance
training
higher start
higher slope
higher asymptoteSlide5
Inductive Learning
All Hypotheses
Allowed Hypotheses
SearchSlide6
Transfer in Inductive Learning
All Hypotheses
Allowed Hypotheses
Search
Thrun and Mitchell
1995
:
Transfer slopes for gradient descentSlide7
Transfer in Inductive Learning
Bayesian Learning
Bayesian Transfer
Prior
distribution
+
Data
=
Posterior
Distribution
Bayesian methods
Raina et
al.2006:
Transfer a Gaussian priorSlide8
Transfer in Inductive Learning
Line
Curve
Surface
Circle
Pipe
Hierarchical methods
Stracuzzi
2006
:
Learn Boolean concepts that can depend on each otherSlide9
Transfer in Inductive Learning
Dealing with Missing Data or Labels
Shi et al.
2008
:
Transfer via active learning
Task S
Task TSlide10
Reinforcement Learning
Environment
s
1
Agent
Q(s
1
, a) = 0
π
(s
1
) = a
1
a
1
s
2
r
2
δ
(s
1
, a
1
) = s
2
r(
s
1
, a
1
) = r
2
Q(s
1
,
a
1
)
Q(s
1
,
a
1
) +
Δ
π
(s
2
) = a
2
a
2
δ
(s
2
, a
2
) = s
3
r(
s
2
, a
2
) = r
3
s
3
r
3Slide11
Transfer in Reinforcement Learning
Starting-point
methods
Hierarchical
methods
Alteration
methods
Imitation
methods
New RL algorithmsSlide12
Transfer in Reinforcement Learning
0
0
0
0
0
0
0
0
0
0
0
0
target-task training
2
5
4
8
9
1
7
2
5
9
1
4
Initial Q-table
transfer
no transfer
Source task
Starting-point methods
Taylor et al.
2005
:
Value-function transferSlide13
Transfer in Reinforcement Learning
Hierarchical methods
Run
Kick
Pass
Shoot
Soccer
Mehta et al.
2008
:
Transfer a learned hierarchySlide14
Transfer in Reinforcement Learning
Alteration methods
Walsh et al. 2006
:
Transfer aggregate states
Task S
Original states
Original actions
Original rewards
New states
New actions
New rewardsSlide15
Transfer in Reinforcement Learning
New RL Algorithms
Torrey et al. 2006: Transfer advice about skills
Environment
s
1
Agent
Q(s
1
, a) = 0
π
(s
1
) = a
1
a
1
s
2
r
2
δ
(s
1
, a
1
) = s
2
r(
s
1
, a
1
) = r
2
Q(s
1
, a
1
)
Q(s
1
, a
1
) +
Δ
π
(s
2
) = a
2
a
2
δ
(s
2
, a
2
) = s
3
r(
s
2
, a
2
) = r
3
s
3
r
3Slide16
Transfer in Reinforcement Learning
Imitation methods
training
source
target
policy used
Torrey et al. 2007: Demonstrate a strategySlide17
My Research
Starting-point
methods
Imitation
methods
Hierarchical
methods
Hierarchical
methods
New RL algorithms
Skill
Transfer
Macro
TransferSlide18
RoboCup Domain
3-on-2
BreakAway
3-on-2 KeepAway
3-on-2 MoveDownfield
2
-on-1 BreakAwaySlide19
Inductive Logic Programming
IF [ ]
THEN pass(Teammate)
IF distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 15
THEN pass(Teammate)
IF distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 30
THEN pass(Teammate)
IF distance(Teammate) ≤ 5
THEN pass(Teammate)
IF distance(Teammate) ≤ 10
THEN pass(Teammate)
…Slide20
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning via Support Vector Regression (
RL-SVR
)
Environment
Agent
Batch 1
Environment
Agent
Batch 2
…
Compute
Q-functionsSlide21
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning with Advice (
KBKR
)
Environment
Agent
Batch 1
Compute
Q-functions
Environment
Agent
Batch 2
…
Advice
+ µ × AdviceMisfitSlide22
Skill Transfer Algorithm
Source
Target
IF distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 30
THEN pass(Teammate)
ILP
Advice Taking
[Human advice]
MappingSlide23
Selected Results
Skill transfer to 3-on-2 BreakAway from several tasksSlide24
Macro-Operators
pass(Teammate)
move(Direction)
shoot(goalRight)
shoot(goalLeft)
IF [ ... ]
THEN pass(Teammate)
IF [ ... ]
THEN move(ahead)
IF [ ... ]
THEN shoot(goalRight)
IF [ ... ]
THEN shoot(goalLeft)
IF [ ... ]
THEN pass(Teammate)
IF [ ... ]
THEN move(left)
IF [ ... ]
THEN shoot(goalRight)
IF [ ... ]
THEN shoot(goalRight)Slide25
Demonstration
source
target
training
policy used
An imitation methodSlide26
Macro Transfer Algorithm
Source
Target
ILP
DemonstrationSlide27
Macro Transfer Algorithm
Learning structures
Positive: BreakAway games that score
Negative: BreakAway games that didn’t score
ILP
IF actionTaken(Game, StateA, pass(Teammate), StateB)
actionTaken(Game, StateB, move(Direction), StateC)
actionTaken(Game, StateC, shoot(goalRight), StateD)
actionTaken(Game, StateD, shoot(goalLeft), StateE)
THEN isaGoodGame(Game)Slide28
Macro Transfer Algorithm
Learning rules for arcs
Positive: states in good games that took the arc
Negative: states in good games that could have taken the arc but didn’t
ILP
shoot(goalRight)
IF [ … ]
THEN enter(State)
IF [ … ]
THEN loop(State, Teammate))
pass(Teammate)Slide29
Selected Results
Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAwaySlide30
Machine learning is often designed in standalone tasksTransfer is a natural learning ability that we would like to incorporate into machine learners
There are some successes, but challenges remain, like avoiding negative transfer and automating mapping
Summary