/
Lisa Torrey Lisa Torrey

Lisa Torrey - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
373 views
Uploaded On 2017-06-27

Lisa Torrey - PPT Presentation

University of Wisconsin Madison CS 540 Transfer Learning Education Hierarchical curriculum Learning tasks share common stimulusresponse elements Abstract problemsolving Learning tasks share general underlying principles ID: 563920

learning transfer methods teammate transfer learning teammate methods pass shoot reinforcement advice task breakaway goalright macro agent environment inductive

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lisa Torrey" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Lisa TorreyUniversity of Wisconsin – MadisonCS 540

Transfer LearningSlide2

EducationHierarchical curriculumLearning tasks share common stimulus-response elements

Abstract problem-solving

Learning tasks share general underlying principlesMultilingualismKnowing one language affects learning in anotherTransfer can be both positive and negative

Transfer Learning in HumansSlide3

Transfer Learning in AI

Given

Learn

Task T

Task SSlide4

Goals of Transfer Learning

performance

training

higher start

higher slope

higher asymptoteSlide5

Inductive Learning

All Hypotheses

Allowed Hypotheses

SearchSlide6

Transfer in Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Thrun and Mitchell

1995

:

Transfer slopes for gradient descentSlide7

Transfer in Inductive Learning

Bayesian Learning

Bayesian Transfer

Prior

distribution

+

Data

=

Posterior

Distribution

Bayesian methods

Raina et

al.2006:

Transfer a Gaussian priorSlide8

Transfer in Inductive Learning

Line

Curve

Surface

Circle

Pipe

Hierarchical methods

Stracuzzi

2006

:

Learn Boolean concepts that can depend on each otherSlide9

Transfer in Inductive Learning

Dealing with Missing Data or Labels

Shi et al.

2008

:

Transfer via active learning

Task S

Task TSlide10

Reinforcement Learning

Environment

s

1

Agent

Q(s

1

, a) = 0

π

(s

1

) = a

1

a

1

s

2

r

2

δ

(s

1

, a

1

) = s

2

r(

s

1

, a

1

) = r

2

Q(s

1

,

a

1

)

Q(s

1

,

a

1

) +

Δ

π

(s

2

) = a

2

a

2

δ

(s

2

, a

2

) = s

3

r(

s

2

, a

2

) = r

3

s

3

r

3Slide11

Transfer in Reinforcement Learning

Starting-point

methods

Hierarchical

methods

Alteration

methods

Imitation

methods

New RL algorithmsSlide12

Transfer in Reinforcement Learning

0

0

0

0

0

0

0

0

0

0

0

0

target-task training

2

5

4

8

9

1

7

2

5

9

1

4

Initial Q-table

transfer

no transfer

Source task

Starting-point methods

Taylor et al.

2005

:

Value-function transferSlide13

Transfer in Reinforcement Learning

Hierarchical methods

Run

Kick

Pass

Shoot

Soccer

Mehta et al.

2008

:

Transfer a learned hierarchySlide14

Transfer in Reinforcement Learning

Alteration methods

Walsh et al. 2006

:

Transfer aggregate states

Task S

Original states

Original actions

Original rewards

New states

New actions

New rewardsSlide15

Transfer in Reinforcement Learning

New RL Algorithms

Torrey et al. 2006: Transfer advice about skills

Environment

s

1

Agent

Q(s

1

, a) = 0

π

(s

1

) = a

1

a

1

s

2

r

2

δ

(s

1

, a

1

) = s

2

r(

s

1

, a

1

) = r

2

Q(s

1

, a

1

)

Q(s

1

, a

1

) +

Δ

π

(s

2

) = a

2

a

2

δ

(s

2

, a

2

) = s

3

r(

s

2

, a

2

) = r

3

s

3

r

3Slide16

Transfer in Reinforcement Learning

Imitation methods

training

source

target

policy used

Torrey et al. 2007: Demonstrate a strategySlide17

My Research

Starting-point

methods

Imitation

methods

Hierarchical

methods

Hierarchical

methods

New RL algorithms

Skill

Transfer

Macro

TransferSlide18

RoboCup Domain

3-on-2

BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

2

-on-1 BreakAwaySlide19

Inductive Logic Programming

IF [ ]

THEN pass(Teammate)

IF distance(Teammate) ≤ 5

angle(Teammate, Opponent) ≥ 15

THEN pass(Teammate)

IF distance(Teammate) ≤ 5

angle(Teammate, Opponent) ≥ 30

THEN pass(Teammate)

IF distance(Teammate) ≤ 5

THEN pass(Teammate)

IF distance(Teammate) ≤ 10

THEN pass(Teammate)

…Slide20

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning via Support Vector Regression (

RL-SVR

)

Environment

Agent

Batch 1

Environment

Agent

Batch 2

Compute

Q-functionsSlide21

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning with Advice (

KBKR

)

Environment

Agent

Batch 1

Compute

Q-functions

Environment

Agent

Batch 2

Advice

+ µ × AdviceMisfitSlide22

Skill Transfer Algorithm

Source

Target

IF distance(Teammate) ≤ 5

angle(Teammate, Opponent) ≥ 30

THEN pass(Teammate)

ILP

Advice Taking

[Human advice]

MappingSlide23

Selected Results

Skill transfer to 3-on-2 BreakAway from several tasksSlide24

Macro-Operators

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

IF [ ... ]

THEN pass(Teammate)

IF [ ... ]

THEN move(ahead)

IF [ ... ]

THEN shoot(goalRight)

IF [ ... ]

THEN shoot(goalLeft)

IF [ ... ]

THEN pass(Teammate)

IF [ ... ]

THEN move(left)

IF [ ... ]

THEN shoot(goalRight)

IF [ ... ]

THEN shoot(goalRight)Slide25

Demonstration

source

target

training

policy used

An imitation methodSlide26

Macro Transfer Algorithm

Source

Target

ILP

DemonstrationSlide27

Macro Transfer Algorithm

Learning structures

Positive: BreakAway games that score

Negative: BreakAway games that didn’t score

ILP

IF actionTaken(Game, StateA, pass(Teammate), StateB)

actionTaken(Game, StateB, move(Direction), StateC)

actionTaken(Game, StateC, shoot(goalRight), StateD)

actionTaken(Game, StateD, shoot(goalLeft), StateE)

THEN isaGoodGame(Game)Slide28

Macro Transfer Algorithm

Learning rules for arcs

Positive: states in good games that took the arc

Negative: states in good games that could have taken the arc but didn’t

ILP

shoot(goalRight)

IF [ … ]

THEN enter(State)

IF [ … ]

THEN loop(State, Teammate))

pass(Teammate)Slide29

Selected Results

Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAwaySlide30

Machine learning is often designed in standalone tasksTransfer is a natural learning ability that we would like to incorporate into machine learners

There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Summary