/
Reinforcement Learning Reinforcement Learning: An Introduction Reinforcement Learning Reinforcement Learning: An Introduction

Reinforcement Learning Reinforcement Learning: An Introduction - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
442 views
Uploaded On 2018-03-21

Reinforcement Learning Reinforcement Learning: An Introduction - PPT Presentation

Humanlevel control through deep reinforcment learning Dueling Network Architectures for Deep Reinforcement Learning Reinforcement Learning Reinforcement learning is a computational approach to understanding and automating good directed learning and decision making It learns by interacting ID: 659821

experience learning function reinforcement learning experience reinforcement function state network stride filters actions dueling change deep high advantage atari games dqn values

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Reinforcement Learning Reinforcement Lea..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Reinforcement Learning

Reinforcement Learning: An Introduction

Human-level control through deep

reinforcment

learning

Dueling Network Architectures for Deep Reinforcement LearningSlide2

Reinforcement Learning

Reinforcement learning is a computational approach to understanding and automating good directed learning and decision making. It learns by interacting with the environment.Slide3

RL componentsSlide4

RL in practice

Exploitation – exploration trade-offShort sight VS far sight

Efficient methods for estimating values

For example: Tic-

Tac

-Toe

Starting rewards: win 1, draw/lose 0, else 0.5Slide5

So what's the problem?

Real-world complexityDerivation of efficient representation of the environment from high dimensional sensory input

Generalize past experience to new situations

Developing a wide range of competencies on a varied range of challenging tasks

Solution?Slide6

DQN-LETTER (2014, Google)Slide7

DQN

Experience ReplayEpsilon exploration training policy

Store agent’s experience

Keep data set

Update

minibatches

of

Optimal action-value function

Loss function (iteration

i)

Gradient for SGDSlide8
Slide9

Method – Atari games

Preprocessing210 X 160 pixels with 128-colour palette

Flickering avoidance: Take max value from previous frame and this one

Rescale to 84 X 84 X 4

Model architecture

32 filters of 8 X 8 with stride 4

64 filters of 4 X 4 with stride 2

64 filters of 3 X 3 with stride 1

Fully connected – 512 units

Fully connected – # output = # valid actions (4-18 possible moves, depending on the game)Slide10

t-SNESlide11

Last hidden layer – t-SNESlide12

Some Results

Can we do better???Slide13

Dueling Network (2016, Google)

Neural network architecture for model-free RLApplicable to existing algorithms

Two separate estimators:

state value function

state-dependent action advantage functionSlide14

Comparison between old and newSlide15

Basic definitions

Mostly similar setup & notations as in the last article, except for introducing the advantage function:Slide16

IllustrationSlide17

Combined output – implementation

A and V are off-target by a constant, but that does not change the relative rank of actions (decision making stays the same)

Increased optimization stabilitySlide18

Algorithmic change:Prioritized Replay

Instead of sampling uniformly from past experience, increase probability of experience

tuples

with high expected learning rate

(measured by absolute TD-error)

Idea – faster learningSlide19

Results – Atari gamesSlide20

Conclusions

More resources to V – better approximation of the state values (which is good for several famous algorithms, e.g. TD based methods) Improvement becomes more evident for large number of possible actions

The new state-of-the-art (in this domain)

Live demoSlide21

Questions?