Humanlevel control through deep reinforcment learning Dueling Network Architectures for Deep Reinforcement Learning Reinforcement Learning Reinforcement learning is a computational approach to understanding and automating good directed learning and decision making It learns by interacting ID: 659821
Download Presentation The PPT/PDF document "Reinforcement Learning Reinforcement Lea..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reinforcement Learning
Reinforcement Learning: An Introduction
Human-level control through deep
reinforcment
learning
Dueling Network Architectures for Deep Reinforcement LearningSlide2
Reinforcement Learning
Reinforcement learning is a computational approach to understanding and automating good directed learning and decision making. It learns by interacting with the environment.Slide3
RL componentsSlide4
RL in practice
Exploitation – exploration trade-offShort sight VS far sight
Efficient methods for estimating values
For example: Tic-
Tac
-Toe
Starting rewards: win 1, draw/lose 0, else 0.5Slide5
So what's the problem?
Real-world complexityDerivation of efficient representation of the environment from high dimensional sensory input
Generalize past experience to new situations
Developing a wide range of competencies on a varied range of challenging tasks
Solution?Slide6
DQN-LETTER (2014, Google)Slide7
DQN
Experience ReplayEpsilon exploration training policy
Store agent’s experience
Keep data set
Update
minibatches
of
Optimal action-value function
Loss function (iteration
i)
Gradient for SGDSlide8Slide9
Method – Atari games
Preprocessing210 X 160 pixels with 128-colour palette
Flickering avoidance: Take max value from previous frame and this one
Rescale to 84 X 84 X 4
Model architecture
32 filters of 8 X 8 with stride 4
64 filters of 4 X 4 with stride 2
64 filters of 3 X 3 with stride 1
Fully connected – 512 units
Fully connected – # output = # valid actions (4-18 possible moves, depending on the game)Slide10
t-SNESlide11
Last hidden layer – t-SNESlide12
Some Results
Can we do better???Slide13
Dueling Network (2016, Google)
Neural network architecture for model-free RLApplicable to existing algorithms
Two separate estimators:
state value function
state-dependent action advantage functionSlide14
Comparison between old and newSlide15
Basic definitions
Mostly similar setup & notations as in the last article, except for introducing the advantage function:Slide16
IllustrationSlide17
Combined output – implementation
A and V are off-target by a constant, but that does not change the relative rank of actions (decision making stays the same)
Increased optimization stabilitySlide18
Algorithmic change:Prioritized Replay
Instead of sampling uniformly from past experience, increase probability of experience
tuples
with high expected learning rate
(measured by absolute TD-error)
Idea – faster learningSlide19
Results – Atari gamesSlide20
Conclusions
More resources to V – better approximation of the state values (which is good for several famous algorithms, e.g. TD based methods) Improvement becomes more evident for large number of possible actions
The new state-of-the-art (in this domain)
Live demoSlide21
Questions?