Nitish Gupta Shreya Rajpal 25 th April 2017 1 Story Comprehension 2 Joe went to the kitchen Fred went to the kitchen Joe picked up the milk Joe travelled to his office Joe left the milk Joe went to the bathroom ID: 605724
Download Presentation The PPT/PDF document "Neural Architectures with Memory" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Neural Architectures with Memory
Nitish Gupta, Shreya Rajpal25th April, 2017
1Slide2
Story Comprehension
2Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joe travelled to his office. Joe left the milk. Joe went to the bathroom.
Q1 : Where is Joe?
Q2 : Where is the milk now
?
Q3
: Where
was Joe before the office?
Questions from Joe’s angry mother: Slide3
Dialogue System
3
Hello! What can I do for you today?
I’d like to reserve a table for 6.
Sure! When would you like that reservation?
At 7 PM, please.
Okay. What cuisine would you like?Actually make that 7:30 PMUpdated! What cuisine?Is there anything better than a medium rare steak?Nothing at all! Blackdog has a 4.7 on Yelp. Sounds perfect! Also, add one more person.Reservation done for 7, 7:30pm at Blackdog. Enjoy!
Machine
HumanSlide4
ML models need memory!
Deeper AI tasks require explicit memory and
multi-hop reasoning
over
it
RNNs have short memory
Cannot increase memory without increasing number of parametersNeed for compartmentalized memoryRead/Write should be asynchronous4Slide5
Memory N
etworks (MemNN)5
Class of Models with memory
- Array of objects
Four Components :
I - Input Feature Map :
Input manipulationG - Generalization : Memory ManipulationO - Output Feature Map : Output representation generatorR - Response : Response Generator Memory Networks, Weston et. al., ICLR 2015
Each
memory here is a dense vectorSlide6
MemNN
6
Input Feature Map
Imagine
input as a sequence of sentences
Update Memories
Memory Networks, Weston et. al., ICLR 2015Slide7
MemNN
7Output Representation
Say if
is a question, compute output representation
Generate Answer Response
Memory Networks, Weston et. al., ICLR 2015Slide8
Simple MemNN for Text
8Input Feature Map - Bag-of-Words representation
Memory Networks, Weston et. al., ICLR 2015
Sentence
Bag-of-Words
Slide9
Simple MemNN for Text
9
Generalization
: Store input in new memory
Memory Networks, Weston et. al., ICLR 2015
Memories till now (i=4)
Memories after 5 inputsSlide10
Simple MemNN for Text
10
Output:
Using
memory hops with query
Response
- Single Word Answer Memory Networks, Weston et. al., ICLR 2015
Score all memories against input
1st Max scoring memory indexScore all words against query and 2 supporting memoriesMax scoring word
Score all memories against input &
2nd Max scoring memory indexSlide11
Scoring Function
Scoring Function is an embedding model11
What is
?
Scoring Function is just dot-product between sum of word embeddings!!!
Memory Networks, Weston et. al., ICLR 2015Slide12
12
Memory Networks, Weston et. al., ICLR 2015
Joe went to the
kitchen.
Fred
went to the
kitchen.Joe picked up the milk.Joe travelled to his office.Joe left the milk. Joe went to the bathroom.
Input Sentences
MemoriesWhere is the milk now?Question1st supporting memoryWhere is the milk now?
Question
2
nd
supporting memory Where is the milk now?QuestionOffice
ResponseSlide13
Training Objective
13
Score for true 1
st
memory
Score for a negative memory
Memory Networks, Weston et. al., ICLR 2015Slide14
Training Objective
14
Score for true 2
nd
memory
Score for a negative memory
Memory Networks, Weston et. al., ICLR 2015Slide15
Training Objective
15
Score for true response
Score for a negative response
Memory Networks, Weston et. al., ICLR 2015Slide16
Experiment
16
Large – Scale QA
14M Statements –
(subject, relation, object)
Memory Hops;
Only re-ranked candidates from other system MethodF1Fader et. al. 20130.54
Bordes et. al. 2014b
0.73Memory Networks (This work)0.72
Why does Memory Network perform exactly as previous model?
Memory Networks, Weston et. al., ICLR 2015
Stored as memories
Output is highest scoring memorySlide17
Experiment
17
Large – Scale QA
14M Statements –
(subject, relation, object)
Memory Hops;
Only re-ranked candidates from other system MethodF1
Fader et. al. 2013
0.54Bordes et. al. 2014b0.73Memory Networks (This work)0.72
Why does Memory Networks not perform as well?
USELESS EXPERIMENTSlide18
Useful Experiment
18
Simulated World QA
4 characters, 3 objects, 5 rooms
7k statements, 3k questions for training and same for testing
Difficulty 1 (5) – Entity in question is mentioned in last 1 (5) sentences
For , annotation has intermediate best memories as well Memory Networks, Weston et. al., ICLR 2015Slide19
Limitations
19Simple BOW representation
Simulated Question Answering dataset is too trivial
Strong supervision i.e. for intermediate memories is needed
Memory Networks, Weston et. al., ICLR 2015Slide20
End-to-End Memory Networks (MemN2N)
20
What if the annotation is:
Input sentences
Query
Answer
Model performs by:Generating memories from inputsTransforming query into suitable representationProcess query and memories jointly using multiple hops to produce the answerBackpropagate through the whole procedure
End-To-End Memory
Networks, Sukhbaatar et. al., NIPS 2015Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joe travelled to his office. Joe left the milk. Joe went to the bathroom.
Where
is the milk now
?
OfficeSlide21
MemN2N
21
Convert input to memories
End-To-End Memory
Networks, Sukhbaatar et
. al., NIPS 2015BOW input
Word-Embedding Matrix
Sum of word-embeddings
Transform query
into same representation space
Output Vectors
Slide22
MemN2N
22Scoring memories against query
End-To-End Memory
Networks, Sukhbaatar et
. al.,
NIPS 2015
MemoriesQuery (transformed)Score for input/memory
Generate output
Weighted average of all inputs (transformed)Slide23
MemN2N
23Generating Response
End-To-End Memory
Networks, Sukhbaatar et
. al.,
NIPS 2015
Training Objective – Maximum Likelihood / Cross Entropy
Distribution over response words
Averaged-outputQuerySlide24
24
End-To-End Memory Networks, Sukhbaatar et. al., NIPS 2015
Generate memories
TransformQuery
Generate outputs
Score memories
Make averaged output
ResponseSlide25
25
End-To-End Memory Networks, Sukhbaatar et. al., NIPS 2015
Multi-hop MemN2N
Hop 1
Hop 2
Hop 3
Different Memories and Outputs for each HopSlide26
Experiments
26End-To-End Memory
Networks, Sukhbaatar et
. al.,
NIPS 2015
Simulated World QA
20 Tasks from bAbI dataset -
1K and 10K instances per task
Vocabulary = 177 words only!!!!!60 epochsLearning Rate annealingLinear Start with different learning rate“Model diverged very often, hence trained multiple models”Slide27
27
End-To-End Memory Networks, Sukhbaatar et. al., NIPS 2015
MemNN
MemN2N
Error % (1k)
6.7
12.4
Error % (10k)
3.27.5Slide28
Movie Trivia Time!
28
Which was
Stanley Kubricks
’s first movie?
When did
2001:A Space Odyssey
release?
After
The Shining
, which movie did its director direct?
Fear and Desire
1968
Full Metal Jacket
(
2001:a_space_odyssey
,
directed_by
,
stanley_kubrick
)
(
fear_and_dark
,
directed_by
,
stanley_kubrick
)
…
(
fear_and_dark
,
released_in
,
1953
)
(
full_metal_jacket
,
released_in
,
1987
)
…
(
2001:a_space_odyssey
,
released_in
,
1968
)
…
(
the_shining
,
directed_by
,
stanley_kubrick
)
…
(
AI:artificial_intelligence
,
written_by
,
stanley_kubrick
)
Subject
Relation
Object
Knowledge BaseSlide29
Knowledge Base?
29
Incomplete!
Too Challenging!
Combine using Memory
Networks?
Textual Knowledge?
(
2001:a_space_odyssey
,
directed_by
,
stanley_kubrick
)
(
fear_and_dark
,
directed_by
,
stanley_kubrick
)
…
(
fear_and_dark
,
released_in
,
1953
)
(
full_metal_jacket
,
released_in
,
1987
)
…
(
2001:a_space_odyssey
,
released_in
,
1968
)
…
(
the_shining
,
directed_by
,
stanley_kubrick
)
…
(
AI:artificial_intelligence
,
written_by
,
stanley_kubrick
)Slide30
Key-Value MemNNs
for Reading Documents30
Structured Memories as Key-Value Pairs
Regular
MemNNs
have single vector for each memory
Key more related to question and values to answer
Key-Value Memory Networks for Directly Reading Documents, Miller et. al., EMNLP 2016
Keys and Values can be Words, Sentences, Vectors etc.
Kubrick’s
first movie
was
, Fear and Dark) Slide31
KV-MemNN
31Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
Retrieve relevant memories using
Hashing Techniques
Use inverted index, locality sensitive hashing,
something sensible
All MemoriesRetrieved Relevant MemoriesSlide32
KV-MemNN
32Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
Score Memory-Keys
Key
BOWSum of Embeddings
Dot-Prod
Distribution over Memory-KeysGenerate Output
Weighted average
of Memory-valuesSlide33
KV-MemNN - Multiple Hops
33Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
In the
hop:
Query representation : Key Addressing
Generate Response
Final HopSlide34
KV-MemNN – What to store in memories?
34Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
KB Based :
Key: (subject, relation)
; Value: ObjectK: (2001:a_space_odyssey, directed_by); V: stanley_kubrickDocument BasedFor each entity in document, extract 5-word window around itKey: window; Value: Entity
K: screenplay written by and; V: HamptonSlide35
KV-MemNN – Experiments
35Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
WikiMovies
Benchmark
Total 100K QA-pairs10% for testingMethodKBDoc
E2E Memory
Network78.569.9Key-Value Memory Network93.976.2Slide36
KV-MemNN
36Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
Retrieve relevant Memories
Score
relevant Memory-KeysGenerate Output using Averaged Memory-Values
Generate ResponseSlide37
KV-MemNN
37Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016Slide38
CNN : Computer Vision
:: ________ : NLP38
Key-Value Memory Networks for Directly Reading Documents, Miller et
. al.,
EMNLP 2016
RNNSlide39
Dynamic Memory Networks – The Beast
39Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016
Use RNNs, specifically GRUs for every moduleSlide40
DMN
40Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016
Final GRU Output for
sentence
Slide41
DMN
41Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016Slide42
DMN
42Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016
Slide43
DMN
43Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016
Slide44
DMN
44Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016Slide45
DMN
45Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016Slide46
DMN
46Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016
How many GRUs were used with 2 hops?Slide47
DMN – Qualitative Results
47Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
, Kumar et.
a
l. ICML 2016Slide48
48
Algorithm LearningSlide49
Neural Turing Machine
Copy Task: Implement the AlgorithmGiven a list of numbers at input, reproduce the list at output
Neural Turing Machine Learns:
What to write to memory
When to write to memory
When to stop writing
Which memory cell to read fromHow to convert result of read into final output49Slide50
Neural Turing Machines
50
Controller
External Input
External Output
Read Heads
Memory
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 ‘Blurry’Write HeadsSlide51
Neural Turing Machines
‘Blurry’ Memory Addressing(at time instant ‘t’)
51
N
M
M
t
w
t
Neural Turing Machines, Graves et. al., arXiv
:1410.5401
Soft Attention (Lectures 2, 3, 20, 24)
w
t
(0) = 0.1
w
t
(1) = 0.2
w
t
(2) = 0.5
w
t
(3) = 0.1
w
t
(4) = 0.1Slide52
Neural Turing Machines
More formally,Blurry Read OperationGiven: Mt
(memory matrix) of size
NxM
wt (weight vector) of length N t (time index)52
Neural Turing Machines, Graves et. al., arXiv:1410.5401 Slide53
Neural Turing Machines: Blurry Writes
Blurry Write OperationDecomposed into blurry erase + blurry add
Given: M
t
(memory matrix) of size
NxM
wt (weight vector) of length N t (time index) et (erase vector) of length M at (add vector) of length M53
Erase Component
Add ComponentNeural Turing Machines, Graves et. al., arXiv:1410.5401 Slide54
Neural Turing Machines: Erase
54
5
7
9
2
12
11
63
1
2
3
7310
6
4
2
5
9
9
3
5
12
8
4
M
0
w
1
(0) = 0.1
w
1
(1) = 0.2
w
1
(2) = 0.5
w
1
(3) = 0.1
w
1
(4) = 0.1
1.0
0.7
0.2
0.5
0.0
e
1
Neural Turing Machines, Graves et. al., arXiv
:1410.5401
1xN
Mx1Slide55
Neural Turing Machines: Erase
55
4.5
5.6
4.5
1.8
10.8
10.23
5.161.95
0.93
1.86
2.94
6.722.79.8
5.88
3.8
1.8
3.75
8.55
8.55
3
5
12
8
4
w
1
(0) = 0.1
w
1
(1) = 0.2
w
1
(2) = 0.5
w
1
(3) = 0.1
w
1
(4) = 0.1
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide56
Neural Turing Machines: Addition
56
4.5
5.6
4.5
1.8
10.8
10.23
5.161.95
0.93
1.86
2.94
6.722.79.8
5.88
3.8
1.8
3.75
8.55
8.55
3
5
12
8
4
w
1
(0) = 0.1
w
1
(1) = 0.2
w
1
(2) = 0.5
w
1
(3) = 0.1
w
1
(4) = 0.1
3
4
-2
0
2
a
1
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide57
Neural Turing Machines: Blurry Writes
57
4.8
6.2
6
2.1
11.1
10.63
5.963.95
1.33
2.26
2.74
6.321.79.6
5.68
3.8
1.8
3.75
8.55
8.55
3.2
5.4
13
8.2
4.2
M
1
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide58
Neural Turing Machines: Demo
58
Demonstration: Training on Copy Task
Figure
from
Snips AI's Medium Post
Neural Turing Machines, Graves et. al., arXiv:1410.5401 Slide59
Neural Turing Machines: Attention Model
59Neural Turing Machines, Graves et. al., arXiv:1410.5401
Generating
w
t
Content Based
Example: QA TaskScore sentences by similarity with QuestionWeights as softmax of similarity scoresLocation BasedExample: Copy TaskMove to address (i+1) after writing to index (i)Weights ≈ Transition probabilitiesSlide60
Neural Turing Machine: Attention Model
60
CA
Steps for generating
w
t
Content Addressing
Peaking
InterpolationConvolutional Shift (Location Addressing)SharpeningI
CS
S
Prev. State
Controller
Outputs
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide61
Neural Turing Machine: Attention Model
Prev. StateControllerOutputs
61
:Vector
(length M) produced by Controller
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide62
Neural Turing Machine: Attention Model
62
CA
Step 1: Content Addressing (CA)
Neural Turing Machines, Graves et. al., arXiv
:1410.5401
Prev. State
Controller
OutputsSlide63
Neural Turing Machine: Attention Model
63
CA
Step 2: Peaking
Prev. State
Controller
Outputs
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide64
Neural Turing Machine: Attention Model
64
CA
Step 3: Interpolation (I)
I
Prev. State
Controller
Outputs
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide65
Neural Turing Machine: Attention Model
65
CA
Step 4: Convolutional Shift (CS)
Controller outputs , a normalized distribution over all N possible shifts
Rotation-shifted weights computed as:
I
CS
Prev. State
Controller
Outputs
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide66
Neural Turing Machine: Attention Model
66
CA
Step 5: Sharpening (S)
Uses to sharpen as:
I
CS
S
Prev. State
Controller
Outputs
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide67
Neural Turing Machine: Controller Design
67Feed-forward: faster, more transparency & interpretability about function learnt
LSTM: more expressive power, doesn’t limit the number of computations per time step
Both are end-to-end differentiable!
Reading/Writing -> Convex Sums
w
t generation -> SmoothController NetworksNeural Turing Machines, Graves et. al., arXiv:1410.5401 Slide68
Neural Turing Machine: Network Overview
68
Unrolled Feed-forward Controller
Neural Turing Machines, Graves et. al., arXiv
:1410.5401
Figure
from Snips AI's Medium PostSlide69
Neural Turing Machines vs. MemNNs
MemNNsMemory is static, with focus on retrieving (reading) information from memory
NTMs
Memory is continuously written to and read from, with network learning when to perform memory read and write
69Slide70
Neural Turing Machines: Experiments
Task
Network Size
Number of Parameters
NTM w/ LSTM*
LSTM
NTM w/ LSTM
LSTM
Copy
3 x 100
3 x 256
67K
1.3M
Repeat Copy
3 x 100
3 x 512
66K
5.3M
Associative
3 x 100
3 x 256
70K
1.3M
N-grams
3 x 100
3 x 128
61K
330K
Priority Sort
2 x 100
3 x 128
269K
385K
70
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide71
Neural Turing Machines: ‘Copy’ Learning Curve
Trained on 8-bit sequences, 1<= sequence length <= 2071
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide72
Neural Turing Machines: ‘Copy’ Performance
72
LSTM
NTM
Neural Turing Machines, Graves et. al., arXiv
:1410.5401 Slide73
Neural Turing Machines triggered an outbreak of Memory Architectures!
73Slide74
Dynamic Neural Turing Machines
Experimented with addressing schemesDynamic Addresses: Addresses of memory locations learnt in training – allows non-linear location-based addressing
Least recently used weighting
: Prefer least recently used memory locations + interpolate with content-based addressing
Discrete Addressing
: Sample the memory location from the content-based distribution to obtain a one-hot address
Multi-step Addressing: Allows multiple hops over memoryResults: bAbI QA Task74Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes, Gulchere et. al., arXiv:1607.00036
Location
NTMContent NTMSoft DNTM
Discrete DNTM
1-step
31.4%
33.6%29.5%
27.9%
3-step
32.8%
32.7%
24.2%
21.7%Slide75
Stack Augmented Recurrent Networks
Learn algorithms based on stack implementations (e.g. learning fixed sequence generators)
Uses a stack data structure to store memory (as opposed to a memory matrix)
75
Inferring Algorithmic Patterns with Stack-Augmented Recurrent
Nets,
Joulin et. al., arXiv:1503.01007 Slide76
Stack Augmented Recurrent Networks
Blurry ‘push’ and ‘pop’ on stack. E.g.:Some results:
76
Inferring Algorithmic Patterns with Stack-Augmented Recurrent
Nets,
Joulin
et. al., arXiv:1503.01007 Slide77
Differentiable Neural Computers
Advanced addressing mechanisms:Content Based AddressingTemporal Addressing
Maintains notion of sequence in addressing
Temporal Link Matrix
L
(size
NxN), L[i,j] = degree to which location I was written to after location j. Usage Based Addressing77Hybrid computing using a neural network with dynamic external memory, Graves et. al., Nature vol. 538 Slide78
DNC: Usage Based Addressing
Writing increases usage of cell, reading decreases usage of cellLeast used location has highest usage-based weightingInterpolate b/w usage & content based weights for final write weights
78
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature vol. 538 Slide79
DNC: Example
79
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature
vol. 538 Slide80
DNC: Improvements over NTMs
NTMLarge contiguous blocks of memory needed
No way to free up memory cells after writing
DNC
Memory locations non-contiguous, usage-based
Regular de-allotment based on usage-tracking
80Hybrid computing using a neural network with dynamic external memory, Graves et. al., Nature vol. 538 Slide81
DNC: Experiments
Graph TasksGraph Representation: (source, edge, destination) tuples
Types of tasks:
Traversal: Perform walk on graph given source, list of edges
Shortest Path: Given source, destination
Inference: Given source, relation over edges; find destination
81Hybrid computing using a neural network with dynamic external memory, Graves et. al., Nature vol. 538 Slide82
DNC: Experiments
Graph TasksTraining over 3 phases:Graph description phase: (source, edge, destination) tuples fed into the graph
Query phase: Shortest path (source, ____, destination), Inference (source, hybrid relation, ___), Traversal (source, relation, relation …, ___)
Answer phase: Target responses provided at output
Trained on random graphs of maximum size 1000
82
Hybrid computing using a neural network with dynamic external memory, Graves et. al., Nature vol. 538 Slide83
DNC: Experiments
Graph Tasks: London Underground
83
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature vol. 538 Slide84
DNC: Experiments
Graph Tasks: London Underground
84
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature vol. 538 Input PhaseSlide85
DNC: Experiments
Graph Tasks: London Underground
85
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature vol. 538 Traversal TaskQuery PhaseAnswer PhaseSlide86
DNC: Experiments
Graph Tasks: London Underground
86
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature vol. 538 Shortest Path TaskQuery PhaseAnswer PhaseSlide87
DNC: Experiments
87
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
.
, Nature
vol. 538 Graph Tasks: Freya’s Family TreeSlide88
Conclusion
Machine Learning models require memory and multi-hop reasoning to perform AI tasks betterMemory Networks for Text are an interesting direction but very simpleGeneric architectures with memory, such as Neural Turing Machine, limited applications shown
Future directions should be focusing on applying generic neural models with memory to more AI Tasks.
88
Hybrid computing using a neural network with dynamic external memory,
Graves et. al
., Nature vol. 538 Slide89
Reading List
Karol Kurach, Marcin Andrychowicz & Ilya Sutskever
Neural Random-Access Machines
, ICLR,
2016
Emilio
Parisotto & Ruslan Salakhutdinov Neural Map: Structured Memory for Deep Reinforcement Learning, ArXiv, 2017Pritzel et. al. Neural Episodic Control, ArXiv, 2017Oriol Vinyals,Meire Fortunato, Navdeep Jaitly Pointer Networks, ArXiv, 2017Jack W Rae et al., Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes, ArXiv 2016Antoine Bordes, Y-Lan Boureau, Jason Weston, Learning End-to-End Goal-Oriented
Dialog, ICLR 2017Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee, Control of Memory, Active Perception, and Action in Minecraft, ICML 2016
Wojciech Zaremba, Ilya Sutskever, Reinforcement Learning Neural Turing Machines, ArXiv 201689