Gabriel Recchia University of Cambridge Background The bAbI dataset Introducing the GABITS dataset http nowin2dcomgabits Some history Some history slide from Bordes ID: 464556
Download Presentation The PPT/PDF document "Considerations for Evaluating Models of ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Considerations for Evaluating Models of Language Understanding and Reasoning
Gabriel
Recchia
University of
CambridgeSlide2
Background: The
bAbI
dataset
Introducing the GABITS dataset
(
http
://nowin2d.com/gabits
/
)Slide3
Some historySlide4
Some history
(slide from
Bordes
, Weston, Chopra,
Mikolov
,
Joulin
&
Bottou
, 2015)Slide5
Generating
Process
Training Set
Test SetSlide6
Facebook’s bAbI
dataset
(slide from
Bordes
, Weston, Chopra,
Mikolov, Joulin
&
Bottou
, 2015)Slide7
The bAbI
dataset
(slide from
Bordes
, Weston, Chopra,
Mikolov, Joulin
&
Bottou
, 2015)Slide8
Introducing GABITS
The Grounded and
bAbI
-Inspired Task SetSlide9
Each training instance consists ofA narrative
A group of questions and associated answers
An image illustrating the state of the world at every point when something changes state
A symbolic representation of the state of the world at every point when something changes state (optional)
Introducing GABITS
The Grounded and
bAbI
-Inspired Task SetSlide10
1 The lamp is in the kitchen.
2 The ball is in the dining room.
3 Eve is in the hall.
4 Carol is in the kitchen.
5 Frank is in the hall.
6 Carol got the lamp.7 Eve went to the kitchen.8 Eve travelled to the billiard room.9 Frank travelled to the kitchen.10 Eve went to the kitchen.
11 Carol travelled to the billiard room.
12 Carol discarded the lamp.
13 Carol grabbed the lamp.
NarrativeSlide11
14 (T1.a) Who is in the kitchen?
Eve,Frank
15 (T2.a) Where is Eve?
kitchen
16 (T3.a) What is Carol holding?
lamp17 (T12.3) How many objects is Carol holding? one
18 (T3.b) Who is holding the lamp?
Carol
19 (T3.b) Who is holding the ball?
no one
20 (T4.a) What has Carol held?
lamp
21 (T4.b) Who has held the lamp? Carol22 (T6) Who moved the lamp to the billiard room? Carol23 (T7.a) Where has Eve been?
billiard room,hall,kitchen24 (T7.a) Where has Frank been? hall,kitchen25 (T7.c) Where has the lamp been? billiard
room,kitchen26 (T7.b) Where has Eve not been? dining room29 (T8.a) Who has been in the hall? Eve,Frank
30 (T8.c) What has been in the kitchen? lamp32 (T8.c) What has been in the billiard room? lamp33 (T8.b) Who has not been in the billiard room? FrankQuestionsSlide12
27 (T12.8) How many people have been in the kitchen?
t
hree
28 (T13.8) Have fewer than four people been in the kitchen?
yes
31 (T13.8) Have fewer than three objects been in the kitchen? yes37 (T11) Who has been in the hall or the dining room (but not both)?
Eve,Frank
38 (T9.a) Who has been in the dining room or the kitchen (or both)?
Carol,Eve,Frank
42 (T13.9) Have more than five people been in the billiard room or the hall or both?
no
Questions (cont.)Slide13
Visual representation of worldSlide14
1 The lamp is in the kitchen.
2 The ball is in the dining room.
3 Eve is in the hall.
4 Carol is in the kitchen.
5 Frank is in the hall.Slide15
6 Carol got the lamp.Slide16
7 Eve moved to the kitchen.Slide17
Symbolic representation of world
agent2.name Frank
agent2.x 170
agent2.y 414
(agent2.room hall)
item0.name lamp
item0.x 278
item0.y 408
(item0.room kitchen)
(item0.owner Carol)
item1.name ball
item1.x 118
item1.y 52(item1.room dining room)(item1.owner null)
time: 65(Carol took the lamp)agent0.name Eveagent0.x 149agent0.y 324(agent0.room hall)agent1.name Carolagent1.x 284agent1.y 414(agent1.room kitchen)Slide18
self-contained
:
all or nearly all
of the information necessary to perform well at the task
is
present within the training dataIt should be obviously possible for a human to solve the task even if they do not speak the language in which the task is rendered
AdvantagesSlide19
incremental and compositional
:
questions build
on
each other
Advantages
Who is in the hall?
Who has been in the hall?
Who has not been in the hall
?
Who has been in the hall and the lounge
?
How many people have been in the hall?
How many people have been in the hall and the lounge?Slide20
wide-coverage
:
the tasks in the dataset correspond to diverse
abilities
For even wider coverage - even more tasks!
In recent years, there has been an increasing number of papers with (mostly
) self-contained tasks involving two- or three-dimensional spatial
representations
Contact me for our list so far!
Or to let me know about more
tasks to add to the collection!
Advantages