Larry Zitnick Facebook AI Research Devi Parikh Virginia Tech Stanislaw Antol Virginia Tech Aishwarya Agrawal Virginia Tech Overview of Challenge Outline Overview of Task and Dataset ID: 657585
Download Presentation The PPT/PDF document "Dhruv Batra (Virginia Tech)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Dhruv Batra
(Virginia Tech)
Larry
Zitnick
(Facebook AI Research)
Devi Parikh
(Virginia Tech)
Stanislaw
Antol
(Virginia Tech)
Aishwarya Agrawal
(Virginia Tech)
Overview of Challenge Slide2
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
2Slide3
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
3Slide4
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
4Slide5
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
5Slide6
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
6Slide7
VQA Task
7Slide8
VQA Task
What is the mustache made of?
8Slide9
VQA Task
What is the mustache made of?
AI System
9Slide10
VQA Task
What is the mustache made of?
bananas
AI System
10Slide11
R
eal images (from COCO)
Tsung
-Yi Lin
et al.
“Microsoft COCO: Common Objects in
COntext
.” ECCV 2014.
http://mscoco.org/
11Slide12
and abstract scenes.
12Slide13
Questions
Stump a smart robot!
Ask a question that a human can answer,
but a smart robot probably can’t!13Slide14
VQA Dataset
14Slide15
Dataset Stats
>250K images (COCO + 50K Abstract Scenes)
>750K questions (3 per image)
~10M answers (10 w/ image + 3 w/o image)
15Slide16
Two modalities of answeringOpen EndedMultiple Choice
(18 choices)1 correct answer
3 plausible choices10 most popular answers
Rest random answers
16Slide17
Accuracy Metric
17Slide18
Human Accuracy (Real)
Overall
Yes/No
Number
Other
Open
Ended83.3095.77
83.39
72.67
Multiple Choice
91.54
97.4086.9787.91
18Slide19
Human Accuracy (Real)
Overall
Yes/No
Number
Other
Open
Ended83.3095.77
83.39
72.67
Multiple Choice
91.54
97.4086.9787.91
19Slide20
Human Accuracy (Abstract)
Overall
Yes/No
Number
Other
Open
Ended87.4995.96
95.04
75.33
Multiple Choice
93.57
97.7896.7188.73
20Slide21
Human Accuracy (Abstract)
Overall
Yes/No
Number
Other
Open
Ended87.4995.96
95.04
75.33
Multiple Choice
93.57
97.7896.7188.73
21Slide22
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
22Slide23
VQA Challenges on
www.codalab.org
Real
Open Ended
Real
Multiple Choice
Abstract
Open Ended
Abstract
Multiple Choice
Real
Abstract
23Slide24
VQA Challenges on
www.codalab.org
Real
Open Ended
Real
Multiple Choice
Abstract
Open Ended
Abstract
Multiple Choice
Real
Abstract
24Slide25
Real Image Challenges: Dataset
Images
Questions
Answers
Training
80K
240K2.4MDataset size is approximate
25Slide26
Real Image Challenges: Dataset
Images
Questions
Answers
Training
80K
240K2.4MValidation
40K
120K
1.2MDataset size is approximate
26Slide27
Real Image Challenges: Dataset
Images
Questions
Answers
Training
80K
240K2.4MValidation
40K
120K
1.2MTest
80K
240KDataset size is approximate
27Slide28
Real Image Challenges: Test Dataset80K test imagesFour splits of 20K images each Test-
dev (development
)Debugging and Validation - unlimited submission to the evaluation server.
Test-standard (publications)Used to score entries for the Public Leaderboard. Test-challenge (competitions)Used
to rank challenge participants. Test-reserve
(check overfitting)
Used to estimate overfitting. Scores on this set are never released.Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015
Dataset size is approximate
28Slide29
VQA Challenges on
www.codalab.org
Real
Open Ended
Real
Multiple Choice
Abstract
Open Ended
Abstract
Multiple Choice
Real
Abstract
29Slide30
Abstract Scene Challenges: Dataset
Images
Questions
Answers
Training
20K
60K0.6M30Slide31
Abstract Scene Challenges: Dataset
Images
Questions
Answers
Training
20K
60K0.6MValidation
10K
30K
0.3M31Slide32
Abstract Scene Challenges: Dataset
Images
Questions
Answers
Training
20K
60K0.6MValidation
10K
30K
0.3MTest
20K
60K32Slide33
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
33Slide34
Award GPUs!!!
34Slide35
Abstract Scene ChallengesOpen-Ended Challenge5 teams 5 institutions
3 countriesMultiple-Choice Challenge
4 teams 4 institutions 3 countries
Top 3 teams are same for Open Ended and Multiple Choice35Slide36
Abstract Scene ChallengesWinner Team
MIL-UT
Andrew Shin*
Kuniaki
Saito*
Yoshitaka
Ushiku
Tatsuya Harada
Open Ended
Challenge Accuracy:
67.39
Multiple Choice
Challenge Accuracy:
71.18
36Slide37
Real Image ChallengesOpen-Ended Challenge25 teams
26 institutions 8 countriesMultiple-Choice Challenge
15 teams 17 institutions
6 countriesTop 5 teams are same for Open Ended and Multiple Choice37Slide38
Real Image ChallengesHonorable Mention
Brandeis
Aaditya
Prakash
Open Ended
Challenge Accuracy:
62.80
Multiple Choice
Challenge Accuracy:
65.17
38Slide39
Real Image ChallengesRunner-Up Team
Naver
Labs
Hyeonseob
Nam
Open Ended
Challenge Accuracy:
64.89
Multiple Choice
Challenge Accuracy:
69.37
Jeonghee
Kim
39Slide40
Real Image ChallengesWinner Team
UC Berkeley & Sony
Akira Fukui
Dong
Huk
Park
Daylen
Yang
Anna Rohrbach
Trevor
DarrellMarcus Rohrbach
Open Ended
Challenge Accuracy:
66.90
Multiple Choice
Challenge Accuracy:
70.52
40Slide41
Outline
Overview of Task and Dataset
Overview of Challenge
Winner Announcements
Analysis of Results
41Slide42
Real Open-Ended Challenge
ICCV15
arXiv
v6
42Slide43
Real Open-Ended Challenge
+12.76%
absolute
43Slide44
Statistical SignificanceBootstrap samples 5000 times@ 99% confidence
44Slide45
Real Open-Ended Challenge
45Slide46
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
46Slide47
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
47Slide48
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
80.6%
of questions can be answered by at least 1
method!
Difficult Questions
48Slide49
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
Difficult Questions
Easy Questions
49Slide50
Difficult Questions with Rare AnswersSlide51
Difficult Questions with Rare Answers
What is the name of …
What is the number on …
What is written on the …What does the sign say?What time is it?What kind of …What type of …
Why …
51Slide52
Easy vs. Difficult Questions
(Real Open-Ended Challenge)Slide53
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
Difficult Questions
with Frequent Answers
Easy Questions
53Slide54
Success Cases
Q:
What is the woman holding
?GT A: laptopMachine A: laptop
Q:
Is this a casino?
GT A: noMachine A: no
Q:
Is it going to rain soon
?GT A: yesMachine A:
yes
Q: What room is the cat located in?GT A: kitchenMachine A:
kitchen
54Slide55
Failure Cases
Q:
What is the woman holding
?GT A: bookMachine A: knife
Q:
Is the hydrant painted a new
color?GT A: yesMachine A: no
Q:
Why is there snow on one side of the
stream and clear grass on the other?GT A:
shadeMachine A:
yesQ: Where is the blue and white umbrella?
GT A: on left
Machine A:
right
55Slide56
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
Difficult Questions
Easy Questions
56Slide57
Easy vs. Difficult Questions
(Real Open-Ended Challenge)
57Slide58
Answer Type and Question Type AnalysesPer Answer Type No team statistically significantly better than winner
Per Question TypeNo team statistically significantly better than winner
58Slide59
Results of the Poll
25 responses
59Slide60
Image Modelling
60Slide61
Question Modelling
61Slide62
Question Word Modelling
62Slide63
Attention on Images
63Slide64
Attention on Questions
64Slide65
Use of Ensemble
65Slide66
Use of External Data Sources
66Slide67
Question Type Specific Mechanisms
67Slide68
Classification vs. Generation of Answers
68Slide69
Future PlansVQA Challenge 2017?What changes do you want?Sub tasks?More difficult/easy dataset?
Dialogue/conversational QA?New evaluation metric?
Other annotations?
69Slide70
Thanks!Questions?
70