/
Dhruv Batra (Virginia Tech) Dhruv Batra (Virginia Tech)

Dhruv Batra (Virginia Tech) - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
396 views
Uploaded On 2018-03-19

Dhruv Batra (Virginia Tech) - PPT Presentation

Larry Zitnick Facebook AI Research Devi Parikh Virginia Tech Stanislaw Antol Virginia Tech Aishwarya Agrawal Virginia Tech Overview of Challenge Outline Overview of Task and Dataset ID: 657585

real challenge open questions challenge real questions open ended abstract dataset choice overview multiple difficult accuracy task easy answers

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Dhruv Batra (Virginia Tech)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dhruv Batra

(Virginia Tech)

Larry

Zitnick

(Facebook AI Research)

Devi Parikh

(Virginia Tech)

Stanislaw

Antol

(Virginia Tech)

Aishwarya Agrawal

(Virginia Tech)

Overview of Challenge Slide2

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

2Slide3

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

3Slide4

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

4Slide5

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

5Slide6

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

6Slide7

VQA Task

7Slide8

VQA Task

What is the mustache made of?

8Slide9

VQA Task

What is the mustache made of?

AI System

9Slide10

VQA Task

What is the mustache made of?

bananas

AI System

10Slide11

R

eal images (from COCO)

Tsung

-Yi Lin

et al.

“Microsoft COCO: Common Objects in

COntext

.” ECCV 2014.

http://mscoco.org/

11Slide12

and abstract scenes.

12Slide13

Questions

Stump a smart robot!

Ask a question that a human can answer,

but a smart robot probably can’t!13Slide14

VQA Dataset

14Slide15

Dataset Stats

>250K images (COCO + 50K Abstract Scenes)

>750K questions (3 per image)

~10M answers (10 w/ image + 3 w/o image)

15Slide16

Two modalities of answeringOpen EndedMultiple Choice

(18 choices)1 correct answer

3 plausible choices10 most popular answers

Rest random answers

16Slide17

Accuracy Metric

17Slide18

Human Accuracy (Real)

Overall

Yes/No

Number

Other

Open

Ended83.3095.77

83.39

72.67

Multiple Choice

91.54

97.4086.9787.91

18Slide19

Human Accuracy (Real)

Overall

Yes/No

Number

Other

Open

Ended83.3095.77

83.39

72.67

Multiple Choice

91.54

97.4086.9787.91

19Slide20

Human Accuracy (Abstract)

Overall

Yes/No

Number

Other

Open

Ended87.4995.96

95.04

75.33

Multiple Choice

93.57

97.7896.7188.73

20Slide21

Human Accuracy (Abstract)

Overall

Yes/No

Number

Other

Open

Ended87.4995.96

95.04

75.33

Multiple Choice

93.57

97.7896.7188.73

21Slide22

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

22Slide23

VQA Challenges on

www.codalab.org

Real

Open Ended

Real

Multiple Choice

Abstract

Open Ended

Abstract

Multiple Choice

Real

Abstract

23Slide24

VQA Challenges on

www.codalab.org

Real

Open Ended

Real

Multiple Choice

Abstract

Open Ended

Abstract

Multiple Choice

Real

Abstract

24Slide25

Real Image Challenges: Dataset

Images

Questions

Answers

Training

80K

240K2.4MDataset size is approximate

25Slide26

Real Image Challenges: Dataset

Images

Questions

Answers

Training

80K

240K2.4MValidation

40K

120K

1.2MDataset size is approximate

26Slide27

Real Image Challenges: Dataset

Images

Questions

Answers

Training

80K

240K2.4MValidation

40K

120K

1.2MTest

80K

240KDataset size is approximate

27Slide28

Real Image Challenges: Test Dataset80K test imagesFour splits of 20K images each Test-

dev (development

)Debugging and Validation - unlimited submission to the evaluation server.

Test-standard (publications)Used to score entries for the Public Leaderboard. Test-challenge (competitions)Used

to rank challenge participants. Test-reserve

(check overfitting)

Used to estimate overfitting. Scores on this set are never released.Slide adapted from: MSCOCO Detection/Segmentation Challenge, ICCV 2015

Dataset size is approximate

28Slide29

VQA Challenges on

www.codalab.org

Real

Open Ended

Real

Multiple Choice

Abstract

Open Ended

Abstract

Multiple Choice

Real

Abstract

29Slide30

Abstract Scene Challenges: Dataset

Images

Questions

Answers

Training

20K

60K0.6M30Slide31

Abstract Scene Challenges: Dataset

Images

Questions

Answers

Training

20K

60K0.6MValidation

10K

30K

0.3M31Slide32

Abstract Scene Challenges: Dataset

Images

Questions

Answers

Training

20K

60K0.6MValidation

10K

30K

0.3MTest

20K

60K32Slide33

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

33Slide34

Award GPUs!!!

34Slide35

Abstract Scene ChallengesOpen-Ended Challenge5 teams 5 institutions

3 countriesMultiple-Choice Challenge

4 teams 4 institutions 3 countries

Top 3 teams are same for Open Ended and Multiple Choice35Slide36

Abstract Scene ChallengesWinner Team

MIL-UT

Andrew Shin*

Kuniaki

Saito*

Yoshitaka

Ushiku

Tatsuya Harada

Open Ended

Challenge Accuracy:

67.39

Multiple Choice

Challenge Accuracy:

71.18

36Slide37

Real Image ChallengesOpen-Ended Challenge25 teams

26 institutions 8 countriesMultiple-Choice Challenge

15 teams 17 institutions

6 countriesTop 5 teams are same for Open Ended and Multiple Choice37Slide38

Real Image ChallengesHonorable Mention

Brandeis

Aaditya

Prakash

Open Ended

Challenge Accuracy:

62.80

Multiple Choice

Challenge Accuracy:

65.17

38Slide39

Real Image ChallengesRunner-Up Team

Naver

Labs

Hyeonseob

Nam

Open Ended

Challenge Accuracy:

64.89

Multiple Choice

Challenge Accuracy:

69.37

 

Jeonghee

Kim

39Slide40

Real Image ChallengesWinner Team

UC Berkeley & Sony

Akira Fukui

Dong

Huk

Park

Daylen

Yang

Anna Rohrbach

Trevor

DarrellMarcus Rohrbach

Open Ended

Challenge Accuracy:

66.90

Multiple Choice

Challenge Accuracy:

70.52

40Slide41

Outline

Overview of Task and Dataset

Overview of Challenge

Winner Announcements

Analysis of Results

41Slide42

Real Open-Ended Challenge

ICCV15

arXiv

v6

42Slide43

Real Open-Ended Challenge

+12.76%

absolute

43Slide44

Statistical SignificanceBootstrap samples 5000 times@ 99% confidence

44Slide45

Real Open-Ended Challenge

45Slide46

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

46Slide47

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

47Slide48

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

80.6%

of questions can be answered by at least 1

method!

Difficult Questions

48Slide49

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

Difficult Questions

Easy Questions

49Slide50

Difficult Questions with Rare AnswersSlide51

Difficult Questions with Rare Answers

What is the name of …

What is the number on …

What is written on the …What does the sign say?What time is it?What kind of …What type of …

Why …

51Slide52

Easy vs. Difficult Questions

(Real Open-Ended Challenge)Slide53

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

Difficult Questions

with Frequent Answers

Easy Questions

53Slide54

Success Cases

Q:

What is the woman holding

?GT A: laptopMachine A: laptop

Q:

Is this a casino?

GT A: noMachine A: no

Q:

Is it going to rain soon

?GT A: yesMachine A:

yes

Q: What room is the cat located in?GT A: kitchenMachine A:

kitchen

54Slide55

Failure Cases

Q:

What is the woman holding

?GT A: bookMachine A: knife

Q:

Is the hydrant painted a new

color?GT A: yesMachine A: no

Q:

Why is there snow on one side of the

stream and clear grass on the other?GT A:

shadeMachine A:

yesQ: Where is the blue and white umbrella?

GT A: on left

Machine A:

right

55Slide56

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

Difficult Questions

Easy Questions

56Slide57

Easy vs. Difficult Questions

(Real Open-Ended Challenge)

57Slide58

Answer Type and Question Type AnalysesPer Answer Type No team statistically significantly better than winner

Per Question TypeNo team statistically significantly better than winner

58Slide59

Results of the Poll

25 responses

59Slide60

Image Modelling

60Slide61

Question Modelling

61Slide62

Question Word Modelling

62Slide63

Attention on Images

63Slide64

Attention on Questions

64Slide65

Use of Ensemble

65Slide66

Use of External Data Sources

66Slide67

Question Type Specific Mechanisms

67Slide68

Classification vs. Generation of Answers

68Slide69

Future PlansVQA Challenge 2017?What changes do you want?Sub tasks?More difficult/easy dataset?

Dialogue/conversational QA?New evaluation metric?

Other annotations?

69Slide70

Thanks!Questions?

70