Kiwon Yun 1 Yifan Peng 1 Dimitris Samaras 1 Gregory J Zelinsky 12 Tamara L Berg 3 1 Department of Computer Science Stony Brook University 2 Department of Psychology Stony Brook University ID: 284559
Download Presentation The PPT/PDF document "Studying Relationships Between Human Gaz..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Studying Relationships Between Human Gaze, Description, and Computer Vision
Kiwon Yun1, Yifan Peng1Dimitris Samaras1, Gregory J. Zelinsky1,2, Tamara L. Berg3 1Department of Computer Science, Stony Brook University2Department of Psychology, Stony Brook University3Department of Computer Science, University of North CarolinaComputer Vision and Pattern Recognition (CVPR) 2013Slide2
Overview
User behavior while freely viewing images contains an abundance of information - about user intent and depicted scene content.Human gaze “where” the important things are in an image.Description “what” is in an image, which parts of an image are important to the viewer.Computer vision “what” might be “where” in an image. However, it will always be noisy and have no knowledge of importance.Slide3
Overview
User behavior while freely viewing images contains an abundance of information - about user intent and depicted scene content.2. From these exploratory analyses, we build prototype applications for gaze-enabled object detection and annotation.Human gaze
Description
An old black
woman
wearing a turban and a headdress and a
dog
next to her wearing the same red headdress
Image content
1
.
We conduct several experiments to better understand the relationship between gaze, description, and image content.Slide4
SUN09104 images, free-viewing for 5 secondseye movements from 8 observers, each of whom provided a scene description.Datasets
PASCAL VOC1,000 images, free-viewing for 3 secondseye movements from 3 observers5 natural language descriptions per image from different observers.Slide5
Experiments and AnalysesPeople are more likely to look
at people, other animals, televisions, and vehicles.People are less likely to look at chairs, bottles, potted plants, drawers, and rugs.Animate objects are much more likely to be fixated than inanimate objects (0.636 vs. 0.495).Gaze vs. Object Type
Probability of being
fixated when
present
for various object categories
(top: PASCAL, bottom: SUN09)Slide6
Experiments and AnalysesGaze vs.
Location on Objectspersonhorsebirdbustraintvbicyclechairtable
cabinet
curtain
plant
Animate objects are much more likely to be described than inanimate objects
(0.843 vs. 0.545).
What objects do people describe?Slide7
Experiments and AnalysesWhat is the relationship between gaze and description?
P (fixated | described)P (described | fixated)PASCAL0.8660.952SUN090.7370.725
S1: A man is reading the label on a beverage bottle.
S2: A man looking at the bottle of beer that he is holding.
S3: The man in a white tee shirt is holding a beer bottle and looking at it.
S4: The scraggly haired man is holding up and admiring his bottle of beer.
S5: Young man with curly black hair holding a beer bottle.
Fixated objects: bottle,
person.
Described
objects: bottle, personSlide8
Gaze-Enabled Computer Vision
Analysis of Human Gaze with Object DetectorsPotential for gaze to increase the performance of object detectors varies by object category.Slide9
Gaze-Enabled Computer VisionGaze-Enabled Object Detection and Annotation
Combine gaze and automated object detection methods to create a collaborative system for detection and annotation.Slide10
Gaze-Enabled Computer VisionGaze-Enabled Object Detection and AnnotationSlide11
ConclusionThrough a series of behavioral studies and experimental evaluations, we explored the information
contained in eye movements and description, and analyzed their relationship with image content. We also examined the complex relationships between human gaze and outputs of current visual detection methods. In future work, we will build on this work in the development of more intelligent human-computer interactive systems for image understanding. [1] Studying Relationships Between Human Gaze, Description, and Computer Vision, Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, and Tamara L. Berg, Computer Vision and Pattern Recognition (CVPR) 2013 (Oregon/USA)[2] Specifying the Relationships Between Objects, Gaze, and Descriptions for Scene Understanding, Kiwon Yun, Yifan Peng, Hossein Adeli, Tamara L. Berg, Dimitris Samaras, and Gregory J. Zelinsky, Visual Science Society (VSS) 2013 (Florida/USA)