Philip McParlane Yashar Moshfeghi and Joemon M Jose University of Glasgow UK httpwwwdcsglaacukphilip pmcparlane1researchglaacuk Motivation for annotating images Problems with existing automatic image annotation collections ID: 532932
Download Presentation The PPT/PDF document "Collections for Automatic Image Annotati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Collections for Automatic Image Annotation and Photo Tag Recommendation
Philip McParlane, Yashar Moshfeghi and Joemon M. JoseUniversity of Glasgow, UK
http://www.dcs.gla.ac.uk/~philip/
p.mcparlane.1@research.gla.ac.ukSlide2
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide3
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide4
With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively
Social image sharing websites depend on
manual annotation of their images.This has a large human cost however.
Plus, humans often tag with
irrelevant tags
(e.g. girl) or tags which are
opinionated
(e.g. cool)
etc.Slide5
With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively
Social image sharing websites depend on
manual annotation of their images.This has a large human cost however.
Plus, humans often tag with
irrelevant tags
(e.g. girl) or tags which are
opinionated
(e.g. cool)
etc.Slide6
With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively
Social image sharing websites depend on
manual annotation of their images.This has a large human cost however.
Plus, humans often tag with
irrelevant tags
(e.g. girl) or tags which are
opinionated
(e.g. cool)
etc.Slide7
Therefore,
research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)
Photo TagRecommendation (PTR)
AIA CollectionsMany public collections used
PTR Collections
Mostly non-public collections used
Evaluated on
Corel5k, Corel30k, ESP Game, IAPR, Google Images,
LabelMe
, Washington Collection, Caltech,
TrecVid
2007, Pascal 2007,
MiAlbum & 4 other small collections.The 20 most
cited AIA papers on CiteSeerX revealed that at least 15 collections had been used…
For photo tag recommendation, the most popular works use their own collections.
“considers the pixels” “considers those tags already added”
Evaluated onSlide8
Therefore,
research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)
Photo TagRecommendation (PTR)
AIA CollectionsMany public collections used
PTR Collections
Mostly non-public collections used
Evaluated on
Corel5k, Corel30k, ESP Game, IAPR, Google Images,
LabelMe
, Washington Collection, Caltech,
TrecVid
2007, Pascal 2007, MiAlbum & 4 other small collections.The 20 most
cited AIA papers on CiteSeerX revealed that at least 15 collections had been used…
For photo tag recommendation, the most popular works use their own collections.
“considers the pixels” “considers those tags already added”
Evaluated onSlide9
Therefore,
research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)
Photo TagRecommendation (PTR)
AIA CollectionsMany public collections used
PTR Collections
Mostly non-public collections used
Evaluated on
Corel5k, Corel30k, ESP Game, IAPR, Google Images,
LabelMe
, Washington Collection, Caltech,
TrecVid
2007, Pascal 2007, MiAlbum & 4 other small collections.The 20 most cited AIA papers on CiteSeerX
revealed that at least 15 collections had been used…
For photo tag recommendation, the most popular works use their own collections.
“considers the pixels” “considers those tags already added”
Evaluated onSlide10
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide11
In this work we consider 3 popular AIA evaluation collections used by recent work [4]
Corel5k [1] IAPR TC-12 [2] ESP Game [3]
[1] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.
P. Duygulu et al.
ECCV 02.
[2] Labeling images with a computer game.
L. von Ahn and L. Dabbish.
CHI '04.
[3] The IAPR TC-12 Benchmark- A New Evaluation Resource.
M. Grubinger, et al. Visual Information Systems, 2006.[4] Baselines for Image Annotation. A. Makadia, V. Pavlovic and S. Kumar. IJCV 2010
Automatic Image AnnotationSlide12
What are the problems with previous automatic image annotation collections?Slide13
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.Slide14
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.Slide15
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.Slide16
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.
Low Image Quality
Models are often tested on small, low quality image collections.Slide17
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.
Low Image Quality
Models are often tested on small, low quality image collections.
Lack of Meta-data
Despite the increase in research considering time/location
etc
, these collections don’t include these.Slide18
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.
Low Image Quality
Models are often tested on small, low quality image collections.
Lack of Meta-data
Despite the increase in research considering time/location
etc
, these collections don’t include these.
Lack of diversity
Collections often contain images taken on the same camera, at the sam
e place by the same user.Slide19
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.
Low Image Quality
Models are often tested on small, low quality image collections.
Lack of Meta-data
Despite the increase in research considering time/location
etc
, these collections don’t include these.
Lack of diversity
Collections often contain images taken on the same camera, at the sam
e place by the same user.
Location Tags
Locations, such as “
usa
”, are impossible to identify, however, these tags are often included in ground truths.Slide20
What are the problems with previous automatic image annotation collections?
Too many collections
There needs to a be a single, openly available collection to reproduce experiments.
Annotation Ambiguity
Collections use many synonyms in the annotation of images. e.g.
usa
/
america
etc.
Unnormalized
Models are able to “exploit” popular tags by promoting them, increasing performance measures.
Low Image Quality
Models are often tested on small, low quality image collections.
Lack of Meta-data
Despite the increase in research considering time/location
etc
, these collections don’t include these.
Lack of diversity
Collections often contain images taken on the same camera, at the sam
e place by the same user.
Location Tags
Locations, such as “
usa
”, are impossible to identify, however, these tags are often included in ground truths.
Copyright
Corel is bound by copyright making distribution difficult.Slide21
1. Problems
Annotation Ambiguity
All three collection ground truths contain:
Synonyms (e.g. america/
usa
)
Visually identical classes
(e.g. sea/ocean)
Corel5k IAPR TC-12 ESP GameSlide22
1. Problems
Annotation Ambiguity
All three collection ground truths contain:Synonyms (e.g. america/usa)Visually identical classes (e.g. sea/ocean)
To demonstrate this, we cluster tags which share a common WordNet “synonym set” (removing irrelevant matches manually).
Corel5k IAPR TC-12 ESP GameSlide23
Corel
polar/arctic
ocean/sea
36 of 374 tags
ice/frostSlide24
Corel
polar/arctic
ocean/sea
ESP
baby/child
child/kid
home/house
37 of 291 tags
36 of 374 tags
ice/frostSlide25
Corel
polar/arctic
ocean/sea
ESP
baby/child
child/kid
home/house
IAPR
woman/adult
bush/shrub
rock/stone
37 of 291 tags
26 of 269 tags
36 of 374 tags
ice/frostSlide26
31%
of photos in
the Corel collection contain at least 1 ambiguous tag.Slide27
31%
of photos in
the Corel collection contain at least 1 ambiguous tag.25%of photos in the ESP collection contain at least 1 ambiguous tag.Slide28
31%
of photos in
the Corel collection contain at least 1 ambiguous tag.25%of photos in the ESP collection contain at least 1 ambiguous tag.sixty three%of photos in the
IAPR collection contain at least 1 ambiguous tag.Slide29
Annotations:
sea
, usa, sky, chair
Test image #1Slide30
Annotations:
sea
, usa, sky, chair
Annotation Model #1sea, usa, blue, water, red
Test image #1
Precision
0.4
SuggestionsSlide31
Annotations:
sea
, usa, sky, chair
Annotation Model #1Annotation Model #2
sea
,
usa
, blue, water, red
Test image #1
ocean, america, blue, water, red
Precision
0.4
0
SuggestionsSlide32
Annotations:
sea
, usa, sky, chair
Annotation Model #1Annotation Model #2
sea
,
usa
, blue, water, red
Test image #1
ocean, america, blue, water, red
Precision
0.4
0
Suggestions
So why do we penalize a system which treats these concepts differently?
It is impossible to tell from an image’s
pixels
, whether it is of the sea or of the ocean. Slide33
2. Problems
Unnormalised
Collections
By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags
This causes problems:
Selection Bias:
Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.
Prediction Bias:
Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.
Corel5k IAPR TC-12 ESP GameSlide34
2. Problems
Unnormalised
Collections
By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags
This causes problems:
Selection Bias:
Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.
Prediction Bias:
Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.
Corel5k IAPR TC-12 ESP GameSlide35
2. Problems
Unnormalised
Collections
By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags
This causes problems:
Selection Bias:
Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.
Prediction Bias:
Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.
Corel5k IAPR TC-12 ESP GameSlide36
Tags
# Images
To demonstrate this “prediction bias”
(i.e. where annotation models can “cheat” by promoting popular tags)
We annotate each collection using the annotation model described in [6].
We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.
For each, we suggest only the tags in each subset.
[6] Baselines for Image Annotation.
A. Makadia, V. Pavlovic and S. Kumar.
IJCV 2010
Corel5k IAPR TC-12 ESP GameSlide37
Popular tags
Tags
# Images
To demonstrate this “prediction bias”
(i.e. where annotation models can “cheat” by promoting popular tags)
We annotate each collection using the annotation model described in [6].
We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.
For each, we suggest only the tags in each subset.
[6] Baselines for Image Annotation.
A. Makadia, V. Pavlovic and S. Kumar.
IJCV 2010
Corel5k IAPR TC-12 ESP GameSlide38
Popular tags
Medium Frequency
Tags
# Images
To demonstrate this “prediction bias”
(i.e. where annotation models can “cheat” by promoting popular tags)
We annotate each collection using the annotation model described in [6].
We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.
For each, we suggest only the tags in each subset.
[6] Baselines for Image Annotation.
A. Makadia, V. Pavlovic and S. Kumar.
IJCV 2010
Corel5k IAPR TC-12 ESP GameSlide39
Popular tags
Medium Frequency
Unpopular Tags
Tags
# Images
To demonstrate this “prediction bias”
(i.e. where annotation models can “cheat” by promoting popular tags)
We annotate each collection using the annotation model described in [6].
We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.
For each, we suggest only the tags in each subset.
[6] Baselines for Image Annotation.
A. Makadia, V. Pavlovic and S. Kumar.
IJCV 2010
Corel5k IAPR TC-12 ESP GameSlide40Slide41
Ultimately, higher annotation accuracy can be achieved by suggesting only popular tags.Slide42
3. Problems
Image quality/size
Despite the increase in Hadoop clusters & computational power, many works still test on small collections of low quality images.
CollectionSize (avg dimension)# ImagesCorel160px5,000ESP156px22,000IAPR417px20,000
Corel5k IAPR TC-12 ESP GameSlide43
4. Problems
Lack of meta-data
Many recent works have focussed on the exploitation of various meta-data [7,8]
e.g. Time, Location, Camera, User
Collection
Time
Location
Corel
XXESPXXIAPR✓✓
[7] On contextual photo tag recommendation.
P McParlane, Y Moshfeghi, J Jose SIGIR 2013[8] Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial. H Zhang et al. WSDM 2012
Corel5k IAPR TC-12 ESP GameSlide44
5. Problems
Lack of diversity
Images in each collection are often taken by
the same user, in the same place, of the same scene/object, using the same camera. This leads to natural clustering in image collections, making
annotation easier
due to high inter-cluster visual similarity.
Further there are duplicate images in the test and train sets, also making annotation easier.
Corel5k IAPR TC-12 ESP GameSlide45Slide46
6. Problems
Identifying Location
Identifying location (even high level) within an image is often difficult or sometimes impossible. Despite this, two of the three collections contain images annotated with locations (e.g. usa).
Given this image, would you know where it was taken?
Corel5k IAPR TC-12 ESP GameSlide47
6. Problems
Identifying Location
Identifying location (even high level) within an image is often difficult or sometimes impossible. Despite this, two of the three collections contain images annotated with locations (e.g. usa).
Given this image, would you know where it was taken?
Annotations:
sea,
usa
, sky, chair
If not, how can we expect an annotation model to predict the annotation “usa”?
Corel5k IAPR TC-12 ESP GameSlide48
7. Problems
Copyright
An evaluation collection should at least be free and distributable.
Unfortunately, the Corel collection is commercial and bound by copyright.
Corel5k IAPR TC-12 ESP GameSlide49
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide50
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.Slide51
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons licenseSlide52
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.Slide53
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.
Diverse image set
We search images for 2,000 WordNet categories & limit the number of images for each user.Slide54
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.
Diverse image set
We search images for 2,000 WordNet categories & limit the number of images for each user.
High quality
The dimension of each image is 719px on average. Slide55
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.
Diverse image set
We search images for 2,000 WordNet categories & limit the number of images for each user.
High quality
The dimension of each image is 719px on average.
No Location tags
We use WordNet to remove “location” tags from image ground truths (e.g. scotland).Slide56
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.
Diverse image set
We search images for 2,000 WordNet categories & limit the number of images for each user.
High quality
The dimension of each image is 719px on average.
No Location tags
We use WordNet to remove “location” tags from image ground truths (e.g. scotland).
Resolved Ambiguity
Tags which are synonyms (e.g. usa/america) are “merged” based on WordNet synonym sets.Slide57
Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Meta-data
Includes extensive location, user and time meta-data.
Diverse image set
We search images for 2,000 WordNet categories & limit the number of images for each user.
High quality
The dimension of each image is 719px on average.
No Location tags
We use WordNet to remove “location” tags from image ground truths (e.g. scotland).
Resolved Ambiguity
Tags which are synonyms (e.g. usa/america) are “merged” based on WordNet synonym sets.
Normalized
Aside from the normal ground truth, we include a “normalised” ground truth containing only medium frequency tags.Slide58
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide59
In this work we consider collection used in 2 popular photo tag recommendation works.
Photo Tag Recommendation
[5] Flickr Tag Recommendation based on Collective Knowledge.
B. Sigurbjornsson and R. van Zwol.
WWW '08.
[6] Personalized, Interactive Tag Recommendation for Flickr.
N. Garg and I. Weber.
ACM RecSys '08.
Sigurbjornsson [5] Garg [6]Slide60
1. Problems
Ground Truth
Sigurbjornsson use a small collection of images which have their ground truth’s crowdsourced.
For photo tag recommendation, however, many aspects that users would tag are often not explicit (e.g. locations, dates etc). Therefore, these annotations are missed using crowdsourcing.
Sigurbjornsson GargSlide61
Crowdsourced Worker
football red
team blueengland grass saturday
Comparing Annotations
Crowdsourced vs tags added by the user.Slide62
Crowdsourced
football red
team blueengland grass saturday
Comparing Annotations
Crowdsourced vs tags added by the user.
User Tags
football red
team blue
scotland hamilton acciesartificial grass dunfermline
sunday new douglas park Slide63
Crowdsourced
football red
team blueengland grass saturday
Comparing Annotations
Crowdsourced vs tags added by the user.
User Tags
football red
team blue
scotland hamilton accies
artificial grass dunfermline
sunday new douglas park Slide64
2. Problems
Synonymous Tags
One of the problems with using user tags, however, is that users often use many synonyms to annotate images.
Sigurbjornsson GargSlide65
Annotations:
newyork, ny, nyc, newyorkcity, york, timessquare
Test image #1Slide66
Annotations:
newyork, ny, nyc, newyorkcity, york, timessquare
Annotation Model #1
ny, nyc, newyork, york, city
Test image #1
Precision
0.8
SuggestionsSlide67
Annotations:
newyork, ny, nyc, newyorkcity, york, timessquare
Annotation Model #1
Annotation Model #2
ny, nyc, newyork, york, city
Test image #1
ny, timessquare, people, cab, empire
Precision
0.8
0.4
SuggestionsSlide68
Annotations:
newyork
, ny, nyc, newyorkcity, york, timessquare
Annotation Model #1Annotation Model #2
ny
,
nyc
,
newyork, york, city
Test image #1
ny, timessquare, people, cab, empire
Precision
0.8
0.4
SuggestionsSlide69
3. Problems
Free distribution
Existing collections [6,7] for photo tag recommendation were never released making comparable experiments difficult.
[6] Personalized, Interactive Tag Recommendation for Flickr. N. Garg and I. Weber. ACM RecSys '08.
[7] Flickr Tag Recommendation based on Collective Knowledge.
B. Sigurbjornsson and R. van Zwol.
WWW '08.
Sigurbjornsson GargSlide70
Motivation for annotating images
Problems with existing automatic image annotation collections
Problems with existing photo tag recommendation collections
Flickr-AIA
Flickr-PTR
Conclusions
We introduce
We introduceSlide71
Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.Slide72
Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.
Openly available
Using Flickr images under the creative commons licenseSlide73
Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.
Openly available
Using Flickr images under the creative commons license
Clustered User Tags
Using a crowdsourced experiment which asked user to group related tags to overcome the problem of synonyms.Slide74
To overcome synonyms in image annotations we took out an
crowdsourced experiment
which “grouped” the synonyms or the tags which refer to the same aspect.Slide75
Conclusions
This work
highlighted:
7 problems with existing AIA collections (Corel, ESP, IAPR)3 problems with existing PTR collections (Sigur, Garg)
With this in mind, we introduce two new, freely available image collections:
Flickr-AIA
312,000 Flickr images
Flickr-PTR
2,000,000 Flickr images
Automatic image annotation evaluation Photo tag recommendation evaluationSlide76
These collections are available at:
http://dcs.gla.ac.uk/~philip/
Thanks for listening!
[1] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.
P. Duygulu et al.
ECCV 02.
[2] Labeling images with a computer game.
L. von Ahn and L. Dabbish.
CHI '04.
[3] The IAPR TC-12 Benchmark- A New Evaluation Resource. M. Grubinger, et al. Visual Information Systems, 2006.[4] Flickr Tag Recommendation based on Collective Knowledge. B. Sigurbjornsson and R. van Zwol. WWW '08.[5] Personalized, Interactive Tag Recommendation for Flickr. N. Garg and I. Weber. ACM RecSys '08. [6] Baselines for Image Annotation. A. Makadia, V. Pavlovic and S. Kumar. IJCV 2010[7] On contextual photo tag recommendation. P McParlane, Y Moshfeghi, J Jose SIGIR 2013[8] Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial. H Zhang et al. WSDM 2012