/
Collections for Automatic Image Annotation and Photo Tag Re Collections for Automatic Image Annotation and Photo Tag Re

Collections for Automatic Image Annotation and Photo Tag Re - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
459 views
Uploaded On 2017-04-02

Collections for Automatic Image Annotation and Photo Tag Re - PPT Presentation

Philip McParlane Yashar Moshfeghi and Joemon M Jose University of Glasgow UK httpwwwdcsglaacukphilip pmcparlane1researchglaacuk Motivation for annotating images Problems with existing automatic image annotation collections ID: 532932

tags collections annotation images collections tags images annotation image flickr problems popular tag aia collection esp automatic iapr user

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Collections for Automatic Image Annotati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Collections for Automatic Image Annotation and Photo Tag Recommendation

Philip McParlane, Yashar Moshfeghi and Joemon M. JoseUniversity of Glasgow, UK

http://www.dcs.gla.ac.uk/~philip/

p.mcparlane.1@research.gla.ac.ukSlide2

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide3

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide4

With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively

Social image sharing websites depend on

manual annotation of their images.This has a large human cost however.

Plus, humans often tag with

irrelevant tags

(e.g. girl) or tags which are

opinionated

(e.g. cool)

etc.Slide5

With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively

Social image sharing websites depend on

manual annotation of their images.This has a large human cost however.

Plus, humans often tag with

irrelevant tags

(e.g. girl) or tags which are

opinionated

(e.g. cool)

etc.Slide6

With the amount of multimedia data rapidly increasing, it becomes important to organize this content effectively

Social image sharing websites depend on

manual annotation of their images.This has a large human cost however.

Plus, humans often tag with

irrelevant tags

(e.g. girl) or tags which are

opinionated

(e.g. cool)

etc.Slide7

Therefore,

research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)

Photo TagRecommendation (PTR)

AIA CollectionsMany public collections used

PTR Collections

Mostly non-public collections used

Evaluated on

Corel5k, Corel30k, ESP Game, IAPR, Google Images,

LabelMe

, Washington Collection, Caltech,

TrecVid

2007, Pascal 2007,

MiAlbum & 4 other small collections.The 20 most

cited AIA papers on CiteSeerX revealed that at least 15 collections had been used…

For photo tag recommendation, the most popular works use their own collections.

“considers the pixels” “considers those tags already added”

Evaluated onSlide8

Therefore,

research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)

Photo TagRecommendation (PTR)

AIA CollectionsMany public collections used

PTR Collections

Mostly non-public collections used

Evaluated on

Corel5k, Corel30k, ESP Game, IAPR, Google Images,

LabelMe

, Washington Collection, Caltech,

TrecVid

2007, Pascal 2007, MiAlbum & 4 other small collections.The 20 most

cited AIA papers on CiteSeerX revealed that at least 15 collections had been used…

For photo tag recommendation, the most popular works use their own collections.

“considers the pixels” “considers those tags already added”

Evaluated onSlide9

Therefore,

research has focussed on the automatic annotation of imagesAutomatic image annotation (AIA)

Photo TagRecommendation (PTR)

AIA CollectionsMany public collections used

PTR Collections

Mostly non-public collections used

Evaluated on

Corel5k, Corel30k, ESP Game, IAPR, Google Images,

LabelMe

, Washington Collection, Caltech,

TrecVid

2007, Pascal 2007, MiAlbum & 4 other small collections.The 20 most cited AIA papers on CiteSeerX

revealed that at least 15 collections had been used…

For photo tag recommendation, the most popular works use their own collections.

“considers the pixels” “considers those tags already added”

Evaluated onSlide10

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide11

In this work we consider 3 popular AIA evaluation collections used by recent work [4]

Corel5k [1] IAPR TC-12 [2] ESP Game [3]

[1] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.

P. Duygulu et al.

ECCV 02.

[2] Labeling images with a computer game.

L. von Ahn and L. Dabbish.

CHI '04.

[3] The IAPR TC-12 Benchmark- A New Evaluation Resource.

M. Grubinger, et al. Visual Information Systems, 2006.[4] Baselines for Image Annotation. A. Makadia, V. Pavlovic and S. Kumar. IJCV 2010

Automatic Image AnnotationSlide12

What are the problems with previous automatic image annotation collections?Slide13

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.Slide14

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.Slide15

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.Slide16

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.

Low Image Quality

Models are often tested on small, low quality image collections.Slide17

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.

Low Image Quality

Models are often tested on small, low quality image collections.

Lack of Meta-data

Despite the increase in research considering time/location

etc

, these collections don’t include these.Slide18

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.

Low Image Quality

Models are often tested on small, low quality image collections.

Lack of Meta-data

Despite the increase in research considering time/location

etc

, these collections don’t include these.

Lack of diversity

Collections often contain images taken on the same camera, at the sam

e place by the same user.Slide19

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.

Low Image Quality

Models are often tested on small, low quality image collections.

Lack of Meta-data

Despite the increase in research considering time/location

etc

, these collections don’t include these.

Lack of diversity

Collections often contain images taken on the same camera, at the sam

e place by the same user.

Location Tags

Locations, such as “

usa

”, are impossible to identify, however, these tags are often included in ground truths.Slide20

What are the problems with previous automatic image annotation collections?

Too many collections

There needs to a be a single, openly available collection to reproduce experiments.

Annotation Ambiguity

Collections use many synonyms in the annotation of images. e.g.

usa

/

america

etc.

Unnormalized

Models are able to “exploit” popular tags by promoting them, increasing performance measures.

Low Image Quality

Models are often tested on small, low quality image collections.

Lack of Meta-data

Despite the increase in research considering time/location

etc

, these collections don’t include these.

Lack of diversity

Collections often contain images taken on the same camera, at the sam

e place by the same user.

Location Tags

Locations, such as “

usa

”, are impossible to identify, however, these tags are often included in ground truths.

Copyright

Corel is bound by copyright making distribution difficult.Slide21

1. Problems

Annotation Ambiguity

All three collection ground truths contain:

Synonyms (e.g. america/

usa

)

Visually identical classes

(e.g. sea/ocean)

Corel5k IAPR TC-12 ESP GameSlide22

1. Problems

Annotation Ambiguity

All three collection ground truths contain:Synonyms (e.g. america/usa)Visually identical classes (e.g. sea/ocean)

To demonstrate this, we cluster tags which share a common WordNet “synonym set” (removing irrelevant matches manually).

Corel5k IAPR TC-12 ESP GameSlide23

Corel

polar/arctic

ocean/sea

36 of 374 tags

ice/frostSlide24

Corel

polar/arctic

ocean/sea

ESP

baby/child

child/kid

home/house

37 of 291 tags

36 of 374 tags

ice/frostSlide25

Corel

polar/arctic

ocean/sea

ESP

baby/child

child/kid

home/house

IAPR

woman/adult

bush/shrub

rock/stone

37 of 291 tags

26 of 269 tags

36 of 374 tags

ice/frostSlide26

31%

of photos in

the Corel collection contain at least 1 ambiguous tag.Slide27

31%

of photos in

the Corel collection contain at least 1 ambiguous tag.25%of photos in the ESP collection contain at least 1 ambiguous tag.Slide28

31%

of photos in

the Corel collection contain at least 1 ambiguous tag.25%of photos in the ESP collection contain at least 1 ambiguous tag.sixty three%of photos in the

IAPR collection contain at least 1 ambiguous tag.Slide29

Annotations:

sea

, usa, sky, chair

Test image #1Slide30

Annotations:

sea

, usa, sky, chair

Annotation Model #1sea, usa, blue, water, red

Test image #1

Precision

0.4

SuggestionsSlide31

Annotations:

sea

, usa, sky, chair

Annotation Model #1Annotation Model #2

sea

,

usa

, blue, water, red

Test image #1

ocean, america, blue, water, red

Precision

0.4

0

SuggestionsSlide32

Annotations:

sea

, usa, sky, chair

Annotation Model #1Annotation Model #2

sea

,

usa

, blue, water, red

Test image #1

ocean, america, blue, water, red

Precision

0.4

0

Suggestions

So why do we penalize a system which treats these concepts differently?

It is impossible to tell from an image’s

pixels

, whether it is of the sea or of the ocean. Slide33

2. Problems

Unnormalised

Collections

By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags

This causes problems:

Selection Bias:

Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.

Prediction Bias:

Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.

Corel5k IAPR TC-12 ESP GameSlide34

2. Problems

Unnormalised

Collections

By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags

This causes problems:

Selection Bias:

Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.

Prediction Bias:

Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.

Corel5k IAPR TC-12 ESP GameSlide35

2. Problems

Unnormalised

Collections

By nature, the classes used in image collections follow a long tail distribution i.e. there exist a few popular tags and many unpopular tags

This causes problems:

Selection Bias:

Popular tags exist in more training and test images. Therefore, annotation models are more likely to test on popular classes.

Prediction Bias:

Popular tags occur in more test images. Therefore, annotation models can potentially “cheat” by promoting only popular tags, instead of making predictions based purely on the pixels.

Corel5k IAPR TC-12 ESP GameSlide36

Tags

# Images

To demonstrate this “prediction bias”

(i.e. where annotation models can “cheat” by promoting popular tags)

We annotate each collection using the annotation model described in [6].

We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.

For each, we suggest only the tags in each subset.

[6] Baselines for Image Annotation.

A. Makadia, V. Pavlovic and S. Kumar.

IJCV 2010

Corel5k IAPR TC-12 ESP GameSlide37

Popular tags

Tags

# Images

To demonstrate this “prediction bias”

(i.e. where annotation models can “cheat” by promoting popular tags)

We annotate each collection using the annotation model described in [6].

We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.

For each, we suggest only the tags in each subset.

[6] Baselines for Image Annotation.

A. Makadia, V. Pavlovic and S. Kumar.

IJCV 2010

Corel5k IAPR TC-12 ESP GameSlide38

Popular tags

Medium Frequency

Tags

# Images

To demonstrate this “prediction bias”

(i.e. where annotation models can “cheat” by promoting popular tags)

We annotate each collection using the annotation model described in [6].

We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.

For each, we suggest only the tags in each subset.

[6] Baselines for Image Annotation.

A. Makadia, V. Pavlovic and S. Kumar.

IJCV 2010

Corel5k IAPR TC-12 ESP GameSlide39

Popular tags

Medium Frequency

Unpopular Tags

Tags

# Images

To demonstrate this “prediction bias”

(i.e. where annotation models can “cheat” by promoting popular tags)

We annotate each collection using the annotation model described in [6].

We split the vocabulary into 3 subsets of popular, medium frequency and unpopular tags.

For each, we suggest only the tags in each subset.

[6] Baselines for Image Annotation.

A. Makadia, V. Pavlovic and S. Kumar.

IJCV 2010

Corel5k IAPR TC-12 ESP GameSlide40
Slide41

Ultimately, higher annotation accuracy can be achieved by suggesting only popular tags.Slide42

3. Problems

Image quality/size

Despite the increase in Hadoop clusters & computational power, many works still test on small collections of low quality images.

CollectionSize (avg dimension)# ImagesCorel160px5,000ESP156px22,000IAPR417px20,000

Corel5k IAPR TC-12 ESP GameSlide43

4. Problems

Lack of meta-data

Many recent works have focussed on the exploitation of various meta-data [7,8]

e.g. Time, Location, Camera, User

Collection

Time

Location

Corel

XXESPXXIAPR✓✓

[7] On contextual photo tag recommendation.

P McParlane, Y Moshfeghi, J Jose SIGIR 2013[8] Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial. H Zhang et al. WSDM 2012

Corel5k IAPR TC-12 ESP GameSlide44

5. Problems

Lack of diversity

Images in each collection are often taken by

the same user, in the same place, of the same scene/object, using the same camera. This leads to natural clustering in image collections, making

annotation easier

due to high inter-cluster visual similarity.

Further there are duplicate images in the test and train sets, also making annotation easier.

Corel5k IAPR TC-12 ESP GameSlide45
Slide46

6. Problems

Identifying Location

Identifying location (even high level) within an image is often difficult or sometimes impossible. Despite this, two of the three collections contain images annotated with locations (e.g. usa).

Given this image, would you know where it was taken?

Corel5k IAPR TC-12 ESP GameSlide47

6. Problems

Identifying Location

Identifying location (even high level) within an image is often difficult or sometimes impossible. Despite this, two of the three collections contain images annotated with locations (e.g. usa).

Given this image, would you know where it was taken?

Annotations:

sea,

usa

, sky, chair

If not, how can we expect an annotation model to predict the annotation “usa”?

Corel5k IAPR TC-12 ESP GameSlide48

7. Problems

Copyright

An evaluation collection should at least be free and distributable.

Unfortunately, the Corel collection is commercial and bound by copyright.

Corel5k IAPR TC-12 ESP GameSlide49

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide50

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.Slide51

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons licenseSlide52

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.Slide53

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.

Diverse image set

We search images for 2,000 WordNet categories & limit the number of images for each user.Slide54

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.

Diverse image set

We search images for 2,000 WordNet categories & limit the number of images for each user.

High quality

The dimension of each image is 719px on average. Slide55

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.

Diverse image set

We search images for 2,000 WordNet categories & limit the number of images for each user.

High quality

The dimension of each image is 719px on average.

No Location tags

We use WordNet to remove “location” tags from image ground truths (e.g. scotland).Slide56

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.

Diverse image set

We search images for 2,000 WordNet categories & limit the number of images for each user.

High quality

The dimension of each image is 719px on average.

No Location tags

We use WordNet to remove “location” tags from image ground truths (e.g. scotland).

Resolved Ambiguity

Tags which are synonyms (e.g. usa/america) are “merged” based on WordNet synonym sets.Slide57

Flickr-AIA contains 312,000 images from Flickr built with AIA evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Meta-data

Includes extensive location, user and time meta-data.

Diverse image set

We search images for 2,000 WordNet categories & limit the number of images for each user.

High quality

The dimension of each image is 719px on average.

No Location tags

We use WordNet to remove “location” tags from image ground truths (e.g. scotland).

Resolved Ambiguity

Tags which are synonyms (e.g. usa/america) are “merged” based on WordNet synonym sets.

Normalized

Aside from the normal ground truth, we include a “normalised” ground truth containing only medium frequency tags.Slide58

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide59

In this work we consider collection used in 2 popular photo tag recommendation works.

Photo Tag Recommendation

[5] Flickr Tag Recommendation based on Collective Knowledge.

B. Sigurbjornsson and R. van Zwol.

WWW '08.

[6] Personalized, Interactive Tag Recommendation for Flickr.

N. Garg and I. Weber.

ACM RecSys '08.

Sigurbjornsson [5] Garg [6]Slide60

1. Problems

Ground Truth

Sigurbjornsson use a small collection of images which have their ground truth’s crowdsourced.

For photo tag recommendation, however, many aspects that users would tag are often not explicit (e.g. locations, dates etc). Therefore, these annotations are missed using crowdsourcing.

Sigurbjornsson GargSlide61

Crowdsourced Worker

football red

team blueengland grass saturday

Comparing Annotations

Crowdsourced vs tags added by the user.Slide62

Crowdsourced

football red

team blueengland grass saturday

Comparing Annotations

Crowdsourced vs tags added by the user.

User Tags

football red

team blue

scotland hamilton acciesartificial grass dunfermline

sunday new douglas park Slide63

Crowdsourced

football red

team blueengland grass saturday

Comparing Annotations

Crowdsourced vs tags added by the user.

User Tags

football red

team blue

scotland hamilton accies

artificial grass dunfermline

sunday new douglas park Slide64

2. Problems

Synonymous Tags

One of the problems with using user tags, however, is that users often use many synonyms to annotate images.

Sigurbjornsson GargSlide65

Annotations:

newyork, ny, nyc, newyorkcity, york, timessquare

Test image #1Slide66

Annotations:

newyork, ny, nyc, newyorkcity, york, timessquare

Annotation Model #1

ny, nyc, newyork, york, city

Test image #1

Precision

0.8

SuggestionsSlide67

Annotations:

newyork, ny, nyc, newyorkcity, york, timessquare

Annotation Model #1

Annotation Model #2

ny, nyc, newyork, york, city

Test image #1

ny, timessquare, people, cab, empire

Precision

0.8

0.4

SuggestionsSlide68

Annotations:

newyork

, ny, nyc, newyorkcity, york, timessquare

Annotation Model #1Annotation Model #2

ny

,

nyc

,

newyork, york, city

Test image #1

ny, timessquare, people, cab, empire

Precision

0.8

0.4

SuggestionsSlide69

3. Problems

Free distribution

Existing collections [6,7] for photo tag recommendation were never released making comparable experiments difficult.

[6] Personalized, Interactive Tag Recommendation for Flickr. N. Garg and I. Weber. ACM RecSys '08.

[7] Flickr Tag Recommendation based on Collective Knowledge.

B. Sigurbjornsson and R. van Zwol.

WWW '08.

Sigurbjornsson GargSlide70

Motivation for annotating images

Problems with existing automatic image annotation collections

Problems with existing photo tag recommendation collections

Flickr-AIA

Flickr-PTR

Conclusions

We introduce

We introduceSlide71

Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.Slide72

Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.

Openly available

Using Flickr images under the creative commons licenseSlide73

Flickr-PTR contains details of 2,000,000 images from Flickr built with PTR evaluation in mind.

Openly available

Using Flickr images under the creative commons license

Clustered User Tags

Using a crowdsourced experiment which asked user to group related tags to overcome the problem of synonyms.Slide74

To overcome synonyms in image annotations we took out an

crowdsourced experiment

which “grouped” the synonyms or the tags which refer to the same aspect.Slide75

Conclusions

This work

highlighted:

7 problems with existing AIA collections (Corel, ESP, IAPR)3 problems with existing PTR collections (Sigur, Garg)

With this in mind, we introduce two new, freely available image collections:

Flickr-AIA

312,000 Flickr images

Flickr-PTR

2,000,000 Flickr images

Automatic image annotation evaluation Photo tag recommendation evaluationSlide76

These collections are available at:

http://dcs.gla.ac.uk/~philip/

Thanks for listening!

[1] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary.

P. Duygulu et al.

ECCV 02.

[2] Labeling images with a computer game.

L. von Ahn and L. Dabbish.

CHI '04.

[3] The IAPR TC-12 Benchmark- A New Evaluation Resource. M. Grubinger, et al. Visual Information Systems, 2006.[4] Flickr Tag Recommendation based on Collective Knowledge. B. Sigurbjornsson and R. van Zwol. WWW '08.[5] Personalized, Interactive Tag Recommendation for Flickr. N. Garg and I. Weber. ACM RecSys '08. [6] Baselines for Image Annotation. A. Makadia, V. Pavlovic and S. Kumar. IJCV 2010[7] On contextual photo tag recommendation. P McParlane, Y Moshfeghi, J Jose SIGIR 2013[8] Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial. H Zhang et al. WSDM 2012