/
Accounting for the Accounting for the

Accounting for the - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
400 views
Uploaded On 2017-11-10

Accounting for the - PPT Presentation

relative importance of objects in image retrieval Sung Ju Hwang and Kristen Grauman University of Texas at Austin Image retrieval Query image Image Database Image 1 Image 2 Image k Contentbased retrieval from an image database ID: 604072

tag image query cow image tag cow query visual retrieval objects space importance tags kernel images database features bird

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Accounting for the" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Accounting for the relative importance of objects in image retrieval

Sung

Ju

Hwang and Kristen

Grauman

University of Texas at AustinSlide2

Image retrieval

Query image

Image Database

Image 1

Image 2

Image k

Content-based retrieval from an image database

…Slide3

Relative importance of objects

Query image

Image Database

Which image is more relevant to the query?

?Slide4

Relative importance of objects

Query image

cow

bird

water

cow

bird

water

Image Database

cow

fence

mud

Which image is more relevant to the query?

?

skySlide5

Relative importance of objects

An image can contain many different objects,

but some are more “important” than others.

sky

water

mountain

architecture

bird

cowSlide6

Relative importance of objects

Some objects are background

sky

water

mountain

architecture

bird

cowSlide7

Relative importance of objects

Some objects are less salient

sky

water

mountain

architecture

bird

cowSlide8

Relative importance of objects

Some objects are more prominent or perceptually define the scene

sky

water

mountain

architecture

bird

cowSlide9

Our goal

Goal

: Retrieve those images that share important objects with the query image.

versus

How to learn a representation that accounts for this?Slide10

The order in which person assigns tags provides implicit cues about object importance to scene.

Idea: image tags as importance cue

TAGS

CowBirds

ArchitectureWaterSkySlide11

TAGS:CowBirds

Architecture

WaterSky

Idea: image tags as importance cue

Learn this connection to improve cross-modal retrieval and CBIR.

The

order

in which person assigns tags provides implicit cues about object importance to scene.

Then query with

untagged

images to retrieve most relevant images or tags.Slide12

Related

work

Previous work using tagged images focuses on the

noun ↔ object

correspondence.

Duygulu et al. 02

Fergus et al. 05

Li et al., 09

Berg et al. 04

Lavrenko et al. 2003,

Monay & Gatica-Perez 2003, Barnard et al. 2004, Schroff et al. 2007, Gupta & Davis 2008, …

Related work building richer image representations from “two-view”

text+image

data:

Bekkerman

&

Jeon

07,

Qi

et al. 09, Quack et al. 08,

Quattoni

et al 07,

Yakhnenko

&

Honavar

09,…

Gupta et al. 08

height: 6-11 weight: 235 lbs

position:forward

,

croatia

college:

Blaschko

&

Lampert

08

Hardoon

et al. 04Slide13

Approach overview:

Building the image database

Extract visual and tag-based features

Cow

Grass

Horse

Grass

Car

House

Grass

Sky

Learn projections from each feature space into common “semantic space”

Tagged training images

…Slide14

Cow

Tree

Retrieved tag-list

Image-to-image retrieval

Tag-to-image retrieval

Image-to-tag auto annotation

Approach overview:

Retrieval from

the database

Untagged query image

Cow

Tree

Grass

Tag list query

Image

database

Retrieved imagesSlide15

Dual-view semantic space

Visual features and tag-lists are two views generated by the same concept.

Semantic spaceSlide16

Learning mappings to semantic space

Canonical Correlation Analysis (CCA)

:

c

hoose projection directions that maximize the correlation of views projected from same instance.

Semantic space:

new common feature space

View

2

View

1Slide17

Kernel

Canonical Correlation Analysis

Linear CCA

Given paired data:

Select directions so as to maximize:

[

Akaho

2001, Fyfe et al. 2001,

Hardoon

et al. 2004]

Same objective, but projections in kernel space:

,

Kernel CCA

Given pair of kernel functions:

,Slide18

Semantic space

Building the kernels for each view

Word frequency,

rank kernels

Visual kernelsSlide19

Visual

features

captures the HSV color distribution

captures the total scene structure

captures local appearance

(k

-means on

DoG+SIFT)

Color Histogram

Visual WordsGist

[

Torralba

et al.]

Average the component

χ

2

kernels to build a single visual kernel .Slide20

Tag

features

Traditional bag-of-(text)words

Word Frequency

Cow

Bird

Water

Architecture

Mountain

Sky

tag countCow 1Bird 1Water 1Architecture 1

Mountain 1Sky 1Car 0Person 0Slide21

Tag

features

Absolute Rank

Cow

Bird

Water

Architecture

Mountain

SkyAbsolute rank in this image’s tag-list

tag

valueCow 1Bird 0.63Water 0.50Architecture 0.43

Mountain 0.39Sky 0.36Car 0Person 0Slide22

Tag

features

Relative Rank

Cow

Bird

Water

Architecture

Mountain

Skytag

value Cow 0.9 Bird 0.6

Water 0.8 Architecture 0.5 Mountain 0.8 Sky 0.8 Car 0 Person 0

Percentile rank, compared to word’s typical rank in all tag-lists.Slide23

Semantic space

Building the kernels for each view

Word frequency,

rank kernels

Visual kernelsSlide24

Experiments

We compare the retrieval performance of our method with two baselines:

Query image

1

st

retrieved image

Visual-Only Baseline

Query image

1

st

retrieved image

Words+Visual

Baseline

[

Hardoon

et al. 2004,

Yakhenenko

et al. 2009]

KCCA semantic spaceSlide25

We use Normalized Discounted Cumulative Gain at top K (NDCG@K) to evaluate retrieval performance:

Evaluation

Doing well in the top ranks is more important.

Sum of all the scores for the perfect ranking

(normalization)

Reward term

score for p

th

ranked example

[

Kekalainen

&

Jarvelin

, 2002]Slide26

We present the NDCG@K scores using two different reward terms:

Evaluation

scale

presence

relative rank

absolute rank

Object presence/scale

Ordered tag similarity

Cow

Tree

Grass

Person

Cow

Tree

Fence

Grass

Rewards similarity of query’s objects/scales and those in retrieved image(s).

Rewards similarity of query’s ground truth tag ranks and those in retrieved image(s).Slide27

Dataset

LabelMe

6352 images

Database: 3799 images

Query: 2553 images

Scene-oriented

Contains the ordered tag lists via labels added56 unique taggers~23 tags/image

Pascal

9963 imagesDatabase: 5011 imagesQuery: 4952 imagesObject-centralTag lists obtained on Mechanical Turk758 unique taggers~5.5 tags/image Slide28

Image

database

Image-to-image retrieval

We want to retrieve images most similar to the given query image in terms of object importance.

Tag-list kernel space

Visual kernel space

Untagged query image

Retrieved imagesSlide29

Our method

Words

+

Visual

Visual only

Image-to-image retrieval

r

esults

Query ImageSlide30

Image-to-image retrieval

r

esults

Our method

Words

+

Visual

Visual only

Query ImageSlide31

Image-to-image

retrieval

results

Our method better retrieves images that share the query’s important objects, by both measures.

Retrieval accuracy

measured by object+scale similarity

Retrieval accuracymeasured by ordered tag-list similarity

39% improvementSlide32

Tag-to-image retrieval

We want to retrieve the images that are

best described by the given tag list

Image

database

Tag-list kernel space

Visual kernel space

Query tags

Cow

Person

Tree

Grass

Retrieved imagesSlide33

Tag-to-image retrieval

r

esults

Our method better respects the importance cues implied by the user’s keyword query.

31% improvementSlide34

Image-to-tag auto annotation

We want to annotate query image with ordered tags that best describe the scene.

Image

database

Tag-list kernel space

Visual kernel space

Untagged query image

Output tag-lists

Cow

Tree

Grass

Cow

Grass

Field

Cow

FenceSlide35

Image-to-tag auto annotation

results

Boat

Person

Water

Sky

Rock

Bottle

KnifeNapkinLight

fork

PersonTree

CarChairWindow

Tree

Boat

Grass

Water

Person

Method

k=1

k=3

k=5

k=10

Visual-only

0.0826

0.1765

0.2022

0.2095

Word+Visual

0.0818

0.1712

0.1992

0.2097

Ours

0.0901

0.1936

0.2230

0.2335

k = number of nearest neighbors usedSlide36

Woman

Table

Mug

Ladder

Implicit tag cues as localization prior

Mug

Key

KeyboardToothbrushPenPhoto

Post-it

Object detector

Implicit tag features

Computer

Poster

Desk

Screen

Mug

Poster

Training: Learn object-specific connection between localization parameters and implicit tag features.

Mug

Eiffel

Desk

Mug

Office

Mug

Coffee

Testing: Given novel image, localize objects based on both tags and appearance.

P (location, scale | tags)

Implicit tag features

[Hwang &

Grauman

, CVPR 2010]Slide37

Conclusion

We want to learn what is

implied

(beyond objects present) by how a human provides tags for an image

Approach requires minimal supervision to learn the connection between importance conveyed by tags and visual features.

Consistent gains over

content-based visual search tag+visual approach that disregards importance