Reading Between The Lines: Object Localization PowerPoint Presentation, PPT - DocSlides

Reading Between The Lines: Object Localization PowerPoint Presentation, PPT - DocSlides

2017-11-05 42K 42 0 0

Description

Using Implicit Cues from Image Tags. Sung . Ju. Hwang and Kristen . Grauman. University . of Texas at . Austin. Jingnan. Li. Ievgeniia. . Gutenko. Baby. Infant. Kid. Child. Headphones. Red. Cute. Laughing. ID: 602857

Direct Link: Embed code:

Download this presentation

DownloadNote - The PPT/PDF document "Reading Between The Lines: Object Locali..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentations text content in Reading Between The Lines: Object Localization

Slide1

Reading Between The Lines: Object LocalizationUsing Implicit Cues from Image Tags

Sung

Ju

Hwang and Kristen

Grauman

University

of Texas at

Austin

Jingnan

Li

Ievgeniia

Gutenko

Slide2

Baby

InfantKidChildHeadphonesRedCuteLaughing

Boy

DogGrassBlueSkyPuppyRiverStreamSunColoradoNikon

Slide3

Weakly labeled images

Lamp

Chair

Painting

Table

Lamp

Chair

BabyTableChair

Bicycle

Person

Slide4

Object detection approaches

Sliding window object detector.Reduce the number of windows scanned.Improve accuracy based on

Prime detector based on context

CascadesBranch-and-bound

- inter-object co-occurrence (occlusions)

- spatial relationships (on, to the left, to the right)

Slide5

Object detection approaches

Prioritize search windows within image, based on learned distribution of tags for speed.Combine the models based on both tags + images for accuracy.

Sliding window object detector

Need to reduce the # of windows scannedAppearance-based detector

Slide6

Motivation

Idea:

What can be predicted from the image before even looking at it and only with given tags?

Both sets of tags suggest that mug appears on the image, but when considering that set of tags is based on what “catches they eye” first,

then the area that object detector has to search can be narrowed.

Slide7

Implicit Tag Feature Definitions

What implicit features can be obtained from tags?Relative prominence of each object based on the order in the list.Scale cues implied by unnamed objects.The rough layout and proximity between objects based on the sequence in which tags are given.

Slide8

Implicit Tag Feature Definitions

Word presence and absence – bag-of-words representationwi denotes the number of times that tag-word i occurs in that image’s associated list of keywords for a vocabulary of N total possible wordsFor most tag lists, this vector will consist of only binary entries saying whether each tag has been named or not

Slide9

Implicit Tag Feature Definitions

Tag rank – prominence of each object: certain things will be named before othersri denotes the percentile ranks observed in the training data for that word (for entire vocabulary)Some objects have context-independent “noticeability”—such as baby or fire truck—often named first regardless of their scale or position.

Slide10

Implicit Tag Feature Definitions

Mutual tag proximity - tagger will name prominent objects first, then move his/her eyes to some other objects nearbypi,j denotes the (signed) rank difference between tag words i and j for the given image.The entry is 0 when the pair is not present.

Slide11

Modeling the localization distributions

Relate defined tag-based features to the object detection (or combination)Model conditional probability density that the window contains the object of interest, given only these image tags: - the target object category.

Slide12

Modeling the localization distributions

Use mixture of Gaussians model: - parameters of the mixture model obtained by trained Mixture Density Network (MDN)Training: Classification: Novel image with no BBoxes.

Computer

Bicycle

Chair

MDN provides the mixture

model representing most

likely

locations for the

target

object.

Slide13

The top 30 most likely places for a car sampled according

to modeled distribution based only on tags of the images.

Slide14

Modulating or Priming the detector

Use from the previous step and: Combine with predictions with object detector based on appearance , A – appearance cues: HOG: Part-based detector (deformable part model) Use the model to rank sub-windows and run the detector on most probable locations only (“priming”).Decision value of detectors is mapped to probability:

Slide15

Modulating the detector

Balance appearance and tag-based predictions:Use all tags cues:Learn the weights w using detection scores for true detections and a number of randomly sampled windows from the background.Can add Gist descriptor to compare against global scene visual context. Goal: improve accuracy.

Slide16

Priming the detector

Prioritize the search windows according toAssumption that object is present, and only localization parameters (x,y,s) have to be estimated.Stop search when confident detection is foundConfidence ( >0.5)Goal: improve efficiency.

Slide17

Results

DatasetsLabelMe - use the HOG detectorPASCAL - use the part-based detectorNote: Last three columns show the ranges of positions/scales present in the images, averaged per class, as a percentage of image size.

L

P

Slide18

LabelMe Dataset

Priming Object Search: Increasing Speed

For a detection rate of 0.6,

proposed

method considers only 1/3 of those scanned by the sliding

window approach.

Modulating the Detector: Increasing

Accuracy

The proposed

features make noticeable improvements in accuracy over the raw

detector.

Slide19

Example detections on LabelMe

Each image shows the best detection

found.

Scores denote overlap ratio with ground

truth.

The detectors modulated according to the visual or tag-based context are more

accurate.

Slide20

PASCAL Dataset

Priming Object Search: Increasing Speed Adopt the Latent SVM (LSVM) part-based windowed detector, faster here than the HOG’s was on LabelMe.Modulating the Detector: Increasing Accuracy Augmenting the LSVM detector with the tag features noticeably improves accuracy—increasing the average precision by 9.2% overall.

Slide21

Example detections on PASCAL VOC

Red dotted boxes denote most confident detections according to the raw detector (LSVM)Green solid boxes denote most confident detections when modulated by our method (LSVM + tags)The first two rows show good results, and third row shows failure cases

Slide22

Conclusions

Novel approach to use information “between the lines” of tags.

Utilizing this implicit tag information helps to make search faster and more accurate.

The method complements and even exceeds performance of the methods using visual cues.

Shows potential for learning tendencies of real taggers.

Slide23

Thank you!

Slide24


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.