/
Supervised ranking hash for semantic similarity search Supervised ranking hash for semantic similarity search

Supervised ranking hash for semantic similarity search - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
345 views
Uploaded On 2018-09-22

Supervised ranking hash for semantic similarity search - PPT Presentation

Kai Li GuoJun Qi Jun Ye Tuoerhongjiang Yusuph Kien A Hua Department of Computer Science University of Central Florida ISM 2016 Presented by Tuoerhongjiang Yusuph Introduction Massive amount of highdimensional data high computational costs ID: 675937

hashing hash based ranking hash hashing ranking based function supervised semantic similarity feature codes optimization search experiment precision wta

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Supervised ranking hash for semantic sim..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Supervised ranking hash for semantic similarity search

Kai Li, Guo-Jun Qi, Jun Ye, Tuoerhongjiang Yusuph, Kien A. HuaDepartment of Computer ScienceUniversity of Central FloridaISM 2016Presented by Tuoerhongjiang YusuphSlide2

Introduction

Massive amount of high-dimensional data, high computational costs …

3.8 trillion images by 2010 !Slide3

Similarity Search Challenges

Semantic gap.

There is a huge semantic gap between the feature representations and the semantic concept.

High dimensionality.

Traditional space partitioning-based indexing techniques would fail due to curse of dimensionality

Massive storage.

The high dimensional floating vectors are storage savvy.

Slow search.

Distance computation is very slow, and exhaustive search is not scalable

Query

 Slide4

What IS Hashing

Learning-based hashing are proposed lately to for large-scale similarity search. The purpose is to learn compact hash codes that preserve the certain similarity measures.Unsupervised hashing. The learned similarities are w.r.t. Euclidean distances of feature vectors.Supervised hashing. The similarities are defined with respect to semantic labels.

11011

10011

01001

01011

…Slide5

Why Supervised Hashing

Similarity computation based on Hamming distance is fastBinary codes can be used to build lookup tables for constant time searchCompact binary codes cost way less storagee.g. 1 billion 64-bit hash codes cost only 8GB, and can be fit into RAM

1000000000

1000000001

1010100100…1010100101

Hash Table

h(x)

Query

Hash Function

1010100101Slide6

RelAted Work

Supervised hashing receives lots of attention latelyKSH: (Liu et al., 2012) Supervised Hashing with KernelsTSH: (Lin et al., 2013) Two-Step HashingLFH: (Zhang et al., 2014) Supervised Hashing with Latent Factor ModelsSDH: (Shen et al., 2015) Supervised Discrete HashingFastHash: (Lin et al., 2015) Fast Hashing with Boosted Decision Trees and Graph Cuts

COSDISH

: (Kang et al. 2016) Column Sampling based Discrete Supervised Hashing

and more

…Slide7

Existing Method

In general, existing hashing algorithms follow two stepsStep 1: Learn model coefficients by minimizing some pairwise errors w.r.t. to ground-truth similarity labelsStep 2: Threshold linear/nonlinear feature projections with

Different hashing algorithms usually differ in the first step, where different objective functions are used

The second step, i.e. the form of hash function

are the same.

 Slide8

Motivation

We explore a new family of hash functions based on features’ relative ranking orders

Here,

defines a linear subspace for ranking projected features.

A special case of such ranking-based hashing schemes has been explored in Winner-Take-All Hash (WTA) (

Yagnik

et al. 2011)

 Slide9

Revisiting WTA Hash

WTA is an ordinal feature embedding using partial order statistics

Resilient to numeric perturbations, scaling, constant offset

Non-linear feature embedding

Limitations

WTA is data-agnostic and requires long codes to get good performance

WTA is a special case of the ranking-hash function when

is restricted to axis-aligned directions and generated through random permutations

 Slide10

Optimization Problem

The objective is to minimize

The pair-wise error is defined as

Here

and

are the hash codes obtained by using the ranking-based hash function for a pair of samples

,

is the similarity label, and

is a trade-off parameter

 Slide11

Optimization

The argmax term makes direct optimization very hard. In order to solve it, we first reformulate the hash function in matrix form

= 1

The constraints for

enforce a 1-of-K coding scheme to select the maximum projected entry, which is equivalent to the previous definition.

 Slide12

Optimization

The objective function can also be reformulated accordinglyHere is the hash code matrix, and

is a matrix whose entries are

 Slide13

Optimization

Then the problem boils down to minimizing the following continuous functionThis problem can be solved by using stochastic gradient descent methods with the update ruleWhere

is the learning rate,

and the gradient can be computed as

 

 Slide14

Optimization

The problem is still hard to solve due to the constraints on ,We propose to approximate the ranking-based hash function with the softmax functionHere,

is the softmax vector defined as

Note: when the maximum entry of

is sufficiently large, the approximation becomes equivalence

. Slide15

Hash function learning

The previous algorithm learns one hash function at a time. We use Adaboost to learn hash functions sequentially

assign equal weights

to each training pair

Increase/decrease weights for correct/wrong pairs

The update rule involving the pairwise weights is simply

 Slide16

Experiment

DatasetLabelme22000 images with semantic affinity matrixImages represented as 512 Gist vectorsPeekaboom60000 images with semantic affinity matrixImages represented as 512 Gist vectorsNUS-WIDE186,577 image with class labelImage represented as 500-D bag-of-visual words (BOVW)

Baselines

KSH, TSH, LFH, SDH,

FastHash

, COSDISH (Refer to Page 6)Slide17

Experiment

Performance MetricsTop-k precisionProportion of ground-truth neighbors in the k nearest neighbors based on Hamming distance Precision-recallPrecision different recall levelsMean Average Precision (mAP)First compute average precision of each query as the area under the precision-recall curveCompute mAP by as the average area over the query setSlide18

Experiment

Results of mean Average Precision (mAP) Slide19

Experiment

More results on Labelme and PeekaboomSlide20

Experiment

More results on NUS-WIDESlide21

Conclusion and Future Work

Key ContributionsThe first supervised hashing scheme to exploit ranking-based hash functionEffective approximation leads to a continuous problem which can be solved efficientlySuperior semantic similarity search performance on real-world datasetsFuture WorkExtend to kernel subspace space ranking Incorporate feature learning stages and develop a deep ranking framework