Kai Li GuoJun Qi Jun Ye Tuoerhongjiang Yusuph Kien A Hua Department of Computer Science University of Central Florida ISM 2016 Presented by Tuoerhongjiang Yusuph Introduction Massive amount of highdimensional data high computational costs ID: 675937
Download Presentation The PPT/PDF document "Supervised ranking hash for semantic sim..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Supervised ranking hash for semantic similarity search
Kai Li, Guo-Jun Qi, Jun Ye, Tuoerhongjiang Yusuph, Kien A. HuaDepartment of Computer ScienceUniversity of Central FloridaISM 2016Presented by Tuoerhongjiang YusuphSlide2
Introduction
Massive amount of high-dimensional data, high computational costs …
3.8 trillion images by 2010 !Slide3
Similarity Search Challenges
Semantic gap.
There is a huge semantic gap between the feature representations and the semantic concept.
High dimensionality.
Traditional space partitioning-based indexing techniques would fail due to curse of dimensionality
Massive storage.
The high dimensional floating vectors are storage savvy.
Slow search.
Distance computation is very slow, and exhaustive search is not scalable
Query
Slide4
What IS Hashing
Learning-based hashing are proposed lately to for large-scale similarity search. The purpose is to learn compact hash codes that preserve the certain similarity measures.Unsupervised hashing. The learned similarities are w.r.t. Euclidean distances of feature vectors.Supervised hashing. The similarities are defined with respect to semantic labels.
11011
10011
01001
01011
…
…
…Slide5
Why Supervised Hashing
Similarity computation based on Hamming distance is fastBinary codes can be used to build lookup tables for constant time searchCompact binary codes cost way less storagee.g. 1 billion 64-bit hash codes cost only 8GB, and can be fit into RAM
1000000000
1000000001
…
1010100100…1010100101
Hash Table
h(x)
Query
Hash Function
1010100101Slide6
RelAted Work
Supervised hashing receives lots of attention latelyKSH: (Liu et al., 2012) Supervised Hashing with KernelsTSH: (Lin et al., 2013) Two-Step HashingLFH: (Zhang et al., 2014) Supervised Hashing with Latent Factor ModelsSDH: (Shen et al., 2015) Supervised Discrete HashingFastHash: (Lin et al., 2015) Fast Hashing with Boosted Decision Trees and Graph Cuts
COSDISH
: (Kang et al. 2016) Column Sampling based Discrete Supervised Hashing
and more
…Slide7
Existing Method
In general, existing hashing algorithms follow two stepsStep 1: Learn model coefficients by minimizing some pairwise errors w.r.t. to ground-truth similarity labelsStep 2: Threshold linear/nonlinear feature projections with
Different hashing algorithms usually differ in the first step, where different objective functions are used
The second step, i.e. the form of hash function
are the same.
Slide8
Motivation
We explore a new family of hash functions based on features’ relative ranking orders
Here,
defines a linear subspace for ranking projected features.
A special case of such ranking-based hashing schemes has been explored in Winner-Take-All Hash (WTA) (
Yagnik
et al. 2011)
Slide9
Revisiting WTA Hash
WTA is an ordinal feature embedding using partial order statistics
Resilient to numeric perturbations, scaling, constant offset
Non-linear feature embedding
Limitations
WTA is data-agnostic and requires long codes to get good performance
WTA is a special case of the ranking-hash function when
is restricted to axis-aligned directions and generated through random permutations
Slide10
Optimization Problem
The objective is to minimize
The pair-wise error is defined as
Here
and
are the hash codes obtained by using the ranking-based hash function for a pair of samples
,
is the similarity label, and
is a trade-off parameter
Slide11
Optimization
The argmax term makes direct optimization very hard. In order to solve it, we first reformulate the hash function in matrix form
= 1
The constraints for
enforce a 1-of-K coding scheme to select the maximum projected entry, which is equivalent to the previous definition.
Slide12
Optimization
The objective function can also be reformulated accordinglyHere is the hash code matrix, and
is a matrix whose entries are
Slide13
Optimization
Then the problem boils down to minimizing the following continuous functionThis problem can be solved by using stochastic gradient descent methods with the update ruleWhere
is the learning rate,
and the gradient can be computed as
Slide14
Optimization
The problem is still hard to solve due to the constraints on ,We propose to approximate the ranking-based hash function with the softmax functionHere,
is the softmax vector defined as
Note: when the maximum entry of
is sufficiently large, the approximation becomes equivalence
. Slide15
Hash function learning
The previous algorithm learns one hash function at a time. We use Adaboost to learn hash functions sequentially
assign equal weights
to each training pair
Increase/decrease weights for correct/wrong pairs
The update rule involving the pairwise weights is simply
Slide16
Experiment
DatasetLabelme22000 images with semantic affinity matrixImages represented as 512 Gist vectorsPeekaboom60000 images with semantic affinity matrixImages represented as 512 Gist vectorsNUS-WIDE186,577 image with class labelImage represented as 500-D bag-of-visual words (BOVW)
Baselines
KSH, TSH, LFH, SDH,
FastHash
, COSDISH (Refer to Page 6)Slide17
Experiment
Performance MetricsTop-k precisionProportion of ground-truth neighbors in the k nearest neighbors based on Hamming distance Precision-recallPrecision different recall levelsMean Average Precision (mAP)First compute average precision of each query as the area under the precision-recall curveCompute mAP by as the average area over the query setSlide18
Experiment
Results of mean Average Precision (mAP) Slide19
Experiment
More results on Labelme and PeekaboomSlide20
Experiment
More results on NUS-WIDESlide21
Conclusion and Future Work
Key ContributionsThe first supervised hashing scheme to exploit ranking-based hash functionEffective approximation leads to a continuous problem which can be solved efficientlySuperior semantic similarity search performance on real-world datasetsFuture WorkExtend to kernel subspace space ranking Incorporate feature learning stages and develop a deep ranking framework