Ilya Razenshteyn MIT CSAIL j oint with Alexandr Andoni Columbia University Aleksandar Nikolov University of Toronto Erik Waingarten Columbia University arXiv161106222 Motivation ID: 724693
Download Presentation The PPT/PDF document "Approximate Near Neighbors for General S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Approximate Near Neighbors for General Symmetric Norms
Ilya Razenshteyn
(MIT CSAIL)
j
oint with
Alexandr Andoni
(Columbia University)
Aleksandar Nikolov
(University of Toronto)
Erik Waingarten
(Columbia University)
arXiv:1611.06222Slide2
Motivation
Data
Data analysis
Similarity search
Feature vector space + distance function
Geometry / Linear Algebra / Optimization
Nearest Neighbor SearchSlide3
An example
Word
embeddings
High-dimensional vectors that capture semantic similarity between words (and more)
GloVe
[Pennigton,
Socher
, Manning 2014],
400K words, 300 dimensionsTen nearest neighbors for “NYU”?
Y
ale
Harvardgraduate
faculty
undergraduateJuilliard
university
undergraduates
C
ornell
MITSlide4
Approximate Near Neighbors (ANN)
Dataset:
points in a metric space
(denote by
)
A
pproximation
, distance threshold
Query:
such that there is
with
Want:
such that
Parameters: space, query time
Slide5
FAQ
Q:
why approximation?
A:
the exact case is
hard for the high-dimensional problem.Q:
what does “high-dimensional” mean?A:
when
, where
is the dimension of a metric.
Q:
how is the dimension defined?
A:
a metric is typically defined on
; alternatively, doubling dimension, etc.
This talk: a metric on , where Must depend on as
, ideally as
Focus of this talkSlide6
Which distance function to use?
A distance function
Must capture semantic similarity well
Must be algorithmically tractable
Word embeddings, etc.: cosine similarity
The goal: classify metrics according to the complexity of high-dimensional ANN
For theory: a poorly-understood property of a metric
For practice: universal algorithm for ANN
Slide7
High-dimensional norms
An important case:
is a
normed space
, where
is such that
iff
Lots of tools (
linear
functional analysis)
[Andoni, Krauthgamer,
R
2015]
characterizes norms that allow efficient
sketching
(succinct summarization), which
implies
efficient ANN
A
pproximation
is easy (John’s theorem)
Slide8
Unit balls
A norm can be given by its unit ball
Claim:
is a symmetric convex body
Claim:
any
such body can be a unit ball
What property of a convex body makes ANN
wrt
it tractable?
John’s theorem:
any symmetric convex body is close to an
ellipsoid (gives approximation
)
Slide9
Our result
If
is a
symmetric
normed space, and
, can solve ANN with:
Approximation
Space
Query time
Invariant under permutation of coordinates and changing signsSlide10
Examples
Usual
norms
Top-
norm: sum of
largest absolute values of coordinates
Interpolates between
and
Orlicz
norms: a
unit ball
is
Where
is convex
and non-negative, and
.
Gives
norms for
-support norm, box-
norm,
-functional (arise in probability and machine learning)
Slide11
Prior work: symmetric norms
[
Blasiok
, Braverman, Chestnut, Krauthgamer, Yang 2015]
: classification of symmetric norms according to their streaming complexityDepends on how well the norm concentrates on the Euclidean ball
Unlike streaming, ANN is always tractableSlide12
Prior work: ANN
Mostly, focus on
(Hamming/Manhattan) and
(Euclidean) norms
Work for many applications
Allow efficient algorithms based on
hashing
Locality-Sensitive Hashing
[Indyk,
Motwani
1998] [Andoni, Indyk 2006]
Data-dependent LSH[Andoni, Indyk, Nguyen,
R
2014] [Andoni, R
2015]
[Andoni, Laarhoven,
R
, Waingarten 2017]
: tight trade-off between space and query time for
every Few results for other norms (, general , will see later) Slide13
ANN for
[Indyk 1998]
ANN for
-dimensional
:
Space
Query time
Approximation
Main idea
: recursive partitioning
“Small” ball with
points ― easy
No such balls ― there is a “good” cut
wrt
some coordinate
[Andoni,
Croitoru
, Patrascu 2008] [Kapralov, Panigrahy
2012]:
Approximation
is tight for decision trees
!
Slide14
Metric
embeddings
A map
is an
embedding with distortion
,
if for
:
R
eductions for geometric problems
14
a
ANN with
approximation
for
ANN with approximation
for
Slide15
Embedding norms into
For a normed space
and
there exists
with
Proof idea:
Take
all
directions and discretize (more details later)
Can we combine it with ANN for
and obtain ANN for any norm?
No!
Discretization requires
.
Tight even for
.
A
pproximation
.
Slide16
The strategy
What
Where
Dimension
Any norm
Symmetric norm
What
Where
Dimension
Any norm
Symmetric norm
Bypass non-
embeddability
into low-dimensional
allowing a more complicated host space, which is still tractable
Slide17
-direct sums of metric spaces
For metrics
,
, …,
, define
as follows:
The ground set is
The distance is:
Example:
― cascaded norms
Our
host space:
, where
is
equipped with the top-
norm
Outer sum is of size
Inner sum is of size
Slide18
Two necessary steps
Embed a symmetric norm into
Solve ANN for
Prior work on ANN via product spaces: for
Frechet
distance
[Indyk 2002]
, edit distance
[Indyk 2004]
, and
Ulam
distance
[Andoni, Indyk, Krauthgamer 2009]
Slide19
ANN for
[Indyk 2002]
,
[Andoni 2009]
: if for
,
, …,
there are data structures for
-ANN, then for
one can get
-ANN
with almost the same time and space
A powerful generalization of ANN for
[Indyk 1998]
Trivially implies ANN for general
Thus, enough to handle ANN for
(top-
norms)!
Slide20
ANN for top-
norms
Include
and
, thus, need a unified approach
Idea:
embed a top-
norm into
and use
[Indyk 1998]
Approximation:
Problem:
requires
-dimensional
Solution:
use
randomized embeddings Slide21
Embedding top-
norm into
The case
(that is,
)
Embedding (uses
min-stability
of exponential distribution):
S
ample
i.i.d
.
Embed
Constant
distortion
w.h.p
.
In reality: slightly different parameters
General
:
sample
Slide22
Detour: ANN for Orlicz
norms
Reminder:
for convex
with
, define a norm whose
unit ball
is
(e.g.,
gives
norms).
Embedding into
(as before,
distortion
w.h.p
.):
Sample
i.i.d
.
Embed
A special case for
norms appeared in
[Andoni 2009]
Slide23
Where are we?
Can solve ANN for
, where
is
equipped with a top-
norm
What remains to be done?
Embed a
-dimensional symmetric norm into (
-dimensional)
Slide24
Starting point: embedding
any
norm into
For a normed space
and
there
is linear
s.t.
A
normed space
dual
to
:
Dual to
is
where
(
vs.
,
vs.
, etc.).
Claim:
for every
, have:
, where
is an
-net of
(
wrt
)
Immediately gives an embedding
Slide25
Proof
For every
, have
,
thus,
.
There exists
such that
and
Non-trivial, requires Hahn–
Banach
theorem
Move
to the closest
Get
Thus,
Can take
by the
volume argument
Slide26
Better embeddings
for symmetric norms
Recap: can’t embed even
into
unless
Instead, aim at embedding
a
symmetric
norm
into
High level idea:
a new space is more forgiving and allows to consider an
-net of
up to a symmetry
Show that there is an
-net that is a result of applying symmetries to merely
vectors!
Slide27
Exploiting symmetry
For a vector
,
, and
, denote
be
with coordinates permuted according to
and signs flipped according to
Recap:
Suppose that
is an
-net for
intersect
Then,
=
Slide28
Small nets
What remains to be done: an
-net for
o
f size
Will see a weaker bound of
, still non-trivial
Volume bound fails
Instead, a simple explicit construction
Slide29
Small nets: continued
Want to approximate a vector
with
Zero all
’s that are smaller than
R
ound all coordinates to a nearest power of
scales
Only cardinality of each scale matters
vectors total
Can be improved to
by one more trick
Slide30
Quick summary
Embed a symmetric norm into a
-dimensional product space of top-
norms
Use known techniques to reduce the ANN problem on the product space to ANN for the top-
norm
Uses truncated exponential random variables to embed the top-
norm into
and use a known ANN data structure there
Slide31
Two immediate open questions
Improve the dependence on
from
to
Better
-net for
Looks doable
Improve approximation from
to
Beyond
is hard due to
Need to bypass ANN for product spaces
Maybe
randomized
embedding into low-dimensional
for
any
symmetric
norm?
Slide32
General norms
Exists an embedding into
with distortion
U
niversal
-dimensional space that can host all
-dimensional
symmetric
norms
Impossible for
general
norms even for
randomized
embeddings
: even distortion
requires dimension
Stronger hardness results?
Implied by: there is a family of spectral expanders that embed with distortion into some -dimensional norm, where is the number of nodes
[
Naor
2016]Slide33
The main open question
Is there an efficient ANN algorithm for general high-dimensional norms with approximation
?
There is hope…
Thanks!