/
Approximate Near Neighbors for General Symmetric Norms Approximate Near Neighbors for General Symmetric Norms

Approximate Near Neighbors for General Symmetric Norms - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
351 views
Uploaded On 2018-11-09

Approximate Near Neighbors for General Symmetric Norms - PPT Presentation

Ilya Razenshteyn MIT CSAIL j oint with Alexandr Andoni Columbia University Aleksandar Nikolov University of Toronto Erik Waingarten Columbia University arXiv161106222 Motivation ID: 724693

norm ann space norms ann norm norms space dimensional symmetric approximation embedding indyk top andoni embed distance metric high

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Approximate Near Neighbors for General S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Approximate Near Neighbors for General Symmetric Norms

Ilya Razenshteyn

(MIT CSAIL)

j

oint with

Alexandr Andoni

(Columbia University)

Aleksandar Nikolov

(University of Toronto)

Erik Waingarten

(Columbia University)

arXiv:1611.06222Slide2

Motivation

Data

Data analysis

Similarity search

Feature vector space + distance function

Geometry / Linear Algebra / Optimization

Nearest Neighbor SearchSlide3

An example

Word

embeddings

High-dimensional vectors that capture semantic similarity between words (and more)

GloVe

[Pennigton,

Socher

, Manning 2014],

400K words, 300 dimensionsTen nearest neighbors for “NYU”?

Y

ale

Harvardgraduate

faculty

undergraduateJuilliard

university

undergraduates

C

ornell

MITSlide4

Approximate Near Neighbors (ANN)

Dataset:

points in a metric space

(denote by

)

A

pproximation

, distance threshold

Query:

such that there is

with

Want:

such that

Parameters: space, query time

 

 

 

 Slide5

FAQ

Q:

why approximation?

A:

the exact case is

hard for the high-dimensional problem.Q:

what does “high-dimensional” mean?A:

when

, where

is the dimension of a metric.

Q:

how is the dimension defined?

A:

a metric is typically defined on

; alternatively, doubling dimension, etc.

 

This talk: a metric on , where  Must depend on as

, ideally as

 

Focus of this talkSlide6

Which distance function to use?

A distance function

Must capture semantic similarity well

Must be algorithmically tractable

Word embeddings, etc.: cosine similarity

The goal: classify metrics according to the complexity of high-dimensional ANN

For theory: a poorly-understood property of a metric

For practice: universal algorithm for ANN

 Slide7

High-dimensional norms

An important case:

is a

normed space

, where

is such that

iff

Lots of tools (

linear

functional analysis)

[Andoni, Krauthgamer,

R

2015]

characterizes norms that allow efficient

sketching

(succinct summarization), which

implies

efficient ANN

A

pproximation

is easy (John’s theorem)

 Slide8

Unit balls

A norm can be given by its unit ball

Claim:

is a symmetric convex body

Claim:

any

such body can be a unit ball

What property of a convex body makes ANN

wrt

it tractable?

John’s theorem:

any symmetric convex body is close to an

ellipsoid (gives approximation

)

 

 

 Slide9

Our result

If

is a

symmetric

normed space, and

, can solve ANN with:

Approximation

Space

Query time

 

 

 

Invariant under permutation of coordinates and changing signsSlide10

Examples

Usual

norms

Top-

norm: sum of

largest absolute values of coordinates

Interpolates between

and

Orlicz

norms: a

unit ball

is

Where

is convex

and non-negative, and

.

Gives

norms for

-support norm, box-

norm,

-functional (arise in probability and machine learning)

 

 

 Slide11

Prior work: symmetric norms

[

Blasiok

, Braverman, Chestnut, Krauthgamer, Yang 2015]

: classification of symmetric norms according to their streaming complexityDepends on how well the norm concentrates on the Euclidean ball

Unlike streaming, ANN is always tractableSlide12

Prior work: ANN

Mostly, focus on

(Hamming/Manhattan) and

(Euclidean) norms

Work for many applications

Allow efficient algorithms based on

hashing

Locality-Sensitive Hashing

[Indyk,

Motwani

1998] [Andoni, Indyk 2006]

Data-dependent LSH[Andoni, Indyk, Nguyen,

R

2014] [Andoni, R

2015]

[Andoni, Laarhoven,

R

, Waingarten 2017]

: tight trade-off between space and query time for

every Few results for other norms (, general , will see later) Slide13

ANN for

 

[Indyk 1998]

ANN for

-dimensional

:

Space

Query time

Approximation

Main idea

: recursive partitioning

“Small” ball with

points ― easy

No such balls ― there is a “good” cut

wrt

some coordinate

[Andoni,

Croitoru

, Patrascu 2008] [Kapralov, Panigrahy

2012]:

Approximation

is tight for decision trees

!

 Slide14

 

 

Metric

embeddings

A map

is an

embedding with distortion

,

if for

:

R

eductions for geometric problems

 

14

a

 

 

 

 

 

ANN with

approximation

for

 

ANN with approximation

for

 Slide15

Embedding norms into

 

For a normed space

and

there exists

with

Proof idea:

Take

all

directions and discretize (more details later)

Can we combine it with ANN for

and obtain ANN for any norm?

No!

Discretization requires

.

Tight even for

.

A

pproximation

.

 Slide16

The strategy

What

Where

Dimension

Any norm

Symmetric norm

What

Where

Dimension

Any norm

Symmetric norm

Bypass non-

embeddability

into low-dimensional

allowing a more complicated host space, which is still tractable

 Slide17

-direct sums of metric spaces

 

For metrics

,

, …,

, define

as follows:

The ground set is

The distance is:

Example:

― cascaded norms

Our

host space:

, where

is

equipped with the top-

norm

Outer sum is of size

Inner sum is of size

 Slide18

Two necessary steps

Embed a symmetric norm into

Solve ANN for

Prior work on ANN via product spaces: for

Frechet

distance

[Indyk 2002]

, edit distance

[Indyk 2004]

, and

Ulam

distance

[Andoni, Indyk, Krauthgamer 2009]

 Slide19

ANN for

 

[Indyk 2002]

,

[Andoni 2009]

: if for

,

, …,

there are data structures for

-ANN, then for

one can get

-ANN

with almost the same time and space

A powerful generalization of ANN for

[Indyk 1998]

Trivially implies ANN for general

Thus, enough to handle ANN for

(top-

norms)!

 Slide20

ANN for top-

norms

 

Include

and

, thus, need a unified approach

Idea:

embed a top-

norm into

and use

[Indyk 1998]

Approximation:

Problem:

requires

-dimensional

Solution:

use

randomized embeddings Slide21

Embedding top-

norm into

 

The case

(that is,

)

Embedding (uses

min-stability

of exponential distribution):

S

ample

i.i.d

.

Embed

Constant

distortion

w.h.p

.

In reality: slightly different parameters

General

:

sample

 Slide22

Detour: ANN for Orlicz

norms

Reminder:

for convex

with

, define a norm whose

unit ball

is

(e.g.,

gives

norms).

Embedding into

(as before,

distortion

w.h.p

.):

Sample

i.i.d

.

Embed

A special case for

norms appeared in

[Andoni 2009]

 Slide23

Where are we?

Can solve ANN for

, where

is

equipped with a top-

norm

What remains to be done?

Embed a

-dimensional symmetric norm into (

-dimensional)

 Slide24

Starting point: embedding

any

norm into

 

For a normed space

and

there

is linear

s.t.

A

normed space

dual

to

:

Dual to

is

where

(

vs.

,

vs.

, etc.).

Claim:

for every

, have:

, where

is an

-net of

(

wrt

)

Immediately gives an embedding

 Slide25

Proof

For every

, have

,

thus,

.

There exists

such that

and

Non-trivial, requires Hahn–

Banach

theorem

Move

to the closest

Get

Thus,

Can take

by the

volume argument

 Slide26

Better embeddings

for symmetric norms

Recap: can’t embed even

into

unless

Instead, aim at embedding

a

symmetric

norm

into

High level idea:

a new space is more forgiving and allows to consider an

-net of

up to a symmetry

Show that there is an

-net that is a result of applying symmetries to merely

vectors!

 Slide27

Exploiting symmetry

For a vector

,

, and

, denote

be

with coordinates permuted according to

and signs flipped according to

Recap:

Suppose that

is an

-net for

intersect

Then,

=

 Slide28

Small nets

What remains to be done: an

-net for

o

f size

Will see a weaker bound of

, still non-trivial

Volume bound fails

Instead, a simple explicit construction

 Slide29

Small nets: continued

Want to approximate a vector

with

Zero all

’s that are smaller than

R

ound all coordinates to a nearest power of

scales

Only cardinality of each scale matters

vectors total

Can be improved to

by one more trick

 Slide30

Quick summary

Embed a symmetric norm into a

-dimensional product space of top-

norms

Use known techniques to reduce the ANN problem on the product space to ANN for the top-

norm

Uses truncated exponential random variables to embed the top-

norm into

and use a known ANN data structure there

 Slide31

Two immediate open questions

Improve the dependence on

from

to

Better

-net for

Looks doable

Improve approximation from

to

Beyond

is hard due to

Need to bypass ANN for product spaces

Maybe

randomized

embedding into low-dimensional

for

any

symmetric

norm?

 Slide32

General norms

Exists an embedding into

with distortion

U

niversal

-dimensional space that can host all

-dimensional

symmetric

norms

Impossible for

general

norms even for

randomized

embeddings

: even distortion

requires dimension

Stronger hardness results?

Implied by: there is a family of spectral expanders that embed with distortion into some -dimensional norm, where is the number of nodes 

[

Naor

2016]Slide33

The main open question

Is there an efficient ANN algorithm for general high-dimensional norms with approximation

?

There is hope…

 

Thanks!