/
Tight Lower Bounds for Data-Dependent Tight Lower Bounds for Data-Dependent

Tight Lower Bounds for Data-Dependent - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
346 views
Uploaded On 2018-11-04

Tight Lower Bounds for Data-Dependent - PPT Presentation

L ocality S ensitive H ashing Alexandr Andoni Columbia Ilya Razenshteyn MIT CSAIL Near Neighbor Search Dataset points in Goal a data point within from a query Space query time ID: 713557

andoni data partitions indyk data andoni indyk partitions space lsh dependent donnell zhou 2011 distance point 2015 query time

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Tight Lower Bounds for Data-Dependent" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Alexandr

Andoni

(Columbia)

Ilya Razenshteyn

(MIT CSAIL)Slide2

Near Neighbor SearchDataset: points in

,

Goal:

a data point within from a querySpace, query timeExample: , Euclidean distance space timeInfeasible for large (“curse of dimensionality”):Space exponential in the dimensionLots of high-dimensional applications

 Slide3

Given: points in

distance threshold

approximation

Query: a point within from a data pointWant: a data point within from the query Approximate Near

Neighbor Search (ANN)

 

 Slide4

Locality-Sensitive Hashing (LSH)Until recently: the only approach to ANN with

memory and

dependence (Indyk-Motwani 1998)Main idea: random partitions of s.t. closer pairs of points collide more oftenA distribution over partitions is

-sensitive

if for every

:

If

, then

If

, then

 

From the definition of ANN

 

 

 

 

 Slide5

From LSH to ANNEfficient

-sensitive partitions imply

-ANN with

space

and query time

, where

S

ample

partitions; given a query, retrieve data points that collide w.r.t.

all

partitions

Repeat

times to boost success probability

 Slide6

Bounds on LSH

Distance metric

Reference

Euclidean

(

)

1/4

(Andoni-Indyk 2006)

(O’Donnell-Wu-Zhou 2011)

Manhattan, Hamming

(

)

1/2

(Indyk-

Motwani

1998)

(O’Donnell-Wu-Zhou 2011)

Distance metric

Reference

1/4

(Andoni-Indyk 2006)

(O’Donnell-Wu-Zhou 2011)

1/2

(Indyk-

Motwani

1998)

(O’Donnell-Wu-Zhou 2011)

Can one improve upon LSH?

Space

,

query time

 

Yes!

(

Andoni

-

Indyk

-Nguyen-

R

2014

) (

Andoni

-

R

2015)Slide7

How to do better than LSH?Main idea: data-dependent space partitionsA distribution over partitions is

-sensitive

if

for every :If

, then

If

, then

Too strong!

Enough to satisfy for

and

E

xploit the geometry of

to design better partitions

Able to obtain

improvement

for every

(Andoni-Indyk-Nguyen-

R

2014

): slightly bypass the LSH lower bound for large (

Andoni-R 2015)

: significant improvement upon LSH for

all

 Slide8

Bounds on data-dependent LSH

Distance metric

Reference

Euclidean

(

)

1/4

(Andoni-Indyk 2006)

(O’Donnell-Wu-Zhou 2011)

1/7

(

Andoni

-

R

2015)

Manhattan, Hamming

(

)

1/2

(Indyk-

Motwani

1998)

(O’Donnell-Wu-Zhou 2011)

1/3

(

Andoni

-

R

2015)

Distance metric

Reference

1/4

(Andoni-Indyk 2006)

(O’Donnell-Wu-Zhou 2011)

1/7

(

Andoni

-

R

2015)

1/2

(Indyk-

Motwani

1998)

(O’Donnell-Wu-Zhou 2011)

1/3

(

Andoni

-

R

2015)

Space

,

query time

 

Optimal!

Optimal!Slide9

The main resultThe data-dependent space partitions for the Euclidean and Manhattan/Hamming distances from (

Andoni

-

R 2015) are optimal* * After proper formalizationSlide10

Hard instanceHamming distanceDataset: random vectors from

, think

Queries:

flip each bit of some data point with probability Distances:

to the original data point

to other data points

Any

-approximate data structure must recover the original data point

The goal: even if see the dataset in advance,

(implies the lower bound for the Euclidean distance as well)

 Slide11

Fine printVoronoi diagram: a perfect partitionUseless: hard to locate pointsNeed to define what is allowed properly to rule it outSlide12

Formalizing the modelRestricted computational complexity: data structure lower boundsBounded number of parts: can tweak the Voronoi diagram exampleSlide13

The main resultAny data structure for the hard instance with

based on data-dependent LSH:

Either has the exponent

Or the space partitions occupy space

(captures

Voronoi

diagrams)

T

he point location time is proportional to the space a partition occupies (for

all known

constructions)

 

For

better exponent is possible

(Becker-

Ducas

-Gama-

Laarhoven

2016)

 Slide14

Proof outlineImprove (Motwani-Naor-Panigrahy 2007) to get a tight exponent

for

data-independent

partitions of the

hard instanceHypercontractivity estimatesData-dependent

lower boundConcentration and union bound (not too many distinct partitions)

 Slide15

ConclusionsTight data-dependent LSH lower bound(Andoni-Laarhoven-

R

-

Waingarten 2016): for Hamming  SpaceQuery time

Space

Query time

Can prove matching

data-independent

lower bounds for the hard instance (in an appropriate model)

What about

data-dependent?

Questions?