ℓ p spaces 2ltplt via embeddings Yair Bartal LeeAd Gottlieb Hebrew U Ariel University Nearest neighbor search Problem definition Given a set of points S preprocess S so that the following query can be answered efficiently ID: 784166 Download

Exact Nearest Neighbor Algorithms Sabermetrics One of the best players ever .310 batting average 3,465 hits 260 home runs 1,311 RBIs 14x All-star 5x World Series winner Who is the next Derek Jeter? Derek Jeter

Neighbor. Search with Keywords. Abstract. Conventional spatial queries, such as range search and nearest . neighbor. retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest .

In the first part we survey a family of nearest neighbor algorithms that are based on the concept of locality sensitive hashing Many of these algorithm have already been successfully applied in a variety of practical scenarios In the second part of

Jie Bao Chi-Yin Chow Mohamed F. Mokbel. Department of Computer Science and Engineering. University of Minnesota – Twin Cities. Wei-Shinn Ku. Department of Computer Science and Software Engineering.

Christian Cosgrove. Kelly. Li. Rebecca. Lin. Shree . Nadkarni. Samanvit. . Vijapur. Priscilla. Wong. Yanjun. Yang. Kate Yuan. Daniel Zheng. Drew . University. New . Jersey Governor’s School in the Sciences.

LECTURE 10. Classification. . k-nearest neighbor classifier. . Naïve Bayes. . Logistic Regression. . Support Vector Machines. NEAREST NEIGHBOR CLASSIFICATION. Instance-Based Classifiers. Store the training records .

Condensing Techniques. Nearest Neighbor Revisited. Condensing Techniques. Proximity Graphs and Decision Boundaries. Editing Techniques . Organization. Last updated: . Nov. . 7, . 2013. Nearest Neighbour Rule.

Torsional. Potentials Of . Regioregular. Poly (3-methyl . Thiophene. ) . Oligomers. Ram S. . Bhatta. . and David S. Perry. Department of Chemistry. The University of Akron, OH 44325-3601. n. Motivation.

Nearest . Neighbor Method . for Pattern . Recognition. This lecture notes is based on the following paper:. B. . Tang and H. He, "ENN: Extended Nearest Neighbor Method for . Pattern Recognition. ," .

Josef . Sivic. http://. www.di.ens.fr. /~josef. INRIA, . WILLOW, ENS/INRIA/CNRS UMR 8548. Laboratoire. . d’Informatique. , . Ecole. . Normale. . Supérieure. , Paris. With slides from: O. Chum, K. .

ℓ. p. –spaces (2<p<∞) via . embeddings. Yair. . Bartal. . Lee-Ad Gottlieb Hebrew U. Ariel University. Nearest neighbor search. Problem definition:. Given a set of points S, preprocess S so that the following query can be answered efficiently:.

Embed :

Download Link

Download - The PPT/PDF document "Approximate nearest neighbor for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Slide1

Approximate nearest neighbor for ℓp–spaces (2<p<∞) via embeddings

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Slide2Nearest neighbor searchProblem definition:

Given a set of points S, preprocess S so that the following query can be answered efficiently:Exact: NNS - Given query point q, what is the closest point to q in S?Approximate: ANN - Given query point q,

find a point x in S whose distance from q is within some approximation factor of the closest ?polynomial

space, polylog queryo(log n) approximation

q

Slide3Approx. nearest neighbor searchNot possible in general metrics

More restrictive spaces?Good news! Euclidean spaceNormed spaces

Slide4Lp Normed SpacesNorms: Recall that for d-dimensional vectors

x,y ǁx-yǁp = (|x1-y1|p+…+|

xd-yd|p

)1/p

l

1l2l∞

Slide5Approx. Nearest neighbor searchAn

efficient ANN structure featurespolynomial space polylog query time o(log n), o(d) approximation

Efficient ANN structures exist for

Euclidean (p=2): (1+ɛ)-ANN [IM’98, KOR ‘98] Reduce dimension via JL, brute force in lower dimension.

ℓ∞: O(log log d)-ANN [Indyk

‘98] What about other norms?1≤p<2: (1+ɛ)-ANN same as Euclidean 2<p<∞: ? (previous – Andoni) subject of this paper

Slide6Summary of resultsCombine two algorithms

for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p

) -ANNNew result: 2O(p) –ANN

Analysis: Equality at p = (

logloglog d) + (loglogdn)1/2Worse case approximation:

(loglogd) exp((loglogdn)1/2)Andoni better for larger values, New for smallerImproved bounds in metrics of low doubling dimension

Slide7EmbeddingsAn embedding

of set X into Y with distortion D is a mapping f : X → Y such that for all x, y ∈ X:1 ≤ c・d

Y(f(x),f(y)) / dX(

x,y) ≤ Dwhere c is any scaling constant

Relaxed: one side may be preserved with constant probabilityIf an embedding is non-expansive

and has small contraction:The nearest neighbor stays close, and far points are still relatively farIf an embedding is non-contractive and has small expansion:The nearest neighbor is only a little bit farther away, and far points remain farqq

Slide8Andoni’s algorithmBasic idea: [Andoni

‘09]Embed ℓp space into ℓ∞Run Indyk’s ANN algorithm for ℓ∞Embedding using

Frechet random variablesmax-stable distribution…

Slide9Frechet distributionFor random variable X ∼

Frechet Pr[X < x] = e-x-p for x>0 Max-stability:Let random variables

X and Z1, . . . ,Zd

be ∼ Frechetlet v = (v1, . . . ,

vd) be a non-negative valued vector. Then the random variable

Y := maxi viZi is distributed as ǁvǁp ・X (Y ∼ ǁvǁp・X).

Slide10Frechet distributionProof of Max-stability:

Recall Y := maxi viZi Pr[Y ≤ x] =

Pr[maxi viZi

≤ x] = Πi Pr[

viZi ≤ x] =

Πi Pr[Zi ≤ x/vi] = Πi e−(vi/x)p = e−(∑ivip)/xp = e−(ǁvǁp/x)pSimilarly, Pr[ǁvǁp・X ≤ x] = Pr[X ≤ x/ǁvǁp] = e−(ǁvǁp /x)p

Slide11Review of Andoni’s embedding

Define embedding fb : V → ℓ∞ (b > 0): Draw Frechet

random variables Z1, . . . ,Zd. fb

maps v= (v1, . . . ,

vd) to (v1bZ1

, . . . ,vdbZd)The resulting set is V′ ∈ ℓ∞.Theorem: Set b = (3 ln n)1/p. Then fb satisfiesNon-contractive (for all points) with prob. > 1−1/nExpansion: For any u,w∈ V, with constant prob. ǁfb(u) − fb (w)ǁ∞ ≤ b ǁu − wǁp Expansion guarantee needed for only one inter-point distance: between the query point and nearest neighbor.

Slide12Analysis of Andoni’s embedding

Theorem: Set b = (3 ln n)1/p. Then fb satisfies

Non-contractive (all points) with prob. > 1−1/nExpansion: For any u,w∈ V with constant prob.

ǁfb(u

) − fb (w) ǁ

∞ ≤ b ǁu − wǁp Proof of contraction: Take v with ǁvǁp = 1. By max-stability, ǁfb(v)ǁ∞ ∼ bǁvǁp・X = b・X, By definition of Frechet distribution, Pr[ǁfb (v)ǁ∞ < 1] = Pr[b ・X < 1] = e−(1/b)−p = n-3 . Since the embedding is linear, v may be taken to be any inter-point distance between two vectors in V, so the probability that any of the n2 inter-point distances decreases is less than n2・ n-3 = 1/n. Proof of expansion:Same approach, Pr[ǁfb(v)ǁ∞ ≤ b] = Pr[b ・X < b] = Pr[X < 1] = e−1So expansion bounded by b.

Slide13SummaryEmbed ℓp space into ℓ

∞distortion O(b) = O(ln n)1/p Run Indyk’s ANN algorithm for ℓ∞O(loglogd

)-ANNFinal guaranteeO(loglogd

ln1/pn)-ANN

Slide14An improvementWe can improve the guarantees of Andoni’s algorithm by considering the

doubling dimension of the space.Doubling constant: number of half-radius balls necessary to cover big ball.Doubling dimension: log(doubling constant)For example, d-dimensional Euclidean space has doubling dimension Ѳ(d)

1

2

3

46578

Slide15Improvement outlineNearest neighbor search can be reduced to a series of subproblems

Searches on spaces with small aspect ratioSo we can take a net on the subspaces, and run Andoni’s algorithm on the nets insteadSize of net: ddimO(ddim

)Approximation:Andoni: O(log

log d (logdn)1/p)

Improved: O(log log d

(ddim logdddim)1/p)

Slide16New algorithmBasic idea: Embed ℓp

space into ℓ2Run ANN algorithm for ℓ2Embedding using the Mazur map

Slide17Mazur mapMazur map is a mapping from ℓp

to ℓq, for any 0 < p, q < ∞. The mapping of vector v ∈ ℓp is defined as M(v) = (|v

0|p/q, |v1|p/q

, . . . , |vm−1|p/q

)For set V, let C satisfy C ≥ ǁvǁp, for all ∈

V. Our embedding f is the Mazur map from ℓp to ℓ2, scaled down by a factor (p/2) C p/2 – 1f is non-expansive. Contraction: If ǁx − yǁp = u, then ǁf(x) − f(y)ǁp ≥ 2p-1 (2C)1−p/2 up/2 [Binyamini & Lindenstrauss ‘00]

Slide18ANN via the Mazur mapThe distortion of our embedding is large depends on the diameter C of the space:

2p-1 (2C)1−p/2up/2But we can show that this guarantee is sufficient to solve a specific case of nearest neighbor in ℓ

p: the c-bounded nearest neighbor problem.

Slide19C-bounded nearest neighborDefine the c-bounded near neighbor problem where c ≥

ǁvǁp for all v ∈ VIf there is a point in V within distance 1 of query q, return it or some point in V within distance c/9

of q. This is a c/9-ANN.If there is no point in V within distance 1 of query q, return null

or some point in V within distance c/9 of q.

Slide20C-bounded nearest neighborApproximately solve the c-bounded nearest neighbor problem in ℓp

, for c=p18p/2Embed from ℓp to ℓ2Compute a 2-ANN in ℓ2

Analysis: the Mazur map ensures that inter-point distances of c/9

or greater map to at least 2p-1 (2c) 1

−p/2 (c/9) p/2 = 4. If

q possesses a neighbor in the original space at distance 1 or less, the 2-ANN finds a neighbor at distance 2 in the embedded space and less than c/9 in the origin space.qq

Slide21C-bounded nearest neighborWe can show that the C-bounded nearest neighbor problem can be used to give a c-ANN for the regular (unbound problem).

Final result: 2O(p)-ANN

Slide22ConclusionCombine two algorithms

for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p

) -ANNNew result: 2O(p) –ANN

Worse case approximation:

(loglogd) exp

((loglogdn)1/2)Improved bounds in metrics of low doubling dimension

© 2021 docslides.com Inc.

All rights reserved.