ℓ p spaces 2ltplt via embeddings Yair Bartal LeeAd Gottlieb Hebrew U Ariel University Nearest neighbor search Problem definition Given a set of points S preprocess S so that the following query can be answered efficiently ID: 784166 Download
Exact Nearest Neighbor Algorithms Sabermetrics One of the best players ever .310 batting average 3,465 hits 260 home runs 1,311 RBIs 14x All-star 5x World Series winner Who is the next Derek Jeter? Derek Jeter
Neighbor. Search with Keywords. Abstract. Conventional spatial queries, such as range search and nearest . neighbor. retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest .
In the first part we survey a family of nearest neighbor algorithms that are based on the concept of locality sensitive hashing Many of these algorithm have already been successfully applied in a variety of practical scenarios In the second part of
Christian Cosgrove. Kelly. Li. Rebecca. Lin. Shree . Nadkarni. Samanvit. . Vijapur. Priscilla. Wong. Yanjun. Yang. Kate Yuan. Daniel Zheng. Drew . University. New . Jersey Governor’s School in the Sciences.
Torsional. Potentials Of . Regioregular. Poly (3-methyl . Thiophene. ) . Oligomers. Ram S. . Bhatta. . and David S. Perry. Department of Chemistry. The University of Akron, OH 44325-3601. n. Motivation.
ℓ. p. –spaces (2<p<∞) via . embeddings. Yair. . Bartal. . Lee-Ad Gottlieb Hebrew U. Ariel University. Nearest neighbor search. Problem definition:. Given a set of points S, preprocess S so that the following query can be answered efficiently:.
Download - The PPT/PDF document "Approximate nearest neighbor for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Approximate nearest neighbor for
Presentation on theme: "Approximate nearest neighbor for"— Presentation transcript:
Approximate nearest neighbor for ℓp–spaces (2<p<∞) via embeddings
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Nearest neighbor searchProblem definition:
Given a set of points S, preprocess S so that the following query can be answered efficiently:Exact: NNS - Given query point q, what is the closest point to q in S?Approximate: ANN - Given query point q,
find a point x in S whose distance from q is within some approximation factor of the closest ?polynomial
space, polylog queryo(log n) approximation
Approx. nearest neighbor searchNot possible in general metrics
More restrictive spaces?Good news! Euclidean spaceNormed spaces
Lp Normed SpacesNorms: Recall that for d-dimensional vectors
x,y ǁx-yǁp = (|x1-y1|p+…+|
Approx. Nearest neighbor searchAn
efficient ANN structure featurespolynomial space polylog query time o(log n), o(d) approximation
Efficient ANN structures exist for
Euclidean (p=2): (1+ɛ)-ANN [IM’98, KOR ‘98] Reduce dimension via JL, brute force in lower dimension.
ℓ∞: O(log log d)-ANN [Indyk
‘98] What about other norms?1≤p<2: (1+ɛ)-ANN same as Euclidean 2<p<∞: ? (previous – Andoni) subject of this paper
Summary of resultsCombine two algorithms
for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p
) -ANNNew result: 2O(p) –ANN
Analysis: Equality at p = (
logloglog d) + (loglogdn)1/2Worse case approximation:
(loglogd) exp((loglogdn)1/2)Andoni better for larger values, New for smallerImproved bounds in metrics of low doubling dimension
of set X into Y with distortion D is a mapping f : X → Y such that for all x, y ∈ X:1 ≤ c・d
Y(f(x),f(y)) / dX(
x,y) ≤ Dwhere c is any scaling constant
Relaxed: one side may be preserved with constant probabilityIf an embedding is non-expansive
and has small contraction:The nearest neighbor stays close, and far points are still relatively farIf an embedding is non-contractive and has small expansion:The nearest neighbor is only a little bit farther away, and far points remain farqq
Andoni’s algorithmBasic idea: [Andoni
‘09]Embed ℓp space into ℓ∞Run Indyk’s ANN algorithm for ℓ∞Embedding using
Frechet random variablesmax-stable distribution…
Frechet distributionFor random variable X ∼
Frechet Pr[X < x] = e-x-p for x>0 Max-stability:Let random variables
X and Z1, . . . ,Zd
be ∼ Frechetlet v = (v1, . . . ,
vd) be a non-negative valued vector. Then the random variable
Y := maxi viZi is distributed as ǁvǁp ・X (Y ∼ ǁvǁp・X).
, . . . ,vdbZd)The resulting set is V′ ∈ ℓ∞.Theorem: Set b = (3 ln n)1/p. Then fb satisfiesNon-contractive (for all points) with prob. > 1−1/nExpansion: For any u,w∈ V, with constant prob. ǁfb(u) − fb (w)ǁ∞ ≤ b ǁu − wǁp Expansion guarantee needed for only one inter-point distance: between the query point and nearest neighbor.
Analysis of Andoni’s embedding
Theorem: Set b = (3 ln n)1/p. Then fb satisfies
Non-contractive (all points) with prob. > 1−1/nExpansion: For any u,w∈ V with constant prob.
) − fb (w) ǁ
∞ ≤ b ǁu − wǁp Proof of contraction: Take v with ǁvǁp = 1. By max-stability, ǁfb(v)ǁ∞ ∼ bǁvǁp・X = b・X, By definition of Frechet distribution, Pr[ǁfb (v)ǁ∞ < 1] = Pr[b ・X < 1] = e−(1/b)−p = n-3 . Since the embedding is linear, v may be taken to be any inter-point distance between two vectors in V, so the probability that any of the n2 inter-point distances decreases is less than n2・ n-3 = 1/n. Proof of expansion:Same approach, Pr[ǁfb(v)ǁ∞ ≤ b] = Pr[b ・X < b] = Pr[X < 1] = e−1So expansion bounded by b.
SummaryEmbed ℓp space into ℓ
∞distortion O(b) = O(ln n)1/p Run Indyk’s ANN algorithm for ℓ∞O(loglogd
An improvementWe can improve the guarantees of Andoni’s algorithm by considering the
doubling dimension of the space.Doubling constant: number of half-radius balls necessary to cover big ball.Doubling dimension: log(doubling constant)For example, d-dimensional Euclidean space has doubling dimension Ѳ(d)
Improvement outlineNearest neighbor search can be reduced to a series of subproblems
Searches on spaces with small aspect ratioSo we can take a net on the subspaces, and run Andoni’s algorithm on the nets insteadSize of net: ddimO(ddim
log d (logdn)1/p)
Improved: O(log log d
New algorithmBasic idea: Embed ℓp
space into ℓ2Run ANN algorithm for ℓ2Embedding using the Mazur map
Mazur mapMazur map is a mapping from ℓp
to ℓq, for any 0 < p, q < ∞. The mapping of vector v ∈ ℓp is defined as M(v) = (|v
, . . . , |vm−1|p/q
)For set V, let C satisfy C ≥ ǁvǁp, for all ∈
V. Our embedding f is the Mazur map from ℓp to ℓ2, scaled down by a factor (p/2) C p/2 – 1f is non-expansive. Contraction: If ǁx − yǁp = u, then ǁf(x) − f(y)ǁp ≥ 2p-1 (2C)1−p/2 up/2 [Binyamini & Lindenstrauss ‘00]
ANN via the Mazur mapThe distortion of our embedding is large depends on the diameter C of the space:
2p-1 (2C)1−p/2up/2But we can show that this guarantee is sufficient to solve a specific case of nearest neighbor in ℓ
p: the c-bounded nearest neighbor problem.
C-bounded nearest neighborDefine the c-bounded near neighbor problem where c ≥
ǁvǁp for all v ∈ VIf there is a point in V within distance 1 of query q, return it or some point in V within distance c/9
of q. This is a c/9-ANN.If there is no point in V within distance 1 of query q, return null
or some point in V within distance c/9 of q.
C-bounded nearest neighborApproximately solve the c-bounded nearest neighbor problem in ℓp
, for c=p18p/2Embed from ℓp to ℓ2Compute a 2-ANN in ℓ2
Analysis: the Mazur map ensures that inter-point distances of c/9
or greater map to at least 2p-1 (2c) 1
−p/2 (c/9) p/2 = 4. If
q possesses a neighbor in the original space at distance 1 or less, the 2-ANN finds a neighbor at distance 2 in the embedded space and less than c/9 in the origin space.qq
C-bounded nearest neighborWe can show that the C-bounded nearest neighbor problem can be used to give a c-ANN for the regular (unbound problem).
Final result: 2O(p)-ANN
ConclusionCombine two algorithms
for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p
) -ANNNew result: 2O(p) –ANN
Worse case approximation:
((loglogdn)1/2)Improved bounds in metrics of low doubling dimension