ℓ p spaces 2ltplt via embeddings Yair Bartal LeeAd Gottlieb Hebrew U Ariel University Nearest neighbor search Problem definition Given a set of points S preprocess S so that the following query can be answered efficiently ID: 784166
Download The PPT/PDF document "Approximate nearest neighbor for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Approximate nearest neighbor for ℓp–spaces (2<p<∞) via embeddings
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Slide2Nearest neighbor searchProblem definition:
Given a set of points S, preprocess S so that the following query can be answered efficiently:Exact: NNS - Given query point q, what is the closest point to q in S?Approximate: ANN - Given query point q,
find a point x in S whose distance from q is within some approximation factor of the closest ?polynomial
space, polylog queryo(log n) approximation
q
Slide3Approx. nearest neighbor searchNot possible in general metrics
More restrictive spaces?Good news! Euclidean spaceNormed spaces
Slide4Lp Normed SpacesNorms: Recall that for d-dimensional vectors
x,y ǁx-yǁp = (|x1-y1|p+…+|
xd-yd|p
)1/p
l
1l2l∞
Slide5Approx. Nearest neighbor searchAn
efficient ANN structure featurespolynomial space polylog query time o(log n), o(d) approximation
Efficient ANN structures exist for
Euclidean (p=2): (1+ɛ)-ANN [IM’98, KOR ‘98] Reduce dimension via JL, brute force in lower dimension.
ℓ∞: O(log log d)-ANN [Indyk
‘98] What about other norms?1≤p<2: (1+ɛ)-ANN same as Euclidean 2<p<∞: ? (previous – Andoni) subject of this paper
Slide6Summary of resultsCombine two algorithms
for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p
) -ANNNew result: 2O(p) –ANN
Analysis: Equality at p = (
logloglog d) + (loglogdn)1/2Worse case approximation:
(loglogd) exp((loglogdn)1/2)Andoni better for larger values, New for smallerImproved bounds in metrics of low doubling dimension
Slide7EmbeddingsAn embedding
of set X into Y with distortion D is a mapping f : X → Y such that for all x, y ∈ X:1 ≤ c・d
Y(f(x),f(y)) / dX(
x,y) ≤ Dwhere c is any scaling constant
Relaxed: one side may be preserved with constant probabilityIf an embedding is non-expansive
and has small contraction:The nearest neighbor stays close, and far points are still relatively farIf an embedding is non-contractive and has small expansion:The nearest neighbor is only a little bit farther away, and far points remain farqq
Slide8Andoni’s algorithmBasic idea: [Andoni
‘09]Embed ℓp space into ℓ∞Run Indyk’s ANN algorithm for ℓ∞Embedding using
Frechet random variablesmax-stable distribution…
Slide9Frechet distributionFor random variable X ∼
Frechet Pr[X < x] = e-x-p for x>0 Max-stability:Let random variables
X and Z1, . . . ,Zd
be ∼ Frechetlet v = (v1, . . . ,
vd) be a non-negative valued vector. Then the random variable
Y := maxi viZi is distributed as ǁvǁp ・X (Y ∼ ǁvǁp・X).
Slide10Frechet distributionProof of Max-stability:
Recall Y := maxi viZi Pr[Y ≤ x] =
Pr[maxi viZi
≤ x] = Πi Pr[
viZi ≤ x] =
Πi Pr[Zi ≤ x/vi] = Πi e−(vi/x)p = e−(∑ivip)/xp = e−(ǁvǁp/x)pSimilarly, Pr[ǁvǁp・X ≤ x] = Pr[X ≤ x/ǁvǁp] = e−(ǁvǁp /x)p
Slide11Review of Andoni’s embedding
Define embedding fb : V → ℓ∞ (b > 0): Draw Frechet
random variables Z1, . . . ,Zd. fb
maps v= (v1, . . . ,
vd) to (v1bZ1
, . . . ,vdbZd)The resulting set is V′ ∈ ℓ∞.Theorem: Set b = (3 ln n)1/p. Then fb satisfiesNon-contractive (for all points) with prob. > 1−1/nExpansion: For any u,w∈ V, with constant prob. ǁfb(u) − fb (w)ǁ∞ ≤ b ǁu − wǁp Expansion guarantee needed for only one inter-point distance: between the query point and nearest neighbor.
Slide12Analysis of Andoni’s embedding
Theorem: Set b = (3 ln n)1/p. Then fb satisfies
Non-contractive (all points) with prob. > 1−1/nExpansion: For any u,w∈ V with constant prob.
ǁfb(u
) − fb (w) ǁ
∞ ≤ b ǁu − wǁp Proof of contraction: Take v with ǁvǁp = 1. By max-stability, ǁfb(v)ǁ∞ ∼ bǁvǁp・X = b・X, By definition of Frechet distribution, Pr[ǁfb (v)ǁ∞ < 1] = Pr[b ・X < 1] = e−(1/b)−p = n-3 . Since the embedding is linear, v may be taken to be any inter-point distance between two vectors in V, so the probability that any of the n2 inter-point distances decreases is less than n2・ n-3 = 1/n. Proof of expansion:Same approach, Pr[ǁfb(v)ǁ∞ ≤ b] = Pr[b ・X < b] = Pr[X < 1] = e−1So expansion bounded by b.
Slide13SummaryEmbed ℓp space into ℓ
∞distortion O(b) = O(ln n)1/p Run Indyk’s ANN algorithm for ℓ∞O(loglogd
)-ANNFinal guaranteeO(loglogd
ln1/pn)-ANN
Slide14An improvementWe can improve the guarantees of Andoni’s algorithm by considering the
doubling dimension of the space.Doubling constant: number of half-radius balls necessary to cover big ball.Doubling dimension: log(doubling constant)For example, d-dimensional Euclidean space has doubling dimension Ѳ(d)
1
2
3
46578
Slide15Improvement outlineNearest neighbor search can be reduced to a series of subproblems
Searches on spaces with small aspect ratioSo we can take a net on the subspaces, and run Andoni’s algorithm on the nets insteadSize of net: ddimO(ddim
)Approximation:Andoni: O(log
log d (logdn)1/p)
Improved: O(log log d
(ddim logdddim)1/p)
Slide16New algorithmBasic idea: Embed ℓp
space into ℓ2Run ANN algorithm for ℓ2Embedding using the Mazur map
Slide17Mazur mapMazur map is a mapping from ℓp
to ℓq, for any 0 < p, q < ∞. The mapping of vector v ∈ ℓp is defined as M(v) = (|v
0|p/q, |v1|p/q
, . . . , |vm−1|p/q
)For set V, let C satisfy C ≥ ǁvǁp, for all ∈
V. Our embedding f is the Mazur map from ℓp to ℓ2, scaled down by a factor (p/2) C p/2 – 1f is non-expansive. Contraction: If ǁx − yǁp = u, then ǁf(x) − f(y)ǁp ≥ 2p-1 (2C)1−p/2 up/2 [Binyamini & Lindenstrauss ‘00]
Slide18ANN via the Mazur mapThe distortion of our embedding is large depends on the diameter C of the space:
2p-1 (2C)1−p/2up/2But we can show that this guarantee is sufficient to solve a specific case of nearest neighbor in ℓ
p: the c-bounded nearest neighbor problem.
Slide19C-bounded nearest neighborDefine the c-bounded near neighbor problem where c ≥
ǁvǁp for all v ∈ VIf there is a point in V within distance 1 of query q, return it or some point in V within distance c/9
of q. This is a c/9-ANN.If there is no point in V within distance 1 of query q, return null
or some point in V within distance c/9 of q.
Slide20C-bounded nearest neighborApproximately solve the c-bounded nearest neighbor problem in ℓp
, for c=p18p/2Embed from ℓp to ℓ2Compute a 2-ANN in ℓ2
Analysis: the Mazur map ensures that inter-point distances of c/9
or greater map to at least 2p-1 (2c) 1
−p/2 (c/9) p/2 = 4. If
q possesses a neighbor in the original space at distance 1 or less, the 2-ANN finds a neighbor at distance 2 in the embedded space and less than c/9 in the origin space.qq
Slide21C-bounded nearest neighborWe can show that the C-bounded nearest neighbor problem can be used to give a c-ANN for the regular (unbound problem).
Final result: 2O(p)-ANN
Slide22ConclusionCombine two algorithms
for ℓp (2<p<∞):Andoni: O(log log d (logdn)1/p
) -ANNNew result: 2O(p) –ANN
Worse case approximation:
(loglogd) exp
((loglogdn)1/2)Improved bounds in metrics of low doubling dimension