l p 1ltplt2 with applications Yair Bartal LeeAd Gottlieb Hebrew U Ariel University Introduction Fundamental result in dimension reduction Johnson Lindenstrauss Lemma JL84 for Euclidean space ID: 806780
Download The PPT/PDF document "Dimension reduction techniques for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Dimension reduction techniques for lp (1<p<2), with applications
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Slide2IntroductionFundamental result in dimension reduction: Johnson-
Lindenstrauss Lemma (JL-84) for Euclidean space.Given: set S of n points in Rd
There exists:ƒ : Rd →
Rk k = O( ln(n) / ε 2
)for all u,v in S,
||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2
Slide3IntroductionJL Lemma is specific to l
2.Dimension reduction for other lp spaces?Impossible for l and l1.
Not known for other lp spaces.
This paper: Dimension reduction techniques for lp
(1<p<2)Specifically, single scale and snowflake embeddings
Slide4JL transformGiven: set S of n
points in RdThere exists:ƒ : Rd
→ Rk k = O( ln(n) / ε
2 )for all u,v in S,
||u-v||2 ≤ ||f(u)-f
(v)||2 ≤ (1+ε)||u-v||2
Slide53
21
JL transform
Proof by (randomized) construction
f
: Rd → Rk : multiply vectors by random d x k matrixMatrix entries can be {-1,1} or Gaussiansg2g1g4g3g6g5
242=
Slide6JL transformProve: with constant probability, for all
u,v in S║u-v║2 ≤ ║f(u)-f
(v)║2 ≤ (1+ε)
║u-v║2Observation:
f is linearif w = u-vf(w) = f(u-v) =
f(u)-f(v)Suffices to prove║w║2 ≤ ║f(w)║2 ≤ (1+ε)║w║2
Slide7c
baJL transform
Consider an embedding into R
1, with G=N(0,1)Normals
are 2-stable:If: X,Y ~ N(0,1) Then:
aX ~ N(0,a2)Also: aX + bY ~ N(0,a2+b2) ~ √(a2+b2) N(0,1)So: ∑ wigi ~ √(∑ wi2) N(0,1) = ║w║2 N(0,1)g1g2g3=ag1 + bg2 +
cg3
Slide8JL transformEven a single coordinate preserves magnitude.
Each coordinate is distributed ~ ║w║2 N(0,1)So (up to scaling) E[
║f(w)║
2] = ║w║
2Need this to hold simultaneously for all point pairsMultiple coordinates:
║f(w)║22 ~ ║w║22 ∑ N2 (0,1) ~ χ2(k) Sum of k coordinates squared tightly concentrated around its meanCan demonstrateWhen k= ln(n) / ε2 all point pairs preserved simultaneously
Slide9Dimension reduction for lp?
JL works well for l2.Let’s try to do the same thing for lp (1<p<2)Hint: won’t work… but will be instructivep-stable distributions:
If: X,Y ~ Fp
p≤2Then: aX + bY ~ (ap
+bp)1/p Fp
[Johnson-Schechtman 82, Datar-Immorlica-Indyk-Mirrokni 04, Mendel-Naor 04]
Slide10Dimension reduction for lp?
Suppose we embedded into R1, with G=Fp║f
(w)║p distributed as ║w
║p Fp
So (up to scaling) E[║f(w)
║p] = ║w║pMultiple coordinates from lp into lp or lq (q≤p)║f(w)║pp = ║w║pp ∑gp║f(w)║pq = ║w║pq ∑gqLooks good! But what’s E[gp] and E[gq]?
Slide11p-stable distributionFamiliar examples:
Guassian: 2-stableCauchy: 1-stable Density functionUnimodal [SY-78, Y-78, H-84]Bell-shaped [G-84] Heavy-tailed when p<2:
h(x) ≈ 1/(1+xp+1) When
p<2, E[gq] = ∫0∞
xqh(x)dx
≈ ∫0∞ xq/(1+xp+1) ≈ ∫01 xqdx + ∫1∞ xq−(p+1)dx ≈ -x-(p-q) /(p-q) |1∞ 0<q<p E[gq] ≈ 1/(p-q) ← OKq≥p E[gq] ≈ ∞ ← Problem
Slide12Dimension reduction for lp?
Problems using p-stables for dimension reductionHeavy tails for p<2 E[gp] When q<p, E[
gq] is finite, but how many coordinates are needed?
Slide13Dimension reduction for lp?
What’s known for non-Euclidean space?For l1 : Bounded range dimension reduction [OR-02]Dimension: O(R logn / ε3 )Distortion: Distances in range [1,R] retained to (1+ε
)Expansion: Distances <1 remain smallerContraction: Distances >R remain largerUsed as a subroutine for clustering, ANNS
Slide14Dimension reduction for lp?
Our contributions for lp (1<p<2):Bounded range dimension reduction (lp l
q q≤p)Dimension: Oε
(R logn)Distortion: Distances in [1,R] retained to (1+ε)Expansion: Distances <1 remain smaller
Contraction: Distances >R remain largerSnowflake embedding:║x-
y║p (1ε) ║x-y║pα α ≤ 1 Dimension: O(ddim2) Previously known only for l1, with dimension O(22ddim)Both embeddings have application to clustering.
Slide15Single scale dimension reductionOur single-coordinate embedding is as follows:
f: Rd → R1s: upper distance threshold (~ R)φ: random angleF(v) = F
φ,s(v) = s sin(φ + (1/s) ∑i
givi)Motivated by [Mendel-
Naor 04]Intuition: sin(ε) ≈ ε
Small values retainedLarge values truncated
Slide16Single scale dimension reductionF(v) = Fφ,
s(v) = s sin(φ + 1/s ∑i givi)E[|F(u)-F(v)|
q] = sq E[|sin(φ + 1/s ∑i
giui) - sin(φ + 1/s ∑
I givi)|q
] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi)) cos(φ + 1/(2s) ∑I gi(ui+vi))|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi))|q]Multiple dimensions: repeat n=sO(1)logn times, tight bounds using Bernstein’s inequality Final embedding: Threshold: ║F(u)-F(v) ║q = O(s) Distortion: when 1<w < εs ║F(u)-F(v) ║q ≈ ║(1+ε)u-v║pExpansion: when w < 1 ║F(u)-F(v) ║q < ║(1+ε)u-v║p
Slide17Snowflake embeddingSnowflake embedding is created by concatenating many single-scale embeddings
An idea due to Assouad (84)Need many properties of single scale: threshold, smoothness, fidelity.Thank you!