/
Dimension reduction techniques  for Dimension reduction techniques  for

Dimension reduction techniques for - PowerPoint Presentation

v2nant
v2nant . @v2nant
Follow
343 views
Uploaded On 2020-08-28

Dimension reduction techniques for - PPT Presentation

l p 1ltplt2 with applications Yair Bartal LeeAd Gottlieb Hebrew U Ariel University Introduction Fundamental result in dimension reduction Johnson Lindenstrauss Lemma JL84 for Euclidean space ID: 806780

reduction dimension sin single dimension reduction single sin distances scale embedding stable remain coordinates embeddings

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Dimension reduction techniques for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dimension reduction techniques for lp (1<p<2), with applications

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Slide2

IntroductionFundamental result in dimension reduction: Johnson-

Lindenstrauss Lemma (JL-84) for Euclidean space.Given: set S of n points in Rd

There exists:ƒ : Rd → 

Rk k = O( ln(n) / ε 2

)for all u,v in S,

||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2

Slide3

IntroductionJL Lemma is specific to l

2.Dimension reduction for other lp spaces?Impossible for l and l1.

Not known for other lp spaces.

This paper: Dimension reduction techniques for lp

(1<p<2)Specifically, single scale and snowflake embeddings

Slide4

JL transformGiven: set S of n

points in RdThere exists:ƒ : Rd

 → Rk k = O( ln(n) / ε

 2 )for all u,v in S,

||u-v||2 ≤ ||f(u)-f

(v)||2 ≤ (1+ε)||u-v||2

Slide5

3

21

JL transform

Proof by (randomized) construction

f

: Rd → Rk : multiply vectors by random d x k matrixMatrix entries can be {-1,1} or Gaussiansg2g1g4g3g6g5

242=

Slide6

JL transformProve: with constant probability, for all

u,v in S║u-v║2 ≤ ║f(u)-f

(v)║2 ≤ (1+ε)

║u-v║2Observation:

f is linearif w = u-vf(w) = f(u-v) =

f(u)-f(v)Suffices to prove║w║2 ≤ ║f(w)║2 ≤ (1+ε)║w║2

Slide7

c

baJL transform

Consider an embedding into R

1, with G=N(0,1)Normals

are 2-stable:If: X,Y ~ N(0,1) Then:

aX ~ N(0,a2)Also: aX + bY ~ N(0,a2+b2) ~ √(a2+b2) N(0,1)So: ∑ wigi ~ √(∑ wi2) N(0,1) = ║w║2 N(0,1)g1g2g3=ag1 + bg2 +

cg3

Slide8

JL transformEven a single coordinate preserves magnitude.

Each coordinate is distributed ~ ║w║2 N(0,1)So (up to scaling) E[

║f(w)║

2] = ║w║

2Need this to hold simultaneously for all point pairsMultiple coordinates:

║f(w)║22 ~ ║w║22 ∑ N2 (0,1) ~ χ2(k) Sum of k coordinates squared tightly concentrated around its meanCan demonstrateWhen k= ln(n) / ε2 all point pairs preserved simultaneously

Slide9

Dimension reduction for lp?

JL works well for l2.Let’s try to do the same thing for lp (1<p<2)Hint: won’t work… but will be instructivep-stable distributions:

If: X,Y ~ Fp

p≤2Then: aX + bY ~ (ap

+bp)1/p Fp

[Johnson-Schechtman 82, Datar-Immorlica-Indyk-Mirrokni 04, Mendel-Naor 04]

Slide10

Dimension reduction for lp?

Suppose we embedded into R1, with G=Fp║f

(w)║p distributed as ║w

║p Fp

So (up to scaling) E[║f(w)

║p] = ║w║pMultiple coordinates from lp into lp or lq (q≤p)║f(w)║pp = ║w║pp ∑gp║f(w)║pq = ║w║pq ∑gqLooks good! But what’s E[gp] and E[gq]?

Slide11

p-stable distributionFamiliar examples:

Guassian: 2-stableCauchy: 1-stable Density functionUnimodal [SY-78, Y-78, H-84]Bell-shaped [G-84] Heavy-tailed when p<2:

h(x) ≈ 1/(1+xp+1) When

p<2, E[gq] = ∫0∞

xqh(x)dx

≈ ∫0∞ xq/(1+xp+1) ≈ ∫01 xqdx + ∫1∞ xq−(p+1)dx ≈ -x-(p-q) /(p-q) |1∞ 0<q<p E[gq] ≈ 1/(p-q) ← OKq≥p E[gq] ≈ ∞ ← Problem

Slide12

Dimension reduction for lp?

Problems using p-stables for dimension reductionHeavy tails for p<2  E[gp]  When q<p, E[

gq] is finite, but how many coordinates are needed?

Slide13

Dimension reduction for lp?

What’s known for non-Euclidean space?For l1 : Bounded range dimension reduction [OR-02]Dimension: O(R logn / ε3 )Distortion: Distances in range [1,R] retained to (1+ε

)Expansion: Distances <1 remain smallerContraction: Distances >R remain largerUsed as a subroutine for clustering, ANNS

Slide14

Dimension reduction for lp?

Our contributions for lp (1<p<2):Bounded range dimension reduction (lp  l

q q≤p)Dimension: Oε

(R logn)Distortion: Distances in [1,R] retained to (1+ε)Expansion: Distances <1 remain smaller

Contraction: Distances >R remain largerSnowflake embedding:║x-

y║p  (1ε) ║x-y║pα α ≤ 1 Dimension: O(ddim2) Previously known only for l1, with dimension O(22ddim)Both embeddings have application to clustering.

Slide15

Single scale dimension reductionOur single-coordinate embedding is as follows:

f: Rd → R1s: upper distance threshold (~ R)φ: random angleF(v) = F

φ,s(v) = s  sin(φ + (1/s) ∑i

givi)Motivated by [Mendel-

Naor 04]Intuition: sin(ε) ≈ ε

Small values retainedLarge values truncated

Slide16

Single scale dimension reductionF(v) = Fφ,

s(v) = s sin(φ + 1/s ∑i givi)E[|F(u)-F(v)|

q] = sq E[|sin(φ + 1/s ∑i

giui) - sin(φ + 1/s ∑

I givi)|q

] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi)) cos(φ + 1/(2s) ∑I gi(ui+vi))|q] = c (2s)q E[|sin(1/(2s) ∑i gi(ui-vi))|q]Multiple dimensions: repeat n=sO(1)logn times, tight bounds using Bernstein’s inequality Final embedding: Threshold: ║F(u)-F(v) ║q = O(s) Distortion: when 1<w < εs ║F(u)-F(v) ║q ≈ ║(1+ε)u-v║pExpansion: when w < 1 ║F(u)-F(v) ║q < ║(1+ε)u-v║p

Slide17

Snowflake embeddingSnowflake embedding is created by concatenating many single-scale embeddings

An idea due to Assouad (84)Need many properties of single scale: threshold, smoothness, fidelity.Thank you!