/
Sublinear Algorithmic Tools Sublinear Algorithmic Tools

Sublinear Algorithmic Tools - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
354 views
Uploaded On 2018-11-09

Sublinear Algorithmic Tools - PPT Presentation

2 Alex Andoni Plan 2 Dimension reduction Application Numerical Linear Algebra Sketching Application Streaming Application Nearest Neighbor Search and more Dimension reduction linear ID: 725590

dimension 107 estimator frequency 107 dimension frequency estimator reduction 131 sketching linear cauchy distinct claim moment approximation space median probability norms streaming

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Sublinear Algorithmic Tools" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Sublinear Algorithmic Tools2

Alex AndoniSlide2

Plan

2

Dimension reduction

Application: Numerical Linear Algebra

SketchingApplication: StreamingApplication: Nearest Neighbor Searchand more…

Dimension reduction: linear map s.t:for any points :

 Slide3

Dimension reduction in other norms/distances?

E.g.,

?Essentially no

[CS’02, BC’03, LN’04, JN’10…]For points, approximation: dimension between

and [BC03, NR10, ANN10…] even if map depends on the dataset!In contrast to : [JL] gives , and doesn’t depend on the datasetGeneralize the notion of dimension reduction! Slide4

Computational view

Arbitrary computation

Cons:

Less geometric structure (e.g.,

not metric)Pros:More expressability: better trade-off approximation vs “dimension” Sketch : “functional compression scheme”for estimating distancesalmost all lossy ( distortion or more) and randomized 

 

 

 

 

 

 

 

 

 

 Slide5

Sketching for

 

Analog of Euclidean projections ?

For

,

we used: Gaussian distributionhas stability property: is distributed as Is there something similar for 1-norm?Yes: Cauchy distribution!1-stable: is distributed as

What’s wrong then?

Cauchy are

heavy-tailed…

doesn’t even have finite expectation (of abs)

 

 Slide6

Sketching for

[Indyk’00]

 

6

Still, can consider similar random mapConsider where

each coordinate distributed as

Cauchy

Take 1-norm:

?

does not have finite expectation, but…

Can

estimate by:median

Correctness claim: for each

 Slide7

Estimator for

 

Estimator: median

Correctness claim

: for each

Proof:

is distributed as

Hence claim equivalent to

Matter of checking the pdf of the Cauchy vars…

 Slide8

Estimator for

: high probability

bnd

 

Estimator: median

Claim: for each Take

Hence

(CLT: Chernoff bound)

Similarly with

The above means that

median

with probability at most

 

8

if holds

 

if holds

 Slide9

Yesterday’s Application:

regression

 

Problem:

+structured

, +preconditioner: More: other norms (, M-estimator, Orlicz norms), low-rank approximation & optimization, matrix multiplication, see [Woodruff, FnTTCS’14,…] Weak DR: linear map

,

s.t.

for any

:

 

Weak(

er

) OSE:

linear map

s.t.

for any linear subspace

of dimension

:

 

Cauchy distribution

 

 

[I’00]

[SW’11, MM’13, WZ’13, WW’18]Slide10

Today Application: Streaming 1

IP

Frequency

131.107.65.14

318.0.1.122

80.97.56.202131.107.65.14131.107.65.14131.107.65.14

18.0.1.12

18.0.1.12

80.97.56.20

80.97.56.20

IP

Frequency

131.107.65.14

3

18.0.1.12

2

80.97.56.20

2

127.0.0.19

192.168.0.1

8257.2.5.70

16.09.20.111Challenge: log statistics of the data, using small spaceSlide11

Streaming statistics

Let

= frequency of IP

1st moment (sum):

Trivial: keep a total counter2nd moment (variance): Trivially: counters too much spaceCan’t do better if exactSmall space via (approximate) dimension reduction in  IPFrequency131.107.65.14318.0.1.12280.97.56.202

 

 Slide12

2nd frequency moment via DR

= frequency of IP

2

nd moment:

Store Estimator:

Updating the sketch:

Use linearity of the sketching function:

Correctness from dimension reduction guarantee

 

 

 

 

 Slide13

Streaming Scenario 2

131.107.65.14

18.0.1.12

18.0.1.12

80.97.56.20

IP

Frequency

131.107.65.14

1

18.0.1.12

1

80.97.56.20

1

Question

:

difference

in traffic

 

 

 

 

Similar

Qs

: average delay/variance in a network

differential statistics between logs at different servers,

etc

IP

Frequency

131.107.65.14

1

18.0.1.12

2Slide14

Sketching for Difference

Use

sketching!

Using random

(common for 2 routers) Estimator:

Already proved: can get approximation with  010110010101Estimate  IPFrequency131.107.65.14118.0.1.12

1

80.97.56.20

1

 

IP

Frequency

131.107.65.14

118.0.1.122

 

 

 

 

 Slide15

Sketching for

norms

 

-moment:

About

counters enoughworks via -stable distributions [Indyk’00]Can do (and need) counters[AMS’96, SS’02, BYJKS’02, CKS’03, IW’05, BGKS’06, BO10, AKO’11, G’11, BKSV’14,…] Slide16

Streaming 3: # distinct elements

Problem

: compute the number of

distinct elements in the streamTrivial solution:

space for distinct elementsWill see:

space (approximate) IPFrequency131.107.65.14118.0.1.122 Slide17

Distinct Elements

Algorithm:

Hash

function

Compute

Output is Main claim: , for distinct elementsProof:repeats of the same element don’t matter = minimum of random numbers in [0,1]Pick another random number What’s the probability ?1) exactly 2) probability it is smallest among reals:  Initialize: minHash=1 hash function h into [0,1]Process

(

int

i

):

if (h(

i

) < minHash) minHash = h(index);Output: 1/minHash-1 

2

7

5

 

 

 

 

 [Flajolet-Martin’85, Alon-Matias-Szegedy’96]

Take majority of repetitions