/
An Optimal Algorithm for Finding Heavy Hitters An Optimal Algorithm for Finding Heavy Hitters

An Optimal Algorithm for Finding Heavy Hitters - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
429 views
Uploaded On 2017-03-21

An Optimal Algorithm for Finding Heavy Hitters - PPT Presentation

David Woodruff IBM Almaden Based on works with Vladimir Braverman Stephen R Chestnut Nikita Ivkin Jelani Nelson and Zhengyu Wang Streaming Model Stream of elements a 1 a ID: 527747

stream log gaussian bits log stream bits gaussian space random items process heavy times set algorithm counters chaining inequality

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "An Optimal Algorithm for Finding Heavy H..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

An Optimal Algorithm for Finding Heavy Hitters

David Woodruff IBM Almaden

Based on works with Vladimir

Braverman

, Stephen R. Chestnut Nikita

Ivkin

, Jelani Nelson, and

Zhengyu

WangSlide2

Streaming Model

Stream of elements a1, …, am in [n] = {1, …, n}. Assume m = poly(n) Arbitrary orderOne pass over the data Minimize memory usage (space complexity) in bits for solving a taskLet fj be the number of occurrences of item jHeavy Hitters Problem: find those j for which fj is large…

2

1

1

3

7

3

4Slide3

Guarantees

l1 – guaranteeoutput a set containing all items j for which fj φ mthe set should not contain any j with fj (φ-ε) ml2 – guarantee

output a set containing all items j for which f

j

2

the set should not contain any j with f

j 2 (φ-ε)l2 – guarantee can be much stronger than the l1 – guaranteeSuppose frequency vector is (, 1, 1, 1, …, 1)Item 1 is an l2-heavy hitter for constant φ, ε, but not an l1

-heavy hitter f1, f

2 f3 f4 f5 f6 Slide4

CountSketch achieves the

l2–guarantee [CCFC]Assign each coordinate i a random sign ¾(i) 2 {-1,1}Randomly partition coordinates into B buckets, maintain cj = Σi: h(i) = j ¾(i)¢fi in j-th bucket

.

Σ

i: h(

i

) = 2

¾

(i)¢fi

.

.

f

1

f

2

f

3

f

4

f

5

f

6

f

7

f

8

f

9

f

10

Estimate f

i

as

¾

(

i

)

¢

c

h

(

i

)

E[

¾(i) ¢ ch(i)] = ¾(i) Σi’: h(i’) = h(i) ¾(i’)¢fi’ = fi Repeat this hashing scheme O(log n) times Output median of estimates Noise in a bucket is ¾(i) Σi’ : h(i’) = h(i) ¾(i’)¢fi’ Ensures every fj is approximated up to an additive /B)1/2 Gives O(log2 n) bits of space

 Slide5

Known Space Bounds for l

2– heavy hittersCountSketch achieves O(log2 n) bits of spaceIf the stream is allowed to have deletions, this is optimal [DPIW]What about insertion-only streams? This is the model originally introduced by Alon, Matias, and SzegedyModels internet search logs, network traffic, databases, scientific data, etc.The only known lower bound is Ω(log n) bits, just to report the identity of the heavy hitterSlide6

Our Results [BCIW]

We give an algorithm using O(log n log log n) bits of space!Same techniques give a number of other results:( at all times) Estimate at all times in a stream with O(log n log log n) bits of spaceImproves the union bound which would take O(log2 n) bits of space(-Estimation) Compute maxi fi up to additive (ε

)

1/2

using O(log n log

log

n) bits of space (Resolves IITK Open Question 3) Slide7

Simplifications

Output a set containing all items i for which fi 2 for constant φThere are at most O(1/φ) = O(1) such items iHash items into O(1) bucketsAll items i for which fi 2

will go to different buckets with good probability

Problem reduces to having a single

i

*

in {1, 2, …, n} with fi* ()1/2 Slide8

Intuition

Suppose first that log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}For the moment, let us also not count the space to store random hash functionsAssign each coordinate i a random sign ¾(i) 2 {-1,1}

R

andomly partition items into 2 buckets

Maintain

c

1 = Σi: h(i) = 1 ¾(i)¢fi and c2 = Σi: h(i) = 2 ¾(i)¢fi Suppose h(i*) = 1.

What do the values c1 and c2 look like? Slide9

c

1 = ¾(i*)¢fi* +

and c

2

=

c

1 -

¾(i*)¢fi* and c2 evolve as random walks as the stream progresses(Random Walks) There is a constant C > 0 so that with probability 9/10, at all times, |c1 - ¾(i*)¢fi*| < Cn1/2 and |c2| < Cn1/2 

Eventually, fi* >

 Only gives 1 bit of information. Can’t repeat log n times in parallel, but can repeat log n times sequentially!Slide10

Repeating Sequentially

Wait until either |c1| or |c2| exceeds Cn1/2If |c1| > Cn1/2 then h(i*) = 1, otherwise h(i*) = 2This gives 1 bit of information about i*(Repeat) initialize 2 new counters to 0 and perform the procedure again!Assuming

log n), we will have at least 10 log n repetitions, and we will be correct in a 2/3 fraction of them

(

Chernoff

)

only a single value of i* whose hash values match a 2/3 fraction of repetitions Slide11

Gaussian Processes

We don’t actually have log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}Fix both problems using Gaussian processes(Gaussian Process) Collection {Xt}t in T of random variables, for an index set T, for which every finite linear combination of random variables is GaussianAssume E[Xt] = 0 for all tProcess entirely determined by covariances E[XsX

t

]

Distance function d(

s,t

) = (E[|Xs-Xt|2])1/2 is a pseudo-metric on T(Connection to Data Streams) Suppose we replace the signs ¾(i) with normal random variables g(i), and consider a counter c at time t: c(t) = Σi g(i)¢fi(t) fi(t) is frequency of item i after processing t stream insertionsc(t) is a Gaussian process!

 Slide12

Chaining Inequality [Fernique

, Talagrand]Let {Xt}t in T be a Gaussian process and let be such that

and

for

. Then,

How can we apply this to

c(t) =

Σ

i g(

i)¢fi(t)?Let be the value of after t stream insertionsLet the

be a recursive partitioning of the stream where (t) changes by a factor of 2 Slide13

a

t

a

5

a

4

a

3

a

2

a1

a

m

a

t

is the first point in the stream for which

Let

be the set of

times

in the stream such that t

j

is the first point in the stream with

Then

and

for

 

Apply the chaining inequality!Slide14

Applying the Chaining Inequality

Let {Xt}t in T be a Gaussian process and let be such that and

for

. Then,

= min

(E|c(t) – c(t

j

)|

2])1/2

)1/2Hence,

)1/2 = O(F21/2) Same behavior as for random walks!Slide15

Removing Frequency Assumptions

We don’t actually have log n and fj in {0,1} for all j in {1, 2, …, n} \ {t}Gaussian process removes the restriction that fj in {0,1} for all j in {1, 2, …, n} \ {t}The random walk bound of Cn1/2 we needed on counters holds without this restrictionBut we still need

log n to learn log n bits about the heavy hitter

How to replace this restriction with

(

φ

F2) 1/2?Assume

φ > log log n by hashing into log log n buckets and incurring a log log n factor in space Slide16

Amplification

Create O(log log n) pairs of streams from the input stream(streamL1 , streamR1), (streamL2 , streamR2), …, (streamLlog log n , streamRlog log n)For each j in O(log log n), choose a hash function hj :{1, …, n} -> {0,1}streamLj is the original stream restricted to items i with hj(

i

) = 0

stream

R

j is the remaining part of the input streammaintain counters cL = Σi: hj(i) = 0 g(i)¢fi and cR = Σi: hj(i) = 1 g(i)¢f

i (Chaining Inequality + Chernoff) the larger counter is usually the substream with i* The larger counter stays larger forever if the Chaining Inequality holdsRun algorithm on items corresponding to the larger countsExpected F2 value of items, excluding i*, is F2/poly(log n), so i* is heavierSlide17

Derandomization

We have to account for the randomness in our algorithmWe need to derandomize a Gaussian processderandomize the hash functions used to sequentially learn bits of i*We achieve (1) by(Derandomized Johnson Lindenstrauss) defining our counters by first applying a Johnson-Lindenstrauss (JL) transform [KMN] to the frequency vector, reducing n dimensions to log n, then taking the inner product with fully independent Gaussians(Slepian’s Lemma) counters don’t change much because a Gaussian process is determined by its covariances and all covariances are roughly preserved by JLFor (2), derandomize an auxiliary algorithm via Nisan’s pseudorandom generator [I]Slide18

An Optimal Algorithm [BCINWW]

Want O(log n) bits instead of O(log n log log n) bitsMultiple sources where the O(log log n) factor is coming fromAmplificationUse a tree-based scheme and that the heavy hitter becomes heavier!DerandomizationShow 4-wise independence suffices for derandomizing a Gaussian process!Slide19

Conclusions

Beat CountSketch for finding -heavy hitters in a data streamAchieve O(log n) bits of space instead of O(log2 n) bitsNew results for estimating F2 at all points and L - estimationQuestions:Is this a significant practical improvement over CountSketch as well?Can we use Gaussian processes for other insertion-only stream problems?