/
New Algorithms for Heavy Hitters in Data Streams New Algorithms for Heavy Hitters in Data Streams

New Algorithms for Heavy Hitters in Data Streams - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
363 views
Uploaded On 2018-03-11

New Algorithms for Heavy Hitters in Data Streams - PPT Presentation

David Woodruff IBM Almaden J oint works with Arnab Bhattacharyya Vladimir Braverman Stephen R Chestnut Palash Dey Nikita Ivkin Jelani Nelson and Zhengyu Wang Streaming Model ID: 647120

bits log items stream log bits stream items random algorithm item counters space gaussian heavy data gries optimal misra

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "New Algorithms for Heavy Hitters in Data..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

New Algorithms for Heavy Hitters in Data Streams

David Woodruff IBM Almaden

J

oint works with Arnab Bhattacharyya, Vladimir

Braverman

, Stephen R. Chestnut,

Palash

Dey

Nikita

Ivkin

, Jelani Nelson, and

Zhengyu

WangSlide2

Streaming Model

Stream of elements a1, …, am in [n] = {1, …, n}. Assume m = poly(n) Arbitrary orderOne pass over the data Minimize space complexity (in bits) for solving a taskLet fj be the number of occurrences of item jHeavy Hitters Problem: find those j for which fj is large

2

1

1

3

7

3

4Slide3

Guarantees

l1 – guaranteeoutput a set containing all items j for which fj φ mthe set should not contain any j with fj (φ-ε) ml2 – guarantee

output a set containing all items j for which f

j

2

the set should not contain any j with fj 2

(φ-ε)l2 – guarantee can be much stronger than the l1 – guaranteeSuppose frequency vector is (, 1, 1, 1, …, 1)Item 1 is an l2-heavy hitter for constant φ, ε, but not an l1

-heavy hitter f1,

f2 f3 f4 f5 f6 Slide4

Outline

Optimal algorithm in all parameters φ, ε for l1-guaranteeOptimal algorithm for l2-guarantee for constant φ, εSlide5

Misra-Gries

Maintain a list L of c = O(1/ε) pairs of the form (key, value)Given an update to item iIf i is in L, increment value by 1If i is not in L, and there are fewer than c pairs in L, put (i,1) in LOtherwise, subtract 1 from all values in L. Remove pairs with value 0If an item i is not a key, charge its updates to c-1 distinct updates of other items: fi , so

Charge each update not included in the value f’

i

of a key

i

to c-1 updates of other items:

 keyvalueSlide6

Space Complexity of Misra-Gries

log n) bits, assuming stream length Optimal if

since output size is

bits

But what if say,

= ½ and

= 1/log n?Misra-Gries uses

bits but lower bound only bits Slide7

Our Results

Obtain an optimal algorithm using) bitsIf = ½ and

= 1/log

n we obtain the optimal O(log n) bits!

For general stream lengths m, there is an additive O(log

log

m) in upper and lower bounds, so also optimalO(1) update and reporting times provided  Slide8

A Simple Initial Improvement

First show ) bit algorithm, then improve it to optimal

) bits

I

dea:

use same number c of (key, value) pairs, but compress each pair

Compress values by sampling random stream positions.

If sample items with probability p = then for all i in [n], new frequency satisfies

distinct keys after sampling, so hash identities to universe of size

 Slide9

Why Sampling Works?

Compress the values by sampling random stream positions. If sample items with probability p = then for all i in [n], new frequency

satisfies

for which

]

 

2

1

1

3

7

3

1Slide10

Misra-Gries after Hashing

Stream length is after sampling distinct keys after sampling, so hash identities pairwise-independently to universe of size

Misra-Gries

on (key, value) pairs takes

) bits of space

Heavy hitters in sampled stream correspond to heavy hitters in original stream, and frequencies are preserved up to additive Problem: want original (non-hashed) identities of heavy hitters! Slide11

Maintaining Identities

For the items with largest counts, as reported by our data structure, maintain actual log n bit identitiesAlways possible to maintain since if we sample an insertion of an item i, we have its actual identity in hand 314659

1000

20

33

5000

11hashed keyvalue458938\\30903\\10020335000

11actual keyvalueSlide12

Summary of Initial Improvement

) bit algorithmUpdate and reporting time can be made O(1) provided

For most stream updates, they’re not sampled so do nothing!

S

pread out computation of expensive operations over future updates for which you do nothing

 Slide13

An Optimal Algorithm

) space, but want

)

Too much space for (key, value) pairs in

Misra-Gries

!

Instead, run Misra-Gries to find items with frequency > then use a separate data structure to estimate their frequencies up to additive Misra-Gries data structure takes O(

bits of spaceSeparate data structure will be O() independent repetitions of a data structure using O() bits. What can you do with O(

) bits? Slide14

An Optimal Algorithm

Want to use O() bits so that for any given item i, can report an additive approximation to with probability > 2/3Median of estimates across O() repetitions is an additive approximation with probability 1 – φ

/100. Union bound over 1/

φ

items

Keep O(

) counters as in Misra-Gries, but each on average uses O(1) bits!Can’t afford to keep item identifiers, even hashed ones..Can’t afford to keep exact counts, even on the sampled stream.. Slide15

Dealing with Item Identifiers

Choose a pairwise-independent hash function h:[n] -> {1, 2, …, 1/ε}Don’t keep item identifiers, just treat all items that go to the same hash bucket as one itemExpected “noise” in a bucket is ε (sampled stream length) = 1/εSolves the problem with item identifiers, but what about counts? Slide16

Dealing with Item Counts

We have r = O(1/ε) counters , with

, and want to store each

up to additive error 1/

R

ound each

to its nearest integer multiple of 1/Gives O(1/ε) bits of space But how to maintain this as the stream progresses?classic “probabilistic counters” do not workdesign “accelerated counters” which are more accurate as count increasesFor more details, please see the paper!

 Slide17

Conclusions on l1-guarantee

) bits of spaceIf

, then update and reporting times are O(1)

Show a matching lower bound

I

s this also a significant practical improvement over

Misra-Gries? Slide18

Outline

Optimal algorithm in all parameters φ, ε for l1-guaranteeOptimal algorithm for l2-guarantee for constant φ, εSlide19

CountSketch achieves the l

2–guarantee [CCFC]Assign each coordinate i a random sign ¾(i) 2 {-1,1}Randomly partition coordinates into B buckets, maintain cj = Σi: h(i) = j ¾(i)¢fi in j-th bucket

.

Σ

i: h(

i

) = 2

¾(i

)¢fi

.

.

f

1

f

2

f

3

f

4

f

5

f

6

f

7

f

8

f

9

f

10

Estimate

f

i

as

¾

(

i

)

¢

c

h

(

i

)

E[

¾

(

i

)

¢

c

h

(

i

)

] =

¾

(

i

)

Σ

i

’: h(

i

’)

=

h(

i

)

¾

(

i

’)

¢

f

i’

=

f

i

Repeat

this hashing scheme O(log n) times

Output median of estimates

Ensures every

f

j

is approximated up to an additive

/B)

1/2

Gives O(log

2

n) bits of space

 Slide20

Known Space Bounds for l

2– heavy hittersCountSketch achieves O(log2 n) bits of spaceIf the stream is allowed to have deletions, this is optimal [DPIW]What about insertion-only streams? This is the model originally introduced by Alon, Matias, and SzegedyModels internet search logs, network traffic, databases, scientific data, etc.The only known lower bound is Ω(log n) bits, just to report the identity of the heavy hitterSlide21

Our Results [BCIW]

We give an algorithm using O(log n log log n) bits of space!Same techniques give a number of other results:( at all times) Estimate at all times in a stream with O(log n log log n) bits of spaceImproves the union bound which would take O(log2 n) bits of spaceImproves an algorithm of [HTY] which requires m >> poly(n) to achieve savings(-Estimation)

Compute max

i

f

i up to additive (ε

)1/2 using O(log n log log n) bits of space (Resolves IITK Open Question 3) Slide22

Simplifications

Output a set containing all items i for which fi 2 for constant φThere are at most O(1/φ) = O(1) such items iHash items into O(1) bucketsAll items i for which fi 2

will go to different buckets with good probability

Problem reduces to having a single

i

* in {1, 2, …, n} with fi* ()1/2 Slide23

Intuition

Suppose first that log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}For the moment, also assume that we have an infinitely long random tapeAssign each coordinate i a random sign ¾(i)

2

{-

1,1}

R

andomly partition items into 2 bucketsMaintain c1 = Σi: h(i) = 1 ¾(i)¢fi and c2 = Σi: h(i) = 2 ¾(i)¢fi Suppose

h(i*) = 1. What do the values c1 and c2 look like? Slide24

c

1 = ¾(i*)¢fi* +

and c

2

=

c

1 - ¾(i*)¢

fi* and c2 evolve as random walks as the stream progresses(Random Walks) There is a constant C > 0 so that with probability 9/10, at all times, |c1 - ¾(i*)¢fi*| < Cn1/2 and |c2| < Cn1/2 

Eventually, fi* >

 Only gives 1 bit of information. Can’t repeat log n times in parallel, but can repeat log n times sequentially!Slide25

Repeating Sequentially

Wait until either |c1| or |c2| exceeds Cn1/2If |c1| > Cn1/2 then h(i*) = 1, otherwise h(i*) = 2This gives 1 bit of information about i*(Repeat) initialize 2 new counters to 0 and perform the procedure again!Assuming

log

n), we will have at least 10 log n repetitions, and we will be correct in a 2/3 fraction of them

(

Chernoff

) only a single value of i* whose hash values match a 2/3 fraction of repetitions Slide26

Gaussian Processes

We don’t actually have log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}Fix both problems using Gaussian processes(Gaussian Process) Collection {Xt}t in T of random variables, for an index set T, for which every finite linear combination of random variables is GaussianA

ssume E[X

t

] = 0 for all t

Process entirely determined by covariances E[X

sXt]Distance function d(s,t) = (E[|Xs-Xt|2])1/2 is a pseudo-metric on T(Connection to Data Streams) Suppose we replace the signs ¾(i) with normal random variables g(i), and consider a counter c at time t: c(t) = Σi g(i)¢fi(t) fi(t) is frequency of item i after processing t stream insertions

c(t) is a Gaussian process! Slide27

Chaining Inequality [Fernique,

Talagrand]Let {Xt}t in T be a Gaussian process and let be such that

and

for

. Then,

How can we apply this to

c(t) =

Σ

i g(i)

¢fi(t)?Let be the value of after t stream insertionsLet the

be a recursive partitioning of the stream where (t) changes by a factor of 2 Slide28

a

t

a

5

a

4

a

3

a

2

a1

a

m

a

t

is the first point in the stream for which

Let

be the set of

times

in the stream such that t

j

is the first point in the stream with

Then

and

for

 

A

pply the chaining inequality!Slide29

Applying the Chaining Inequality

Let {Xt}t in T be a Gaussian process and let be such that and

for

. Then,

= (min

E|c(t) – c(t

j

)|

2])1/2

)1/2Hence,

)

1/2 = O(F21/2) Same behavior as for random walks!Slide30

Removing Frequency Assumptions

We don’t actually have log n and fj in {0,1} for all j in {1, 2, …, n} \ {t}Gaussian process removes the restriction that fj in {0,1} for all j in {1, 2, …, n} \ {t}The random walk bound of Cn1/2 we needed on counters holds without this restriction

But we still need

(-

i

*) log

n to learn log n bits about the heavy hitterHow to replace this restriction with

(φ F2(-i*)) 1/2?Assume φ > log log n by hashing into log log n buckets and incurring a log log n factor in space Slide31

Amplification

Create O(log log n) pairs of streams from the input stream(streamL1 , streamR1), (streamL2 , streamR2), …, (streamLlog log n , streamRlog log n)For each j in O(log log n), choose a hash function hj :{1, …, n} -> {0,1}

stream

L

j

is the original stream restricted to items i with h

j(i) = 0streamRj is the remaining part of the input streammaintain counters cL = Σi: hj(i) = 0 g(i)¢fi and cR = Σi: hj(i) = 1

g(i)¢fi (Chaining Inequality + Chernoff) the larger counter is usually the substream with i* The larger counter stays larger forever if the Chaining Inequality holdsRun algorithm on items corresponding to the larger countsExpected F2 value of items, excluding i*, is F2/poly(log n), so i* is heavierSlide32

Derandomization

We don’t have an infinitely long random tapeWe need to derandomize a Gaussian processderandomize the hash functions used to sequentially learn bits of i*We achieve (1) by(Derandomized Johnson Lindenstrauss) defining our counters by first applying a Johnson-Lindenstrauss (JL) transform [KMN] to the frequency vector, reducing n dimensions to log n, then taking the inner product with fully independent Gaussians(Slepian’s Lemma) counters don’t change much because a Gaussian process is determined by its covariances and all covariances are roughly preserved by JLFor (2), derandomize an auxiliary algorithm via Nisan’s PRG [I]Slide33

An Optimal Algorithm [BCINWW]

Want O(log n) bits instead of O(log n log log n) bitsSources where the O(log log n) factor is coming fromAmplificationUse a tree-based scheme and that the heavy hitter becomes heavier!DerandomizationShow 6-wise independence suffices for derandomizing a Gaussian process!Slide34

Conclusions on l

2-guaranteeBeat CountSketch for finding -heavy hitters in a data streamAchieve O(log n) bits of space instead of O(log2 n) bitsNew results for estimating F2 at all points and L - estimationQuestions:Is this a significant practical improvement over CountSketch as well?Can we use Gaussian processes for other insertion-only stream problems? Slide35

Accelerated Counters

What if we update a counter c for with probability p = ε? E[c/p] = Sum of the counts is expected to be O(1/ ε)We have counters with sum O(1/ ε)c

bits

Problem:

very inaccurate if

=

 Slide36

Accelerated Counters

Instead, suppose you knew a value r with Update a counter c with probability . Output Var[

] =

Problem:

don’t know

in advance!

 Slide37

Accelerated Counters

Solution: increase sampling probability as counter increases!Opposite of standard probabilistic countersA frequency will in expectation have count value about )With counters subject to

space is maximized at

bits