David Woodruff IBM Almaden Based on works with Vladimir Braverman Stephen R Chestnut Nikita Ivkin Jelani Nelson and Zhengyu Wang Streaming Model Stream of elements a 1 a ID: 527747
Download Presentation The PPT/PDF document "An Optimal Algorithm for Finding Heavy H..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
An Optimal Algorithm for Finding Heavy Hitters
David Woodruff IBM Almaden
Based on works with Vladimir
Braverman
, Stephen R. Chestnut Nikita
Ivkin
, Jelani Nelson, and
Zhengyu
WangSlide2
Streaming Model
Stream of elements a1, …, am in [n] = {1, …, n}. Assume m = poly(n) Arbitrary orderOne pass over the data Minimize memory usage (space complexity) in bits for solving a taskLet fj be the number of occurrences of item jHeavy Hitters Problem: find those j for which fj is large…
2
1
1
3
7
3
4Slide3
Guarantees
l1 – guaranteeoutput a set containing all items j for which fj φ mthe set should not contain any j with fj (φ-ε) ml2 – guarantee
output a set containing all items j for which f
j
2
the set should not contain any j with f
j 2 (φ-ε)l2 – guarantee can be much stronger than the l1 – guaranteeSuppose frequency vector is (, 1, 1, 1, …, 1)Item 1 is an l2-heavy hitter for constant φ, ε, but not an l1
-heavy hitter f1, f
2 f3 f4 f5 f6 Slide4
CountSketch achieves the
l2–guarantee [CCFC]Assign each coordinate i a random sign ¾(i) 2 {-1,1}Randomly partition coordinates into B buckets, maintain cj = Σi: h(i) = j ¾(i)¢fi in j-th bucket
.
Σ
i: h(
i
) = 2
¾
(i)¢fi
.
.
f
1
f
2
f
3
f
4
f
5
f
6
f
7
f
8
f
9
f
10
Estimate f
i
as
¾
(
i
)
¢
c
h
(
i
)
E[
¾(i) ¢ ch(i)] = ¾(i) Σi’: h(i’) = h(i) ¾(i’)¢fi’ = fi Repeat this hashing scheme O(log n) times Output median of estimates Noise in a bucket is ¾(i) Σi’ : h(i’) = h(i) ¾(i’)¢fi’ Ensures every fj is approximated up to an additive /B)1/2 Gives O(log2 n) bits of space
Slide5
Known Space Bounds for l
2– heavy hittersCountSketch achieves O(log2 n) bits of spaceIf the stream is allowed to have deletions, this is optimal [DPIW]What about insertion-only streams? This is the model originally introduced by Alon, Matias, and SzegedyModels internet search logs, network traffic, databases, scientific data, etc.The only known lower bound is Ω(log n) bits, just to report the identity of the heavy hitterSlide6
Our Results [BCIW]
We give an algorithm using O(log n log log n) bits of space!Same techniques give a number of other results:( at all times) Estimate at all times in a stream with O(log n log log n) bits of spaceImproves the union bound which would take O(log2 n) bits of space(-Estimation) Compute maxi fi up to additive (ε
)
1/2
using O(log n log
log
n) bits of space (Resolves IITK Open Question 3) Slide7
Simplifications
Output a set containing all items i for which fi 2 for constant φThere are at most O(1/φ) = O(1) such items iHash items into O(1) bucketsAll items i for which fi 2
will go to different buckets with good probability
Problem reduces to having a single
i
*
in {1, 2, …, n} with fi* ()1/2 Slide8
Intuition
Suppose first that log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}For the moment, let us also not count the space to store random hash functionsAssign each coordinate i a random sign ¾(i) 2 {-1,1}
R
andomly partition items into 2 buckets
Maintain
c
1 = Σi: h(i) = 1 ¾(i)¢fi and c2 = Σi: h(i) = 2 ¾(i)¢fi Suppose h(i*) = 1.
What do the values c1 and c2 look like? Slide9
c
1 = ¾(i*)¢fi* +
and c
2
=
c
1 -
¾(i*)¢fi* and c2 evolve as random walks as the stream progresses(Random Walks) There is a constant C > 0 so that with probability 9/10, at all times, |c1 - ¾(i*)¢fi*| < Cn1/2 and |c2| < Cn1/2
Eventually, fi* >
Only gives 1 bit of information. Can’t repeat log n times in parallel, but can repeat log n times sequentially!Slide10
Repeating Sequentially
Wait until either |c1| or |c2| exceeds Cn1/2If |c1| > Cn1/2 then h(i*) = 1, otherwise h(i*) = 2This gives 1 bit of information about i*(Repeat) initialize 2 new counters to 0 and perform the procedure again!Assuming
log n), we will have at least 10 log n repetitions, and we will be correct in a 2/3 fraction of them
(
Chernoff
)
only a single value of i* whose hash values match a 2/3 fraction of repetitions Slide11
Gaussian Processes
We don’t actually have log n and fi in {0,1} for all i in {1, 2, …, n} \ {i*}Fix both problems using Gaussian processes(Gaussian Process) Collection {Xt}t in T of random variables, for an index set T, for which every finite linear combination of random variables is GaussianAssume E[Xt] = 0 for all tProcess entirely determined by covariances E[XsX
t
]
Distance function d(
s,t
) = (E[|Xs-Xt|2])1/2 is a pseudo-metric on T(Connection to Data Streams) Suppose we replace the signs ¾(i) with normal random variables g(i), and consider a counter c at time t: c(t) = Σi g(i)¢fi(t) fi(t) is frequency of item i after processing t stream insertionsc(t) is a Gaussian process!
Slide12
Chaining Inequality [Fernique
, Talagrand]Let {Xt}t in T be a Gaussian process and let be such that
and
for
. Then,
How can we apply this to
c(t) =
Σ
i g(
i)¢fi(t)?Let be the value of after t stream insertionsLet the
be a recursive partitioning of the stream where (t) changes by a factor of 2 Slide13
…
a
t
a
5
a
4
a
3
a
2
a1
a
m
…
a
t
is the first point in the stream for which
Let
be the set of
times
in the stream such that t
j
is the first point in the stream with
Then
and
for
Apply the chaining inequality!Slide14
Applying the Chaining Inequality
Let {Xt}t in T be a Gaussian process and let be such that and
for
. Then,
= min
(E|c(t) – c(t
j
)|
2])1/2
)1/2Hence,
)1/2 = O(F21/2) Same behavior as for random walks!Slide15
Removing Frequency Assumptions
We don’t actually have log n and fj in {0,1} for all j in {1, 2, …, n} \ {t}Gaussian process removes the restriction that fj in {0,1} for all j in {1, 2, …, n} \ {t}The random walk bound of Cn1/2 we needed on counters holds without this restrictionBut we still need
log n to learn log n bits about the heavy hitter
How to replace this restriction with
(
φ
F2) 1/2?Assume
φ > log log n by hashing into log log n buckets and incurring a log log n factor in space Slide16
Amplification
Create O(log log n) pairs of streams from the input stream(streamL1 , streamR1), (streamL2 , streamR2), …, (streamLlog log n , streamRlog log n)For each j in O(log log n), choose a hash function hj :{1, …, n} -> {0,1}streamLj is the original stream restricted to items i with hj(
i
) = 0
stream
R
j is the remaining part of the input streammaintain counters cL = Σi: hj(i) = 0 g(i)¢fi and cR = Σi: hj(i) = 1 g(i)¢f
i (Chaining Inequality + Chernoff) the larger counter is usually the substream with i* The larger counter stays larger forever if the Chaining Inequality holdsRun algorithm on items corresponding to the larger countsExpected F2 value of items, excluding i*, is F2/poly(log n), so i* is heavierSlide17
Derandomization
We have to account for the randomness in our algorithmWe need to derandomize a Gaussian processderandomize the hash functions used to sequentially learn bits of i*We achieve (1) by(Derandomized Johnson Lindenstrauss) defining our counters by first applying a Johnson-Lindenstrauss (JL) transform [KMN] to the frequency vector, reducing n dimensions to log n, then taking the inner product with fully independent Gaussians(Slepian’s Lemma) counters don’t change much because a Gaussian process is determined by its covariances and all covariances are roughly preserved by JLFor (2), derandomize an auxiliary algorithm via Nisan’s pseudorandom generator [I]Slide18
An Optimal Algorithm [BCINWW]
Want O(log n) bits instead of O(log n log log n) bitsMultiple sources where the O(log log n) factor is coming fromAmplificationUse a tree-based scheme and that the heavy hitter becomes heavier!DerandomizationShow 4-wise independence suffices for derandomizing a Gaussian process!Slide19
Conclusions
Beat CountSketch for finding -heavy hitters in a data streamAchieve O(log n) bits of space instead of O(log2 n) bitsNew results for estimating F2 at all points and L - estimationQuestions:Is this a significant practical improvement over CountSketch as well?Can we use Gaussian processes for other insertion-only stream problems?