Algorithms for Distributed Functional Monitoring
Author : phoebe-click | Published Date : 2025-08-04
Description: Algorithms for Distributed Functional Monitoring Ke Yi HKUST Joint work with Graham Cormode ATT Labs S Muthukrishnan Google Inc The Story Begins with The Model 1 4 2 1 3 4 5 2 3 5 2 1 2 Alice observes At by time t Bob observes
Presentation Embed Code
Download Presentation
Download
Presentation The PPT/PDF document
"Algorithms for Distributed Functional Monitoring" is the property of its rightful owner.
Permission is granted to download and print the materials on this website for personal, non-commercial use only,
and to display it on your personal computer provided you do not modify the materials and that you retain all
copyright notices contained in the materials. By downloading content from our website, you accept the terms of
this agreement.
Transcript:Algorithms for Distributed Functional Monitoring:
Algorithms for Distributed Functional Monitoring Ke Yi HKUST Joint work with Graham Cormode (AT&T Labs) S. Muthukrishnan (Google Inc.) The Story Begins with ... The Model 1 4 2 1 3 4 5 2 3 5 2 1 2 Alice observes A(t) by time t Bob observes B(t) by time t A(t), B(t): multisets Carole tries to compute f (A(t)UB(t)) for all t All parties have infinite computing power Goal is to minimize communication t The Model 1 4 2 1 3 4 5 2 3 5 2 1 2 2 3 1 3 1 3 2 5 3 3 2 2 k sites Continuous Communication Model / Distributed Streaming Model Combination of Two Models 3 1 1 2 4 2 3 1 1 2 4 2 Communication model 1 4 2 1 3 Streaming model Continuous Communication Model Distributed Streaming Model One-shot Model “ ” Other Models [Gibbons and Tirthapura, 2001] 1 4 2 1 3 4 5 2 3 5 2 1 2 Carole tries to compute f (AUB) in the end All parties make one pass using small memory small communication t Applied Motivation: Distributed Monitoring Large-scale querying/monitoring: Inherently distributed! Streams physically distributed across remote sites E.g., stream of UDP packets through routers Challenge is “holistic” querying/monitoring Queries over the union of distributed streams Q(S1 ∪ S2 ∪ …) Streaming data is spread throughout the network Network Operations Center (NOC) Slide from the tutorial “Streaming in a connected world: Querying and tracking distributed data streams” at VLDB’06 and SIGMOD’07 [Cormode and Garofalakis] Applied Motivation: Distributed Monitoring Traditional approach: “pull” based Query all nodes once for a while Expensive communication, most is wasted Inaccurate Current trend: moving towards a “push” based approach The remote sites alert the coordinator when something interesting happens Network Operations Center (NOC) Theoretical Questions Upper bounds: Worst-case communication bounds for a given f ? Lower bounds: Is there a gap in the communication complexity between the one-shot model and the continuous model? The Frequency Moments Assume integer domain [n] = {1, …, n} i appears mi times The p-th frequency moment: F1 is the cardinality of A F0 is # unique items in A (define 00=0) F2 is Gini’s index of homogeneity in statistics self-join size in db Extensively studied since [Alon, Matias, and Szegedy, 1999] Approximate Monitoring Must trigger alarm when Fp > τ Cannot trigger alarm when Fp < (1 −