Kijung Shin Triangles in a Graph Graphs are everywhere Social Network Web Emails etc Triangles are a fundamental primitive 3 nodes connected to each other Counting triangles has many applications ID: 655024
Download Presentation The PPT/PDF document "WRS: Waiting Room Sampling for Accurate ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams
Kijung ShinSlide2
Triangles in a Graph
Graphs are everywhere!Social Network, Web, Emails, etc.Triangles are a fundamental primitive3 nodes connected to each otherCounting triangles
has many applicationsCommunity detection, Spam detection, Query optimizationWRS: Waiting Room Sampling (by Kijung Shin)
2/30
Introduction
Experiments
Conclusion
Algorithm
PatternSlide3
Challenges in Real Graphs
Many algorithms for small and fixed graphsHowever, real-world graphs areLarge: may not fit in main memoryGrowing: new nodes and edges are added
Need to consider realistic settingsWRS: Waiting Room Sampling (by Kijung Shin)
3/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
time
time
+2
time
+6
Slide4
Streaming Model
Stream of edges
edges are streamed one by one from sourcesAny or adversarial orderedges can be in any order in the streamLimited memorynot every edge can be stored in memory
WRS: Waiting Room Sampling (by Kijung Shin)
4
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
too strong
Source
DestinationSlide5
Relaxed
Streaming Model
Stream of edges edges are streamed one by one from sourcesChronological ordernatural for dynamic graphsedges are streamed when they are created
Limited memorynot every edge can be stored in memoryWRS: Waiting Room Sampling (by Kijung Shin)
5
/30
“What
temporal patterns
do exist?”
“How can we
exploit them for accurate triangle counting?”
Introduction
Experiments
Conclusion
Algorithm
PatternSlide6
Roadmap
IntroductionTemporal Pattern <<Proposed AlgorithmExperiments
ConclusionWRS: Waiting Room Sampling (by Kijung Shin)6/30Slide7
Time Interval of a Triangle
Time interval of a triangle:WRS: Waiting Room Sampling (by Kijung Shin)
7/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
–
arrival order
of its
last
edge
arrival order
of its
first
edge
arrival order
1 2 3 4 5 6 7 8
Time interval Slide8
Time Interval Distribution
Temporal Locality:average time interval is 2X shorter in the chronological orderthan in a random order
WRS: Waiting Room Sampling (by Kijung Shin)
8/30
random
arrival order
chronological
arrival order
random order
chronological
orderSlide9
Temporal Locality (cont.)
One interpretation:edges are more likely to form triangles with edges close in timethan with edges far in timeAnother interpretation: new edges
are more likely to form triangles with recent edges than with old edgesWRS: Waiting Room Sampling (by Kijung Shin)
9/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
“How can we exploit
temporal locality
for accurate
triangle counting
?”
chronological
order
random
orderSlide10
Roadmap
IntroductionTemporal PatternProposed Algorithm <<Experiments
ConclusionWRS: Waiting Room Sampling (by Kijung Shin)10/30Slide11
Problem Definition
Given: a graph stream in the chronological ordermemory budget:
(i.e., up to edges can be stored)Estimate: counts of global and local trianglesTo Minimize: estimation error
WRS: Waiting Room Sampling (by Kijung Shin)
11
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
Global triangles
: all triangles in the graph
Local triangles
: the triangles incident to each node
3
2
1
2
3
4
1
3
2
1Slide12
Algorithm Overview
General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts
: probability that triangle
is discovered
WRS: Waiting Room Sampling (by Kijung Shin)
12
/30
Introduction
Experiments
Conclusion
Algorithm
PatternSlide13
Algorithm Overview
General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts
: probability that triangle
is discovered
WRS: Waiting Room Sampling (by Kijung Shin)
13
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
memory
new edge
(1) Edge ArrivalSlide14
Algorithm Overview
General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts
: probability that triangle
is discovered
WRS: Waiting Room Sampling (by Kijung Shin)
14
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
discover!
memory
(1) Edge Arrival
(2) Counting Step
new edgeSlide15
Algorithm Overview
General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts
: probability that triangle
is discovered
WRS: Waiting Room Sampling (by Kijung Shin)
15
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
discover!
(2) Counting Step
memory
(1) Edge Arrival
new edgeSlide16
Algorithm Overview
General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts
: probability that triangle
is discovered
WRS: Waiting Room Sampling (by Kijung Shin)
16
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
(2) Counting Step
memory
(1) Edge Arrival
(3) Sampling Step
(to be explained)
new edgeSlide17
Bias and Variance Analyses
Bias and variance analyses results Theorem. Unbiasedness of our estimate
: Theorem. Variance of our estimate:
should increase discovering probability
which depends
on
sampling algorithms
WRS: Waiting Room Sampling (by Kijung Shin)
17
/30
True count
Introduction
Experiments
Conclusion
Algorithm
PatternSlide18
Increasing Discovering Prob.
Recall Temporal Locality:new edges are more likely to form triangles with recent edgesthan with old edgesWaiting-Room Sampling (WRS)
exploits temporal localityby treating recent edges better than old edgesWRS: Waiting Room Sampling (by Kijung Shin)
18/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
“How can we
increase
discovering probabilities
of triangles?”Slide19
Waiting-Room Sampling (WRS)
Divides memory space into two partsWaiting Room: stores latest edges Reservoir: store samples from the remaining edges
WRS: Waiting Room Sampling (by Kijung Shin)
19/30
Waiting Room (FIFO)
Reservoir
(
Random Replace)
% of budget
of budget
New edge
Introduction
Experiments
Conclusion
Algorithm
Pattern
Slide20
WRS: Sampling Steps (Step 1)
WRS: Waiting Room Sampling (by Kijung Shin)20
/30
Popped e
dge
Introduction
Experiments
Conclusion
Algorithm
Pattern
Waiting Room
(FIFO)
Reservoir
(
Random Replace)
New edge
Waiting Room
(FIFO)
Reservoir
(
Random Replace)Slide21
WRS: Sampling Steps (Step 2)
WRS: Waiting Room Sampling (by Kijung Shin)21/30
Popped
edge
Introduction
Experiments
Conclusion
Algorithm
Pattern
Waiting Room
(FIFO)
replace!
store
discard
or
or
Reservoir
(
Random Replace)Slide22
Summary of Algorithm
WRS: Waiting Room Sampling (by Kijung Shin)22/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
memory
new edge
(1) Arrival Step
discover!
(2) Discovery Step
(3) Sampling Step
Waiting-Room
Sampling!Slide23
Roadmap
IntroductionTemporal PatternProposed AlgorithmExperiments <<
ConclusionWRS: Waiting Room Sampling (by Kijung Shin)23/30Slide24
Experimental Settings
Competitors: Mascot [YK15], Triest-IMPR [DERU16]Datasets:
Memory budget : 10% of the edgesSize of waiting-room: 10% of memory budget
WRS: Waiting Room Sampling (by Kijung Shin)
24
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
citation
email
friendshipSlide25
Distribution of Estimates
WRS: Waiting Room Sampling (by Kijung Shin)
25/30
Waiting Room Sampling (WRS) givesunbiased estimateswith smallest variance
Introduction
Experiments
Conclusion
Algorithm
Pattern
WRS
Triest
-IMPR
MASCOT
True CountSlide26
Discovering Probability
WRS increases discovering probability
WRS discovers up to
more triangles
WRS: Waiting Room Sampling (by Kijung Shin)
26
/30
Introduction
Experiments
Conclusion
Algorithm
Pattern
WRS
Triest
-IMPR
MASCOTSlide27
Estimation Errors
WRS is most accurate WRS reduces estimation error up to
WRS: Waiting Room Sampling (by Kijung Shin)
27
/30
Introduction
Experiments
Conclusion
Algorithm
PatternSlide28
Roadmap
IntroductionTemporal PatternProposed AlgorithmExperiments
Conclusion <<WRS: Waiting Room Sampling (by Kijung Shin)28/30Slide29
Contributions
Pattern: Temporal Localityshort time interval of triangles in real graph streams Algorithm: Waiting-Room Sampling (WRS)exploits temporal locality for accurate triangle counting Analyses: Bias and Variance AnalysesWRS gives unbiased estimates with small variances
WRS: Waiting Room Sampling (by Kijung Shin)
29/30
0.37
0
Introduction
Experiments
Conclusion
Algorithm
PatternSlide30
Thank you!
Code and datasets:https://github.com/kijungs/waiting_roomReferences:
[LK15] Yongsub Lim, and U Kang. "Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams.“ KDD 2015[DERU16] Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato and Eli Upfal
. "TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size." KDD 2016WRS: Waiting Room Sampling (by Kijung Shin)
30
/30