Accurate Triangle Counting in Graph Streams with Deletions Kijung Shin Jisu Kim Bryan Hooi Christos Faloutsos Triangles in a Graph Accurate Triangle Counting in Graph Streams with Deletions ID: 780603
Download The PPT/PDF document "Think before You Discard:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions
Kijung Shin, Jisu Kim, Bryan Hooi, Christos Faloutsos
Slide2Triangles in a GraphAccurate Triangle Counting in Graph Streams with Deletions
(by Kijung Shin)2/46Graphs are everywhere!social networks, the web, citation networksTriangles are a fundamental primitive3 nodes connected to each otherCounting triangles has many applicationscommunity detection, anomaly detection, query optimization
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide3Application: Anomaly DetectionAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
3/46Degree
# Incident Triangles[KMF11]
Degree
# Incident Triangles
[LJK18]
Telemarketer
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide4Counting triangles in real-world graphsReal-world graphs areLarge: not fitting in main memoryFully dynamic: both growing
and shrinkingAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)4/46Remaining Challenges
Introduction
Algorithm
Experiments
Problem
Conclusion
online social networks Web Citation networks Call networks
Slide5Large
Graph
Fully dynamic
Graph
Accurate
Insertion
Deletion
MASCOT [LJK18]
Triest
-IMPR [DERU17]
WRS [Shi17]
ESD [HS17]
Triest
-FD [DERU17]
Previous Work
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
5
/46
- Given
: a
large
and
fully-dynamic
graph
-
Goal
:
accurately estimate
the count of triangles
ThinkD
(Proposed)
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide6Our ContributionsWe propose ThinkD
(Think before You Discard)Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)6/46Fast and Accurate:
outperforming competitorsScalable: linear data scalability
Theoretically Sound: unbiased estimates
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide7Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)7/46
Problem Definition
Proposed Method:
ThinkD
Experiments
Conclusions
Slide8Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
8/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
Change
(given)
Graph
(
unmate-rialized
)
Change
(given)
Graph
(
unmate-rialized
)
Slide9Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
9/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
Change
(given)
Graph
(
unmate-rialized
)
Change
(given)
Graph
(
unmate-rialized
)
Slide10Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
10/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
Change
(given)
Graph
(
unmate-rialized
)
Change
(given)
Graph
(
unmate-rialized
)
Slide11Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
11/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
Change
(given)
Graph
(
unmate-rialized
)
Change
(given)
Graph
(
unmate-rialized
)
Slide12Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
12/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
…
Change
(given)
…
Graph
(
unmate-rialized
)
…
…
Change
(given)
…
Graph
(
unmate-rialized
)
…
Slide13Fully Dynamic Graph Stream
Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edgeExamples:
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
13/46
…
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide14Problem Definition
Given: a fully-dynamic graph stream (possibly infinite)memory space (finite)Estimate: the counts of global and local trianglesTo Minimize: estimation error at each time
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
14/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Time
…
Changes
…
#
Triangles
…
…
Changes
…
#
Triangles
…
Given
Estimate
Slide15Problem Definition (cont.)Global Triangles: all triangles in the graphLocal triangles
: the triangles incident to each nodeAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)15/46
3
2
1
2
3
4
1
3
2
1
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide16Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)16/46
Problem Definition
Proposed Method:
ThinkD
<<
Experiments
Conclusions
Slide17Overview of ThinkD
Maintains and updates Number of (non-deleted) triangles that it has observedHow it processes an insertion:
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
17/46
store
test
arrive
count
Yes
- arrive
: an
insertion
of an edge arrives
- count:
count
new
triangles and
increase
- test
: toss a coin
- store
:
store
the edge in memory
No
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide18Overview of ThinkD (cont.)Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
18/46delete
test
arrive
count
Yes
- arrive
: a
deletion
of an edge arrives
- count:
count
deleted
triangles and
decrease
- test
: test whether the edge is stored in memory
- delete
:
delete
the edge in memory
No
Introduction
Algorithm
Experiments
Problem
Conclusion
Maintains and updates
Number of (non-deleted) triangles that it has observed
How it processes a
deletion
:
Comparison with Triest-FD
ThinkD (Think before You Discard):every arrived change is used to update
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
19/46
store
/
delete
test
arrive
count:
update
Yes
No
(discard)
store
/
delete
arrive
Yes
No
(discard)
information loss!
count:
update
test
Triest
-FD
[DERU17]:
some
changes are discarded without being used to update
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide20Two Versions of ThinkDThinkD
-FAST: simple and fastusing independent Bernoulli trialsThinkD-ACC: accurate and parameter-freeusing random pairing [GLH08]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)20/46
Q1: How to
test
in the test step
Q2: How to
estimate
the count of all triangles from
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide21ThinkD-FAST: Details
: number of (non-deleted) triangles observed so farToy example: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
21/46
- time
:
- count
:
- memory
:
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide22ThinkD-FAST: Arrive StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
22/46
store /delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
A new
change
in the input graph arrives
change
: either
insertion
or
deletion
of an edge
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide23ThinkD-FAST: Count StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
23/46store
/delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- # added triangles:
- memory :
Count added or deleted triangles and update
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide24ThinkD-FAST: Test StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
24/46store
/delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
Simulate a Bernoulli trial with probability
is an input parameter
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide25ThinkD-FAST: Store StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
25/46store
/delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
Store the new edge in memory
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide26ThinkD-FAST: Arrive StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
26/46
store /delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
A new
change
in the input graph arrives
change
: either
insertion
or
deletion
of an edge
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide27ThinkD
-FAST: Count StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
27/46
store
/
delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- # deleted triangles:
- memory :
Count added or deleted triangles and update
Introduction
Algorithm
Experiments
Problem
Conclusion
Details
Slide28ThinkD-FAST: Test StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
28/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Analysis
store
/
delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
Test if the arrived edge is in memory
Details
Slide29ThinkD-FAST: Store StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
29/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Analysis
store
/
delete
test
arrive
count
Yes
No
- time
:
- change:
- count :
- memory :
Remove the arrived edge from memory
Details
Slide30Unbiased Estimation
: count of observed triangles
: estimated count of all triangles
: true count of all triangles [ Theorem 1 ] At any time
,
Proof and a variance of
: see the paper
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
30
/46
Unbiased estimate of
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide31Disadvantages of ThinkD-FAST
Parameter: setting the parameter is non-trivialsmall underutilize memory inaccurate estimationlarge
out-of-memory error
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
31
/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide32Disadvantages of ThinkD-FAST (cont.)
Information loss: it may discard inserted edges even when memory is not fullAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)32/46
store
/delete
test
arrive
count
Yes
No
- time
:
- change:
- estimate :
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide33ThinkD-ACC: Even Better!
Random pairing [RLH08] instead of Bernoulli trialsAdvantages:no parameterless information loss: utilizes memory as fully as possible more accurate estimationstill unbiasedDisadvantages: complicated slower than ThinkD-FAST
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
33/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide34Scalability of ThinkD
Let be the size of memoryFor processing changes in the input stream,[ Theorem 2 ] ThinkD-ACC takes[ Theorem 3 ] ThinkD-FAST with
takes
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
34
/46
linear in data size
Introduction
Algorithm
Experiments
Problem
Conclusion
Advantages of ThinkDAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
35/46Fast and Accurate: outperforming competitorsScalable: linear data scalability (Theorems 2 & 3)Theoretically Sound:
unbiased estimates (Theorem 1)
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide36Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)36/46
Problem Definition
Proposed Method:
ThinkD
Experiments <<
Conclusions
Slide37Experimental SettingsAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
37/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Competitors
:
Triest
-FD
[DERU17] &
ESD
[HS17]
state-of-the-art algorithms for
triangle counting in fully-dynamic graph streams
Implementations
:
Datasets:
insertions (edges in graphs) + deletions (random 20%)
Web
(6M+)
Citation
(16M+)
Social Networks
(
1.8B+
edges, …)
ER
Synthetic
(
100B
edges)
Trust
(0.7M+)
Slide38EXP1. Bias Analysis [THM1]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
38/46
Introduction
Algorithm
Experiments
Problem
Conclusion
“Does
ThinkD
give
unbiased
estimates
?”
Triest
-FD
ThinkD
-ACC
ThinkD
-FAST
True Count
- memory budget
of the edges
- dataset:
- #repeats: 10,000
Slide39EXP2. Variance AnalysisAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
39/46
Introduction
Algorithm
Experiments
Problem
Conclusion
“Does
ThinkD
maintain
estimates
with small variance
?
- memory budget
of the edges
- dataset:
- #repeats: 10,000
Triest
-FD
ThinkD
-ACC
ThinkD
-FAST
True Count
Number of Processed Changes
Slide40EXP3. Scalability [THM 2 & 3]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
40/46
Introduction
Algorithm
Experiments
Problem
Conclusion
“Does
ThinkD
scale linearly
with
the size of the input stream?”
(THM 2 & 3)
Linear scalability
(slope=1)
ThinkD
-ACC
ThinkD
-FAST
Number of Changes
- dataset:
ER
- memory budget
fixed
to
(i.e.,
= size /
)
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
41/46Fast and Accurate: outperforming competitorsScalable: linear data scalability (Theorems 2 & 3)Theoretically Sound: unbiased estimates (Theorem 1)
Advantages of
ThinkD
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide42EXP4. Space & Accuracy
Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)42/46
Introduction
Algorithm
Experiments
Problem
Conclusion
“Is
ThinkD
more accurate
than its best competitors?”
- dataset:
- # repeats: 1000
ThinkD
-FAST
ThinkD
-ACC
Memory budget (ratio)
Estimation Error (ratio)
Triest
-FD
ESD
Slide43EXP5. Speed & AccuracyAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
43/46
Introduction
Algorithm
Experiments
Problem
Conclusion
“Is
ThinkD
faster
than its best competitors?”
Running time (Sec)
Estimation Error (ratio)
- dataset:
- # repeats: 1000
ThinkD
-FAST
ThinkD
-ACC
ESD
Triest
-FD
Slide44Advantages of ThinkDAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
44/46Fast and Accurate: outperforming competitorsScalable: linear data scalability Theoretically Sound:
unbiased estimates
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide45Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)45/46
Problem Definition
Proposed Method:
ThinkD
Experiments
Conclusions <<
Slide46ConclusionsWe propose
ThinkD accurately estimates the count of trianglesin large and fully-dynamic graphsAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)46/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Fast and Accurate:
outperforming competitors
Scalable:
linear data scalability
Theoretically Sound:
unbiased estimates
ThinkD
Download
Slide47References[RLH08] Rainer Gemulla
et al., “Maintaining bounded-size sample synopses of evolving datasets.” The VLDB Journal 2008[KMF11] U Kang et al., “Spectral analysis for billion-scale graphs: Discoveries and implementation.” PAKDD 2011[DERU17] Lorenzo De Stefani et al., “TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size.” TKDD 2017[Shi17] Kijung Shin, “WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017 [HS17] Han, Guyue, and Harish Sethu, "Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs." ASONAM 2017[LJK18] Yongsub Lim et al., “Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs”, TKDD 2018Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
47/46
Introduction
Algorithm
Experiments
Problem
Conclusion
Slide48Backup SlidesAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)
48/46
Slide49Proof of Unbiasedness
Proof Sketch:Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)49/46
Prob. that each
added
triangle is observed
Prob. that each
removed
triangle is observed
Thus,
Introduction
Algorithm
Experiments
Problem
Conclusion