/
Think before You Discard: Think before You Discard:

Think before You Discard: - PowerPoint Presentation

mercynaybor
mercynaybor . @mercynaybor
Follow
346 views
Uploaded On 2020-06-17

Think before You Discard: - PPT Presentation

Accurate Triangle Counting in Graph Streams with Deletions Kijung Shin Jisu Kim Bryan Hooi Christos Faloutsos Triangles in a Graph Accurate Triangle Counting in Graph Streams with Deletions ID: 780603

counting graph streams triangle graph counting triangle streams deletions kijung shin problem experiments algorithm thinkd introduction conclusion count accurate

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Think before You Discard:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions

Kijung Shin, Jisu Kim, Bryan Hooi, Christos Faloutsos

Slide2

Triangles in a GraphAccurate Triangle Counting in Graph Streams with Deletions

(by Kijung Shin)2/46Graphs are everywhere!social networks, the web, citation networksTriangles are a fundamental primitive3 nodes connected to each otherCounting triangles has many applicationscommunity detection, anomaly detection, query optimization

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide3

Application: Anomaly DetectionAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

3/46Degree

# Incident Triangles[KMF11]

Degree

# Incident Triangles

[LJK18]

Telemarketer

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide4

Counting triangles in real-world graphsReal-world graphs areLarge: not fitting in main memoryFully dynamic: both growing

and shrinkingAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)4/46Remaining Challenges

Introduction

Algorithm

Experiments

Problem

Conclusion

online social networks Web Citation networks Call networks

Slide5

Large

Graph

Fully dynamic

Graph

Accurate

Insertion

Deletion

MASCOT [LJK18]

Triest

-IMPR [DERU17]

WRS [Shi17]

ESD [HS17]

Triest

-FD [DERU17]

Previous Work

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

5

/46

- Given

: a

large

and

fully-dynamic

graph

-

Goal

:

accurately estimate

the count of triangles

ThinkD

(Proposed)

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide6

Our ContributionsWe propose ThinkD

(Think before You Discard)Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)6/46Fast and Accurate:

outperforming competitorsScalable: linear data scalability

Theoretically Sound: unbiased estimates

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide7

Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)7/46

Problem Definition

Proposed Method:

ThinkD

Experiments

Conclusions

Slide8

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

8/46

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Time

Change

(given)

Graph

(

unmate-rialized

)

Change

(given)

Graph

(

unmate-rialized

)

Slide9

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

9/46

 

Introduction

Algorithm

Experiments

Problem

Conclusion

 

 

 

 

Time

Change

(given)

Graph

(

unmate-rialized

)

Change

(given)

Graph

(

unmate-rialized

)

Slide10

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

10/46

 

Introduction

Algorithm

Experiments

Problem

Conclusion

 

 

 

 

 

 

 

Time

Change

(given)

Graph

(

unmate-rialized

)

Change

(given)

Graph

(

unmate-rialized

)

Slide11

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

11/46

 

Introduction

Algorithm

Experiments

Problem

Conclusion

 

 

 

 

 

 

 

 

 

 

Time

Change

(given)

Graph

(

unmate-rialized

)

Change

(given)

Graph

(

unmate-rialized

)

Slide12

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edge

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

12/46

 

Introduction

Algorithm

Experiments

Problem

Conclusion

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Time

Change

(given)

Graph

(

unmate-rialized

)

Change

(given)

Graph

(

unmate-rialized

)

Slide13

Fully Dynamic Graph Stream

Model for a large and fully-dynamic graphDiscrete time , starting from 1 and ever increasingAt each time , a change in the input graph arriveschange: either an insertion or deletion of an edgeExamples:

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

13/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide14

Problem Definition

Given: a fully-dynamic graph stream (possibly infinite)memory space (finite)Estimate: the counts of global and local trianglesTo Minimize: estimation error at each time

 Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

14/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Time

Changes

#

Triangles

Changes

#

Triangles

Given

Estimate

Slide15

Problem Definition (cont.)Global Triangles: all triangles in the graphLocal triangles

: the triangles incident to each nodeAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)15/46

3

2

1

2

3

4

1

3

2

1

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide16

Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)16/46

Problem Definition

Proposed Method:

ThinkD

<<

Experiments

Conclusions

Slide17

Overview of ThinkD

Maintains and updates Number of (non-deleted) triangles that it has observedHow it processes an insertion: 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

17/46

store

test

arrive

count

Yes

- arrive

: an

insertion

of an edge arrives

- count:

count

new

triangles and

increase

- test

: toss a coin

- store

:

store

the edge in memory

 

No

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide18

Overview of ThinkD (cont.)Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

18/46delete

test

arrive

count

Yes

- arrive

: a

deletion

of an edge arrives

- count:

count

deleted

triangles and

decrease

- test

: test whether the edge is stored in memory

- delete

:

delete

the edge in memory

 

No

Introduction

Algorithm

Experiments

Problem

Conclusion

Maintains and updates

Number of (non-deleted) triangles that it has observed

How it processes a

deletion

:

 

Slide19

Comparison with Triest-FD

ThinkD (Think before You Discard):every arrived change is used to update  

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

19/46

store

/

delete

test

arrive

count:

update

 

Yes

No

(discard)

store

/

delete

arrive

Yes

No

(discard)

information loss!

 

count:

update

 

test

Triest

-FD

[DERU17]:

some

changes are discarded without being used to update

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide20

Two Versions of ThinkDThinkD

-FAST: simple and fastusing independent Bernoulli trialsThinkD-ACC: accurate and parameter-freeusing random pairing [GLH08]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)20/46

Q1: How to

test

in the test step

Q2: How to

estimate

the count of all triangles from

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide21

ThinkD-FAST: Details

: number of (non-deleted) triangles observed so farToy example: Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

21/46

- time

:

- count

:

- memory

:

 

 

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide22

ThinkD-FAST: Arrive StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

22/46

store /delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

A new

change

in the input graph arrives

change

: either

insertion

or

deletion

of an edge

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide23

ThinkD-FAST: Count StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

23/46store

/delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- # added triangles:

- memory :

 

 

 

Count added or deleted triangles and update

 

 

 

 

 

 

 

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide24

ThinkD-FAST: Test StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

24/46store

/delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

Simulate a Bernoulli trial with probability

is an input parameter

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide25

ThinkD-FAST: Store StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

25/46store

/delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

Store the new edge in memory

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide26

ThinkD-FAST: Arrive StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

26/46

store /delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

A new

change

in the input graph arrives

change

: either

insertion

or

deletion

of an edge

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide27

ThinkD

-FAST: Count StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

27/46

store

/

delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- # deleted triangles:

- memory :

 

 

 

Count added or deleted triangles and update

 

 

 

 

 

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Details

Slide28

ThinkD-FAST: Test StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

28/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Analysis

store

/

delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

Test if the arrived edge is in memory

 

 

 

Details

Slide29

ThinkD-FAST: Store StepAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

29/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Analysis

store

/

delete

test

arrive

count

Yes

No

- time

:

- change:

- count :

- memory :

 

 

 

Remove the arrived edge from memory

 

 

 

Details

Slide30

Unbiased Estimation

: count of observed triangles

: estimated count of all triangles

: true count of all triangles [ Theorem 1 ] At any time

,

Proof and a variance of

: see the paper

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

30

/46

 

Unbiased estimate of

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide31

Disadvantages of ThinkD-FAST

Parameter: setting the parameter is non-trivialsmall underutilize memory inaccurate estimationlarge

out-of-memory error

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

31

/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide32

Disadvantages of ThinkD-FAST (cont.)

Information loss: it may discard inserted edges even when memory is not fullAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)32/46

store

/delete

test

arrive

count

Yes

No

- time

:

- change:

- estimate :

 

 

 

 

 

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide33

ThinkD-ACC: Even Better!

Random pairing [RLH08] instead of Bernoulli trialsAdvantages:no parameterless information loss: utilizes memory as fully as possible more accurate estimationstill unbiasedDisadvantages: complicated slower than ThinkD-FAST

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

33/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide34

Scalability of ThinkD

Let be the size of memoryFor processing changes in the input stream,[ Theorem 2 ] ThinkD-ACC takes[ Theorem 3 ] ThinkD-FAST with

takes

 

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

34

/46

 

linear in data size

Introduction

Algorithm

Experiments

Problem

Conclusion

 

Slide35

Advantages of ThinkDAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

35/46Fast and Accurate: outperforming competitorsScalable: linear data scalability (Theorems 2 & 3)Theoretically Sound:

unbiased estimates (Theorem 1)

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide36

Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)36/46

Problem Definition

Proposed Method:

ThinkD

Experiments <<

Conclusions

Slide37

Experimental SettingsAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

37/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Competitors

:

Triest

-FD

[DERU17] &

ESD

[HS17]

state-of-the-art algorithms for

triangle counting in fully-dynamic graph streams

Implementations

:

Datasets:

insertions (edges in graphs) + deletions (random 20%)

Web

(6M+)

Citation

(16M+)

Social Networks

(

1.8B+

edges, …)

ER

Synthetic

(

100B

edges)

Trust

(0.7M+)

Slide38

EXP1. Bias Analysis [THM1]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

38/46

Introduction

Algorithm

Experiments

Problem

Conclusion

“Does

ThinkD

give

unbiased

estimates

?”

Triest

-FD

ThinkD

-ACC

ThinkD

-FAST

True Count

- memory budget

of the edges

 

- dataset:

- #repeats: 10,000

Slide39

EXP2. Variance AnalysisAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

39/46

Introduction

Algorithm

Experiments

Problem

Conclusion

“Does

ThinkD

maintain

estimates

with small variance

?

- memory budget

of the edges

 

- dataset:

- #repeats: 10,000

Triest

-FD

ThinkD

-ACC

ThinkD

-FAST

True Count

Number of Processed Changes

Slide40

EXP3. Scalability [THM 2 & 3]Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

40/46

Introduction

Algorithm

Experiments

Problem

Conclusion

“Does

ThinkD

scale linearly

with

the size of the input stream?”

(THM 2 & 3)

Linear scalability

(slope=1)

ThinkD

-ACC

ThinkD

-FAST

Number of Changes

- dataset:

ER

- memory budget

fixed

to

(i.e.,

= size /

)

 

Slide41

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

41/46Fast and Accurate: outperforming competitorsScalable: linear data scalability (Theorems 2 & 3)Theoretically Sound: unbiased estimates (Theorem 1)

Advantages of

ThinkD

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide42

EXP4. Space & Accuracy

Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)42/46

Introduction

Algorithm

Experiments

Problem

Conclusion

“Is

ThinkD

more accurate

than its best competitors?”

- dataset:

- # repeats: 1000

ThinkD

-FAST

ThinkD

-ACC

Memory budget (ratio)

Estimation Error (ratio)

Triest

-FD

ESD

Slide43

EXP5. Speed & AccuracyAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

43/46

Introduction

Algorithm

Experiments

Problem

Conclusion

“Is

ThinkD

faster

than its best competitors?”

Running time (Sec)

Estimation Error (ratio)

- dataset:

- # repeats: 1000

ThinkD

-FAST

ThinkD

-ACC

ESD

Triest

-FD

Slide44

Advantages of ThinkDAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

44/46Fast and Accurate: outperforming competitorsScalable: linear data scalability Theoretically Sound:

unbiased estimates

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide45

Road MapAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)45/46

Problem Definition

Proposed Method:

ThinkD

Experiments

Conclusions <<

Slide46

ConclusionsWe propose

ThinkD accurately estimates the count of trianglesin large and fully-dynamic graphsAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)46/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Fast and Accurate:

outperforming competitors

Scalable:

linear data scalability

Theoretically Sound:

unbiased estimates

ThinkD

Download

Slide47

References[RLH08] Rainer Gemulla

et al., “Maintaining bounded-size sample synopses of evolving datasets.” The VLDB Journal 2008[KMF11] U Kang et al., “Spectral analysis for billion-scale graphs: Discoveries and implementation.” PAKDD 2011[DERU17] Lorenzo De Stefani et al., “TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size.” TKDD 2017[Shi17] Kijung Shin, “WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017 [HS17] Han, Guyue, and Harish Sethu, "Edge sample and discard: A new algorithm for counting triangles in large dynamic graphs." ASONAM 2017[LJK18] Yongsub Lim et al., “Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs”, TKDD 2018Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

47/46

Introduction

Algorithm

Experiments

Problem

Conclusion

Slide48

Backup SlidesAccurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)

48/46

Slide49

Proof of Unbiasedness

Proof Sketch:Accurate Triangle Counting in Graph Streams with Deletions (by Kijung Shin)49/46

 

 

Prob. that each

added

triangle is observed

 

Prob. that each

removed

triangle is observed

 

Thus,

 

Introduction

Algorithm

Experiments

Problem

Conclusion