/
WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
374 views
Uploaded On 2018-03-17

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams - PPT Presentation

Kijung Shin Triangles in a Graph Graphs are everywhere Social Network Web Emails etc Triangles are a fundamental primitive 3 nodes connected to each other Counting triangles has many applications ID: 655024

sampling room wrs waiting room sampling waiting wrs algorithm kijung shin pattern experiments conclusion introduction triangle edge edges triangles

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "WRS: Waiting Room Sampling for Accurate ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams

Kijung ShinSlide2

Triangles in a Graph

Graphs are everywhere!Social Network, Web, Emails, etc.Triangles are a fundamental primitive3 nodes connected to each otherCounting triangles

has many applicationsCommunity detection, Spam detection, Query optimizationWRS: Waiting Room Sampling (by Kijung Shin)

2/30

Introduction

Experiments

Conclusion

Algorithm

PatternSlide3

Challenges in Real Graphs

Many algorithms for small and fixed graphsHowever, real-world graphs areLarge: may not fit in main memoryGrowing: new nodes and edges are added

Need to consider realistic settingsWRS: Waiting Room Sampling (by Kijung Shin)

3/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

time

 

time

+2

 

time

+6

 Slide4

Streaming Model

Stream of edges

edges are streamed one by one from sourcesAny or adversarial orderedges can be in any order in the streamLimited memorynot every edge can be stored in memory

WRS: Waiting Room Sampling (by Kijung Shin)

4

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

too strong

Source

DestinationSlide5

Relaxed

Streaming Model

Stream of edges edges are streamed one by one from sourcesChronological ordernatural for dynamic graphsedges are streamed when they are created

Limited memorynot every edge can be stored in memoryWRS: Waiting Room Sampling (by Kijung Shin)

5

/30

“What

temporal patterns

do exist?”

“How can we

exploit them for accurate triangle counting?”

Introduction

Experiments

Conclusion

Algorithm

PatternSlide6

Roadmap

IntroductionTemporal Pattern <<Proposed AlgorithmExperiments

ConclusionWRS: Waiting Room Sampling (by Kijung Shin)6/30Slide7

Time Interval of a Triangle

Time interval of a triangle:WRS: Waiting Room Sampling (by Kijung Shin)

7/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

arrival order

of its

last

edge

arrival order

of its

first

edge

arrival order

1 2 3 4 5 6 7 8

Time interval Slide8

Time Interval Distribution

Temporal Locality:average time interval is 2X shorter in the chronological orderthan in a random order

WRS: Waiting Room Sampling (by Kijung Shin)

8/30

random

arrival order

chronological

arrival order

random order

chronological

orderSlide9

Temporal Locality (cont.)

One interpretation:edges are more likely to form triangles with edges close in timethan with edges far in timeAnother interpretation: new edges

are more likely to form triangles with recent edges than with old edgesWRS: Waiting Room Sampling (by Kijung Shin)

9/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

“How can we exploit

temporal locality

for accurate

triangle counting

?”

chronological

order

random

orderSlide10

Roadmap

IntroductionTemporal PatternProposed Algorithm <<Experiments

ConclusionWRS: Waiting Room Sampling (by Kijung Shin)10/30Slide11

Problem Definition

Given: a graph stream in the chronological ordermemory budget:

(i.e., up to edges can be stored)Estimate: counts of global and local trianglesTo Minimize: estimation error

 

WRS: Waiting Room Sampling (by Kijung Shin)

11

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

Global triangles

: all triangles in the graph

Local triangles

: the triangles incident to each node

3

2

1

2

3

4

1

3

2

1Slide12

Algorithm Overview

General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts

: probability that triangle

is discovered

 

WRS: Waiting Room Sampling (by Kijung Shin)

12

/30

Introduction

Experiments

Conclusion

Algorithm

PatternSlide13

Algorithm Overview

General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts

: probability that triangle

is discovered

 

WRS: Waiting Room Sampling (by Kijung Shin)

13

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

memory

new edge

(1) Edge ArrivalSlide14

Algorithm Overview

General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts

: probability that triangle

is discovered

 

WRS: Waiting Room Sampling (by Kijung Shin)

14

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

 

 

 

 

 

 

discover!

memory

(1) Edge Arrival

(2) Counting Step

 

new edgeSlide15

Algorithm Overview

General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts

: probability that triangle

is discovered

 

WRS: Waiting Room Sampling (by Kijung Shin)

15

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

 

 

discover!

(2) Counting Step

 

 

 

 

memory

(1) Edge Arrival

 

new edgeSlide16

Algorithm Overview

General approach used in [LK15] [DERU16]∆: our estimate of global triangle counts

: probability that triangle

is discovered

 

WRS: Waiting Room Sampling (by Kijung Shin)

16

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

 

 

 

 

 

 

(2) Counting Step

 

 

 

 

memory

(1) Edge Arrival

(3) Sampling Step

(to be explained)

 

new edgeSlide17

Bias and Variance Analyses

Bias and variance analyses results Theorem. Unbiasedness of our estimate

: Theorem. Variance of our estimate:

should increase discovering probability

which depends

on

sampling algorithms

 

WRS: Waiting Room Sampling (by Kijung Shin)

17

/30

True count

 

 

Introduction

Experiments

Conclusion

Algorithm

PatternSlide18

Increasing Discovering Prob.

Recall Temporal Locality:new edges are more likely to form triangles with recent edgesthan with old edgesWaiting-Room Sampling (WRS)

exploits temporal localityby treating recent edges better than old edgesWRS: Waiting Room Sampling (by Kijung Shin)

18/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

“How can we

increase

discovering probabilities

of triangles?”Slide19

Waiting-Room Sampling (WRS)

Divides memory space into two partsWaiting Room: stores latest edges Reservoir: store samples from the remaining edges

WRS: Waiting Room Sampling (by Kijung Shin)

19/30

 

Waiting Room (FIFO)

Reservoir

(

Random Replace)

% of budget

 

of budget

 

 

New edge

Introduction

Experiments

Conclusion

Algorithm

Pattern

 Slide20

WRS: Sampling Steps (Step 1)

WRS: Waiting Room Sampling (by Kijung Shin)20

/30

 

Popped e

dge

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

Waiting Room

(FIFO)

Reservoir

(

Random Replace)

 

New edge

 

 

 

Waiting Room

(FIFO)

Reservoir

(

Random Replace)Slide21

WRS: Sampling Steps (Step 2)

WRS: Waiting Room Sampling (by Kijung Shin)21/30

Popped

edge

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

Waiting Room

(FIFO)

replace!

store

discard

or

or

Reservoir

(

Random Replace)Slide22

Summary of Algorithm

WRS: Waiting Room Sampling (by Kijung Shin)22/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

 

 

 

 

 

memory

new edge

(1) Arrival Step

 

 

 

 

 

 

 

discover!

(2) Discovery Step

 

 

 

 

(3) Sampling Step

Waiting-Room

Sampling!Slide23

Roadmap

IntroductionTemporal PatternProposed AlgorithmExperiments <<

ConclusionWRS: Waiting Room Sampling (by Kijung Shin)23/30Slide24

Experimental Settings

Competitors: Mascot [YK15], Triest-IMPR [DERU16]Datasets:

Memory budget : 10% of the edgesSize of waiting-room: 10% of memory budget

 

WRS: Waiting Room Sampling (by Kijung Shin)

24

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

citation

email

friendshipSlide25

Distribution of Estimates

WRS: Waiting Room Sampling (by Kijung Shin)

25/30

Waiting Room Sampling (WRS) givesunbiased estimateswith smallest variance

Introduction

Experiments

Conclusion

Algorithm

Pattern

WRS

Triest

-IMPR

MASCOT

True CountSlide26

Discovering Probability

WRS increases discovering probability

WRS discovers up to

more triangles

 

WRS: Waiting Room Sampling (by Kijung Shin)

26

/30

Introduction

Experiments

Conclusion

Algorithm

Pattern

WRS

Triest

-IMPR

MASCOTSlide27

Estimation Errors

WRS is most accurate WRS reduces estimation error up to

 

WRS: Waiting Room Sampling (by Kijung Shin)

27

/30

Introduction

Experiments

Conclusion

Algorithm

PatternSlide28

Roadmap

IntroductionTemporal PatternProposed AlgorithmExperiments

Conclusion <<WRS: Waiting Room Sampling (by Kijung Shin)28/30Slide29

Contributions

Pattern: Temporal Localityshort time interval of triangles in real graph streams Algorithm: Waiting-Room Sampling (WRS)exploits temporal locality for accurate triangle counting Analyses: Bias and Variance AnalysesWRS gives unbiased estimates with small variances

WRS: Waiting Room Sampling (by Kijung Shin)

29/30

0.37

0

Introduction

Experiments

Conclusion

Algorithm

PatternSlide30

Thank you!

Code and datasets:https://github.com/kijungs/waiting_roomReferences:

[LK15] Yongsub Lim, and U Kang. "Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams.“ KDD 2015[DERU16] Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato and Eli Upfal

. "TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size." KDD 2016WRS: Waiting Room Sampling (by Kijung Shin)

30

/30