/
Estimating Clique Composition and Size Estimating Clique Composition and Size

Estimating Clique Composition and Size - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
405 views
Uploaded On 2017-09-28

Estimating Clique Composition and Size - PPT Presentation

Distributions from Sampled Network Data Minas Gjoka Emily Smith Carter T Butts University of California Irvine Outline Problem statement Estimation methodology Results with reallife graphs ID: 591469

cliques clique order neighbors clique cliques neighbors order degree counting sums sample probability graph distinct nodes count estimation node sampling network sampled

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Estimating Clique Composition and Size" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Estimating Clique Composition and Size

Distributions from Sampled Network Data

Minas

Gjoka

, Emily Smith, Carter T. Butts

University of California, IrvineSlide2

Outline

Problem statementEstimation methodologyResults with real-life graphsSlide3

Cliques

A complete subgraph that contains

i vertices is an order-i

clique

order-

1

order-

2

order-

3

order-

4

order-

5

order-

i

A

maximal clique

is a clique that is

not included in a larger cliqueSlide4

Cliques

A complete subgraph that contains

i vertices is an order-i

clique

c

b

d

a

order-

3

order-

4

A

maximal clique

is a clique that is

not included in a larger clique

b

d

a

c

b

d

c

b

a

c

d

a

4

non-maximal

order-

3

cliquesSlide5

Counting of Cliques

graph G

3

2

4

5

1

8

7

6

C

i

is the count of order-

i

cliques (

maximal

or

non-maximal

)

C

1

C

2

C

3

C

4

order-

1

order-

2

order-

3

order-

4

Clique Distribution of G

C = (C

1

,

C

2

,

C

3

,

C

4

)

= ( 0, 1, 2, 1 )

Goal 1: Estimate

C

i

(for all

i

) in graph G

from sampled network dataSlide6

Counting of Cliques Vertex Attributes

graph G

3

2

4

5

1

8

7

6

p =3

Vertex Attribute vector

X

j

j=1..p, p<=N

u =[ 3 0 0 ]

u =[ 2 1 0 ]

u =[ 2 0 1 ]

Clique Composition

Distribution of G

C

u

is the count of order-

u

cliques

Goal 2: Estimate

C

u

(for all

u

) in graph G

from sampled network dataSlide7

What type of cliques can we count?

Maximal cliquesNon-maximal cliques Slide8

Motivation

Counting of Cliquescliques describe local structure (clustering, cohesive subgroups)

algorithmic implications of cliques in engineering contextcliques used as input in network modelsSampled network data

unknown graphs with access limitationsmassive known graphsSlide9

Related Work

Model-based methodsDo not scale

Do not help with countingDesign-based methods

Subgraph (or motif) counting tools that use sampling e.g. MFinder, FANMOD, MODA

No support for subgraphs of size larger than 10

No support for vertex attributesBiased EstimationSlide10

EstimationSlide11

Methodology

Collect an egocentric network sample

H1,..,H

nCollect a probability sample of “n” nodes from the graph:

Vj, X[V

j] j=1..n

uniform independence samplingweighted independence samplinglink-trace sampling

with replacementwithout replacementSlide12

7

4

Methodology

Collect an egocentric network sample

H

1,..,Hn

Collect a probability sample of “n” nodes from the graph:

graph G(V,E)

V

j, X[Vj] j=1..n

3

2

4

5

1

8

7

6

C

3

n

=2Slide13

Methodology

Collect an egocentric network sample

H1,..,H

nCollect a probability sample of “n” nodes from the graph:

Fetch the egonet of each sampled node:

Vj, X[Vj]

j=1..n

G[Vj]j=1..n

graph G(V,E)

3

2

4

5

1

8

7

6

C

3

n

=2

8

6

7

3

2

5

4Slide14

Methodology

Collect an egocentric network sample

H1,..,H

n Collect a probability sample of “n” nodes from the graph

Fetch the egonet of each sampled node

Calculate the clique count C

i (or C

u) in each egonet

Hj

Vj

, X[Vj]G[

Vj]

j=1..ngraph G(V,E)

3

2

4

5

1

8

7

6

C

3

n

=2

8

6

7

3

2

5

4Slide15

Methodology

Collect an egocentric network sample

H1,..,H

nCollect a probability sample of “n” nodes from the graph

Fetch the egonet of each sampled node

Calculate the clique count Ci

(or Cu

) in each egonet H

jcan use existing exact clique counting algorithmsclique type is determined by counting algorithm.

Vj, X[Vj]

G[Vj]

j=1..ngraph G(V,E)

3

2

4

5

1

8

7

6

C

3

1

0

n

=2

8

6

7

3

2

5

4Slide16

Methodology

Collect an egocentric network sample

H1,..,H

nCollect a probability sample of “n” nodes from the graph

Fetch the egonet of each sampled node

Calculate the clique count Ci

(or Cu

) in each egonet H

jApply estimation method that combines calculations

Clique Degree Sums (CDS)Distinct Clique Counting (CC)

Vj

, X[Vj]G[Vj] j=1..n

1

0

n=2

graph G(V,E)

3

2

4

5

1

8

7

6

8

6

7

3

2

5

4

C

3Slide17

Methodology

Collect an egocentric network sample

H1,..,H

nCollect a probability sample of “n” nodes from the graph

Fetch the egonet of each sampled node

Calculate the clique count Ci

(or Cu

) in each egonet H

jApply estimation method that combines calculations

Clique Degree Sums (CDS)labeling of neighbors not required, more space efficient

Distinct Clique Counting (CC)higher accuracy

Vj, X[Vj]G[Vj]

j=1..n

1

0

n=2

graph G(V,E)

3

2

4

5

1

8

7

6

8

6

7

3

2

5

4

C

3Slide18

Labeling of neighbors

g

raph G

8

7

9

6

5

3

4

1

C

3

2Slide19

9

9

6

5

Labeling of neighbors

g

raph G

8

7

9

6

5

3

4

1

n=2

C

3

2

V

j

, X[

V

j

], G[

V

j

]

8

7

3

4

1

2

6

5Slide20

9

Labeling of neighbors

Distinct Clique Counting (CC)

labeled neighbors

g

raph G

8

7

9

6

5

3

4

1

n=2

Labeled Neighbors

C

3

9

6

5

4

8

7

9

6

5

2

9

6

5

5

4

3

6

5

5

5

4

3

Calculate count C

3

Slide21

5

9

Labeling of neighbors

Distinct Clique Counting (CC)

labeled neighbors

Clique Degree Sums (CDS)

unlabeled neighbors

g

raph G

8

7

9

6

5

3

4

1

n=2

Unlabeled Neighbors

Calculate count C

3

C

3

6

5

4

8

7

9

6

5

2

9

6

5

5

3

4

9

5

4

3

5

5

Labeled Neighbors

Calculate count C

3

Slide22

Order-i

Clique Degree dij contains

the number of i-cliques that node j belongs

Clique Degree Sums unlabeled neighborsSlide23

Order-i

Clique Degree dij contains

the number of i-cliques that node j belongs

d38

Clique Degree Sums unlabeled neighbors

C

3

2

3

1

8

g

raph G (V,E)

8

6

7

5

4

= 2

H

8Slide24

Clique Degree Sums

unlabeled neighbors

All nodes

Number of

i

-cliques that node j belongs

D

i

is the

Order-

i

Clique Degree SumSlide25

d

38

Clique Degree Sums

unlabeled neighbors

C

3

2

3

1

8

g

raph G (V,E)

All nodes

Number of

i

-cliques that node j belongs

8

6

7

5

4

D

3

= d

31

+ d

32

+ d

33

+ d

34

+ d

35

+d

36

+ d

37

+ d

38

D

3

= 1 + 1 + 0 + 1 + 2 + 1 + 1 + 2

D

3

= 9

D

3

= 3

C

3

D

i

is the

Order-

i

Clique Degree SumSlide26

is a design-unbiased Horvitz-Thompson estimator (

)

Clique Degree Sums

unlabeled neighbors

All nodes

Number of

i

-cliques that node j belongs

Sampled nodes

Node j inclusion probabilitySlide27

Clique Degree Sums

unlabeled neighbors

All nodes

Sampled nodes

Node j inclusion probability

Number of

i

-cliques that node j belongs

Number of u-cliques that node j belongs

is a design-unbiased Horvitz-Thompson estimator

(

)Slide28

Clique Degree Sums

Estimator Variance

We can use Horvitz-Thompson theory to derive unbiased estimators of the variance of

and

Node inclusion probability

Joint node

inclusion probabilitySlide29

Clique Degree Sums

Estimator Variance

We can use Horvitz-Thompson theory to derive unbiased estimators of the variance of

and

Uniform Independence Sampling

Weighted Independence Sampling

Link-trace Sampling

Without replacement

With replacementSlide30

Clique Degree Sums

Estimator Variance

We can use Horvitz-Thompson theory to derive unbiased estimators of the variance of

and

Uniform Independence Sampling

Without replacement

Joint node

inclusion probability

Node inclusion probability

All nodes

Sampled nodesSlide31

Distinct Clique Counting

labeled neighbors

i

-clique inclusion probability

is a design-unbiased Horvitz-Thompson estimator

(

)

)

number of distinct

i

-cliquesin H1, ..,

Hn

Uniform Independence Sampling

Weighted Independence Sampling Link-trace Sampling

With replacement

Without replacementSlide32

Distinct Clique Counting

labeled neighbors

i

-clique inclusion probability

number of distinct

i

-cliques

in H

1

, .., Hn

Uniform Independence Sampling

With replacement

is a design-unbiased Horvitz-Thompson estimator

( )

)Slide33

Distinct Clique Counting

labeled neighbors

b

c

a

2

3

1

8

6

7

5

4

g

raph G

N

=8

n=4 UIS with replacement

C

3Slide34

Distinct Clique Counting

labeled neighbors

b

c

a

2

3

1

8

6

7

5

4

g

raph G

8

6

7

8

6

7

2

1

5

n=4 UIS with replacement

N

=8

8

6

7

2

1

5

2

1

5

C

3

Observed

order-3 cliques

Distinct

order-3 cliquesSlide35

Computational complexity

Space complexity to count Ci

or CuO(1) for Clique Degree Sums Method O(c

i) or O(cu) for Distinct Clique Counting MethodTime complexity

from O(3N/3) to O(

n*3D/3) where N

is the graph size, D is the maximum degree, and n

is the sample sizefrom O(n*3D/3

) to O(3D/3) via parallel computations per

egonet Slide36

Benefits of our methodology

Full knowledge of graph not requiredFast estimation for massive known graphs

Estimation or exact computation easily parallelizable for massive known graphsEstimation with or without neighbor labelsSupports vertex attributes Supports a variety of sampling designsSlide37

ResultsSlide38

Simulation ResultsSlide39

Simulation ResultsFacebook

New Orleans

Egonet

sample size n=1,000Uniform independence sampling, without replacement1000 simulations

Clique Degree Sums

Distinct Clique CountingSlide40

Simulation Results

Error metric Normalized Mean Absolute Error :

1000 simulations

Distinct Clique Counting

Clique Degree SumsSlide41

Simulation Results

Distinct Clique Counting

Clique Degree SumsSlide42

Which estimation method to use?

Heuristic

Average Edge Count =

All edges between egos and neighbors

Unique edges between egos and neighbors

2

3

1

8

6

7

5

4

g

raph G

8

6

7

8

6

7

2

1

5

n=3

N

=8

8

6

7

2

1

5

Average Edge Count =

9

6

b

c

a

= 1.5Slide43

Which estimation method to use?Heuristic

Average Edge Count

Clique Degree Sums Error

Distinct Clique Counting ErrorSlide44

Estimation ResultsFacebook

‘09

Facebook ‘09 crawled dataset[1]36,628 unique egonets

[1] M.

Gjoka

, M. Kurant, C. T. Butts and A. Markopoulou

, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, IEEE INFOCOM 2010.Slide45

Estimation Resultsvertex attributes,

Facebook ‘09

Complemented dataset with gender attributesabout 6 million usersSlide46

References

[1] M. Gjoka, E.

Smith

, C. T. Butts, “

Estimating Clique Composition and Size Distributions from Sampled Network Data”, IEEE

NetSciCom '14

.[2

] Facebook datasets:

http://odysseas.calit2.uci.edu/research/osn.html

[3] Python code for Clique Estimators: http://tinyurl.com/clique-estimators

Thank you!

Unbiased estimation methods of clique distributionsClique Degree Sums

Distinct Clique CountingFacebook cliquesFuture worksupport estimation of any subgraphs (beyond cliques)