Jure Leskovec CMU Kevin Lang Anirban Dasgupta and Michael Mahoney Yahoo Research Network communities Communities Sets of nodes with lots of connections inside and few to outside ID: 390649
Download Presentation The PPT/PDF document "Statistical properties of network commun..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Statistical properties of network community structure
Jure Leskovec, CMU
Kevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! ResearchSlide2
Network communities
Communities:Sets of nodes with lots of connections
inside and few to outside (the rest of the network)Assumption:Networks are (hierarchically) composed of
communities (modules)
Communities, clusters, groups, modules
Our question:
Are large networks really like this?
2Slide3
Community score (quality)
How community like is a set of nodes?
Need a natural intuitive measure
Conductance (normalized cut)
Φ(S) = # edges cut / # edges inside
Small
Φ(S
)
corresponds to
more community-like sets of nodes
S
S’
3Slide4
Community score (quality)
Score:
Φ(S
) = # edges cut / # edges inside
What is “best” community of 5 nodes?
4Slide5
Community score (quality)
Score:
Φ(S
) = # edges cut / # edges inside
Bad community
Φ
=5/6 = 0.83
What is “best” community of 5 nodes?
5Slide6
Community score (quality)
Score:
Φ(S
) = # edges cut / # edges inside
Better community
Φ
=5/7 = 0.7
Bad community
Φ
=2/5 = 0.4
What is “best” community of 5 nodes?
6Slide7
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better community
Φ=5/7 = 0.7
Bad community
Φ
=2/5 = 0.4
Best community
Φ
=2/8 = 0.25
What is “best” community of 5 nodes?
7Slide8
Network Community Profile Plot
We define: Network community profile (NCP
) plot Plot the score of best community of size k
Search over all subsets of size k and find best:
Φ
(k=5) = 0.25
NCP plot is intractable to compute
Use a
pproximation
algorithm
8Slide9
NCP plot: Small Social NetworkDolphin social networkTwo communities of dolphins
NCP plot
Network
9Slide10
NCP plot: Zachary’s karate clubZachary’s university karate club social network
During the study club split into 2The split (squares vs. circles) corresponds to cut B
NCP plot
Network
10Slide11
NCP plot: Network Science
Collaborations between scientists in Networks
NCP plot
Network
11Slide12
Geometric and Hierarchical graphs
Hierarchical network
Geometric (grid-like) network
– Small social networks
– Geometric and
– Hierarchical network
have
downward
NCP plot
12Slide13
Our work: Large networks
Previously researchers examined community structure of small
networks (~100 nodes)We examined more than 70 different large social and information networks
Large
real-world networks look very different!
13Slide14
Example of our findings
Typical example:
General relativity collaboration network (4,158 nodes, 13,422 edges)
14Slide15
Community score
Community size
NCP: LiveJournal (N=5M, E=42M)
Better and better communities
Best communities get worse and worse
Best community has
100 nodes
15Slide16
Explanation: Downward part
Whiskers are responsible for downward slope of NCP plot
Whisker
is a set of nodes connected to the network by a
single
edge
NCP plot
Largest whisker
16Slide17
Explanation: Upward partEach new edge inside the community costs
more
NCP plot
Φ
=2/4
=
0.5
Φ
=8/6
=
1.3
Φ
=64/14
=
4.5
Each node has twice as many children
Φ
=1/3
=
0.33
17Slide18
Suggested network structure
Network structure:
Core-periphery (jellyfish
,
octopus)
Whiskers
are responsible for good communities
Denser and denser core of the network
Core
contains 60% node and 80% edges
18Slide19
Caveat: Bag of whiskers
What if we allow cuts that give disconnected communities?
Cut all whiskers
Compose
communities out of
whiskers
How good “community” do we get?
19Slide20
Communities made of whiskers
Community score
Community size
We get better community
scores when composing disconnected sets of whiskers
Connected communities
Bag of whiskers
20Slide21
Comparison to rewire networkTake a real network GRewire edges for a long time
We obtain a random graph with same degree distribution as the real network G
21Slide22
Comparison to a rewired network
22
Rewired network:
random network with same degree distributionSlide23
What is a good model?What is a good model that explains such network structure?
None of the existing models work
Pref. attachment
Small World
Geometric Pref. Attachment
Flat
Down and Flat
Flat and Down
23Slide24
Forest Fire model works
Forest Fire:
connections spread like a fire
New node joins the network
Selects a seed node
Connects to some of its neighbors
Continue recursively
As community
grows
it
blends
into the
core
of the
network
24Slide25
Forest Fire NCP plot
rewired
network
Bag of whiskers
25Slide26
Conclusion and connections
Whiskers:Largest whisker has ~100 nodesIndependent of network sizeDunbar number: a person can maintain social relationship to
at most 150 peopleBond vs. identity communitiesCore:Core has little structure (hard to cut)Still more structure than the random network
26Slide27
Conclusion and connectionsNCP plot
is a way to analyze network community structureOur results agree with previous work on
small networks (that are commonly used for testing community finding algorithms)But large networks are differentLarge
networksWhiskers + Core structureSmall well isolated communities
blend into the core of the networks as they grow
27