and Analyzing Networks Scott Kirkpatrick Hebrew University of Jerusalem April 12 2011 Sources of data Communications networks Web links urls contained within surface pages Internet Physical network ID: 420903
Download Presentation The PPT/PDF document "Measuring" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Measuring and Analyzing Networks
Scott Kirkpatrick
Hebrew University of Jerusalem
April 12, 2011Slide2
Sources of data
Communications networks
Web links –
urls
contained within surface pages
Internet Physical network
Telephone CDR’s
Social networks
Links through common activity
Movie actors, scientists publishing together
Opt-in networking in
Facebook
et al.Slide3
Properties to be considered
“3 degrees of separation” and small world effects.
Robustness/fragility of communications
Percolation under various modeled attacks
Spread of information
, disease, etc…Slide4
Aggregates and Attributes
Degree distribution,
betweenness
distribution
Two-point distributions
Degree-degree
“
assortative
” or “
disassortative
”
Cluster coefficient and triangle counting
Is the friend of my friend also my friend?
Variations on
betweenness
(not in the literature, but an attractive option)
Mark Newman’s SIAM Review paper – a great reference
but dated. Slide5
K-Cores, Shells, Crusts and all that…
K-core almost as fundamental a graph property as the “giant component”:
Bollobas
(1984) defined K-core: maximal
subgraph
in which all nodes have K or more edges. Corollaries – it’s unique, it is
w.h.probability
K-connected, when it exists it has size O(N)
Pittel
, Spencer,
Wormald
(1996) showed how to calculate its size and thresholdSlide6
K-Cores, Shells, Crusts and all that…
K-shell: All sites in the K-core but not in the (K+1)-core.
Nucleus: the non-vanishing core with largest K
K-crust: Union of shells 1,…(K-1), or all sites outside of the K-core.
A natural application is analysis of networks
Replaces some ambiguous definitions with uniquely specified objects.Slide7
Faloutsos’ Jellyfish (Internet model)
Define the core in some way (“Tier 0”)
Layers breadth first around the core are the “mantle” and the edge sites are the tendrilsSlide8
K-cores of Barabasi-like random network
L,M model gives non-trivial K-shell structure.
(
Shalit
, Solomon, SK, 2000)
At each step in the construction, a new node makes L links to existing nodes, with probability proportional to their #
ngbrs
.
Then we add M links between existing nodes, also with preferential attachment.
Results for L=1, M = 1,2,4,8 (next slide) give lovely power laws. (Rome conference on complex systems, 2000)
Nucleus is just the endpoint.Slide9
Results: L,M models’ K-coresSlide10
Next apply to the real Internet
DIMES data used at AS level
(
Shir
,
Shavitt
, SK, Carmi,
Havlin
, Li)
2004 to present day with relatively consistent experimental methodology
K-shell plots show power laws with two surprises
The nucleus is striking and different from the mantle of this “Medusa”
Percolation analysis determines the tendrils as a subset connected only to the nucleusSlide11
Does degree of site relate to k-shell? Slide12
Distances and Diameters in coresSlide13
K-crusts show percolation threshold
Data from 01.04.2005
These are the hanging
tentacles of our (Red Sea)
Jellyfish
For subsequent analysis, we distinguish three components:
Core, Connected, Isolated
Largest cluster in each shellSlide14
Meduza (
מדוזה
) model
This picture has been stable from January 2005 (kmax = 30) to present day, with little change in the nucleus composition. The precise definition of the tendrils: those sites and clusters isolated from the largest cluster in all the crusts – they connect only through the core.Slide15
Willinger’s Objection to all this
Established network practitioners do not always welcome physicists’ model-making
They require first that real characteristics be incorporated
Finite connectivity at each router box
Length restrictions for connections
Include likely business relationships
Only then let the modeling begin…
But ASs are objects with a fractal distribution
From ISPs that support a neighborhood to global
telcos
and GoogleSlide16
How does the city data differ from the AS-graph information?
DIMES used commercial (error-filled) databases
Results available on website
Cities are local,
ASes
may be highly extended (ATT, Level 3, Global Xing, Google)
About 4000 cities identified, cf. 25,000
ASes
Number of city-city edges about 2x AS edges
But similar features are seen
Wide spread of small-k shells
Distinct nucleus with high path redundancy
Many central sites participate with nucleus
A less strong Medusa structureSlide17
K-shell size distributionSlide18
City KCrusts show percolation, with smaller jump at nucleusSlide19
City locations permit mapping the physical internetSlide20
Are Social Networks Like Communications Networks?
Visual evidence that communications nets are more globally organized:
Indiana
Univ
(
Vespigniani
group) visualization tool
AS graph, ca 2006
Movie actors’ collaborationsSlide21
Diurnal variation suggests separating work from leisure periodsSlide22
Telephone call graphs (“CDRs”)Offer an Intermediate Case
Full graph
Reciprocated
Reciprocated,
> 4 calls
Metro area
PnLa
only
7 B calls, over 28 days, Aug 2005
Cebrian
,
Pentland
,
SKSlide23
Data sets available
Raw CDR’s NOT AVAILABLE—SECRET!!
Hadoop
used to collect full data sets,
total #calls.
aggregated for each link, with forward and reverse, w
ork and leisure separated.
Analysis done for all links
Then for reciprocated links
Finally for major cities or
metro areas.Slide24
How do work and leisure differ?Slide25
Diffusion of information from the edges
Faster in work than in leisure networksSlide26
K-shell structure, full set, work periodSlide27
Work characteristics persist on smaller scalesSlide28
K-shell structure, full data set, LeisureSlide29
Mysteries (Work period, full, R1)Slide30
Mysteries, ctd.