/
Measuring Measuring

Measuring - PowerPoint Presentation

test
test . @test
Follow
393 views
Uploaded On 2016-07-26

Measuring - PPT Presentation

and Analyzing Networks Scott Kirkpatrick Hebrew University of Jerusalem April 12 2011 Sources of data Communications networks Web links urls contained within surface pages Internet Physical network ID: 420903

shell core data nucleus core shell nucleus data networks links work leisure sites full city cores internet model graph edges structure crusts

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Measuring" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Measuring and Analyzing Networks

Scott Kirkpatrick

Hebrew University of Jerusalem

April 12, 2011Slide2

Sources of data

Communications networks

Web links –

urls

contained within surface pages

Internet Physical network

Telephone CDR’s

Social networks

Links through common activity

Movie actors, scientists publishing together

Opt-in networking in

Facebook

et al.Slide3

Properties to be considered

“3 degrees of separation” and small world effects.

Robustness/fragility of communications

Percolation under various modeled attacks

Spread of information

, disease, etc…Slide4

Aggregates and Attributes

Degree distribution,

betweenness

distribution

Two-point distributions

Degree-degree

assortative

” or “

disassortative

Cluster coefficient and triangle counting

Is the friend of my friend also my friend?

Variations on

betweenness

(not in the literature, but an attractive option)

Mark Newman’s SIAM Review paper – a great reference

but dated. Slide5

K-Cores, Shells, Crusts and all that…

K-core almost as fundamental a graph property as the “giant component”:

Bollobas

(1984) defined K-core: maximal

subgraph

in which all nodes have K or more edges. Corollaries – it’s unique, it is

w.h.probability

K-connected, when it exists it has size O(N)

Pittel

, Spencer,

Wormald

(1996) showed how to calculate its size and thresholdSlide6

K-Cores, Shells, Crusts and all that…

K-shell: All sites in the K-core but not in the (K+1)-core.

Nucleus: the non-vanishing core with largest K

K-crust: Union of shells 1,…(K-1), or all sites outside of the K-core.

A natural application is analysis of networks

Replaces some ambiguous definitions with uniquely specified objects.Slide7

Faloutsos’ Jellyfish (Internet model)

Define the core in some way (“Tier 0”)

Layers breadth first around the core are the “mantle” and the edge sites are the tendrilsSlide8

K-cores of Barabasi-like random network

L,M model gives non-trivial K-shell structure.

(

Shalit

, Solomon, SK, 2000)

At each step in the construction, a new node makes L links to existing nodes, with probability proportional to their #

ngbrs

.

Then we add M links between existing nodes, also with preferential attachment.

Results for L=1, M = 1,2,4,8 (next slide) give lovely power laws. (Rome conference on complex systems, 2000)

Nucleus is just the endpoint.Slide9

Results: L,M models’ K-coresSlide10

Next apply to the real Internet

DIMES data used at AS level

(

Shir

,

Shavitt

, SK, Carmi,

Havlin

, Li)

2004 to present day with relatively consistent experimental methodology

K-shell plots show power laws with two surprises

The nucleus is striking and different from the mantle of this “Medusa”

Percolation analysis determines the tendrils as a subset connected only to the nucleusSlide11

Does degree of site relate to k-shell? Slide12

Distances and Diameters in coresSlide13

K-crusts show percolation threshold

Data from 01.04.2005

These are the hanging

tentacles of our (Red Sea)

Jellyfish

For subsequent analysis, we distinguish three components:

Core, Connected, Isolated

Largest cluster in each shellSlide14

Meduza (

מדוזה

) model

This picture has been stable from January 2005 (kmax = 30) to present day, with little change in the nucleus composition. The precise definition of the tendrils: those sites and clusters isolated from the largest cluster in all the crusts – they connect only through the core.Slide15

Willinger’s Objection to all this

Established network practitioners do not always welcome physicists’ model-making

They require first that real characteristics be incorporated

Finite connectivity at each router box

Length restrictions for connections

Include likely business relationships

Only then let the modeling begin…

But ASs are objects with a fractal distribution

From ISPs that support a neighborhood to global

telcos

and GoogleSlide16

How does the city data differ from the AS-graph information?

DIMES used commercial (error-filled) databases

Results available on website

Cities are local,

ASes

may be highly extended (ATT, Level 3, Global Xing, Google)

About 4000 cities identified, cf. 25,000

ASes

Number of city-city edges about 2x AS edges

But similar features are seen

Wide spread of small-k shells

Distinct nucleus with high path redundancy

Many central sites participate with nucleus

A less strong Medusa structureSlide17

K-shell size distributionSlide18

City KCrusts show percolation, with smaller jump at nucleusSlide19

City locations permit mapping the physical internetSlide20

Are Social Networks Like Communications Networks?

Visual evidence that communications nets are more globally organized:

Indiana

Univ

(

Vespigniani

group) visualization tool

AS graph, ca 2006

Movie actors’ collaborationsSlide21

Diurnal variation suggests separating work from leisure periodsSlide22

Telephone call graphs (“CDRs”)Offer an Intermediate Case

Full graph

Reciprocated

Reciprocated,

> 4 calls

Metro area

PnLa

only

7 B calls, over 28 days, Aug 2005

Cebrian

,

Pentland

,

SKSlide23

Data sets available

Raw CDR’s NOT AVAILABLE—SECRET!!

Hadoop

used to collect full data sets,

total #calls.

aggregated for each link, with forward and reverse, w

ork and leisure separated.

Analysis done for all links

Then for reciprocated links

Finally for major cities or

metro areas.Slide24

How do work and leisure differ?Slide25

Diffusion of information from the edges

Faster in work than in leisure networksSlide26

K-shell structure, full set, work periodSlide27

Work characteristics persist on smaller scalesSlide28

K-shell structure, full data set, LeisureSlide29

Mysteries (Work period, full, R1)Slide30

Mysteries, ctd.