/
Network Formation Processes: Network Formation Processes:

Network Formation Processes: - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
342 views
Uploaded On 2020-01-26

Network Formation Processes: - PPT Presentation

Network Formation Processes Powerlaw degree distributions and Preferential Attachment CS224W Social and Information Network Analysis Jure Leskovec Stanford University httpcs224wstanfordedu ID: 773918

cs224w stanford social network stanford cs224w network social leskovec information http analysis node 2011 jure degree power law nodes

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Network Formation Processes:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Network Formation Processes:Power-law degree distributions and Preferential Attachment CS224W: Social and Information Network AnalysisJure Leskovec, Stanford Universityhttp://cs224w.stanford.edu

Network Formation Processes11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu2 What do we observe that needs explaining Small-world model? Diameter Clustering coefficient Preferential Attachment: Node degree distribution What fraction of all nodes have degree k (as a function of k)?Prediction from simple random graph models: exponential function of –kObservation: Power-law:  

Degree Distributions11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu3 Expected based on G np Found in data  

Node Degrees in Networks Take a network, plot a histogram of P(k) vs. k 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4 Flickr social network n= 584,207, m=3,555,115 [ Leskovec et al. KDD ‘08]Plot: fraction of nodes with degree k:   Probability: P(k) = P(X=k)

Node Degrees in Networks Plot the same data on log-log axis:11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5 Flickr social network n= 584,207, m=3,555,115 [ Leskovec et al. KDD ‘08] How to distinguish: vs. ? Take logarithms : if then If then So, on log-log axis power-law looks like a straight line of slope   Slope =   Probability: P(k) = P(X=k)  

Node Degrees: Faloutsos3 Internet Autonomous Systems[Faloutsos, Faloutsos and Faloutsos , 1999] 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6 Internet domain topology

Node Degrees: WebThe World Wide Web [ Broder et al., 2000]11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

Node Degrees: Barabasi&Albert Other Networks [Barabasi-Albert, 1999]11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8 Power-grid Web graph Actor collaborations

Exponential vs. Power-Law Above a certain x value, the power law is always higher than the exponential. 11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 20 40 60 80 100 0.2 0.6 1 x f(x)

Exponential vs. Power-Law Power-law vs. exponential on log-log and log-lin scales 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10 [ Clauset - Shalizi -Newman 2007] semi-log 10 10 0 10 1 10 2 10 3 -4 10 -3 10 -2 10 -1 10 0 log-log x … linear y … logarithmic x … logarithmic y … logarithmic

Exponential vs. Power-Law11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu11

Power-Law Degree Exponents Power-law degree exponent is typically 2 <  < 3Web graph: in = 2.1,  out = 2.4 [ Broder et al. 00] Autonomous systems: = 2.4 [Faloutsos3, 99]Actor-collaborations:  = 2.3 [Barabasi-Albert 00]Citations to papers:  3 [Redner 98]Online social networks:  2 [Leskovec et al. 07]11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

Scale-Free Networks Definition:Networks with a power law tail in their degree distribution are called “scale-free networks” Where does the name come from ? Scale invariance : there is no characteristic scaleScale-free function: Power-law function:   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

Power-Laws are EverywhereIn social systems – lots of power-laws: Pareto, 1897 – Wealth distributionLotka 1926 – Scientific outputYule 1920s – Biological taxa and subtaxa Zipf 1940s – Word frequency Simon 1950s – City populations11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu14Skip!

Power-Laws are Everywhere11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu15 Many other quantities follow heavy-tailed distributions [ Clauset - Shalizi -Newman 2007]

Anatomy of the Long Tail11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu16 [Chris Anderson, Wired, 2004] Skip!

Not Everyone Likes Power-Laws  11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17 CMU grad-students at the G20 meeting in Pittsburgh in Sept 2009

Mathematics of Power-Laws

Heavy Tailed Distributions Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if: Note: Normal PDF: Exponential PDF: then are not heavy tailed!   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19 Skip!

Heavy TailsVarious names, kinds and forms: Long tail, Heavy tail, Zipf’s law, Pareto’s lawHeavy tailed distributions:P(x) is proportional to: 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 [ Clauset - Shalizi -Newman 2007]   Skip!

Mathematics of Power-laws What is the normalizing constant? P(x) = z x -  z=? is a distribution: Continuous approximation   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21 [ Clauset - Shalizi -Newman 2007]   P(x) diverges as x 0 so x m is the minimum value of the power-law distribution x  [ x m , ∞ ] x m

Mathematics of Power-laws What’s the expectation of a power-law random variable x?   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22 [ Clauset - Shalizi -Newman 2007] Need:  > 2

Mathematics of Power-LawsPower-laws: Infinite moments! If α ≤ 2 : E [x]= ∞ If α ≤ 3 : Var[x]=∞Average is meaningless, as the variance is too high!Sample average of n samples from a power-law with exponent α:11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu23   In real networks 2 <  < 3 so: E[x] = const Var [x]= ∞ ADD: How to sample from a power law?

Estimating Power-Law Exponent  Estimating  from data: Fit a line on log-log axis using least squares method: Solve   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24 BAD!

Estimating Power-Law Exponent  Estimating  from data:Plot Complementary CDF Then where is the slope of . If then   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25 OK

Estimating Power-Law Exponent  Estimating  from data: Use MLE: Want to find that max : Set   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26 Power-law density:   OK

Flickr : Fitting Degree Exponent11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27 Linear scale Log scale, α =1.75 CCDF, Log scale, α =1.75 CCDF, Log scale, α =1.75, exp. cutoff Show examples of the fitting – not the nice line but the fitted line

Maximum Degree What is the expected maximum degree K in a scale-free network?The expected number of nodes with degree > K should be less than 1:   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28 Power-law density:   Skip!

Maximum Degree: Consequence Why don’t we see networks with exponents in the range of ? In order to reliably estimate , we need 2-3 orders of magnitude of K. That is, E.g., to measure an degree exponent ,we need to maximum degree of the order of:   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29 Skip!

Why are Power-Laws Surprising Can not arise from sums of independent eventsRecall: in each pair of nodes in connected independently with prob. … degree of node , … event that w links to v Now, what is ? Central limit theorem ! : rnd. vars with mean  , variance  2 : , ,   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30

Random vs. Scale-free network Random network Scale-free (power-law) network ( Erdos-Renyi random graph) Degree distribution is Binomial Degree distribution is Power-law Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1-3111/3/2011 Skip!

Model: Preferential Attachment

Model: Preferential attachment Preferential attachment [Price ‘65, Albert- Barabasi ’99, Mitzenmacher ‘03] Nodes arrive in order 1,2,…,n At step j , let d i be the degree of node i < j A new node j arrives and creates m out-links Prob. of j linking to a previous node i is proportional to degree d i of node i 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33

Rich Get Richer New nodes are more likely to link to nodes that already have high degreeHerbert Simon’s result:Power-laws arise from “Rich get richer” (cumulative advantage) Examples [Price 65] : Citations: New citations to a paper are proportional to the number it already has 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34

The Exact Model We will analyze the following model:Nodes arrive in order 1,2,3,…,nWhen node j is created it makes a single link to an earlier node i chosen: 1) With prob. p, j links to i chosen uniformly at random (from among all earlier nodes)2) With prob. 1-p, node j chooses node i uniformly at random and links to a node i points to.Note this is same as saying: With prob. 1-p, node j links to node u with prob. proportional to du (the degree of u ) 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35 [ Mitzenmacher , ‘03] Node j Make everything as a directed graph. We only care about in-degree as every node has out- degre 1.

The Model Givens Power-LawsClaim: The described model generates networks where the fraction of nodes with degree k scales as:11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36 where q=1-p

Continuous Approximation Consider deterministic and continuous approximation to the degree of node i as a function of time t t is the number of nodes that have arrived so farDegree d i (t) of node i ( i=1,2,…,n ) is a continuous quantity and it grows deterministically as a function of time t Plan: Analyze di(t) – continuous degree of node i at time t  i11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu37

Continuous Degree: What We Know Initial condition: di (t)=0 , when t=i (node i just arrived) Expected change of d i(t) over time:Node i gains an in-link at step t+1 only if a link from a newly created node t+1 points to it.What’s the probability of this event?With prob. p node t+1 links randomly: Links to our node i with prob. 1/tWith prob. 1-p node t+1 links preferentially : L inks to our node i with prob. d i (t)/t So: Prob. node t+1 links to i is:   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38

Continuous Degree At t=5 node i=5 comes and has total degree of 1 to deterministically share with other nodes: How does d i (t) change as t ∞? 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39 i di(t-1)+1di(t)+1 1 1 = 2 3 = 3 1 = 4 1 = i=5 0 1 i d i (t-1 )+1 d i (t )+1 1 1 2 3 3 1 4 1 i=5 0 1 Node i=5

END!!!!!!!!!!!!!!!!!!!!!Too long, cut at the beginning where discussing importance of power-laws and why they are surprising. Cover also how to sample form a power-law distribution – derive why is that the caseThis lecture should finish with the proof of PA giving power-laws.11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40

What is the rate of growth of di?   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 Divide by p+q d i (t) integrate Let A= e c and exponentiate

What is the constant A? What is the constant A?We know: So:   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42  

Degree Distribution What is F(d) the fraction of nodes that has degree at least d at time t ? How many nodes i have degree > t? then: There are t nodes total at time t so F(d):   11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43

Degree DistributionWhat is the fraction of nodes with degree exactly d?Take derivative of F(d):11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44

Preferential attachment: Reflections Two changes from the Gnp The network grows Preferential attachment Do we need both? Yes! Add growth to G np (assume p=1):xj = degree of node j at the end Xj(u)= 1 if u links to j, else 0xj = x j (j+1)+ x j (j+2)+…+ x j (n) E[ x j (u)] = P[u links to j]= 1/(u-1) E[ x j ] =  1/(u-1) = 1/j + 1/(j+1)+…+1/(n-1) = H n-1 – H j E[ x j ] = log(n-1) – log(j) = log((n-1)/j) NOT (n/j)  11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45 H n …n-th harmonic number:

Preferential attachment: Good news Preferential attachment gives power-law degreesIntuitively reasonable processCan tune p to get the observed exponentOn the web, P[node has degree d] ~ d -2.1 2.1 = 1+1/(1-p)  p ~ 0.1 There are also other network formation mechanisms that generate scale-free networks: Random surfer model Forest Fire model 11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu46

PA-like Link Formation Copying mechanism (directed network)select a node and an edge of this nodeattach to the endpoint of this edge Walking on a network (directed network) the new node connects to a node, then to every first , second, … neighbor of this nodeAttaching to edgesselect an edgeattach to both endpoints of this edgeNode duplicationduplicate a node with all its edgesrandomly prune edges of new node11/3/2011Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu47Skip!

Preferential attachment: Bad news Preferential attachment is not so good at predicting network structureAge-degree correlationLinks among high degree nodesOn the web nodes sometime avoid linking to each otherFurther questions:What is a reasonable probabilistic model for how people sample through web-pages and link to them? Short+Random walks Effect of search engines – reaching pages based on number of links to them 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 48 Give a picture of age correlation!

PA: Many Extensions & Variations Preferential attachment is a key ingredientExtensions:Early nodes have advantage: node fitnessGeometric preferential attachmentCopying model:Picking a node proportional to the degree is same as picking an edge at random (pick node and then it’s neighbor) 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49 Skip!

Network resilience (1) We observe how the connectivity (length of the paths) of the network changes as the vertices get removed [Albert et al. 00; Palmer et al. 01]Vertices can be removed: Uniformly at random In order of decreasing degree It is important for epidemiology Removal of vertices corresponds to vaccination 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50

Network Resilience (2) Real-world networks are resilient to random attacksOne has to remove all web-pages of degree > 5 to disconnect the webBut this is a very small percentage of web pagesRandom network has better resilience to targeted attacks 11/3/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 51 Fraction of removed nodes Mean path length Fraction of removed nodes Random removal Preferentialremoval Random network Internet (Autonomous systems)