/
Modularity and Community Structure in Networks* M.E.J  Newman in Modularity and Community Structure in Networks* M.E.J  Newman in

Modularity and Community Structure in Networks* M.E.J Newman in - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
344 views
Uploaded On 2019-11-01

Modularity and Community Structure in Networks* M.E.J Newman in - PPT Presentation

Modularity and Community Structure in Networks MEJ Newman in PNAS 2006 1 Networks A network presented by a graph G VE V nodes E edges link node pairs Examples of reallife networks ID: 761775

division matrix modularity group matrix division group modularity node algorithm networks network sparse power vector method groups computing generalized

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Modularity and Community Structure in Ne..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Modularity and Community Structure in Networks* M.E.J Newman in PNAS 2006 1

Networks A network: presented by a graph G(V,E):V = nodes, E = edges (link node pairs) Examples of real-life networks: social networks (V = people) World Wide Web (V= webpages ) protein-protein interaction networks (V = proteins) 2

Protein-protein Interaction Networks 3 Nodes – proteins (6K), edges – interactions (15K). Reflect the cell’s machinery and signaling pathways.

Communities (clusters) in a network A community (cluster) is a densely connected group of vertices, with only sparser connections to other groups. 4

Searching for communities in a network There are numerous algorithms with different "target-functions":"Homogenity" - dense connectivity clusters"Separation"- graph partitioning, min-cut approachClustering is important for Understanding the structure of the network Provides an overview of the network 5

Distilling Modules from Networks 6 Motivation: identifying protein complexes responsible for certain functions in the cell

Modularity (Newman) 7

Modularity of a division (Q) 8 Q = #( edges within groups ) - E(#( edges within groups in a RANDOM graph with same node degrees )) Trivial division : all vertices in one group ==> Q(trivial division) = 0 Edges within groups k i = degree of node i M =  k i = 2|E | Aij = 1 if ( i,j ) E, 0 otherwise Eij = expected number of edges between i and j in a random graph with same node degrees. Lemma : Eij  ki*kj / M Q = (Aij - ki*kj/M | i,j in the same group)

Modularity 9 Are two definitions of modularity equivalent ?

Methods to Optimize Q 10 Fast modularity Greedily iterative agglomeration of small communities Choosing at each step the join that results in the greatest increase (or smallest decrease) in Q Can be generalized to weighted networks Extreme methods: Simulated Annealing, GA Heuristic algorithm Spectral Partitioning

Important features of Newman's clustering algorithm The number and size of the clusters are determined by the algorithmAttempts to find a division that maximizes a modularity score Q heuristic algorithmNotifies when the network is non-modular 11

Algorithm 1: Division into two groups (1)Suppose we have n vertices {1,...,n}s - { 1} vector of size n. Represent a 2-division: si == sj iff i and j are in the same group ½ (si*sj+1) = 1 if si==sj, 0 otherwise ==> 12 Q = (Aij - ki*kj/M | i,j in the same group)

Algorithm 1: Division into two groups (2) 13 Since where B = the modularity matrix - symmetric - row sum = 0 0 is an eigvenvalue of B

Modularity matrix: example 14

Algorithm 1: Division into two groups (3)Which vector s maximizes Q? clearly s ~ u1 maximizes Q, but u1 may not be {1} vector Greedy heuristic: choose s ~ u1 : si = +1 if ui >0, si =-1 otherwise 15 B 's eigen values B 's corresponding eigen vectors B is symmetric  B is diagonalizable (real eigenvalues) n =|| s || 2 =  a i 2 Bu i =  i u i

16

Example: a 2-division of a social network 17 A network showing relationships between people in a karate club which eventually split into 2. The division algorithm predicts exactly the two groups after the split known group leader known group leaders Color matches the entries of the eigen vector u1: light = positive entry (si=1) dark: negative (si=-1)

Dividing into more than 2 (1)How to compute into more than 2?Idea: apply the algorithm recursively on every group. 18 Splitting a group ==>update Q {i,j} pairs that needs to be updated in Q Bij 0|1 =1 iff i and j are in the same group, 0 otherwise

Dividing into more than 2 (2)g - a group of n g verticess - a { 1} vector of size ng Compute Q for a 2-division of g 19 New: elements of g are split into two subgroups (corresponding to s) Old: all the elements of g are within one group (g) Bij 0|1

Dividing into more than 2 (3)20 where B[g] = the submatrix of B defined by g f i (g) = sum of ith row B[g] f i ({1,...,n}) = 0 generalized modularity matrix

Generalized modularity matrix: example 21 g = {1, 4, 5} (1 is the minimal index) What is [{1...5}]?

22 A "generalized" 2-division algorithm (divides a group in a network)

23

Further techniques for modularity maximization (Combined with Neman's "generalized' 2-division algorithm)24

A heuristic for 2-division {g1, g2} - an initial 2-division of g While there is an unmoved node: Let v be an unmoved node, whose moving between g1 and g2 maximizes Q Move v between g1 and g2 From the n g 2-divisions generated in the previous step - let { g1 , g2 } be the one with maximum Q If Q>0 ==> go to 1 25 The last iteration produces a 2-division which equals the initial 2-division

26 Choosing j' with maximum Q 2.While there is an unmoved node: 1. Let v be an unmoved node, whose moving between g1 and g2 maximizes Q 2. Move v between g1 and g2 Computing Q for each node moving j' and storing its Q

Algorithm 4 -cont. 27 3. From the n g 2-divisions generated in the previous step - let {g1, g2} be the one with maximum Q 4. If Q>0 ==> go to 1

Finding the leading eigen-pair The power method28

The Power Method (1) A - a diagonalizable matrixLet ( 1,V 1 ),..., ( n ,Vn) be n eigenpairs of A where |  1 | > |  2 |  |  3 |... |  n | The power method finds the dominant eigenpair of A, i.e. (V 1 ,  1 ) (Note that  1 is not necessarily the leading eigenvalue)X0 = any vector.  X0 = c1V1+... +cnVn , where ci = X0Vi29

The Power Method (2) X1=AX 0 = A (c1 V 1 +... +c nVn ) = c 1 AV 1 +... +c n AV n = c 1  1 V 1 +....+ c n  n V n X 2 =A 2 X 0 = AX 1= A (c11V1+....+ cnnVn) = c112V1+....+ cnn2Vn ... X m =A m X 0 = AX m-1 = A (c 1  1 m-1 V 1 +....+ c n  n m-1 V n ) = c 1  1 m V 1 +....+ c n  n m V n ~ c1  1 m V 1 If m is large enough  30

Power Method (3) Suppose V1Y0. For m large enough: 31 X m = AX m-1 = A m X 0 For simplicity, Y=X m

Power method - Example 32 Example:  We perform only matrix-vector multiplications! Convergence usually occurs within O(n) iterations

Power method – convergence condition 33 To avoid numerical problems due to large numbers – normalize X i before computing X i+1 = A Xi X 0 = X / ||X|| X 1 = AX 0 / ||AX 0 || X 2 = AX 1 / || AX 1 || .... The desired precision

Finding the leading eigenpairusing matrix shifting Let be the eigenvalues of A, and U1,...,U n their corresponding eigenvectors Let ||A||1 = max |i | (exercise) Q: What is the dominant eigenpair of A+||A|| 1 I? A: ( 1+ ||A|| 1 , U1) 34

Implementation Robustness and Efficiency 35

Checking "positiveness" #define IS_POSITIVE(X) ((X) > 0.00001)Instead "x>0" ==> use IS_POSITIVE(X) 36

Efficient multiplications in the (extended) modularity matrix: O(n) instead O(n2)37 multiplication in a sparse matrix inner product f (g) i x i ("matrix shifting") "matrix shifting"

sparse_matrix_arr typedef struct{ int n ; /* matrix size */ elem* values ; /* the non zero elements ordered by rows*/ int* colind ; /* column indices */ int* rowptr ; /* pointers to where rows begin in the values array. */ } sparse_matrix_arr; 38

Fast score computations 39 Computing Q for each node ==>O(n 2 ) Computing Q for each node in O(n) before moving 1st node Updating the score AFTER a move of a node k (s is already updated) Algorithm 4

Project specifications 40

programs sparse_mlpl < matrix_vec.inmodularity_mat <adj_matrix> <group>spectral_div <adj_matrix> <group> <precision> improve_div < adj_matrix> <group> <subgroup> cluster <adj_matrix> <precision> 41 for the power method for the power method computing a 2-division The complete clustering algorithm (including the improvement)

Implementation process Read and understand the documentDesign ALL programs: Data structuresFunctions used by more than one programCheck your code "Toy" examples on website - easy to debugYour own created LARGE examplesRun your code on yeast/fly networks 42

Analyzing clusters in yeast and fly protein-protein interaction networks Input: true PPI network + 2 random networksTask 1 : infer the true networkSolution: the true network is more modular Task 2 : compute associated functions (using cytoscape + BiNGO) 43 Saccharomyces cerevisiae drosophila melanogaster

Cytoscape, BiNGOwww.cytoscape.com (version 2.5.1)A framework for analyzing networks Provides visualization of networks and clustershttp://www.psb.ugent.be/cbd/papers/BiNGO/ Finding functions associated with gene cluster Runs from cytoscape Version 2.3 is not suitable for our project!!! (due to a bug) ==> use version 2.4 (when available) or version 2.0 (available under ~ozery/public/cytoscape-v2.5.1/plugins/BiNGO.jar). 44

BiNGO output (GO = Gene Ontology) 45

Visualization with cytoscape 46

How is the project checked? Most checks (points): "BLACK BOX"The common checks in "real world"Running with fixed input files, comparing to fixed output filesScore = #(successful checks) / #(total checks) "WHITE BOX" checks: code review (10 points maximum)code simplicity / efficiency 47

A simple data structure for maintaining a division Complexity:Finding all the elements of a group: O(n)Splitting a group into 2: O(n) 48 typedef struct Division_{ int n; int* group- ids; int numGroups; double Q; } Division; #nodes in the network for each node - its group id (initially 0 - all nodes within on group)

Maintaining the generalized modularity matrix Should we maintain the modularity matrix?No: 1) we do not use it explicitly 2) it is a dense matrix - consumes a large memory space Yes: 1) Despite its large size - can be kept in memory 2) Can simplify code (e.g. deriving B[g] from B, computing the L1 -norm) 3) Can be used in validating the correctness of optimized multiplications (debug mode only!) 49

Suggestion for modules 50 Sparse matrices : Data structure: sparse_matrix_lst Reading a sparse matrix ( file / stdin) Multiplication in a vectorComputing A[g]Methods hiding the inner structure (allows a simple replacement of sparse_matrix_lst with another data structure for holding sparse matrices) Division Group The spectral algorithm: 2-division full-division The improvement algorithm The generalized modularity matrix : Data structure: A[g], k[g], M, f[g], L1-norm Multiplication in a vector Computing Q printing the modularity matrix

Good luck! (and have fun...)51