/
Introduction Introduction

Introduction - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
2039 views
Uploaded On 2015-11-08

Introduction - PPT Presentation

to Graph Cluster Analysis Outline Introduction to Cluster Analysis Types of Graph Cluster Analysis Algorithms for Graph Clustering kSpanning Tree Shared Nearest Neighbor Betweenness Centrality Based ID: 187170

library graph clustering edge graph library edge clustering clique spanning data nearest vertex betweenness introduction neighbor connected minimum clusters

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction

to

Graph

Cluster AnalysisSlide2

Outline

Introduction to Cluster Analysis

Types of Graph Cluster AnalysisAlgorithms for Graph Clusteringk-Spanning TreeShared Nearest NeighborBetweenness Centrality BasedHighly Connected ComponentsMaximal Clique EnumerationKernel k-meansApplication

2Slide3

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Graph Clusteringk-Spanning TreeShared Nearest NeighborBetweenness Centrality BasedHighly Connected ComponentsMaximal Clique Enumeration

Kernel k-means

Application

3Slide4

What is Cluster Analysis?

The process of dividing a set of input data into possibly overlapping

, subsets, where elements in each subset are considered related by some similarity measure4

Clusters by Shape

Clusters by Color

2 Clusters

3

ClustersSlide5

Applications of Cluster Analysis

5

Summarization

Provides a macro-level view of the data-set

Clustering precipitation in Australia

From

Tan, Steinbach, Kumar

Introduction To

Data Mining,

Addison-Wesley, Edition 1Slide6

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Graph Clusteringk-Spanning TreeShared Nearest NeighborBetweenness Centrality BasedHighly Connected ComponentsMaximal Clique Enumeration

Kernel

k-means

Application

6Slide7

What is Graph Clustering?

Types

Between-graphClustering a set of graphsWithin-graphClustering the nodes/edges of a single graph7Slide8

Between-graph Clustering

Between-graph clustering

methods divide a set of graphs into different clustersE.g., A set of graphs representing chemical compounds can be grouped into clusters based on their structural similarity8Slide9

Within-graph Clustering

Within-graph clustering methods

divides the nodes of a graph into clustersE.g., In a social networking graph, these clusters could represent people with same/similar hobbies9Note: In this chapter we will look at different algorithms to perform within-graph clusteringSlide10

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest NeighborBetweenness Centrality BasedHighly Connected ComponentsMaximal Clique

Enumeration

Kernel k-means

Application

10Slide11

k

-Spanning Tree

11

1

2

3

4

5

2

3

2

k-Spanning Tree

k

k

groups

of

non-overlapping

vertices

4

Minimum Spanning Tree

STEPS:

Obtains the Minimum Spanning Tree (MST) of

i

nput

g

raph

G

Removes k-1 edges from the MST

Results in k clustersSlide12

What is

a Spanning

Tree?A connected subgraph with no cycles that includes all vertices in the graph12

1

2

3

4

5

2

3

2

4

6

5

7

4

1

2

3

4

5

2

6

7

Weight = 17

2

Note:

Weight can represent either distance

or similarity between

two vertices or similarity of the two vertices

GSlide13

What is a Minimum Spanning Tree (MST)?

13

1

2

3

4

5

2

3

2

4

6

5

7

4

G

1

2

3

4

5

2

3

2

4

Weight = 11

2

1

2

3

4

5

2

4

5

Weight = 13

1

2

3

4

5

2

6

7

Weight = 17

2

The spanning tree of a graph with the minimum possible sum of

edge weights,

if the edge weights represent

distance

Note:

maximum

possible sum of edge weights, if the edge weights represent

similaritySlide14

Algorithm to Obtain MST

Prim’s Algorithm

14

1

2

3

4

5

2

3

2

4

6

5

7

4

Given Input Graph

G

Select Vertex Randomly

e.g., Vertex 5

5

Initialize Empty Graph T with Vertex 5

5

T

Select a list of edges L from G such that at most ONE vertex of

each

edge

is in T

From L

select

the edge X with minimum weight

Add X to T

5

5

3

4

6

4

5

4

4

5

T

4

Repeat

until

a

ll

v

ertices

are added to TSlide15

k

-Spanning Tree

15

1

2

3

4

5

2

3

2

Remove k-1 edges with highest weight

4

Minimum Spanning Tree

Note:

k – is the number of clusters

E.g., k=3

1

2

3

4

5

2

3

2

4

E.g., k=3

1

2

3

4

5

3 ClustersSlide16

k-Spanning Tree R-code

library(

GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(MST_Example)G = graph.data.frame(MST_Example,directed=FALSE)E(G)$weight=E(G)$V3MST_PRIM =

minimum.spanning.tree

(

G,weights=G$weight, algorithm = "prim

")

OutputList

=

k_clusterSpanningTree

(MST_PRIM,3)

Clusters =

OutputList[[1]]

outputGraph = OutputList[[2]]Clusters

16Slide17

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality BasedHighly Connected Components

Maximal Clique

Enumeration

Kernel k-means

Application

17Slide18

18

Shared Nearest Neighbor Clustering

0

1

2

3

4

Shared Nearest Neighbor Graph (SNN)

2

2

2

2

1

1

3

2

Shared Nearest Neighbor Clustering

Groups

of

non-overlapping

vertices

STEPS:

Obtains the Shared Nearest Neighbor Graph (SNN) of

i

nput

g

raph

G

Removes edges from the SNN with weight less than

τ

τSlide19

What is Shared Nearest

Neighbor?

(Refresher from Proximity Chapter)19

u

v

 

Shared Nearest Neighbor is a proximity measure and denotes the number of neighbor nodes common between any given pair of nodesSlide20

Shared Nearest Neighbor (SNN) Graph

20

0

1

2

3

4

G

0

1

2

3

4

SNN

2

2

2

2

1

1

3

Given input

graph

G, weight each edge (

u,v

) with the number of shared nearest neighbors between u and v

1

Node 0 and Node 1 have 2 neighbors in common: Node 2 and Node 3Slide21

Shared Nearest Neighbor

Clustering

Jarvis-Patrick Algorithm21

0

1

2

3

4

SNN

g

raph

of input

graph

G

2

2

2

2

1

1

3

2

If u and v share more

than

τ

neighbors

Place them in

the same

cluster

0

1

2

3

4

E.g.,

τ

=3Slide22

SNN-Clustering R code

library(

GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(SNN_Example)G = graph.data.frame(SNN_Example,directed=FALSE)tkplot(G)Output = SNN_Clustering(G,3)Output

22Slide23

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality BasedHighly Connected Components

Maximal Clique

Enumeration

Kernel

k-means

Application

23Slide24

What is Betweenness Centrality?

(Refresher from Proximity Chapter)

Two types:Vertex BetweennessEdge Betweenness24Betweenness centrality quantifies the degree to which a vertex (or edge) occurs on the shortest path between all the other pairs of nodesSlide25

Vertex Betweenness

25

The number

of

shortest paths in the graph G

that pass through a given node S

G

E.g., Sharon is likely a liaison between NCSU and DUKE and hence many connections between DUKE and NCSU pass through SharonSlide26

Edge Betweenness

The number

of shortest paths in the graph G that pass through given edge (S, B)26

E.g., Sharon and Bob both study at NCSU and they are the only link between NY DANCE and CISCO groups

NCSU

Vertices and Edges with high Betweenness form good starting points to identify clustersSlide27

Vertex Betweenness Clustering

27

Repeat until highest

v

ertex

betweenness

μ

Select

vertex v with the

h

ighest

b

etweenness

E.g., Vertex 3 with value 0.67

Given Input

graph

G

Betweenness for each vertex

1. Disconnect

graph

at

selected

vertex (e.g.,

vertex

3 )

2. Copy

vertex to both ComponentsSlide28

Vertex

Betweenness Clustering

R codelibrary(GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(Betweenness_Vertex_Example)G = graph.data.frame(Betweenness_Vertex_Example,directed=FALSE)betweennessBasedClustering(G,mode="

vertex",threshold

=0.2)

28Slide29

29

Edge-Betweenness Clustering

Girvan and Newman Algorithm

29

Repeat

until

h

ighest

e

dge

b

etweenness

μ

Select

edge

with Highest Betweenness

E.g.,

edge

(3,4) with value 0.571

Given Input Graph G

Betweenness for each edge

Disconnect

graph at selected edge

(E.g., (3,4 ))Slide30

Edge Betweenness Clustering

R code

library(GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(Betweenness_Edge_Example)G = graph.data.frame(Betweenness_Edge_Example,directed=FALSE) betweennessBasedClustering(G,mode

="

edge",threshold

=0.2)

30Slide31

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality BasedHighly Connected Components

Maximal Clique

Enumeration

Kernel

k-means

Application

31Slide32

What is

a Highly

Connected Subgraph?Requires the following definitionsCut Minimum Edge Cut (MinCut)Edge Connectivity (EC)32Slide33

Cut

The set of edges whose removal disconnects a graph

336

5

4

7

3

2

1

0

8

6

5

4

7

3

2

1

0

8

6

5

4

7

3

2

1

0

8

Cut = {(0,1),(1,2),(1,3}

Cut = {(3,5),(4,2)}Slide34

Minimum Cut

34

6

5

4

7

3

2

1

0

8

6

5

4

7

3

2

1

0

8

MinCut

= {(3,5),(4,2)}

The

minimum set

of edges whose removal disconnects a graphSlide35

Edge

Connectivity (EC)

Minimum NUMBER of edges that will disconnect a graph356

5

4

7

3

2

1

0

8

MinCut

= {(3,5),(4,2)}

EC = |

MinCut

|

= |

{(3,5),(4,2

)}|

= 2

Edge

C

onnectivitySlide36

Highly Connected Subgraph (HCS)

A graph

G =(V,E) is highly connected if EC(G)>V/236

6

5

4

7

3

2

1

0

8

EC(G) > V/2

2 > 9/2

G

G is NOT a highly connected subgraphSlide37

HCS Clustering

37

6

5

4

7

3

2

1

0

8

Find the

Minimum Cut

MinCut

(G)

Given Input

graph

G

(3,5

),(4,2)}

YES

Return

G

NO

3

2

1

0

6

5

4

7

8

G1

G2

Divide G

using

MinCut

Is EC(G)> V/2

Process Graph G1

Process Graph G2Slide38

HCS Clustering R code

library(

GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(HCS_Example)G = graph.data.frame(HCS_Example,directed=FALSE)HCSClustering(G,kappa=2)

38Slide39

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality Based

Highly Connected Components

Maximal Clique

Enumeration

Kernel k-means

Application

39Slide40

What is a Clique?

A subgraph C of graph

G with edges between all pairs of nodes406

5

4

7

8

Clique

6

5

7

G

CSlide41

What is a Maximal Clique?

41

65

4

7

8

Clique

Maximal Clique

6

5

7

6

5

7

8

A

maximal clique is a clique

that is not part of a larger clique.Slide42

42

BK(C,P,N)

C - vertices in current clique

P – vertices that can be added to C

N – vertices that cannot be added to C

Condition:

If both P and N are empty – output C as maximal clique

Maximal Clique Enumeration

Bron

and

Kerbosch

Algorithm

Input Graph GSlide43

Maximal Clique R code

library(

GraphClusterAnalysis)library(RBGL)library(igraph)library(graph)data(CliqueData)G = graph.data.frame(CliqueData,directed=FALSE)tkplot(G)maximalCliqueEnumerator (G)

43Slide44

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality BasedHighly Connected Components

Maximal Clique

Enumeration

Kernel k-means

Application

44Slide45

What is k-means?

k-means is a clustering algorithm applied to vector data points

k-means recap:Select k data points from input as centroidsAssign other data points to the nearest centroidRecompute centroid for each clusterRepeat Steps 1 and 2 until centroids don’t change

45Slide46

k-means on Graphs

Kernel K-means

Basic algorithm is the same as k-means on Vector dataWe utilize the “kernel trick” (recall Kernel Chapter)“kernel trick” recap We know that we can use within-graph kernel functions to calculate the inner product of a pair of vertices in a user-defined feature space.We replace the standard distance/proximity measures used in k-means with this within-graph kernel

function

46Slide47

Outline

Introduction to Clustering

Introduction to Graph ClusteringAlgorithms for Within Graph Clusteringk-Spanning TreeShared Nearest Neighbor ClusteringBetweenness Centrality BasedHighly Connected Components

Maximal Clique

Enumeration

Kernel k-means

Application

47Slide48

Application

Functional modules

in protein-protein interaction networksSubgraphs with pair-wise interacting nodes => Maximal cliques48

R-code

library(

GraphClusterAnalysis

)

library(RBGL)

library(

igraph

)

library(graph

)

data(

YeasPPI

)

G =

graph.data.frame

(

YeasPPI,directed

=FALSE

)Potential_Protein_Complexes = maximalCliqueEnumerator (G

)Potential_Protein_Complexes