High Density Clusters June 2017 1 Idea Shift DensityBased Clustering VS CenterBased 2 Main Objective Objective find a clustering of tight knit groups in G 3 Clustering Algorithm Recursive Algorithm based on Sparse Cuts ID: 761634
Download Presentation The PPT/PDF document "High Density Clusters June 2017 1 Idea S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
High Density Clusters June 2017 1
Idea Shift Density-Based Clustering VS Center-Based. 2
Main Objective Objective: find a clustering of tight knit groups in G. 3
Clustering Algorithm: Recursive Algorithm based on Sparse Cuts Finding “Dense Submatrices” Community Finding: Network Flow 4 Outline
5 Part : Recursive Clustering
Recursive Clustering-Sparse Cuts For two disjoint sets of nodes S,T, we will define: 6 S T
Recursive Clustering-Sparse Cuts For a set S, we will define: 7 = 3 S
Recursive Clustering-Sparse Cuts 8 1 6 2 5 9 7 3 10 8 4 S 11 12
1 3 2 8 5 4 9 7 6 : |S| = 3 |W| = 9 : |S| = 3 |W| = 6 Let be 9
Recursive Clustering-Sparse Cuts Clusters(G)- List of current clusters Initiailization : Clusters(G ) = {V} (One cluster- the graph) Let > 0 Rec_Clustering(G, , ) for each cluster W in Clusters(G) do if do Rec_Clustering (G , , ) 10
Recursive Clustering-Sparse Cuts Theorem 7.9 At the termination of Recursive Clustering, the total number of edges between vertices in different clusters is at most 11
12 Part : Dense Submatrices
Dense Submatrices - Different Approach Let n data points in d-space be represented as a Matrix (We will assume that A is non negative). 13 Example: The Document-Term matrix. Let D1 be the statement “I really really like Clustering”and Let D2 be the statement “I love Clustering” Clustering really lovelikeI 120 11D1 1010 1D2
Like I Really Love 1 2 T 14 Clustering
Dense Submatrices Say we look at A as a bipartite graph, where one side represents Rows(A) and the other Col(A), where the edge ( i , j) is given weight We want s.t : 15
Dense Submatrices First Try: (The average size in the submatrix) 16 Second Try: D , and let the density of A.
17 Dense Submatrices Clustering Love Like I 1 0 11 D111 01D2
Dense Submatrices Theorem 7.10 Let A be a Matrix with entries in then Furthermore, we can find S,T such that using the top singular vector 18
19 Part : Community Finding
Dense Submatrices Special Case: Similarity of the Set For S subgroup of V, What does D(S,S) represent? 20 0 0 1 1 0 0 0 1 01 110 1110 100 011 00 1234 5 1 2 3 4 5 Let S= {3,4,5}
21 0 0 1 1 0 0 0 101 1101 110 100 0110 0 1 3 2 4 5 1 2 3 4 5 1 2 3 4 5 S= Green Dense Submatrices
Community Finding- Similarity of the Set Goal: Find the subgraph with maximum average degree in graph G. 22
Community Finding Let G=(V,E) a weighted graph. We define: Where S,T are two sets of nodes. 23 The density of S will be: What are we looking for in terms of density?
Flow technique Sub-Problem Let . (Or claim it does not exist!) 24 S= Green
v u x w s t ( u,v ) ( v,w ) ( w,x) 1 1 1 edges vertices What kinds of Cuts exist in H? Flow technique 25
v u x w s t ( u,v ) ( v,w ) ( w,x) 1 1 1 edges vertices C(S,T) = |E| Type 1 Cut 26
v u x w s t ( u,v ) ( v,w ) ( w,x) 1 1 1 edges vertices Type 2 Cut 27 C(S,T) = |V|
v u x w s t ( u,v ) ( v,w ) ( w,x) 1 1 1 edges vertices Type 3 Cut 28 C(S,T) =
Flow technique Theorem: 29
Algorithm: Start with Build Network, and run MaxFlow If we get Type 3 Cut: Look for bigger Else: Look for a smaller Complexity : 30 Flow technique
Flow technique - Questions When do we stop? 31 > 0 (different stages of algorithm) and whole,