Theory and Applications Danai Koutra CMU Tina EliassiRad Rutgers Christos Faloutsos CMU SDM 2014 Friday April 25 th 2014 Philadelphia PA Who we are Danai Koutra CMU Node and graph similarity ID: 321954
Download Presentation The PPT/PDF document "Node Similarity, Graph Similarity and Ma..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Node Similarity, Graph Similarity and Matching: Theory and Applications
Danai Koutra (CMU)Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)
SDM 2014
, Friday April 25
th
2014, Philadelphia, PASlide2
Who we are
Danai Koutra, CMUNode and graph similarity, summarization, pattern mining
http://www.cs.cmu.edu/~dkoutra/
Tina Eliassi-Rad, RutgersData mining, machine learning, big complex networks analysishttp://eliassi.org/Christos Faloutsos, CMUGraph and stream mining, …http://www.cs.cmu.edu/~christos
2Slide3
Part 2aSimilarity between Graphs:
Known node correspondence3Slide4
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence4Slide5
Problem Definition:Graph Similarity
Given: (i) 2 graphs with the
same
nodes and different edge sets (ii) node correspondenceFind: similarity score s [0,1]
5
G
A
G
BSlide6
Problem Definition:Graph Similarity
Given: (a) 2 graphs with the
same
nodes and different edge sets (b) node correspondenceFind: similarity score, s [0,1]
s
= 0:
G
A
<>
G
B
s = 1:
G
A
==
G
B
6
G
A
G
BSlide7
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence7Slide8
Applications
Danai Koutra (CMU)
8
Discontinuity Detection
Day 1 Day 2 Day 3 Day 4 Day 5
2
Classification
1
different brain wiring?Slide9
Applications
Danai Koutra (CMU)
9
Intrusion detection
4
Behavioral Patterns
3
FB message graph vs. wall-to-wall networkSlide10
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence10Slide11
Is there any obvious solution?
11Slide12
One Solution
Edge Overlap(EO)
# of common edges
(normalized or not)
Danai Koutra (CMU)12
G
A
G
BSlide13
… but “barbell”…
EO(B10,mB10)
==
EO(B10,mmB10) Danai Koutra (CMU)13
G
A
G
A
G
B
G
B’Slide14
Other solutions?
14Slide15
Vertex / Edge Overlap
15
[Papadimitriou,
Dasdan
, Garcia-Molina ‘10]IDEA: “Two graphs are similar if they share many vertices and/or edges.” 5 + 4 VEO = 2 -------------------- 5 + 5 + 5 +
4
G
A
G
B
Common
nodes + edges
nodes + edges
in
G
A
nodes + edges
in
G
B
Slide16
Vertex Ranking
16
IDEA
: “Two graphs are similar if the rankings of their vertices are similar”
[Papadimitriou, Dasdan, Garcia-Molina ‘10]
Rank correlation
with scores of G
B
G
A
PageRank
Node Score
0 .13
1 .25
2 .24
3 .25
4 .13
Sort
Score
.25
.25
.24
.13
.13
Slide17
Vector Similarity
17
IDEA
: “Two graphs are similar if their node/edge weight vectors are close”
sim(GA, GB) = similarity between the eigenvectors of the adjacency matrices A & B
[Papadimitriou,
Dasdan, Garcia-Molina ‘10]Slide18
Graph Edit Distance
18
# of operations to transform
G
A to GBInsertion of nodes/edgesDeletion of nodes/edgesEdge label substitution [Bunke+ ’98, ’06, Riesen ’09, Gao ’10, Fankhauser ’11]
NP-complete
BUT…
✗
for communications
performance
monitoringSlide19
Graph Edit Distance
19
# of operations to transform
G
A to GBInsertion of nodes/edgesDeletion of nodes/edgesCost per operation -> hard problem[Bunke+ ’98, ’06, Riesen ’09, Gao ’10, Fankhauser ’11]
How to assign?Slide20
Graph Edit Distance
20
But for
Insertion of nodes/edges: cost = 1
Deletion of nodes/edges: cost = 1Change in weights: not consideredGED(GA, GB) = |VA|+|VB|- 2|VA VB|
+ |EA| + |E
B| - 2|EA E
B|
[Bunke
+ ’98, ’06,
Riesen
’09,
Gao
’10,
Fankhauser
’11
]
topological
changes only
U
USlide21
Graph Edit Distance
21
But for
Insertion of nodes/edges: cost = 1
Deletion of nodes/edges: cost = 1Change in weightsGEDw(GA, GB) = c[|VA|+|VB|- 2|VA V
B|] + |E
A| + |EB| - 2|E
A EB
| + Σ
w
A
(e) +
Σ
w
B
(e
) +
Σ
|
w
A
(e
)-
w
B
(e)|
[
Kapsabelis+
’07]U
U
e only
in
GA
e only
in GB
e in GA
& GBSlide22
Weight Distance
22
[
Shoubridge
+ ’02, Dickinson+ ‘04] 1 |wGA(e) – wGB(e)|d(GA, G
B)= ---------- . Σ
---------------------------
|E
A E
B
|
e
max{
w
GA
(e)
,
w
GB
(e
)
}
Takes into account relative differences
in
the
edge weights.Slide23
Maximum Common Subgraph
23
[
Bunke
+ ’06] |mcs(GA, GB)| d(G
A, G
B)= 1- -----------------------
max{|
GA
|
,
|
G
B
|}
NP-complete!
MCS Edge Distance
|
mcs
(
E
A
,
E
B
)|
d(
G
A
,
G
B
)= 1- -----------------------
max{|EA
|, |E
B|}
MCS Node Distance |mcs
(VA,
VB)| d(
GA, G
B)= 1- -----------------------
max{|VA|,
|VB
|}Slide24
Maximum Common Subgraph
24
[
Bunke
+ ’06] |mcs(GA, GB)| d(G
A, G
B)= 1- -----------------------
max{|
GA
|
,
|
G
B
|}
MCS Distance
(|G|=|V|)
day
Event Detection
NP-complete!Slide25
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence25Slide26
Signature Similarity
26
[Papadimitriou,
Dasdan
, Garcia-Molina ‘10]Step 1: Compute graph fingerprint (b bits)out-degree
Page-
rank
sign(entry)>0 => 1
sign(entry
)<0
=>
0
b numbers in {-1,1}
p
er node/edgeSlide27
Signature Similarity
27
[Papadimitriou,
Dasdan
, Garcia-Molina ‘10]Step 2: Hamming Distance between graph fingerprintsFingerprint of GA: Fingerprint of GB:Hamming Distance: 4
1
0
1
0
1
0
0
1
0
1Slide28
Application: Anomaly Detection
28
[Papadimitriou,
Dasdan
, Garcia-Molina ‘10]Slide29
… Many similarity functions can be defined…
29
W
hat
properties should a good similarityfunction have?Slide30
Axioms
30A1.
Identity
property sim( , ) = 1A2. Symmetric property sim( , ) = sim( , ) A3. Zero
property sim(
, ) = 0
[Koutra, Faloutsos, Vogelstein ‘13]Slide31
Desired Properties
31Intuitiveness
P1.
Edge Importance
P2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus AwarenessScalability[Koutra, Faloutsos, Vogelstein ‘13]Slide32
Desired Properties
32Intuitiveness
P1. Edge Importance
P2.
Weight AwarenessP3. Edge-“Submodularity”P4. Focus AwarenessScalability
Creation of disconnected components matters more than small connectivity changes.
[Koutra, Faloutsos, Vogelstein ‘13]Slide33
Desired Properties
33Intuitiveness
P1.
Edge Importance
P2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus AwarenessScalability
The bigger the edge weight, the more the edge change matters.
w
=5
w
=1
✗
✗
[Koutra, Faloutsos, Vogelstein ‘13]Slide34
Desired Properties
34Intuitiveness
P1.
Edge Importance
P2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus AwarenessScalability
“Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change.
n
=5
G
A
G
A
G
B
G
B
[Koutra, Faloutsos, Vogelstein ‘13]Slide35
Desired Properties
35Intuitiveness
P1.
Edge Importance
P2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus AwarenessScalability
Targeted
changes are more important than
random
changes of the same extent.
G
A
targeted
G
B
’
random
G
B
[Koutra, Faloutsos, Vogelstein ‘13]Slide36
How do state-of-the-art methods fare?
36
Metric
P1
P2P3P4Vertex/Edge Overlap✗
✗
✗
?
Graph Edit Distance (XOR)
✗
✗
✗
?
Signature Similarity
✗
✔
✗
?
λ-
distance (adjacency matrix)
✗
✔
✗
?
λ-
distance (graph
laplacian
)
✗
✔
✗
?
λ-
distance (normalized
lapl
.)
✗
✔
✗
?
edge weight returns focus
[Koutra, Faloutsos, Vogelstein ‘13]
Later!Slide37
Is there a method that satisfies the properties?
Yes! DeltaCon
37Slide38
DeltaCon: Intuition
STEP 1: Compute the pairwise
node influence,
S
A & SB38SA
=
S
B
=
[Koutra, Faloutsos, Vogelstein ‘13]
G
A
G
BSlide39
DeltaCon
39
S
A
=
S
B
=
Details
Find the
pairwise
node influence
,
S
A
&
S
B
.
Find the
similarity
between
S
A
&
S
B
.
[Koutra, Faloutsos, Vogelstein ‘13]Slide40
How? Using FaBP.
Sound theoretical background (MLE on
marginals
)
Attenuating Neighboring Influence for small ε:40
1-
hop
2-hops
…
Note:
ε
>
ε
2
> ...
,
0<ε<1
IntuitionSlide41
Our Solution: DeltaCon
41
Details
Find the
pairwise node influence, SA
& S
B.
Find the similarity between
SA &
S
B
.
S
A
,
S
B
S
B
=
S
A
=
sim
(
S
A
,
S
B
) = 0.3
[Koutra, Faloutsos, Vogelstein ‘13]Slide42
… but O(n2) …
42
f a
s
t e r ?
1
4
2
3
in the paper
[Koutra, Faloutsos, Vogelstein ‘13]Slide43
Comparison of methods revisited
43
Metric
P1
P2P3P4Vertex/Edge Overlap✗
✗
✗
?
Graph Edit Distance (XOR)
✗
✗
✗
?
Signature Similarity
✗
✔
✗
?
λ-
distance (adjacency matrix)
✗
✔
✗
?
λ-
distance (graph
laplacian
)
✗
✔
✗
?
λ-
distance (normalized
lapl
.)
✗
✔
✗
?
DeltaCon
0
✔
✔
✔
✔
DeltaCon
✔
✔
✔
✔
edge weight returns focus
[Koutra, Faloutsos, Vogelstein ‘13]Slide44
44
Nodes
:
employees
Edges: email exchange
Day 1 Day 2 Day 3 Day 4 Day 5
sim
1
sim
2
sim
3
sim
4
Temporal Anomaly
Detection
[Koutra, Faloutsos, Vogelstein ‘13]Slide45
45
similarity
consecutive days
Feb 4: Lay resigns
Temporal Anomaly
Detection
[Koutra, Faloutsos, Vogelstein ‘13]Slide46
Brain Connectivity Graph Clustering
46
114 brain graphs
Nodes
: 70 cortical regionsEdges: connectionsAttributes: gender, IQ, age…
[Koutra, Faloutsos, Vogelstein ‘13]Slide47
Brain Connectivity Graph Clustering
47
High CCI
Low CCI
t-test
p-value = 0.0057
[Koutra, Faloutsos, Vogelstein ‘13]Slide48
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence48Slide49
Comparing Connectomes
For small graphs with 40-80 nodes and low sparsity
49
connectome
weighted
adjacency matrix
Functional
MRI
[
Alper
+ ’13, CHI]Slide50
Tested Visual Encodings
50
[
Alper
+ ’13, CHI]
1) Augmenting the graphs to show the differencesSlide51
Tested Visual Encodings
51
[
Alper
+ ’13, CHI]
2) Augmenting the adjacency matrices to show the differencesSlide52
Tested Visual Encodings
52
[
Alper
+ ’13, CHI]
2) Augmenting the adjacency matrices to show the differences
User Study Result:
Matrices are better than graphs as the
size increases and the
sparsity
drops. Slide53
More on visualization
53For large graphs
HoneyComb
[van Ham+ ’09] Reference graph [Andrews ’09]Interactive comparison [Hascoet+ ’12]General principles [Gleicher+ ’11]… Slide54
Roadmap
Known node correspondenceMotivationSimple featuresComplex features
Visualization
Summary
Unknown node correspondence54Slide55
Summary
Numerous applications: Network monitoring, anomaly detection, network intrusion, behavioral studiesAlthough seems easy problem, it’s not!
There are multiple measures, but which one to use?
Depends on the application!
55Slide56
Papers at
http://www.cs.cmu.edu/~dkoutra/pub.htm
56Slide57
What we will cover next
57Slide58
ReferencesKoutra, Danai and Faloutsos, Christos and Vogelstein, Joshua T. (2013).
DELTACON: A Principled Massive-Graph Similarity Function. SDM 2013: 162-170Papadimitriou, Panagiotis
and
Dasdan
, Ali and Garcia-Molina, Hector (2010). Web Graph Similarity for Anomaly Detection. Journal of Internet Services and Applications, Volume 1 (1). pp. 19-30.H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis, A Graph-Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser, 2006.Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching.Horst Bunke and Kim Shearer. 1998.
A graph distance metric based on the maximal common subgraph
. Pattern Recogn.
Lett. 19, 3-4 (March 1998), 255-259.
58Slide59
ReferencesKelmans
, A. 1976. Comparison of graphs by their number of spanning trees. Discrete Mathematics 16, 3, 241 – 261.Stefan Fankhauser
,
Kaspar
Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR'11.Xinbo Gao, Bing Xiao, Dacheng Tao, and Xuelong Li. 2010. A survey of graph edit distance. Pattern Anal. Appl. 13, 1 (January 2010), 113-129.Shoubridge P., Kraetzl M., Wallis W. D., Bunke H. Detection of Abnormal
Change in a Time Series of Graphs. Journal of Interconnection Networks
(JOIN) 3(1-2):85-101, 2002. Kelly Marie Kapsabelis, Peter John Dickinson,
Kutluyil Dogancay.
Investigation of graph edit distance cost functions for detection of network
anomalies
.
ANZIAM J. 48 (CTAC2006) pp.436–449, 2007.
59Slide60
References
VisualizationAndrews, K.,
Wohlfahrt
, M., and
Wurzinger, G. 2009. Visual graph comparison. In Information Visualisation, 2009 13th International Conference. 62 –67.Frank Ham, Hans-Jörg Schulz, and Joan M. Dimicco. 2009. Honeycomb: Visual Analysis of Large Scale Social Networks. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II (INTERACT '09)Basak Alper, Benjamin Bach, Nathalie Henry Riche, Tobias Isenberg, and Jean-Daniel Fekete
. 2013. Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13).
Mountaz
Hascoët and Pierre Dragicevic. 2012.
Interactive graph matching and visual comparison of graphs and clustered graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI '12).
60Slide61
References
Michael Gleicher, Danielle Albers, Rick Walker, Ilir
Jusufi
, Charles D. Hansen, and Jonathan C. Roberts. 2011.
Visual comparison for information visualization.61