Outline 1 Introduction 2 Meta Structure 3 Relevance Measures 4 Experiments Introduction Computing relevance on network social network coauthor ID: 642070
Download Presentation The PPT/PDF document "Meta Structure: Computing Relevance in L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Meta Structure: Computing Relevance in Large Heterogeneous Information Networks Slide2
Outline
1.
Introduction
2.
Meta
Structure
3.
Relevance
Measures
4
.
ExperimentsSlide3
Introduction
Computing
relevance
on
network
(social
network,
co-author
network)
supports
many
applications.
similarity
search
recommendation
Many
measures
have
been
proposed
Jaccard
coefficient,
common
neighbors,
shortest
path
Page
Rank
,
Personalize
Page
Rank
,
SimRankSlide4
Heterogeneous
Information
Network
HIN:
Directed
graph with multiple node types and edge types.Slide5
Meta Path
[1]
Meta path:
a
sequence of node types with edge types in between.Slide6
Meta
Path-Based Relevance
Measures
PathCount
[1]: #InstancePathSim [1]: normalized version of
Pathcount
PCRW
[2]
: random walkThey measure how much two objects are connected by paths that conform to a given meta path.Slide7
Linear Combination
R(a
1
,a
2
)
=
R(a1,a2|P1)+R(a1,a2|P2)
=
1+1
= 2R(a2,a3)=
R(a
2
,a
3
|P1)+R(a2,a3|P2)= 1+1= 2Slide8
Limitation
Fail
to
discover
common nodes in different meta pathsA researcher wants to search for some authors who have published papers in the same venue and in the same topic with his papers. Slide9Slide10
Meta Structure
A
directed
acyclic graph (DAG) with a single source node n
s
(i.e., with in-degree 0) and a single sink (target) node nt (i.e., with out- degree 0).More Powerful. Contain more information
than
a
meta path. Can express more
semantic
meaning.Slide11
Relevance Measures
StructCount
:
extension
of
PathCountBiased to popular objects.Slide12
Relevance Measures
Layer of meta structureSlide13
Relevance Measures
Structure Constrained Random Walk (SCSE)Slide14
Recursive Tree
To calculate SCSE(a2, a1)
1.0
1.0
1.0
1.0
0.5
0.25
0.0
0.0
0.0
0.5
0.0Slide15
Relevance Measures
Biased Structure Constrained Random Walk (BSCSE)
A combination of SC and SCSE
SC 0 <-
-> 1 SCSE
Slide16
Experiments (Effectiveness)
Datasets
DBLP-4-Area:
20k entities and 50k edges
Form 4 areas
YAGO-core
2.1 million entities and 4 million edges
derived from Wikipedia, WordNet and GeoNamesSlide17
Entity ResolutionRelevance RankingClusteringCase Study
Experiments (
Effectiveness
)Slide18
Entity Resolution
On YAGO, we have duplicated entities, e.g.,
Barack_Obama
and
Presidency_Of_Barack_Obama
Area under PR-Curve (AUC)Slide19
Entity Resolution
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
AUC
0.1324
0.0120
0.0097
0.0003
0.0014
0.0002
Linear Combination(optimal
)
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
AUC
0.2898
0.2606
0.2920
0.5556
0.5640
0.5640
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
AUC
0.1324
0.0120
0.0097
0.0003
0.0014
0.0002
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
AUC
0.2898
0.2606
0.2920
0.5556
0.5640
0.5640Slide20
Relevance Ranking
We
label
the
relevance of venues in DBLP_4_Area. 0 for not relevant, 1
for
relevant
and 2 for strongly relevant.
Consider
both
scope and level of the venues. (like SIGMOD and VLDB are 2)
Normalized Discounted Cumulative Gain (NDCG)Slide21
Relevance Ranking
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
nDCG
0.9004
0.9047
0.9083
0.8224
0.8901
0.8834
Linear Combination(optimal
)
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
nDCG
0.9004
0.9100
0.9083
0.9056
0.9104
0.9130
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
nDCG
0.9004
0.9047
0.9083
0.8224
0.8901
0.8834
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
nDCG
0.9004
0.9100
0.9083
0.9056
0.9104
0.9130Slide22
Clustering
We do clustering on venues in YAGO
Normalized Mutual Information (NMI)
and
Purity Slide23
Clustering
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
NMI
0.4932
0.6866
0.6780
0.3595
0.6866
0.5157
Linear Combination(optimal
)
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
NMI
0.4932
0.6866
0.6780
0.3202
0.8065
0.8065
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
NMI
0.4932
0.6866
0.6780
0.3595
0.6866
0.5157
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
NMI
0.4932
0.6866
0.6780
0.3202
0.8065
0.8065
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
Purity
2.75
3.50
3.00
2.50
3.50
2.75
Linear Combination(optimal
)Meta Structure SMeasurePathCountPCRWPathSimSCSCSEBSCSE*Purity2.753.503.502.253.503.50
P1
P2
Measure
PathCount
PCRW
PathSim
PathCount
PCRW
PathSim
Purity
2.75
3.50
3.00
2.50
3.50
2.75
Meta Structure S
Measure
PathCount
PCRW
PathSim
SC
SCSE
BSCSE*
Purity
2.75
3.50
3.50
2.25
3.50
3.50Slide24
Case Study
Clint Eastwood,
a famous actor and directorSlide25
Case StudySlide26
Experiment (Efficiency)
Comparable efficiency as meta path-based methodsSlide27
Reference
[1]
Sun,
Yizhou
, et al. “
Pathsim
: Meta path-based top-k similarity search in heterogeneous information networks." VLDB’11 (2011).[2] Lao, Ni, and William W. Cohen. "Relational retrieval using a combination of path-constrained random walks." Machine learning 81.1 (2010): 53-67.