/
Meta Structure: Computing Relevance in Large Heterogeneous Information Networks Meta Structure: Computing Relevance in Large Heterogeneous Information Networks

Meta Structure: Computing Relevance in Large Heterogeneous Information Networks - PowerPoint Presentation

test
test . @test
Follow
370 views
Uploaded On 2018-03-07

Meta Structure: Computing Relevance in Large Heterogeneous Information Networks - PPT Presentation

Outline 1 Introduction 2 Meta Structure 3 Relevance Measures 4 Experiments Introduction Computing relevance on network social network coauthor ID: 642070

pathsim pathcount meta pcrw pathcount pathsim pcrw meta measure structure relevance scse path bscse measures combination 6866 linear ndcg

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Meta Structure: Computing Relevance in L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Meta Structure: Computing Relevance in Large Heterogeneous Information Networks Slide2

Outline

1.

Introduction

2.

Meta

Structure

3.

Relevance

Measures

4

.

ExperimentsSlide3

Introduction

Computing

relevance

on

network

(social

network,

co-author

network)

supports

many

applications.

similarity

search

recommendation

Many

measures

have

been

proposed

Jaccard

coefficient,

common

neighbors,

shortest

path

Page

Rank

,

Personalize

Page

Rank

,

SimRankSlide4

Heterogeneous

Information

Network

HIN:

Directed

graph with multiple node types and edge types.Slide5

Meta Path

[1]

Meta path:

a

sequence of node types with edge types in between.Slide6

Meta

Path-Based Relevance

Measures

PathCount

[1]: #InstancePathSim [1]: normalized version of

Pathcount

PCRW

[2]

: random walkThey measure how much two objects are connected by paths that conform to a given meta path.Slide7

Linear Combination

R(a

1

,a

2

)

=

R(a1,a2|P1)+R(a1,a2|P2)

=

1+1

= 2R(a2,a3)=

R(a

2

,a

3

|P1)+R(a2,a3|P2)= 1+1= 2Slide8

Limitation

Fail

to

discover

common nodes in different meta pathsA researcher wants to search for some authors who have published papers in the same venue and in the same topic with his papers. Slide9
Slide10

Meta Structure

A

directed

acyclic graph (DAG) with a single source node n

s

(i.e., with in-degree 0) and a single sink (target) node nt (i.e., with out- degree 0).More Powerful. Contain more information

than

a

meta path. Can express more

semantic

meaning.Slide11

Relevance Measures

StructCount

:

extension

of

PathCountBiased to popular objects.Slide12

Relevance Measures

Layer of meta structureSlide13

Relevance Measures

Structure Constrained Random Walk (SCSE)Slide14

Recursive Tree

To calculate SCSE(a2, a1)

1.0

1.0

1.0

1.0

0.5

0.25

0.0

0.0

0.0

0.5

0.0Slide15

Relevance Measures

Biased Structure Constrained Random Walk (BSCSE)

A combination of SC and SCSE

SC 0 <-

-> 1 SCSE

 Slide16

Experiments (Effectiveness)

Datasets

DBLP-4-Area:

20k entities and 50k edges

Form 4 areas

YAGO-core

2.1 million entities and 4 million edges

derived from Wikipedia, WordNet and GeoNamesSlide17

Entity ResolutionRelevance RankingClusteringCase Study

Experiments (

Effectiveness

)Slide18

Entity Resolution

On YAGO, we have duplicated entities, e.g.,

Barack_Obama

and

Presidency_Of_Barack_Obama

Area under PR-Curve (AUC)Slide19

Entity Resolution

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

AUC

0.1324

0.0120

0.0097

0.0003

0.0014

0.0002

Linear Combination(optimal

)

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

AUC

0.2898

0.2606

0.2920

0.5556

0.5640

0.5640

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

AUC

0.1324

0.0120

0.0097

0.0003

0.0014

0.0002

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

AUC

0.2898

0.2606

0.2920

0.5556

0.5640

0.5640Slide20

Relevance Ranking

We

label

the

relevance of venues in DBLP_4_Area. 0 for not relevant, 1

for

relevant

and 2 for strongly relevant.

Consider

both

scope and level of the venues. (like SIGMOD and VLDB are 2)

Normalized Discounted Cumulative Gain (NDCG)Slide21

Relevance Ranking

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

nDCG

0.9004

0.9047

0.9083

0.8224

0.8901

0.8834

Linear Combination(optimal

)

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

nDCG

0.9004

0.9100

0.9083

0.9056

0.9104

0.9130

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

nDCG

0.9004

0.9047

0.9083

0.8224

0.8901

0.8834

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

nDCG

0.9004

0.9100

0.9083

0.9056

0.9104

0.9130Slide22

Clustering

We do clustering on venues in YAGO

Normalized Mutual Information (NMI)

and

Purity Slide23

Clustering

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

NMI

0.4932

0.6866

0.6780

0.3595

0.6866

0.5157

Linear Combination(optimal

)

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

NMI

0.4932

0.6866

0.6780

0.3202

0.8065

0.8065

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

NMI

0.4932

0.6866

0.6780

0.3595

0.6866

0.5157

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

NMI

0.4932

0.6866

0.6780

0.3202

0.8065

0.8065

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

Purity

2.75

3.50

3.00

2.50

3.50

2.75

Linear Combination(optimal

)Meta Structure SMeasurePathCountPCRWPathSimSCSCSEBSCSE*Purity2.753.503.502.253.503.50

P1

P2

Measure

PathCount

PCRW

PathSim

PathCount

PCRW

PathSim

Purity

2.75

3.50

3.00

2.50

3.50

2.75

Meta Structure S

Measure

PathCount

PCRW

PathSim

SC

SCSE

BSCSE*

Purity

2.75

3.50

3.50

2.25

3.50

3.50Slide24

Case Study

Clint Eastwood,

a famous actor and directorSlide25

Case StudySlide26

Experiment (Efficiency)

Comparable efficiency as meta path-based methodsSlide27

Reference

[1]

Sun,

Yizhou

, et al. “

Pathsim

: Meta path-based top-k similarity search in heterogeneous information networks." VLDB’11 (2011).[2] Lao, Ni, and William W. Cohen. "Relational retrieval using a combination of path-constrained random walks." Machine learning 81.1 (2010): 53-67.