/
Beyond Non-textual Linkage: Beyond Non-textual Linkage:

Beyond Non-textual Linkage: - PowerPoint Presentation

WannabeRockstar
WannabeRockstar . @WannabeRockstar
Follow
347 views
Uploaded On 2022-08-04

Beyond Non-textual Linkage: - PPT Presentation

Designing GNN for Textrich Graphs Yanbang Wang Jul 27 2020 at UIUC DMG Collaborated work with Carl Yang Pan Li and Prof Jiawei Han Textrich Graphs Usually come with two things Node attributes ID: 935875

topic text textual node text topic node textual doc linkage word rich model pregiven gnn graph citation term link

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Beyond Non-textual Linkage:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Beyond Non-textual Linkage:

Designing GNN for Text-rich Graphs

Yanbang Wang, Jul 27, 2020 at UIUC DMG

Collaborated work with Carl Yang, Pan Li and Prof. Jiawei Han

Slide2

Text-rich GraphsUsually come with two things:

Node attributes:Raw text (or Bag-of-Word / TF-IDF features)May or may not have extra numerical featuresPregiven linkage information among entities: Citation…Examples: Paper-paper:

Cora

, Citeseer, Pubmed,

arXiv, DBLP Webpage-webpage: WebKBPerson-person: Facebook

paper

paper

Cite

Slide3

TaskPrediction over text-rich graphsNode classification (theme categorization)

Link prediction (citation prediction)Prototype to many important applications

Slide4

Opportunities & challenges from Text:Structural information beyond citation network

The main advantage of GNN (collective classification): utilization of the linkage between target entitiesAlmost all previous work simply follow the pregiven linkage informationOr at most make some modification primarily based on the pregiven linkageHowever, in text-rich networks, the linkage information should not be confined to those explicitly givenFor example, the relationship between two papers is not only just “citation”Text attributes endorse much richer interrelationship beyond the pregiven linkage

Slide5

Two types of linkage

Non-textual linkage  usually pregiven and cleancitationco-authorship same publication venue

Textual linkage

latent, complex, but very rich

topic clusters (e.g. different research areas)(dis)similarities among the topic clusters (e.g., “Network analysis”, “Graph mining”, “Security”)subtle semantic relationship (e.g., “machine learning”, “deep learning”, “CNN”)

paper

paper

???

Slide6

Previous WorkHow to

1) model 2) utilize these (latent) textual linkages is seriously underexplored:Previous work on text-rich graphs focus on using the text independently:Use deep models like Bi-LSTM to vectorize node’s textual tags independently;Treat text attribute as a general feature vector;Previous work on document classification:

Can not cooperatively use textual linkage and non-textual linkage;

Fail to capture the complexity of the textual linkages;

(e.g. simple keyword matching like Text-GNN [AAAI’19], or k-nearest neighbor like paper2vec)

Slide7

Proposed method involving two phasesBasic idea: make the best use of the latent relationship of topics and word semantics

To model and utilize these (latent) textual linkage:Phase 1: graph construction, textual + non-textual -> heterogeneous

Phase 2:

Specialized GNN to model the interaction

Whether or not to combine the two phases results in two variants

Slide8

Phased Version

Slide9

Phase 1: Heterogeneous graph construction

Doc

Topic

Term

<has mixture of>

<distributes over>

<cite>

Non-textual linkage (citation)

Node Attributes

Node Text

Topic Model

 

Emb

.

Lookup

node/edge label

Input:

Supervision:

 

Loss:

 

Graph:

 

Slide10

Phase 2: Neural Propagation on Heterogeneous Graph

Step 1: Encode the different edge types and edge weights, Step 2: project all edges and node attributs to a unified feature spaceStep 3: Propagate the features:Where, σ: sigmoid, g:

softplus

,

encodes edge types and weights

 

 

 

Slide11

Joint Version

Slide12

What’s captured and what’s missed

Pros:Much richer latent textual relationships.Interaction between textual and non-textual linkage Phased framework: clean and transparentProblem roots in the two phases being completely independent:The topic clustering information are hard-coded by the pretrained topic models, and remains

frozen

throughout the GNN training later

Graph construction receive no benefit from the supervision signals in the 2nd phase

GNN training process has to accept whatever the topic model yields in the 1st phase

Doc

Topic

Term

<has mixture of>

<distributes over>

<cite>

Slide13

Making graph construction trainable

Doc

Topic

Term

<has mixture of>

<distributes over>

<cite>

Non-textual linkage

Topic Model

Input:

Node text

Topic Model

=

 

 

=

 

Loss:

 

 

node/edge label

Supervision:

 

Node text

(doc-term matrix)

Node Attributes

Slide14

Integrating supervision from the text

Input:Doc. node features:

Topic node feature:

Term node feature:

Supervision:

Doc-term matrix:

, extracted from raw text

 

 

 

,

 

 

Where,

is a learnable diagonal matrix parametrizing topic weights,

,

and

are scalar loss coefficients

 

1) Techniques to enforce sparsity,

2) Gumbel

softmax

to learn

distrb

.

Slide15

Experiments

Cora

Citeseer

Pubmed

GCN

87.63

77.28

87.17

GAT

87.71

76.21

86.92

GraphSAGE

86.82

75.19

84.74

Text-rich GNN (phased)

88.92

78.56

88.08

Text-rich GNN (joint)

On-going

Slide16

ConclusionWe propose leveraging the latent textual relationshipModel and use the latent textual relationship

Slide17

Project UpdateWorked on several technical details of the GNN architectureExperiment with 20NewsGroup Dataset

Systematic experiment setup

Slide18

Review

Doc

Topic

Term

<has mixture of>

<distributes over>

<cite>

Pregiven

Link

Topic Model

Input:

Node text

Topic Model

=

 

 

=

 

Loss:

 

 

node/edge label)

Supervision:

 

Node text

(doc-term matrix)

Node Attributes

Slide19

20NewsGroup DatasetDocument type: news report 20 news categoriesNo pregiven link information

Documents: 18,846, vocab size: 42,757, average length: 221.3

Method

Accuracy

LSTM

65.71

Bi-LSTM

73.18

PTE

76.74

CNN

82.15

Text-GCN (doc-word

, word-word)

86.34

Our Method

83~84

Slide20

Experiment Setup

Prediction TasksAblation & Comparison StudyEffect of topic model’s parameters (#Topics, #Terms)Analysis of the learned attention & topic models

Slide21

Prediction TaskText-rich graph with pregiven but

weak link dataDocument classification dataset without any links

Name

Node Meaning

Pregiven Link

Classification Target

CORA_ML

ML. Papers

Citation

7 ML. Areas

Hep-

th

High-energy Physics Papers

Citation

4 High-energy Physics Areas

WebKB

Webpages of Top Univs

Hyperlink

5 types of target readers

(previous SOTA 0.6)

20

NewsGroup

News reports

-

20 News Categories

MoviewReviews

Movie Reviews

-

2 (Positive/Negative)

Reuters

News reports

-

8 News Categories

Slide22

BaselinesGNN-based methods: GCN, GAT.Random-walk based methods: paper2vec, TADW

Text-network based methods: Text-GCN, PTE (Predictive Text Embed.)

Slide23

Ablation & Comparison StudyRemove different types of links in our network

PregivenDoc-TopicTopic-TermInitialize doc. node attributes with different feature extractors for textTF-IDF (default)Glove Vector (Mean pooled)Bi-LSTMText CNN(BERT)

Slide24

Further AnalysisRobustness to topic model setting

What happen if we use different number of topic node and term nodes?Analysis of the learned attention & topic modelsCan we visualize the learned attention and check how our GNN model finally learn about the text relationships?

Slide25

Experiment Update

Slide26

Dataset Overview

Name

Node Meaning

Pregiven Link?

#Target

Raw Text?

#docs, #edges, #vocab

CORA_ML

ML. Papers

Citation

7

2708, 5278, N/A

Citeseer

ML. Papers

Citation

6

x

3327, 4552, N/A

Pubmed

Biomed Papers

Citation

3

x

19717, 44324, N/A

Hep-

th

High-energy Physics Papers

Citation

4

11752, 134956, 21614

Wikipedia

Wikipedia webpages

Hyperlink

19

x

2405, 17981, N/A

20

NewsGroup

News reports

None

20

18846, N/A, 42757

Reuters

News reports

None

8

7647, N/A, 7688

Notes:

When we remove pregiven links from text-rich graphs we get document collections (the last 2 datasets)

Not all datasets we use come with raw text, some only have TF-IDF / Word Freq. features

Our method can work well

without raw text

and

/or

without pregiven links

, while almost all baselines require at least one of them available

We claim our major contribution with text-rich graphs (the first 5 datasets)

Slide27

Main Performance Table – Text-rich Graph Datasets

Row-wise comparison:Our method uniformly and significantly outperforms the state-of-art baselines on all these popular text-rich graph datasetsLDA does a better job than MLP GNN based methods also generally shows very competitive results, but there is limited difference within this line of works

Random-walk based methods relies on matrix factorization, which is generally a linear process to deal with relational data and lacks expressive power (even CANE is limited in this way )

Slide28

Main Performance Table – Text-rich Graph Datasets

Row-wise comparison:Text-GCN is the strongest baseline, the idea of introducing additional text relations is rather game-changing!Our most significant gain is achieved with the most difficult Hep-th dataset

Slide29

Ablation studyMethod: remove different components of our framework and to check how the performance is affected

Goal: validate the usefulness of each building component

Slide30

Analysis:

Ab. 1 vs. the rest: the importance of using various types of text relationshipAb. 0 vs. Ab. 2: pregiven links is usually helpful to some extent, though relatively limited with Hep-thAb. 0 vs. Ab. 3: usefulness of doc_word links (direct channels between document and words)

Ab. 0 vs. Ab. 4:

our model trains word embeddings that better suits the downstream classification task

Ab. 0 vs. Ab. 5

and Ab. 6 vs. Ab. 7: the existence of topic node is highly important in most cases, no matter how many word nodes are used

Becomes Text-

gcn

Note:

Our method include {

Doc_doc

,

doc_topic

,

topic_word

,

doc_word

} links

Slide31

Analysis:Ab. 0 vs. Ab. 6: when topic node is NOT present, using all vocabulary without pmi link is a very bad choice

Ab. 5 vs. Ab. 7: when topic node is present, using all vocabulary without pmi link is does not have a consistent effectAb. 7 vs. Ab. 8: the usage of word_word pmi is highly crucial to the success of text-gcn. However, it also requires all vocabulary to be used as word nodes, which leads to an implicit tradeoff!

Slide32

Other interesting findingsThe optimal number of topic nodes is usually 1 to 1.5 times the number of classification categoriesThe optimal #words/topic usually ranges from 20 to 100, usually accounting for only less than 5% of the entire vocabulary

Using all vocabulary leads to significant overfit of the model

Slide33

On-going experimentsRobustness to hyperparametersInitialize the with different feature extractors

End2end training frameworkCase study of learned topic and word embeddings