/
Improving Cross-lingual Entity Alignment via Optimal Transport Improving Cross-lingual Entity Alignment via Optimal Transport

Improving Cross-lingual Entity Alignment via Optimal Transport - PowerPoint Presentation

welnews
welnews . @welnews
Follow
364 views
Uploaded On 2020-08-27

Improving Cross-lingual Entity Alignment via Optimal Transport - PPT Presentation

Shichao Pei Lu Yu Xiangliang Zhang CEMSE King Abdullah University of Science and Technology KAUST SA 81919 The 28th International Joint Conference on Artificial Intelligence August 1016 2019 Macao China ID: 804303

macao intelligence 2019 28th intelligence macao 28th 2019 august artificial conference joint international china entity level loss embedding entities

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Improving Cross-lingual Entity Alignment..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Improving Cross-lingual Entity Alignment via Optimal Transport

Shichao Pei, Lu Yu, Xiangliang ZhangCEMSEKing Abdullah University of Science and Technology (KAUST), SA

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

1

Slide2

Outline

The Task Background and Related WorkProposed ModelExperimentConclusion

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

2

Slide3

The Task

ProblemKGs differ in language and content – Similar domain, e.g., geography.In each KG which includes a set of triples, each of which includes a head entity (e.g., Mexico), a relation (e.g., neighbor of) and a tail entity (e.g., USA).

Entity alignment is to find pairs of entities with the same meaning

(one in English KG and the other in French KG), so called aligned entities, e.g., Mexico with Mexique, USA with

Etats-Unis.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

3

Slide4

The Task

MotivationVarious methods, sources, and languages have been explored to construct KGs, and most existing KGs are developed separately.These KGs are inevitably heterogeneous in surface forms and typically supplementary in contents.

It is thus essential to align entities in multiple KGs and join them into a unified KG for knowledge-driven applications.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

4

Slide5

Background

Related WorkFeature Engineering MethodsThe semantics of OWL properties [Hu et al., 2011] Compatible neighbors and attribute values of entities [Suchanek

et al., 2012] Structural information of relations [Lacoste-Julien

et al., 2013] Well-designed hand-crafted features [Mahdisoltani

et al., 2014]

Time-consuming, labor-expensive and suffers from extension inflexibility.Embedding-based Methods

Encoding the KGs in separated embedding space or unified embedding space.

MTransE

[Chen

et al.

, 2017],

ITransE

[Zhu

et al.

, 2017].

Jointly modeling the KGs and attributes. JAPE [Sun

et al.

, 2017],

KDCoE

[Chen

et al.

, 2018].

Iteratively enlarging the labeled entity pairs based on the bootstrapping strategy.

BootEA [Sun et al., 2018]

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

5

Slide6

Background

Limitations of Current MethodsLimited gain due to the shortage of labeled entity pairsIgnorance of dualityFailure on matching the whole distribution

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

6

Slide7

Our Objective

Objective Learning the translation matrix by dually minimizing both entity-level and

group-level loss.The group-level loss describes the

discrepancy between distributions of different embeddings.Challenges

The group-level loss is difficult to measure using a statistical distance.

GAN still suffers from an unstably weak learning signal.Inspired by the progress of optimal transport, how to use the theory to match distributions is still not explored.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

7

Slide8

Contribution

Proposed to solve entity alignment by dually minimizing both the entity-level loss and the group-level loss via optimal transport theory.

Imposed

L2,1 norm on the dual translation matrices

, which can enforce the translation matrix to be close to orthogonal. Conducted extensive experiments on six real-world datasets and show the superior performance of our proposed model over the state-of-the-art methods.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

8

Slide9

Outline

The Task Background and Related WorkProposed ModelExperimentConclusion

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

9

Slide10

Proposed Model

Knowledge Graph EmbeddingTransEMargin-based ranking

Entity-level loss

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

10

Embeddings of head, tail entity, and relation.

M

1

transfer embedding of G

i

into the embedding space of

G

j

1

2

3

Slide11

Proposed Model

Group-level Loss – Optimal Transport based.8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

11

WGAN

Transferred embedding

entity embedding

Slide12

Proposed Model

RegularizerThe translation matrix is desired to be orthogonal.Employing L2,1 norm as the regularizer.Preventing the matrix to be dense, and mitigating the error induced by dense matrix.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

12

Slide13

Proposed Model

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China13

Regularizer

Slide14

Outline

The Task Background and Related WorkProposed ModelExperimentConclusion

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

14

Slide15

Experiment

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China15

Evaluation Metric

We adopt popular metrics

Hits@k

and MRR

Datasets

Baselines: three categories

Category 1: Encoding the KGs in

separated

embedding space or

unified

embedding space.

MTransE

,

ITransE

.

Category 2: Jointly modeling the

KGs and attributes

.

JAPE, GCN based method.

Category 3: Iteratively enlarging the labeled entity pairs based on the

bootstrapping strategy

.

BootEA

,

ITransE

Slide16

Experiment

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China16

OTEA consistently outperforms all baselines methods on all datasets.

Significant improvement (10%-50%) of Hits@1 value on almost all datasets.

For the largest KG, OTEA improved 33%-59% under different metrics.

The accumulated error is not avoidable for bootstrapping-based methods, especially, for the largest KG.

OTEA w/o reg results in dense translation matrices, which introduce increased noise into the translation.

OTEA w/o dual is harder than OTEA to reach the optimal and convergence, it needs to search in a broader parameter space.

Slide17

Experiment

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

17

Slide18

Experiment

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China18

Sensitivity to the Proportion of Prior Aligned Entities.

All methods have better performance with the growth of the proportion of aligned entities.

OTEA and

BootEA

have much better performance than other baselines, due to the employment of unlabeled data and selection of labeled data.

Slide19

Experiment

8/19/19The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China19

Sensitivity to the Dimension of KG Embeddings

Time Complexity Comparison

OTEA method is consistently better than all other baselines. And its performance is quite stable when varying d.

OTEA is faster than

BootEA

, because the bootstrapping based method need to propose the new aligned entities by calculating the similarity with all unaligned entities.

Slide20

Outline

The Task Background and Related WorkProposed ModelExperimentConclusion

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

20

Slide21

Conclusion

Introduced a novel framework for cross-lingual entity alignment.Solved the entity alignment by dually minimizing both the

entity-level loss and group-level loss via optimal transport theory.

Imposed regularizer on the dual translation matrices to mitigate the effect of noise during transformation

.Achieved superior results comparing with other SOTA methods.

In future work, how to combine the model with attribute and relation information.

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

21

Slide22

Thank you for your attention!

Q&ALab of Machine Intelligence and kNowledge E

ngineering (MINE): http://mine.kaust.edu.sa/

8/19/19

The 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China

22