/
Triangular Architecture for Rare Language Translation Triangular Architecture for Rare Language Translation

Triangular Architecture for Rare Language Translation - PowerPoint Presentation

hirook
hirook . @hirook
Follow
348 views
Uploaded On 2020-08-29

Triangular Architecture for Rare Language Translation - PPT Presentation

Shuo Ren 12 Wenhu Chen 3 Shujie Liu 4 Mu Li 4 Ming Zhou 4 and Shuai Ma 12 1 SKLSDE Lab Beihang University China 2 Beijing Advanced Innovation Center for Big Data and Brain Computing ID: 810625

116 method data related method 116 related data translation motivation work nmt step experiment language training update monolingual bleu

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Triangular Architecture for Rare Languag..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Triangular Architecture for Rare Language Translation

Shuo Ren

1,2*

,

Wenhu

Chen3, Shujie Liu4, Mu Li4, Ming Zhou4 and Shuai Ma1,21SKLSDE Lab, Beihang University, China2Beijing Advanced Innovation Center for Big Data and Brain Computing 3University of California, Santa Barbara, CA, USA4Microsoft Research Asia, Beijing, China

* Contribution during internship at Microsoft Research Asia.

Slide2

Motivation

Related Work

Method

Experiment

Background

NMT suffers from data sparsity problem.

Rich language pairs have a lot of parallel data while rare languages not.

How to improve rare language translation

by

leveraging the data of rich languages?

Scenario

Large bilingual corpus between rich languages and .Small bilingual corpora and between rare language and rich languages and .

 

:EN

 

: FR

 

: HE

 

EN: English

FR: French

HE: Hebrew

Slide3

Methods to Tackle Low-resource Problem

Exploiting monolingual data

Back-translation to exploit target-side monolingual data. (

Sennich et al. 2015)Self learning and multi-task learning algorithms to exploit source-side monolingual data. (Zhang and

Zong, 2016)

Joint training to combine source-target translation model and target-source translation model. (Cheng et al. 2016 and Zhang et al. 2018)Multilingual neural machine translation Several encoders and decoders for different languages with a shared attention mechanism. (Firat et al. 2016)Exploiting other parallel resourcesTeacher-student method. (Chen et al. 2017 use this method to address zero-shot NMT.)

Motivation

Related Work

Method

ExperimentJust consider the source side and target side from a single language pair.Focus on the model architecture rather than the training method.

Slide4

Goal

Improve the translation performance of four models

for

rare

language jointly.

Our M

ethod

Language

is used as a hidden space to translate a sentence from language

to language

, and from

to .EM training can be leveraged. 

 

 

:EN

 

: FR

 

: HE

 

 

 

 

 

Motivation

Related Work

Method

Experiment

Slide5

Generalized EM Training

Motivation

Related Work

Method

Experiment

 

 

 

 

-

 

M-step

E-step

Introduce z as the hidden variable into the log

likelihood.

Use Jessen inequality to find the lower-bound

.

 

Choose a posterior distribution of

as

.

 

Decompose the log likelihood into two parts.

In M-step, maximize the

lower-bound

.

In E-step, minimize the

gap

between

and

.

 

Slide6

Generalized EM Training

Motivation

Related Work

Method

Experiment

E-step: optimize

by finding a lower bound.

Leverage

to update

by maximizing the agreement of

generated from

and .M-step: optimize by maximizing the lower-bound.Leverage

to update

by maximizing the expectation of the final translation

.

Similarly for 

direction to optimize

and

.

 

 

 

 

 

 

 

 

 

 

 

How to train 4 models jointly ?

.

 

 

 

Slide7

Joint EM Training

E

:

Find the lower bound

M

:Maxmize the Expectation of y

X

ZY

E

:

Find the lower bound

M

:Maxmize the Expectation of x

XZY

E-Step

of XZY: Update

using

M-Step of XZY: Update

using

E-Step of YZX:

Update

using

M

-Step of YZX: Update

using

E-Step

of X

ZY:

Update

using

M

-Step

of X

ZY:

Update

using

E-Step

of Y

ZX:

Update

using

M

-Step

of Y

ZX:

Update

using

)

……

 

TA-NMT

Motivation

Related Work

Method

Experiment

Slide8

Pair

MultiUN

IWSLT2012

Lang

Size

Lang

Size

EN-FR

9.9 M

EN-FR

7.9 M

EN-AR

116 K

EN-HE

112.6 K

FR-AR

116 K

FR-HE

116.3 K

Mono

AR

3 M

HE

512.5 K

EN-ES

116 K

EN-RO

467.3 K

FR-ES

116 K

FR-RO

111.6 K

Mono

ES

3 M

RO

885.0 K

Pair

MultiUN

IWSLT2012

Lang

Size

Lang

Size

EN-FR

9.9 M

EN-FR

7.9 M

EN-AR

116 K

EN-HE

112.6 K

FR-AR

116 K

FR-HE

116.3 K

AR

3 M

HE

512.5 K

EN-ES

116 K

EN-RO

467.3 K

FR-ES

116 K

FR-RO

111.6 K

ES

3 M

RO

885.0 K

Motivation

Related Work

Method

Experiment

Z

X

Y

Dataset

Simulated rare language scenario

Real

rare language scenario

Slide9

RNNSearch

:

Singual

layer GRU-based NMT system trained only with low-resource bilingual data.

PBSMT: Phrase based statistical machine translation system.

T-S: Teacher-student training method.

BackTrans

: Back translation with monolingual data.

Motivation

Related WorkMethodExperiment

Baselines

Slide10

Our method

RNNSearch

:

Singual

layer GRU-based NMT system trained only with low-resource bilingual data.

PBSMT: Phrase based statistical machine translation system.

T-S: Teacher-student training method.

BackTrans

: Back translation with monolingual data.

MotivationRelated Work

Method

ExperimentZ

X

Y

Baselines

Method

Required

Resources

RNNSearch

,

PBSMT

,

T-S

,

,

BackTrans

,

,

,

Mono

TA-NMT

,

,

TA-NMT(GI)

,

,

,

Mono

Method

Required

Resources

RNNSearch

PBSMT

T-S

BackTrans

TA-NMT

TA-NMT(GI)

Our method combining back-translation as good initialization

Slide11

+0.7

AR

EN

FR

Motivation

Related Work

Method

Experiment

Results:

MultiUN

+0.4

ES

EN

FR

+1.1

BLEU

BLEU

+1.1

+1.0

+0.9

+1.0

+0.3

+1.5

+1.0

+1.6

+0.7

+1.4

+0.9

Slide12

HE

EN

FR

Motivation

Related Work

Method

Experiment

Results: IWSLT

RO

EN

FR

+1.8

BLEU

BLEU

+1.2

+1.2

+0.8

+1.4

+0.7

+1.2

+0.6

+1.6

+2.2

+1.1

+1.6

+0.6

+1.1

+0.7

+1.2

Slide13

The effect of monolingual Z

The improvements by monolingual Z are different in two datasets.

Motivation

Related Work

Method

Experiment

More Discussion

Method

Dataset

MultiUN

(Avg. BLEU)

IWSLT(Avg. BLEU)

T-S

29.49

24.59

BackTrans

30.75

(+1.26)

24.89

(+0.30)

TA-NMT

30.22

25.54

TA-NMT(GI)

31.45

(+1.23)

26.19

(+0.65)

Slide14

Pair

MultiUN

IWSLT2012

Lang

Size

Lang

Size

EN-FR

9.9 M

EN-FR

7.9 M

EN-AR

116 K

EN-HE

112.6 K

FR-AR

116 K

FR-HE

116.3 K

Mono

AR

3 M

HE

512.5 K

EN-ES

116 K

EN-RO

467.3 K

FR-ES

116 K

FR-RO

111.6 K

Mono

ES

3 M

RO

885.0 K

Pair

MultiUN

IWSLT2012

Lang

Size

Lang

Size

EN-FR

9.9 M

EN-FR

7.9 M

EN-AR

116 K

EN-HE

112.6 K

FR-AR

116 K

FR-HE

116.3 K

AR

3 M

HE

512.5 K

EN-ES

116 K

EN-RO

467.3 K

FR-ES

116 K

FR-RO

111.6 K

ES

3 M

RO

885.0 K

Motivation

Related Work

Method

Experiment

Z

X

Y

Dataset

Slide15

The effect of monolingual Z

The improvements by monolingual Z are different in two datasets.

Motivation

Related Work

Method

Experiment

More Discussion

Method

Dataset

MultiUN

(Avg. BLEU)

IWSLT(Avg. BLEU)

T-S

29.49

24.59

BackTrans

30.75

(+1.26)

24.89

(+0.30)

TA-NMT

30.22

25.54

TA-NMT(GI)

31.45

(+1.23)

26.19

(+0.65)

Slide16

The EM training Curves

Changes of validation BLEU on (EN, FR, AR) group during training

Motivation

Related Work

Method

Experiment

More

Discussion

Four models are improved jointly

Slide17

Motivation

Related Work

Method

Experiment

Summary

TA-NMT, a triangular architecture to tackle the low-resource problem in NMT.

Jointly train four translation models from and to the rare language with the help of large bilingual corpus between rich languages.

Take the rare language as the hidden variable, and optimize in an EM framework.

The results on

MultiUN

and IWSLT datasets demonstrate the effectiveness.

Z

X

Y

Slide18

T

hanks!

Q & A

Slide19

Related workExploiting monolingual dataBack-translation to exploit target-side monolingual data (

Sennich et al. 2015)

 

 

 

 

 

 

Use model

to generate pseudo data for

and verse vice.

 

 

 

Use model

to generate pseudo data for

and verse vice.

 

Slide20

Related workExploiting other parallel resourcesTeacher-student method

 

 

 

 

 

 

 

 

 

 

 

Use model

to generate pseudo data for training

and

 

 

Use model

to generate pseudo data for training

and

 

Slide21

 

 

 

 

 

Levearge

to update

by maximize the agreement of

generated from

and y

 

Levearge

to update

by maximize the expectation of the final translation

 

 

 

 

 

 

Generalized EM Training