Shuo Ren 12 Wenhu Chen 3 Shujie Liu 4 Mu Li 4 Ming Zhou 4 and Shuai Ma 12 1 SKLSDE Lab Beihang University China 2 Beijing Advanced Innovation Center for Big Data and Brain Computing ID: 810625
Download The PPT/PDF document "Triangular Architecture for Rare Languag..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Triangular Architecture for Rare Language Translation
Shuo Ren
1,2*
,
Wenhu
Chen3, Shujie Liu4, Mu Li4, Ming Zhou4 and Shuai Ma1,21SKLSDE Lab, Beihang University, China2Beijing Advanced Innovation Center for Big Data and Brain Computing 3University of California, Santa Barbara, CA, USA4Microsoft Research Asia, Beijing, China
* Contribution during internship at Microsoft Research Asia.
Slide2Motivation
Related Work
Method
Experiment
Background
NMT suffers from data sparsity problem.
Rich language pairs have a lot of parallel data while rare languages not.
How to improve rare language translation
by
leveraging the data of rich languages?
Scenario
Large bilingual corpus between rich languages and .Small bilingual corpora and between rare language and rich languages and .
:EN
: FR
: HE
EN: English
FR: French
HE: Hebrew
Slide3Methods to Tackle Low-resource Problem
Exploiting monolingual data
Back-translation to exploit target-side monolingual data. (
Sennich et al. 2015)Self learning and multi-task learning algorithms to exploit source-side monolingual data. (Zhang and
Zong, 2016)
Joint training to combine source-target translation model and target-source translation model. (Cheng et al. 2016 and Zhang et al. 2018)Multilingual neural machine translation Several encoders and decoders for different languages with a shared attention mechanism. (Firat et al. 2016)Exploiting other parallel resourcesTeacher-student method. (Chen et al. 2017 use this method to address zero-shot NMT.)
Motivation
Related Work
Method
ExperimentJust consider the source side and target side from a single language pair.Focus on the model architecture rather than the training method.
Slide4Goal
Improve the translation performance of four models
for
rare
language jointly.
Our M
ethod
Language
is used as a hidden space to translate a sentence from language
to language
, and from
to .EM training can be leveraged.
:EN
: FR
: HE
Motivation
Related Work
Method
Experiment
Slide5Generalized EM Training
Motivation
Related Work
Method
Experiment
-
M-step
E-step
Introduce z as the hidden variable into the log
likelihood.
Use Jessen inequality to find the lower-bound
.
Choose a posterior distribution of
as
.
Decompose the log likelihood into two parts.
In M-step, maximize the
lower-bound
.
In E-step, minimize the
gap
between
and
.
Generalized EM Training
Motivation
Related Work
Method
Experiment
E-step: optimize
by finding a lower bound.
Leverage
to update
by maximizing the agreement of
generated from
and .M-step: optimize by maximizing the lower-bound.Leverage
to update
by maximizing the expectation of the final translation
.
Similarly for
direction to optimize
and
.
How to train 4 models jointly ?
.
Joint EM Training
E
:
Find the lower bound
M
:Maxmize the Expectation of y
X
ZY
E
:
Find the lower bound
M
:Maxmize the Expectation of x
XZY
E-Step
of XZY: Update
using
M-Step of XZY: Update
using
E-Step of YZX:
Update
using
M
-Step of YZX: Update
using
E-Step
of X
ZY:
Update
using
M
-Step
of X
ZY:
Update
using
E-Step
of Y
ZX:
Update
using
M
-Step
of Y
ZX:
Update
using
)
……
TA-NMT
Motivation
Related Work
Method
Experiment
Slide8Pair
MultiUN
IWSLT2012
Lang
Size
Lang
Size
EN-FR
9.9 M
EN-FR
7.9 M
EN-AR
116 K
EN-HE
112.6 K
FR-AR
116 K
FR-HE
116.3 K
Mono
AR
3 M
HE
512.5 K
EN-ES
116 K
EN-RO
467.3 K
FR-ES
116 K
FR-RO
111.6 K
Mono
ES
3 M
RO
885.0 K
Pair
MultiUN
IWSLT2012
Lang
Size
Lang
Size
EN-FR
9.9 M
EN-FR
7.9 M
EN-AR
116 K
EN-HE
112.6 K
FR-AR
116 K
FR-HE
116.3 K
AR
3 M
HE
512.5 K
EN-ES
116 K
EN-RO
467.3 K
FR-ES
116 K
FR-RO
111.6 K
ES
3 M
RO
885.0 K
Motivation
Related Work
Method
Experiment
Z
X
Y
Dataset
Simulated rare language scenario
Real
rare language scenario
Slide9RNNSearch
:
Singual
layer GRU-based NMT system trained only with low-resource bilingual data.
PBSMT: Phrase based statistical machine translation system.
T-S: Teacher-student training method.
BackTrans
: Back translation with monolingual data.
Motivation
Related WorkMethodExperiment
Baselines
Slide10Our method
RNNSearch
:
Singual
layer GRU-based NMT system trained only with low-resource bilingual data.
PBSMT: Phrase based statistical machine translation system.
T-S: Teacher-student training method.
BackTrans
: Back translation with monolingual data.
MotivationRelated Work
Method
ExperimentZ
X
Y
Baselines
Method
Required
Resources
RNNSearch
,
PBSMT
,
T-S
,
,
BackTrans
,
,
,
Mono
TA-NMT
,
,
TA-NMT(GI)
,
,
,
Mono
Method
Required
Resources
RNNSearch
PBSMT
T-S
BackTrans
TA-NMT
TA-NMT(GI)
Our method combining back-translation as good initialization
Slide11+0.7
AR
EN
FR
Motivation
Related Work
Method
Experiment
Results:
MultiUN
+0.4
ES
EN
FR
+1.1
BLEU
BLEU
+1.1
+1.0
+0.9
+1.0
+0.3
+1.5
+1.0
+1.6
+0.7
+1.4
+0.9
Slide12HE
EN
FR
Motivation
Related Work
Method
Experiment
Results: IWSLT
RO
EN
FR
+1.8
BLEU
BLEU
+1.2
+1.2
+0.8
+1.4
+0.7
+1.2
+0.6
+1.6
+2.2
+1.1
+1.6
+0.6
+1.1
+0.7
+1.2
Slide13The effect of monolingual Z
The improvements by monolingual Z are different in two datasets.
Motivation
Related Work
Method
Experiment
More Discussion
Method
Dataset
MultiUN
(Avg. BLEU)
IWSLT(Avg. BLEU)
T-S
29.49
24.59
BackTrans
30.75
(+1.26)
24.89
(+0.30)
TA-NMT
30.22
25.54
TA-NMT(GI)
31.45
(+1.23)
26.19
(+0.65)
Slide14Pair
MultiUN
IWSLT2012
Lang
Size
Lang
Size
EN-FR
9.9 M
EN-FR
7.9 M
EN-AR
116 K
EN-HE
112.6 K
FR-AR
116 K
FR-HE
116.3 K
Mono
AR
3 M
HE
512.5 K
EN-ES
116 K
EN-RO
467.3 K
FR-ES
116 K
FR-RO
111.6 K
Mono
ES
3 M
RO
885.0 K
Pair
MultiUN
IWSLT2012
Lang
Size
Lang
Size
EN-FR
9.9 M
EN-FR
7.9 M
EN-AR
116 K
EN-HE
112.6 K
FR-AR
116 K
FR-HE
116.3 K
AR
3 M
HE
512.5 K
EN-ES
116 K
EN-RO
467.3 K
FR-ES
116 K
FR-RO
111.6 K
ES
3 M
RO
885.0 K
Motivation
Related Work
Method
Experiment
Z
X
Y
Dataset
Slide15The effect of monolingual Z
The improvements by monolingual Z are different in two datasets.
Motivation
Related Work
Method
Experiment
More Discussion
Method
Dataset
MultiUN
(Avg. BLEU)
IWSLT(Avg. BLEU)
T-S
29.49
24.59
BackTrans
30.75
(+1.26)
24.89
(+0.30)
TA-NMT
30.22
25.54
TA-NMT(GI)
31.45
(+1.23)
26.19
(+0.65)
Slide16The EM training Curves
Changes of validation BLEU on (EN, FR, AR) group during training
Motivation
Related Work
Method
Experiment
More
Discussion
Four models are improved jointly
Slide17Motivation
Related Work
Method
Experiment
Summary
TA-NMT, a triangular architecture to tackle the low-resource problem in NMT.
Jointly train four translation models from and to the rare language with the help of large bilingual corpus between rich languages.
Take the rare language as the hidden variable, and optimize in an EM framework.
The results on
MultiUN
and IWSLT datasets demonstrate the effectiveness.
Z
X
Y
Slide18T
hanks!
Q & A
Slide19Related workExploiting monolingual dataBack-translation to exploit target-side monolingual data (
Sennich et al. 2015)
Use model
to generate pseudo data for
and verse vice.
Use model
to generate pseudo data for
and verse vice.
Related workExploiting other parallel resourcesTeacher-student method
Use model
to generate pseudo data for training
and
Use model
to generate pseudo data for training
and
Levearge
to update
by maximize the agreement of
generated from
and y
Levearge
to update
by maximize the expectation of the final translation
Generalized EM Training