Thang Luong ACL 2015 Joint work with Ilya Sutskever Quoc Le Oriol Vinyals amp Wojciech Zaremba Standard Machine Translation MT T ranslate locally phrases by phrases ID: 810626
Download The PPT/PDF document "Addressing the Rare Word Problem in Neur..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Addressing the Rare Word Problem in Neural Machine Translation
Thang LuongACL 2015Joint work with: Ilya Sutskever, Quoc Le, Oriol Vinyals, & Wojciech Zaremba.
Slide2Standard Machine Translation (MT)T
ranslate locally phrases by phrases: Good progress: Moses (Koehn et al., 2007) among many others.Many subcomponents need to be tuned separately.Hybrid systems with neural components:Language model: (Schwenk
et al., 2006), (Vaswani et al., 2013).Translation model
: (
Schwenk
, 2012), (Devlin et al., 2014).Complex pipeline.Desire: a simple system that translates globally.
Cindy
loves
Cindy
cute cats
aime
les chats mignons
Slide3Neural Machine Translation (NMT)Encoder-
decoder: first proposed at Google & Montreal.Advantages:Minimal domain knowledge.Dimensionality reduction: up to 100-gram source-conditioned LMs.No gigantic phrase tables or LMs.Simple beam-search decoder.
A
B
C
D
–
X
Y
Z
Source Sentence
X
Y
Z
–
Target Sentence
(
Sutskever
et al., 2014)
Slide4Existing NMT Work
EncoderDecoder(Kalchbrenner & Blunsom, 2013)Convolutional Net
RNN
(
Sutskever
et al., 2014)Long-short term memory (LSTM)
LSTM
(Cho et al., 2014), (Bahdanau et al., 2015)
Gated Recurrent Unit (GRU)GRU
All decoders use recurrent networks.All* NMT work uses fixed modest-size
vocabulary<unk> to represent all OOV words.Translations with <unk> are troublesome!
*Except the very recent work (Jean et al., 2015): scale to large vocabulary.
Slide5The Rare Word Problem
NMTs translate poorly for sentences with rare words.The ecotax portico in
Pont-de-BuisLe
portique
écotaxe de Pont-de-Buis
The <u
nk> portico in <unk
>Le <unk>
<unk> de <unk
>
Original
Actual input
Slide6Our approachIdea
: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“
Attention” for rare words (
Bahdanau
et al., 2015
).
The
ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
The
<unk> portico in <unk
>Le unk1 unk
-1 de unk1
Original
Actual input
Slide7Our approachIdea
: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“
Attention” for rare words (
Bahdanau
et al., 2015
).
The
ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
The <unk> portico
in <unk>Le unk1
unk-1 de
unk1Original
Actual input
Slide8Our approachIdea
: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“
Attention” for rare words (
Bahdanau
et al., 2015
).
The
ecotax portico in Pont-de-Buis
Le portique écotaxe de Pont-de-Buis
The <unk> portico
in <unk>Le unk1
unk-1 de
unk1Original
Actual input
Slide9Our approachIdea
: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“Attention” for rare words (Bahdanau et al., 2015).
Treat any neural MT as a black
box:
annotate
training data
& post-process translations.
The
ecotax portico in Pont-de-BuisLe
portique écotaxe de Pont-de-Buis
The <unk
> portico in <unk>
Le unk1 unk-
1 de unk
1
Original
Actual input
Slide10ExperimentsWMT’14 English-French
Hyper-parameters: newstest2012+2013.BLEU: newstest2014.Setup: similar to (Sutskever et al., 2014)Stacking LSTMs: 1000 cells, 1000-dim embeddings.Reverse source sentences.
Slide11Results
SystemsBLEUSOTA in WMT’14 (Durrani et al., 2014)37.0
Our NMT systems
(
40K
target vocab)Single 6-layer LSTM30.4
Single 6-layer LSTM +
Our technique32.7 (+2.3)Ensemble of 8 LSTMs34.1
Ensemble of 8 LSTMs + Our technique
36.9 (+2.8)
Better models: better gains with our technique
Naïve approach: monotonic alignments of <
unk
>
Only +0.8 BLEU gain.
Slide12Results
SystemsBLEUSOTA in WMT’14 (Durrani et al., 2014)37.0
Our NMT systems
(
40K
target vocab)Single 6-layer LSTM30.4
Single 6-layer LSTM +
Our technique32.7 (+2.3)Ensemble of 8 LSTMs34.1
Ensemble of 8 LSTMs + Our technique
36.9 (+2.8)Our NMT systems
(80K target vocab)Single 6-layer LSTM
31.5Single 6-layer LSTM + Our technique33.1 (+1.6)
Ensemble of 8 LSTMs35.6Ensemble of 8
LSTMs + Our technique37.5 (+1.9)
New SOTA: about +2.0 BLEU gain with our
technique
Slide13Existing Work
SystemsVocabBLEUEnsemble 8 LSTMs (This work)80K37.5
SOTA in WMT’14 (
Durrani
et al., 2014)
All37.0
Standard MT + neural components
Neural Language Model
(Schwenk, 2014)All
33.3Phrase table neural features (Cho et al., 2014)
All34.5Ensemble 5 LSTMs, rerank
n-best lists (Sutskever et al., 2014)All36.5
Slide14Existing Work
SystemsVocabBLEUEnsemble 8 LSTMs (This work)80K37.5
SOTA in WMT’14 (
Durrani
et al., 2014)
All37.0
Standard MT + neural components
Neural Language Model
(Schwenk, 2014)All
33.3Phrase table neural features (Cho et al., 2014)
All34.5Ensemble 5 LSTMs, rerank
n-best lists (Sutskever et al., 2014)All36.5
End-to-end NMT systems
Ensemble 5 LSTMs
(Sutskever et al., 2014)80K
34.8
Single
RNNsearch
(
Bahdanau
et al., 2015)
30K
28.5
Ensemble 8
RNNsearch
+
Unknown replace
(
Jean et al., 2015)
500K
37.2
Still SOTA performance until now!
We got
37.7
after ACL camera-ready version.
Slide15Effects of Translating Rare WordsBetter than existing SOTA on both frequent and rare words.
Slide16Effects of Network Depths
Each layer gives on average about +1 BLEU gain.More accurate models: better gains with our technique.+1.9
+
2.0
+
2.2
Slide17Perplexity vs. BLEUTraining objective: perplexity.
Strong correlation: 0.5 perplexity reduction gives about +1.0 BLEU.
Slide18Sample translationsPredict well long-distance alignments.
srcAn additional 2600 operations including
orthopedic and
cataract
surgery will help
clear a backlog .
ref
2600
opérations supplémentaires , notamment dans le domaine
de la chirurgie orthopédique et de la cataracte , aideront à
rattraper le retard .
trans
En outre , unk1 opérations
supplémentaires , dont la chirurgie
unk5 et la
unk
6
,
permettront
de
résorber
l'
arriéré
.
trans+unk
En
outre
,
2600
opérations
supplémentaires
, dont la
chirurgie orthopédiques
et la cataracte , permettront
de résorber l' arriéré .
Slide19Sample translationsTranslate well long sentences.
srcThis trader , Richard
Usher , left RBS in
2010
and is understand to have be given leave from his current position as European head of
forex spot trading at
JPMorgan .
ref
Ce
trader , Richard Usher , a quitté RBS en 2010 et aurait
été mis suspendu de son poste de responsable européen
du trading au comptant pour les devises chez JPMorgan
trans
Ce
unk0 , Richard
unk
0
, a
quitté
unk
1
en 2010 et a
compris
qu
'
il
est
autorisé à quitter son poste actuel en tant
que leader européen du marché des points de vente
au unk5 .
trans+unk
Ce
négociateur , Richard
Usher , a quitté
RBS en 2010 et a compris qu
' il est autorisé
à quitter son poste actuel en tant
que leader européen du marché
des points de vente au JPMorgan
.
Slide20Sample translationsIncorrect alignment prediction:
was – était vs. abandonnait.src
But concerns have grown after Mr
Mazanga
was quoted as saying
Renamo
was
abandoning the 1992 peace accord .ref
Mais
l' inquiétude a grandi après que M. Mazanga a
déclaré que la Renamo abandonnait l' accord de
paix de 1992 .
trans
Mais les inquiétudes se sont
accrues après que M. unkpos
3
a
déclaré
que
la
unk
3
unk
3
l' accord de
paix
de 1992 .
trans+unk
Mais
les
inquiétudes
se sont accrues après que
M. Mazanga a déclaré
que la
Renamo était
l' accord de paix de 1992 .
Slide21ConclusionSimple technique to tackle rare words:
Applicable to any NMT (+2.0 BLEU improvements).State-of-the-art result in WMT’14 English-French.Future work:More challenging language pairs.Thank you!