/
Addressing the Rare Word Problem in Neural Machine Translation Addressing the Rare Word Problem in Neural Machine Translation

Addressing the Rare Word Problem in Neural Machine Translation - PowerPoint Presentation

bikersjoker
bikersjoker . @bikersjoker
Follow
345 views
Uploaded On 2020-08-29

Addressing the Rare Word Problem in Neural Machine Translation - PPT Presentation

Thang Luong ACL 2015 Joint work with Ilya Sutskever Quoc Le Oriol Vinyals amp Wojciech Zaremba Standard Machine Translation MT T ranslate locally phrases by phrases ID: 810626

2014 unk pont rare unk 2014 rare pont portico neural translations lstms work amp words nmt buis 2015 target

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Addressing the Rare Word Problem in Neur..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Addressing the Rare Word Problem in Neural Machine Translation

Thang LuongACL 2015Joint work with: Ilya Sutskever, Quoc Le, Oriol Vinyals, & Wojciech Zaremba.

Slide2

Standard Machine Translation (MT)T

ranslate locally phrases by phrases: Good progress: Moses (Koehn et al., 2007) among many others.Many subcomponents need to be tuned separately.Hybrid systems with neural components:Language model: (Schwenk

et al., 2006), (Vaswani et al., 2013).Translation model

: (

Schwenk

, 2012), (Devlin et al., 2014).Complex pipeline.Desire: a simple system that translates globally.

Cindy

loves

Cindy

cute cats

aime

les chats mignons

Slide3

Neural Machine Translation (NMT)Encoder-

decoder: first proposed at Google & Montreal.Advantages:Minimal domain knowledge.Dimensionality reduction: up to 100-gram source-conditioned LMs.No gigantic phrase tables or LMs.Simple beam-search decoder.

A

B

C

D

X

Y

Z

Source Sentence

X

Y

Z

Target Sentence

(

Sutskever

et al., 2014)

Slide4

Existing NMT Work

EncoderDecoder(Kalchbrenner & Blunsom, 2013)Convolutional Net

RNN

(

Sutskever

et al., 2014)Long-short term memory (LSTM)

LSTM

(Cho et al., 2014), (Bahdanau et al., 2015)

Gated Recurrent Unit (GRU)GRU

All decoders use recurrent networks.All* NMT work uses fixed modest-size

vocabulary<unk> to represent all OOV words.Translations with <unk> are troublesome!

*Except the very recent work (Jean et al., 2015): scale to large vocabulary.

Slide5

The Rare Word Problem

NMTs translate poorly for sentences with rare words.The ecotax portico in

Pont-de-BuisLe

portique

écotaxe de Pont-de-Buis

The <u

nk> portico in <unk

>Le <unk>

<unk> de <unk

>

Original

Actual input

Slide6

Our approachIdea

: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“

Attention” for rare words (

Bahdanau

et al., 2015

).

The

ecotax portico in Pont-de-Buis

Le portique écotaxe de Pont-de-Buis

The

<unk> portico in <unk

>Le unk1 unk

-1 de unk1

Original

Actual input

Slide7

Our approachIdea

: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“

Attention” for rare words (

Bahdanau

et al., 2015

).

The

ecotax portico in Pont-de-Buis

Le portique écotaxe de Pont-de-Buis

The <unk> portico

in <unk>Le unk1

unk-1 de

unk1Original

Actual input

Slide8

Our approachIdea

: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“

Attention” for rare words (

Bahdanau

et al., 2015

).

The

ecotax portico in Pont-de-Buis

Le portique écotaxe de Pont-de-Buis

The <unk> portico

in <unk>Le unk1

unk-1 de

unk1Original

Actual input

Slide9

Our approachIdea

: track where each target <unk> comes fromAnnotate train data: unsupervised alignments & relative indices.Post-process test translations: word/identity translations.“Attention” for rare words (Bahdanau et al., 2015).

Treat any neural MT as a black

box:

annotate

training data

& post-process translations.

The

ecotax portico in Pont-de-BuisLe

portique écotaxe de Pont-de-Buis

The <unk

> portico in <unk>

Le unk1 unk-

1 de unk

1

Original

Actual input

Slide10

ExperimentsWMT’14 English-French

Hyper-parameters: newstest2012+2013.BLEU: newstest2014.Setup: similar to (Sutskever et al., 2014)Stacking LSTMs: 1000 cells, 1000-dim embeddings.Reverse source sentences.

Slide11

Results

SystemsBLEUSOTA in WMT’14 (Durrani et al., 2014)37.0

Our NMT systems

(

40K

target vocab)Single 6-layer LSTM30.4

Single 6-layer LSTM +

Our technique32.7 (+2.3)Ensemble of 8 LSTMs34.1

Ensemble of 8 LSTMs + Our technique

36.9 (+2.8)

Better models: better gains with our technique

Naïve approach: monotonic alignments of <

unk

>

Only +0.8 BLEU gain.

Slide12

Results

SystemsBLEUSOTA in WMT’14 (Durrani et al., 2014)37.0

Our NMT systems

(

40K

target vocab)Single 6-layer LSTM30.4

Single 6-layer LSTM +

Our technique32.7 (+2.3)Ensemble of 8 LSTMs34.1

Ensemble of 8 LSTMs + Our technique

36.9 (+2.8)Our NMT systems

(80K target vocab)Single 6-layer LSTM

31.5Single 6-layer LSTM + Our technique33.1 (+1.6)

Ensemble of 8 LSTMs35.6Ensemble of 8

LSTMs + Our technique37.5 (+1.9)

New SOTA: about +2.0 BLEU gain with our

technique

Slide13

Existing Work

SystemsVocabBLEUEnsemble 8 LSTMs (This work)80K37.5

SOTA in WMT’14 (

Durrani

et al., 2014)

All37.0

Standard MT + neural components

Neural Language Model

(Schwenk, 2014)All

33.3Phrase table neural features (Cho et al., 2014)

All34.5Ensemble 5 LSTMs, rerank

n-best lists (Sutskever et al., 2014)All36.5

Slide14

Existing Work

SystemsVocabBLEUEnsemble 8 LSTMs (This work)80K37.5

SOTA in WMT’14 (

Durrani

et al., 2014)

All37.0

Standard MT + neural components

Neural Language Model

(Schwenk, 2014)All

33.3Phrase table neural features (Cho et al., 2014)

All34.5Ensemble 5 LSTMs, rerank

n-best lists (Sutskever et al., 2014)All36.5

End-to-end NMT systems

Ensemble 5 LSTMs

(Sutskever et al., 2014)80K

34.8

Single

RNNsearch

(

Bahdanau

et al., 2015)

30K

28.5

Ensemble 8

RNNsearch

+

Unknown replace

(

Jean et al., 2015)

500K

37.2

Still SOTA performance until now!

We got

37.7

after ACL camera-ready version.

Slide15

Effects of Translating Rare WordsBetter than existing SOTA on both frequent and rare words.

Slide16

Effects of Network Depths

Each layer gives on average about +1 BLEU gain.More accurate models: better gains with our technique.+1.9

+

2.0

+

2.2

Slide17

Perplexity vs. BLEUTraining objective: perplexity.

Strong correlation: 0.5 perplexity reduction gives about +1.0 BLEU.

Slide18

Sample translationsPredict well long-distance alignments.

srcAn additional 2600 operations including

orthopedic and

cataract

surgery will help

clear a backlog .

ref

2600

opérations supplémentaires , notamment dans le domaine

de la chirurgie orthopédique et de la cataracte , aideront à

rattraper le retard .

trans

En outre , unk1 opérations

supplémentaires , dont la chirurgie

unk5 et la

unk

6

,

permettront

de

résorber

l'

arriéré

.

trans+unk

En

outre

,

2600

opérations

supplémentaires

, dont la

chirurgie orthopédiques

et la cataracte , permettront

de résorber l' arriéré .

Slide19

Sample translationsTranslate well long sentences.

srcThis trader , Richard

Usher , left RBS in

2010

and is understand to have be given leave from his current position as European head of

forex spot trading at

JPMorgan .

ref

Ce

trader , Richard Usher , a quitté RBS en 2010 et aurait

été mis suspendu de son poste de responsable européen

du trading au comptant pour les devises chez JPMorgan

trans

Ce

unk0 , Richard

unk

0

, a

quitté

unk

1

en 2010 et a

compris

qu

'

il

est

autorisé à quitter son poste actuel en tant

que leader européen du marché des points de vente

au unk5 .

trans+unk

Ce

négociateur , Richard

Usher , a quitté

RBS en 2010 et a compris qu

' il est autorisé

à quitter son poste actuel en tant

que leader européen du marché

des points de vente au JPMorgan

.

Slide20

Sample translationsIncorrect alignment prediction:

was – était vs. abandonnait.src

But concerns have grown after Mr

Mazanga

was quoted as saying

Renamo

was

abandoning the 1992 peace accord .ref

Mais

l' inquiétude a grandi après que M. Mazanga a

déclaré que la Renamo abandonnait l' accord de

paix de 1992 .

trans

Mais les inquiétudes se sont

accrues après que M. unkpos

3

a

déclaré

que

la

unk

3

unk

3

l' accord de

paix

de 1992 .

trans+unk

Mais

les

inquiétudes

se sont accrues après que

M. Mazanga a déclaré

que la

Renamo était

l' accord de paix de 1992 .

Slide21

ConclusionSimple technique to tackle rare words:

Applicable to any NMT (+2.0 BLEU improvements).State-of-the-art result in WMT’14 English-French.Future work:More challenging language pairs.Thank you!