/
Post-MT Term Swapper: Supplementing a Statistical Machine T Post-MT Term Swapper: Supplementing a Statistical Machine T

Post-MT Term Swapper: Supplementing a Statistical Machine T - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
398 views
Uploaded On 2016-10-07

Post-MT Term Swapper: Supplementing a Statistical Machine T - PPT Presentation

Masaki Itagaki Language Excellence Takako Aikawa Machine Translation Incubation at MSR Microsoft Motivation MSRMT Quirk et al 2005 A statistical machine translation Training with bilingual contents from software user guides Web contents etc ID: 472465

term contact step list contact term list step microsoft target translation user dictionary swapper empty source information msr templates

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Post-MT Term Swapper: Supplementing a St..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary

Masaki Itagaki

(Language Excellence)

Takako Aikawa

(Machine Translation Incubation at MSR)

MicrosoftSlide2

Motivation

MSR-MT

(Quirk, et al 2005)

A statistical machine translation

Training with bilingual contents from software, user guides, Web contents, etc.

Used for localizing software and user contents

Issues

SMT may not use “product-specific translations”.

“Contact list

連絡先リスト:

Windows, Office etc

メンバー リスト

: Windows LiveSlide3

Conditions

Do not apply dictionary data BEFORE MT’s input sentence analysis

This could diffuse “

treelet

mapping” (e.g. “access information”).

Try to find a “black box” solutionDo not touch MT engine itself: Customize mapping information “by products” is not realistic. The solution should work for ANY MT systems

Correct translations in MT output.Slide4

[Source] Your

contact list

is empty.

[Target]

連絡先リストが空です。

Overview

Step 1: Get a raw MT output

[Source] Your contact list is empty.

[Target]

連絡先リストが空です。

Step 2: Identity noun terms

[Source] Your contact list is empty.

[Target] 連絡先リストが空です。

Step 3: Find a match in the user dictionary

[

Dict]contact list = メンバーリスト

Step 4: Swap the translation

[Target] メンバー リストが空です。Slide5

Identifying a translation

How MT translates “contact list”?

Contact list

連絡先リスト

This is a contact list.

これは、連絡先の一覧です。

My contact list already exists.

既に自分の

連絡先リスト

が存在する。(contact list)(連絡先)Slide6

Translation templates

Found 15 pattern sentences (or

templates

)that may generate most of the variations.

Templates

Patterns

Descriptions

SUBJ + V

X exists

A term as the subject of an intransitive verb.

PREP_WITHwith XA term following a common preposition, “with”SUBJ+BE

X is a word. A term as the subject of a copula.OBJ_V Select X.A term as an object of a transitive verb.PARENTHESIS

(X)A term in parenthesis.Slide7

Process

How “contact list” could be translated?

contact list

MT

MT

MT

Candidates:

連絡先リスト、連絡先の一覧、メンバーリスト

etc

Strip out all “template text translations”: e.g. “This is”, “is a word”, etcSlide8

[Source] Your contact list is empty.

[Target]

連絡先リストが空です。

Overview

again…

Step 1: Get a raw MT output

[Source] Your contact list is empty.

[Target]

連絡先リストが空です。

Step 2: Identity noun terms

[Source] Your contact list is empty.

[Target]

連絡先リストが空です。

Step 3: Find a match in the user dictionary

[Dict

]contact list = メンバーリスト

Step 4: Swap the translation

[Target] メンバー リストが空です。Slide9

Coverage Experiment, using MSR-MT

Design

A Dummy User Dictionary: 634 nouns

Language Pairs: English->Japanese, Chinese, and Korean systems

Test data: 500 sentences from a game product

(for each sentence, (at least) one candidate(s) for DUMMY)

Aztec army

British Fort Command building

.

.

.

etc.

DUMMY

DUMMY

DUMMY

DUMMYDUMMYDUMMYDUMMY

EJECEK

90.6%92%86%Slide10

Error Analysis

Why not 100%?

[Input] Choose what

shader

model

to use.

[Dictionary]:

shader

model: [DUMMY] (

シェーダのモデル

)

[MT Output]

どの

シェーダ

を使用してモデルを選択します。Slide11

Impact of Term Swapper on MT Quality

Experiment Design

Bleu, Edit Distance

Three MT systems

A “real” user dictionary with real entries

(634 noun entries)

Language Pair: English -> Japanese

500 sentences from a game domain

(=same as those used for the coverage experiment)Slide12

Results

MSR-MT

Without

Term Swapper

With

Term Swapper

Bleu

12.43

22.51

Edit-distance

0.63

0.52System A

Without Term Swapper

With Term SwapperBleu6.3913.80Edit-distance

0.660.6

System BWithout Term SwapperWith Term SwapperBleu5.9318.26Edit-distance0.660.56Slide13

Future Work

An automatic way to leverage our templates?

Term Swapper for languages with rich inflections/agreement?

Term Swapper for other types of lexical items (not just for nouns)?Slide14

Q&A

Thank you!!Slide15

©

2007 Microsoft

Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.