Masaki Itagaki Language Excellence Takako Aikawa Machine Translation Incubation at MSR Microsoft Motivation MSRMT Quirk et al 2005 A statistical machine translation Training with bilingual contents from software user guides Web contents etc ID: 472465
Download Presentation The PPT/PDF document "Post-MT Term Swapper: Supplementing a St..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Post-MT Term Swapper: Supplementing a Statistical Machine Translation System with a User Dictionary
Masaki Itagaki
(Language Excellence)
Takako Aikawa
(Machine Translation Incubation at MSR)
MicrosoftSlide2
Motivation
MSR-MT
(Quirk, et al 2005)
A statistical machine translation
Training with bilingual contents from software, user guides, Web contents, etc.
Used for localizing software and user contents
Issues
SMT may not use “product-specific translations”.
“Contact list
”
連絡先リスト:
Windows, Office etc
メンバー リスト
: Windows LiveSlide3
Conditions
Do not apply dictionary data BEFORE MT’s input sentence analysis
This could diffuse “
treelet
mapping” (e.g. “access information”).
Try to find a “black box” solutionDo not touch MT engine itself: Customize mapping information “by products” is not realistic. The solution should work for ANY MT systems
Correct translations in MT output.Slide4
[Source] Your
contact list
is empty.
[Target]
連絡先リストが空です。
Overview
Step 1: Get a raw MT output
[Source] Your contact list is empty.
[Target]
連絡先リストが空です。
Step 2: Identity noun terms
[Source] Your contact list is empty.
[Target] 連絡先リストが空です。
Step 3: Find a match in the user dictionary
[
Dict]contact list = メンバーリスト
Step 4: Swap the translation
[Target] メンバー リストが空です。Slide5
Identifying a translation
How MT translates “contact list”?
Contact list
連絡先リスト
This is a contact list.
これは、連絡先の一覧です。
My contact list already exists.
既に自分の
連絡先リスト
が存在する。(contact list)(連絡先)Slide6
Translation templates
Found 15 pattern sentences (or
templates
)that may generate most of the variations.
Templates
Patterns
Descriptions
SUBJ + V
X exists
A term as the subject of an intransitive verb.
PREP_WITHwith XA term following a common preposition, “with”SUBJ+BE
X is a word. A term as the subject of a copula.OBJ_V Select X.A term as an object of a transitive verb.PARENTHESIS
(X)A term in parenthesis.Slide7
Process
How “contact list” could be translated?
contact list
MT
MT
MT
Candidates:
連絡先リスト、連絡先の一覧、メンバーリスト
etc
Strip out all “template text translations”: e.g. “This is”, “is a word”, etcSlide8
[Source] Your contact list is empty.
[Target]
連絡先リストが空です。
Overview
–
again…
Step 1: Get a raw MT output
[Source] Your contact list is empty.
[Target]
連絡先リストが空です。
Step 2: Identity noun terms
[Source] Your contact list is empty.
[Target]
連絡先リストが空です。
Step 3: Find a match in the user dictionary
[Dict
]contact list = メンバーリスト
Step 4: Swap the translation
[Target] メンバー リストが空です。Slide9
Coverage Experiment, using MSR-MT
Design
A Dummy User Dictionary: 634 nouns
Language Pairs: English->Japanese, Chinese, and Korean systems
Test data: 500 sentences from a game product
(for each sentence, (at least) one candidate(s) for DUMMY)
Aztec army
British Fort Command building
.
.
.
etc.
DUMMY
DUMMY
DUMMY
DUMMYDUMMYDUMMYDUMMY
EJECEK
90.6%92%86%Slide10
Error Analysis
Why not 100%?
[Input] Choose what
shader
model
to use.
[Dictionary]:
shader
model: [DUMMY] (
シェーダのモデル
)
[MT Output]
どの
シェーダ
を使用してモデルを選択します。Slide11
Impact of Term Swapper on MT Quality
Experiment Design
Bleu, Edit Distance
Three MT systems
A “real” user dictionary with real entries
(634 noun entries)
Language Pair: English -> Japanese
500 sentences from a game domain
(=same as those used for the coverage experiment)Slide12
Results
MSR-MT
Without
Term Swapper
With
Term Swapper
Bleu
12.43
22.51
Edit-distance
0.63
0.52System A
Without Term Swapper
With Term SwapperBleu6.3913.80Edit-distance
0.660.6
System BWithout Term SwapperWith Term SwapperBleu5.9318.26Edit-distance0.660.56Slide13
Future Work
An automatic way to leverage our templates?
Term Swapper for languages with rich inflections/agreement?
Term Swapper for other types of lexical items (not just for nouns)?Slide14
Q&A
Thank you!!Slide15
©
2007 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.