Machine Translation Fully automatic Helping human translators Enter Source Text Translation from Stanfords Phrasal 这 不过 是 一 个 时间 的 问题 This is only a matter of time ID: 811571
Download The PPT/PDF document "Machine Translation Introduction to MT" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Machine Translation
Introduction to MT
Slide2Machine TranslationFully automatic
Helping human translators
Enter Source Text:
Translation from Stanford’s
Phrasal
:
这 不过 是 一 个 时间 的 问题
.
This is only a matter of time.
Slide3Google TranslateFried ripe plantains:
http:/
/laylita.com
/recetas/2008/02/28/platanos-maduros-fritos
/
Slide4Machine TranslationThe Story of the Stone (“The Dream of the Red Chamber”)
Cao
Xueqin
1792
Chinese gloss
: Dai-yu alone at bed on think-of-with-gratitude Bao-chai… again listen to window outside bamboo tip plantain leaf of on, rain sound sigh drop, clear cold penetrate curtain, not feeling again fall down tears come.Hawkes translation: As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.
Slide5Difficulties in Chinese to English translation
Long Chinese sentences: 4
English sentences to 1 Chinese
Chinese no pronouns or articles (English
the, a
)Chinese has locative post-positions, English prepositionsChinese bed on, window outside, English on the bed, outside the windowChinese rarely marks tense:English as, turned to, had begun,Chinese tou, ‘penetrate’ -> English penetratedChinese relative clauses are before the noun, English afterChinese: [window outside bamboo on] rainEnglish: rain [on the bamboo outside the window]Stylistic and cultural differencesChinese bamboo tip plaintain leaf -> bamboos and plantainsChinese rain sound sigh drop -> insistent rustle of the rainChinese ma ‘curtain’ -> curtains of her bed
Slide6Alignment in Machine Translation
Slide7Early MT History
1946 Booth and Weaver discuss MT
in New York
1947-48 idea of dictionary-based direct
translation
1947 Warren Weaver suggests translation by computer1949 Weaver memorandum1952 all 18 MT researchers in world meet at MIT1954 IBM/Georgetown Demo Russian-English MT1955-65 lots of labs take up MThttp://www.hutchinsweb.me.uk/PPF-TOC.htm
Slide81949 Weaver memorandumhttp://www.mt-archive.info/Weaver-1949.pdf
“There are certain invariant properties which
are…
common to all languages”
‘When
I look at an article in Russian, I say "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”’“[If] one can see… N words on either side, then, if N is large enough, one can unambiguously decide the meaning of the central word.”8
Slide9The History of MT: Pessimism
1959/
1960
Yehoshua Bar
-Hillel “Report on the state of MT in US and GB”
FAHQ MT too hard because we would have to encode all of human knowledgeInstead we should work on computer tools for human translators
Slide10The claim that fully automatic high quality MT is impossible
Yehoshua Bar-Hillel.
1960. A Demonstration of the
Nonfeasibility
of Fully Automatic High Quality Translation.
Little John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy.Pen1: Enclosure for small childrenPen2: Writing utensil
Pen
1: Enclosure for small children
Slide11The box was in the pen.
Slide12The claim that fully automatic high quality MT is impossibleYehoshua Bar-Hillel, 1960
“I now claim that no existing
or imaginable
program will enable an electronic computer to
determine…”
Slide13The state of the art in MT
Slide14The state of the art in MT
Slide15History of MT: Further PessimismThe ALPAC report
Headed
by John R. Pierce of Bell Labs
Conclusions:
MT doesn’t work
MT a failure: all current MT work had to be post-editedIntelligibility and informativeness worse than humanWe don’t need MT anyhowAlready too many human translators from RussianResults: MT research sufferedFunding lossNumber of research labs declinedAssociation for Machine Translation and Computational Linguistics dropped MT from its name
Slide16MT in the modern age1975-1985 Resurgence of MT in Europe and Japan
Domain-specific rule-based systems
1990-present
Rise of Statistical Machine Translation
Slide17Machine Translation
Introduction to MT
Slide18Machine Translation
Language Divergences
Slide19Language Similarities and DivergencesTypology:
the
study of systematic cross-linguistic similarities and
differences
What
are the dimensions along which human languages vary?
Slide20Syntactic Variation: Basic Word OrdersSVO (Subject-Verb-Object) languages
English, German, French,
Mandarin
I baked a pizza
SOV
LanguagesJapanese, HindiEnglish: He adores listening to musicJapanese: kare ha ongaku wo kiku no ga daisuki desu he music to listening adoresVSO languagesIrish, Classical Arabic, Tagalog
In many languages one word order is more basic
Slide21MorphologyMorpheme:
“
Minimal
meaningful unit of
language”Word = Morpheme + Morpheme + Morpheme +…Stems: (base form, root) hope+ing hoping hop hoppingAffixesPrefixes: AntidisestablishmentarianismSuffixes: AntidisestablishmentarianismInfixes: hingi (borrow) – humingi (borrower) in TagalogCircumfixes: sagen (say) – gesagt (said
) in German
Slide22Morphemes per Word
isolating
synthetic
Vietnamese
Joseph Greenberg. 1954. A Quantitative Approach to the Morphological Typology of Language. IJAL 26:3
.131.06Yakut (Turkic)
2.17
English1.68
WestGreenlandic
(Eskimo-Inuit)
3.722
Swahili
2.55
4
Slide23Few morphemes
per
word
:
Cantonese
“He said this was the biggest building in the whole country”Each word in this sentence has one morpheme (and one syllable):keui wa chyuhn gwok jeui daaih gaan nguk haih li gaan
he say entire country
most big bldg house is this bldg
Slide24Many Morphemes per word: Turkishuygarla
ş
t
ı
ramad
ıklarımızdanmışsınızcasınauygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casınaBehaving as if you are among those whom we could not cause to become civilized
Slide25Word SegmentationAre word boundaries marked in writing?Some writing systems: boundaries between words not marked
Chinese, Japanese,
Thai
Word segmentation becomes an important part of text normalization for MT
Some
languages tend to have sentences that are quite long, closer to English paragraphs than sentences:Modern Standard Arabic, ChineseSentence segmentation may be necessary for MT between these languages and languages like English
Slide26Inferential Load: cold vs. hot languages
Hot
languages:
Who did
what to
whom is marked explicitlyEnglishCold languages:The hearer has more “figuring out” of who the various actors in the various events areJapanese, ChineseBalthasar Bickel. 2003. Referential density in discourse and syntactic typology. Language 79:2, 708-36
Slide27Inferential Load: The blue noun phrases are not in the Chinese original
飓风丽塔已经减弱为第三级飓风,
Rita weakened and was downgraded to a
Category
3
storm;ø 迫近美国德课萨斯州和路易斯安那州,[Rita/it/the storm] is moving close to Texas and Louisiana;当局表示,the authorities announced; 虽然
ø
在登陆前可能再稍微减弱,although [Rita/it/the storm] might weaken again before landing,
但 ø
仍然会非常危险,[Rita/it/the storm] is still very
dangerous;
ø
预料
ø
会
在当地时间星期六凌晨在德州和路易斯安那州之间登陆
,
[the authorities]
predict
[Rita/it/the storm]
will
arrive at the Texas-Louisiana border on
Saturday
morning local time;
ø
直接吹袭休斯敦市东面的主要炼油设施。
[Rita/it/the storm]
will directly hit the
oil
-refining industry east of Houston
.
Slide28Lexical DivergencesWord to phrases:English
computer science
French
informatique
Part of Speech divergencesEnglish She likes to sing German Sie singt gerne [She sings likefully]English I’m hungrySpanish Tengo hambre [I have hunger]
Slide29Lexical Specificity Divergences
Grammatical
specificity
Spanish: plural pronouns have gender (
ellos
/ellas)English: plural pronouns no gender (they)So translating “they” from English to Spanish, need to figure out gender of the referent!
Slide30Lexical Divergences: Semantic Specificity
English
brother
Mandarin
gege (older brother), didi (younger brother)English wallGerman Wand (inside) Mauer (outside)English fishSpanish pez (the creature) pescado (fish as food)Cantonese ngau
English
cow beef
Slide31Predicate Argument divergences
English Spanish
The bottle
floated
out
. La botella salió flotando. The bottle exited floatingSatellite-framed languages: direction of motion is marked on the satelliteCrawl out, float off, jump down, walk over to, run afterMost of Indo-European, Hungarian, Finnish, ChineseVerb-framed languages: direction of motion is marked on the verbSpanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiesL. Talmy. 1985. Lexicalization patterns: Semantic Structure in Lexical Form.
Slide32Predicate Argument divergences:Heads and Argument swapping
Heads:
English: X
swim
across Y
Spanish: X crucar Y nadandoEnglish: I like to eatGerman: Ich esse gernEnglish: I’d prefer vanillaGerman: Mir wäre Vanille lieber
Arguments:
Spanish
:
Y
me
gusta
English: I like
Y
German:
Der
Termin
fällt
mir
ein
English: I
forget
the date
Dorr, Bonnie J., "Machine Translation Divergences: A Formal Description and Proposed Solution," Computational Linguistics, 20:4
,
597--
633
Slide33Predicate-Argument Divergence Counts
Found divergences in 32
% of sentences in
UN
Spanish/English
CorpusPart of SpeechX tener hambre Y have
hunger
98%
Phrase/Light verb
X
dar
puñaladas
a
Z
X
stab
Z
83%
Structural
X
entrar
en Y
X enter
Y
35%
Heads swap
X
cruzar
Y
nadando
X
swim
across
Y
8%
Arguments swap
X
gustar
a Y
Y
likes
X
6%
B.Dorr
et al. 2002.
DUSTer
: A Method for
Unraveling Cross
-Language Divergences for
Statistical Word
-Level
Alignment
Slide34Machine Translation
Language Divergences
Slide35Machine Translation
Three classical methods for MT
Slide363 Classical methods for MT
Direct
Transfer
Interlingua
Slide37Three MT Approaches: Direct, Transfer, Interlingual
Slide38Direct TranslationProceed word-by-word through textTranslating each word
No intermediate structures except morphology
Knowledge is in the form of
Huge bilingual dictionary
word-to-word translation information
After word translation, can do simple reorderingAdjective ordering English -> French/Spanish
Slide39Direct MT Dictionary entry
Slide40Direct MT
Slide41Problems with direct MTGermanChinese
Slide42The Transfer ModelIdea: apply contrastive knowledge, i.e., knowledge about the difference between two languages
Steps:
Analysis
: Syntactically parse
source
languageTransfer: Rules to turn this parse into parse for target languageGeneration: Generate target sentence from parse tree
Slide43English to FrenchEnglish
: Adjective Noun
French: Noun Adjective
This is not always
true
Route mauvaise ‘bad road, badly-paved road’Mauvaise route ‘wrong road’But is a reasonable first approximationRule:
Slide44Transfer rules
Slide45Transferring the green witch….45
Slide46Interlingua
Instead of N
2
sets of transfer rules
Use meaning as a representation language
Parse source sentence into meaning representationGenerate target sentence from meaning.Intuition: Use other NLP applications to do MT workEnglish book to Spanish: libro or reservarDisambiguate book into concepts BOOKVOLUME and RESERVENeed 2N systems (a parser and generator for each language)
Slide47Interlingua for Mary did not slap the green witch
Slide48Machine Translation
Three classical methods for MT