/
Machine Translation Introduction to MT Machine Translation Introduction to MT

Machine Translation Introduction to MT - PowerPoint Presentation

ubiquad
ubiquad . @ubiquad
Follow
346 views
Uploaded On 2020-08-29

Machine Translation Introduction to MT - PPT Presentation

Machine Translation Fully automatic Helping human translators Enter Source Text Translation from Stanfords Phrasal  这 不过 是 一 个 时间 的 问题 This is only a matter of time ID: 811571

translation english machine word english translation word machine languages chinese divergences language spanish direct storm rita human sentences marked

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Machine Translation Introduction to MT" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Machine Translation

Introduction to MT

Slide2

Machine TranslationFully automatic

Helping human translators

Enter Source Text:

Translation from Stanford’s

Phrasal

:

 这 不过 是 一 个 时间 的 问题

.

This is only a matter of time.

Slide3

Google TranslateFried ripe plantains:

http:/

/laylita.com

/recetas/2008/02/28/platanos-maduros-fritos

/

Slide4

Machine TranslationThe Story of the Stone (“The Dream of the Red Chamber”)

Cao

Xueqin

1792

Chinese gloss

: Dai-yu alone at bed on think-of-with-gratitude Bao-chai… again listen to window outside bamboo tip plantain leaf of on, rain sound sigh drop, clear cold penetrate curtain, not feeling again fall down tears come.Hawkes translation: As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

Slide5

Difficulties in Chinese to English translation

Long Chinese sentences: 4

English sentences to 1 Chinese

Chinese no pronouns or articles (English

the, a

)Chinese has locative post-positions, English prepositionsChinese bed on, window outside, English on the bed, outside the windowChinese rarely marks tense:English as, turned to, had begun,Chinese tou, ‘penetrate’ -> English penetratedChinese relative clauses are before the noun, English afterChinese: [window outside bamboo on] rainEnglish: rain [on the bamboo outside the window]Stylistic and cultural differencesChinese bamboo tip plaintain leaf -> bamboos and plantainsChinese rain sound sigh drop -> insistent rustle of the rainChinese ma ‘curtain’ -> curtains of her bed

Slide6

Alignment in Machine Translation

Slide7

Early MT History

1946 Booth and Weaver discuss MT

in New York

1947-48 idea of dictionary-based direct

translation

1947 Warren Weaver suggests translation by computer1949 Weaver memorandum1952 all 18 MT researchers in world meet at MIT1954 IBM/Georgetown Demo Russian-English MT1955-65 lots of labs take up MThttp://www.hutchinsweb.me.uk/PPF-TOC.htm

Slide8

1949 Weaver memorandumhttp://www.mt-archive.info/Weaver-1949.pdf

“There are certain invariant properties which

are…

common to all languages”

‘When

I look at an article in Russian, I say "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.”’“[If] one can see… N words on either side, then, if N is large enough, one can unambiguously decide the meaning of the central word.”8

Slide9

The History of MT: Pessimism

1959/

1960

Yehoshua Bar

-Hillel “Report on the state of MT in US and GB”

FAHQ MT too hard because we would have to encode all of human knowledgeInstead we should work on computer tools for human translators

Slide10

The claim that fully automatic high quality MT is impossible

Yehoshua Bar-Hillel.

1960. A Demonstration of the

Nonfeasibility

of Fully Automatic High Quality Translation.

Little John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy.Pen1: Enclosure for small childrenPen2: Writing utensil

Pen

1: Enclosure for small children

Slide11

The box was in the pen.

Slide12

The claim that fully automatic high quality MT is impossibleYehoshua Bar-Hillel, 1960

“I now claim that no existing

or imaginable

program will enable an electronic computer to

determine…”

Slide13

The state of the art in MT

Slide14

The state of the art in MT

Slide15

History of MT: Further PessimismThe ALPAC report

Headed

by John R. Pierce of Bell Labs

Conclusions:

MT doesn’t work

MT a failure: all current MT work had to be post-editedIntelligibility and informativeness worse than humanWe don’t need MT anyhowAlready too many human translators from RussianResults: MT research sufferedFunding lossNumber of research labs declinedAssociation for Machine Translation and Computational Linguistics dropped MT from its name

Slide16

MT in the modern age1975-1985 Resurgence of MT in Europe and Japan

Domain-specific rule-based systems

1990-present

Rise of Statistical Machine Translation

Slide17

Machine Translation

Introduction to MT

Slide18

Machine Translation

Language Divergences

Slide19

Language Similarities and DivergencesTypology:

the

study of systematic cross-linguistic similarities and

differences

What

are the dimensions along which human languages vary?

Slide20

Syntactic Variation: Basic Word OrdersSVO (Subject-Verb-Object) languages

English, German, French,

Mandarin

I baked a pizza

SOV

LanguagesJapanese, HindiEnglish: He adores listening to musicJapanese: kare ha ongaku wo kiku no ga daisuki desu he music to listening adoresVSO languagesIrish, Classical Arabic, Tagalog

In many languages one word order is more basic

Slide21

MorphologyMorpheme:

Minimal

meaningful unit of

language”Word = Morpheme + Morpheme + Morpheme +…Stems: (base form, root) hope+ing  hoping hop  hoppingAffixesPrefixes: AntidisestablishmentarianismSuffixes: AntidisestablishmentarianismInfixes: hingi (borrow) – humingi (borrower) in TagalogCircumfixes: sagen (say) – gesagt (said

) in German

Slide22

Morphemes per Word

isolating

synthetic

Vietnamese

Joseph Greenberg. 1954. A Quantitative Approach to the Morphological Typology of Language. IJAL 26:3

.131.06Yakut (Turkic)

2.17

English1.68

WestGreenlandic

(Eskimo-Inuit)

3.722

Swahili

2.55

4

Slide23

Few morphemes

per

word

:

Cantonese

“He said this was the biggest building in the whole country”Each word in this sentence has one morpheme (and one syllable):keui wa chyuhn gwok jeui daaih gaan nguk haih li gaan

he say entire country

most big bldg house is this bldg

Slide24

Many Morphemes per word: Turkishuygarla

ş

t

ı

ramad

ıklarımızdanmışsınızcasınauygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casınaBehaving as if you are among those whom we could not cause to become civilized

Slide25

Word SegmentationAre word boundaries marked in writing?Some writing systems: boundaries between words not marked

Chinese, Japanese,

Thai

Word segmentation becomes an important part of text normalization for MT

Some

languages tend to have sentences that are quite long, closer to English paragraphs than sentences:Modern Standard Arabic, ChineseSentence segmentation may be necessary for MT between these languages and languages like English

Slide26

Inferential Load: cold vs. hot languages

Hot

languages:

Who did

what to

whom is marked explicitlyEnglishCold languages:The hearer has more “figuring out” of who the various actors in the various events areJapanese, ChineseBalthasar Bickel. 2003. Referential density in discourse and syntactic typology. Language 79:2, 708-36

Slide27

Inferential Load: The blue noun phrases are not in the Chinese original

飓风丽塔已经减弱为第三级飓风,

Rita weakened and was downgraded to a

Category

3

storm;ø 迫近美国德课萨斯州和路易斯安那州,[Rita/it/the storm] is moving close to Texas and Louisiana;当局表示,the authorities announced; 虽然

ø

在登陆前可能再稍微减弱,although [Rita/it/the storm] might weaken again before landing,

但 ø

仍然会非常危险,[Rita/it/the storm] is still very

dangerous;

ø

预料

ø

在当地时间星期六凌晨在德州和路易斯安那州之间登陆

[the authorities]

predict

[Rita/it/the storm]

will

arrive at the Texas-Louisiana border on

Saturday

morning local time;

ø

直接吹袭休斯敦市东面的主要炼油设施。

[Rita/it/the storm]

will directly hit the

oil

-refining industry east of Houston

.

Slide28

Lexical DivergencesWord to phrases:English

computer science

French

informatique

Part of Speech divergencesEnglish She likes to sing German Sie singt gerne [She sings likefully]English I’m hungrySpanish Tengo hambre [I have hunger]

Slide29

Lexical Specificity Divergences

Grammatical

specificity

Spanish: plural pronouns have gender (

ellos

/ellas)English: plural pronouns no gender (they)So translating “they” from English to Spanish, need to figure out gender of the referent!

Slide30

Lexical Divergences: Semantic Specificity

English

brother

Mandarin

gege (older brother), didi (younger brother)English wallGerman Wand (inside) Mauer (outside)English fishSpanish pez (the creature) pescado (fish as food)Cantonese ngau

English

cow beef

Slide31

Predicate Argument divergences

English Spanish

The bottle

floated

out

. La botella salió flotando. The bottle exited floatingSatellite-framed languages: direction of motion is marked on the satelliteCrawl out, float off, jump down, walk over to, run afterMost of Indo-European, Hungarian, Finnish, ChineseVerb-framed languages: direction of motion is marked on the verbSpanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiesL. Talmy. 1985. Lexicalization patterns: Semantic Structure in Lexical Form.

Slide32

Predicate Argument divergences:Heads and Argument swapping

Heads:

English: X

swim

across Y

Spanish: X crucar Y nadandoEnglish: I like to eatGerman: Ich esse gernEnglish: I’d prefer vanillaGerman: Mir wäre Vanille lieber

Arguments:

Spanish

:

Y

me

gusta

English: I like

Y

German:

Der

Termin

fällt

mir

ein

English: I

forget

the date

Dorr, Bonnie J., "Machine Translation Divergences: A Formal Description and Proposed Solution," Computational Linguistics, 20:4

,

597--

633

Slide33

Predicate-Argument Divergence Counts

Found divergences in 32

% of sentences in

UN

Spanish/English

CorpusPart of SpeechX tener hambre Y have

hunger

98%

Phrase/Light verb

X

dar

puñaladas

a

Z

X

stab

Z

83%

Structural

X

entrar

en Y

X enter

Y

35%

Heads swap

X

cruzar

Y

nadando

X

swim

across

Y

8%

Arguments swap

X

gustar

a Y

Y

likes

X

6%

B.Dorr

et al. 2002.

DUSTer

: A Method for

Unraveling Cross

-Language Divergences for

Statistical Word

-Level

Alignment

Slide34

Machine Translation

Language Divergences

Slide35

Machine Translation

Three classical methods for MT

Slide36

3 Classical methods for MT

Direct

Transfer

Interlingua

Slide37

Three MT Approaches: Direct, Transfer, Interlingual

Slide38

Direct TranslationProceed word-by-word through textTranslating each word

No intermediate structures except morphology

Knowledge is in the form of

Huge bilingual dictionary

word-to-word translation information

After word translation, can do simple reorderingAdjective ordering English -> French/Spanish

Slide39

Direct MT Dictionary entry

Slide40

Direct MT

Slide41

Problems with direct MTGermanChinese

Slide42

The Transfer ModelIdea: apply contrastive knowledge, i.e., knowledge about the difference between two languages

Steps:

Analysis

: Syntactically parse

source

languageTransfer: Rules to turn this parse into parse for target languageGeneration: Generate target sentence from parse tree

Slide43

English to FrenchEnglish

: Adjective Noun

French: Noun Adjective

This is not always

true

Route mauvaise ‘bad road, badly-paved road’Mauvaise route ‘wrong road’But is a reasonable first approximationRule:

Slide44

Transfer rules

Slide45

Transferring the green witch….45

Slide46

Interlingua

Instead of N

2

sets of transfer rules

Use meaning as a representation language

Parse source sentence into meaning representationGenerate target sentence from meaning.Intuition: Use other NLP applications to do MT workEnglish book to Spanish: libro or reservarDisambiguate book into concepts BOOKVOLUME and RESERVENeed 2N systems (a parser and generator for each language)

Slide47

Interlingua for Mary did not slap the green witch

Slide48

Machine Translation

Three classical methods for MT