/
SYNTAX BASED MACHINE TRANSLATION SYNTAX BASED MACHINE TRANSLATION

SYNTAX BASED MACHINE TRANSLATION - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
342 views
Uploaded On 2019-11-09

SYNTAX BASED MACHINE TRANSLATION - PPT Presentation

SYNTAX BASED MACHINE TRANSLATION UNDER GUIDANCE OF PROF PUSHPAK BHATTACHARYYA PRESENTED BY ROUVEN R Ӧ HRIG 10V05101 ERANKI KIRAN 10438004 SRIHARSA MOHAPATRA 10405004 ARJUN ATREYA 09405011 942011 ID: 764949

based translation model phrase translation based phrase model language syntax hierarchical machine statistical english sentence ann proceedings ram arbor

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "SYNTAX BASED MACHINE TRANSLATION" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

SYNTAX BASED MACHINE TRANSLATION UNDER GUIDANCE OF PROF PUSHPAK BHATTACHARYYA PRESENTED BYROUVEN RӦHRIG (10V05101)ERANKI KIRAN (10438004)SRIHARSA MOHAPATRA (10405004)ARJUN ATREYA (09405011) 9/4/2011

Motivation Introduction Synchronous grammar Syntax based Language Model for SMT Hierarchical Phrase-Based MT Example Hindi translationsJoshua ToolkitConclusions OUTLINE

Motivation Consider the following English-Japanese example:(1) The boy stated that the student said that the teacher danced(2) shoonen-ga gakusei-ga sensei-ga odotta to itta to hanasita The-boy the-student the-teacher danced that said that stated-> Easy to translate the words.-> Very hard find the correct reordering! Syntax-based machine translation techniques start with the syntax. Some can deliver guaranteed correct syntax!David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Introduction (1) Syntax-based Language ModelNoisy channel modelUses 3 steps starting from the parse tree1. Reordering - create foreign language syntax tree2. Insertion - add extra words which are required in target language 3. Translation - Translation of leaf wordsEugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine TranslationBrown Univ.(2002)

Introduction (2) Basic phrase-based modelUses phrases instead of wordsInstance of noisy channel modelModeled as known: arg maxP (e | f) = arg maxP(e, f) = arg max(P(e) x P(f | e))Then 1. Segmentation of e into phrases ē1… ēI , 2. Reordering of ēi 3. Translation of ēi using P(f ̄ | ē) Problem: usually phrases reordered independent of their content  It is desirable to include a larger scope David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Introduction (3)Hierarchical Phrase-Based Model Consists of words and phrases.For example:English: "Australia is one of the few countries that have diplomatic relations with North Korea"German: ''Australien ist eines der weniges Länder, das diplomatische Beziehungen mit Nord-Korea hat"One example of of a hierarchical phrase is<[1] mit [2] hat, have [1] with [2]>[i] are placeholders for sub-phrases.Captures the fact of different placing in German and EnglishDavid Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Synchronous grammar (1) Production of a syntactic correct source language stringwill always deliver a syntactic correct target language stringGeneralizes context-free grammars (CFGs)Generates pair of strings e.g. (1) S → ⟨NP[1] VP[2] ,NP[1] VP[2] ⟩ (2) VP → ⟨V[1] NP[2], NP[2] V[1] ⟩[i] model the relations of non-terminal symbolsApplying rule (1) and (2) produces:Replacing S → ⟨NP[1] VP[2], NP[1] VP[2]⟩=> ⟨NP[1] V[3] NP[4] ,NP[1] NP[4] V[3]⟩ - David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Synchronous grammar (2) ⟨NP[1] V[3] NP[4] ,NP[1] NP[4] V[3]⟩When applying a rule, both sides have to be replaced similarly!When replacing NP[1] on the left side, then also NP[1] on the ride side.NP → ⟨I, watashi wa⟩ NP → ⟨the box, hako wo⟩ V → ⟨open, akemasu⟩=> ⟨I open the box ,watashi wa hako wo akemasu ⟩ David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Synchronous grammar (3) Solution for everything?-> Lowering or raising of tree is not possible!Example:John misses MaryMary manque à John(Mary is-missed by John) S → <NP[0] VP[1], NP[0] VP[1]>  “à John“ is part of the VPNP → <John, John>NP → <Mary, Mary>Not possible to replace correctly!An Introduction to Synchronous Grammars - David Chiang, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006.

Noisy channel model – where source Language Sentence E is distorted by the channel into the Foriegn Language F.argmaxE p(E|F) = argmaxE p(E)p(F|E) .. .. ..(1)LM TMBase SMT System: It is a parse tree-string tranlsation model (english parse tree[input]-->French sentence [output] p(E|F) ∞ ∑p(E, π)p(F|E, π ) where π – parse tree of english sentence This model performs 3 types of operations – reorders, insertion, translation The direction of real translation(decoding) is reverse of translation Model Extract CFG rules from parsed corpus of english, using std. Bottom up parser. A decoder is given chinese sentence to get best english parse tree p(E), p(F|E) Syntax based Language Model for SMT Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Translation Brown Univ.(2002)

Parsing/Language Model: comprises of 2 stages based on Penn tree bank corpus a. Non-Lexical PCFG (create large parse forest for sentence)b. Pruning step p(e ki,j | w1,n) = α(ni,jk) p(rule(ei,jk)) πnnl,m є rhs(e i,j k ) β (n l,m n ) c. Lexical PCFG( examine edges and pull out most probable parse tree from forest) Issues while parsing – incompatibilities with translation model, phrasal translations, non-linear word ordering. Syntax based Language Model for SMT p(w 1,n ) Computes the inside, outside probabilities of parse forest and eliminate edges which fall below a empirical set of 0.00001 threshold. Eugene Charniak, Kevin et al . - Syntax based Language Models for Statistical Machine Translation Brown Univ.(2002)

Syntax based Translation Model for SMT Input: ”He adores listening to music” [english parse tree]Output: Kare ha ongaku wo kiku no ga daisuki desu [Japanese sentence] VB2 VB PRP VB1 VB TO TO NN He adores listening to music Channel Input music VB1 VB PRP VB2 VB TO NN He TO adores to listening Reordering music VB1 VB PRP VB2 VB TO NN He ha TO Adores desu to listening no Insertion Ongaku VB1 VB PRP VB2 VB TO NN kare ha TO desuki desu wo kiku no Translation ga ga SVO  SOV R-table N-table T-table Kenji Yamada, Kevin et al. - Syntax based Translation Model - Southern California Univ.(2002)

Syntax based Translation Model for SMT The model parameters probabilities of n(v|N), r(p|R), and t(t|T) decide the behaviour of the translation model. Kenji Yamada, Kevin et al. - Syntax based Translation Model - Southern California Univ.(2002)

Use of heirarchical Phrases not words as translation units A phrase is a sequence of wordsUses Bi-text to infer the syntax for both source and destination language The syntax is a synchronous grammar Inherent reordering Phrase to phrase alignmentPhrase to phrase translationHandling divergenceThe translation has two phases – training and decoding The Bi-text is a word aligned corpus: - a set of triples < f, e, ~ > f is the French sentence (source language) e is the corresponding English sentence (target language) ~ is the many-to-many mapping between phrases in the sentences Hierarchical Phrase-Based MT A Hierarchical Phrase-Based Model for Statistical Machine Translation - David Chiang, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

A phrase grammar rule is represented as (an example) X  < X1fi j X 2 , X2ek lX1 >Where (i, j) is the source phrase boundary and (k,l) is the target phrase boundaryThe above example shows the attachment of a subordinate clause is reversed in EnglishIn training phrase the minimal set of all the above rules is extracted A Derivation D is a set of triples [ R, i, j ] . Each triple is a step in derivation. R is the rule used f i j is the phrase in source language that was rewritten using the grammar In decoding phase given a French sentence f, D(f) rewrites the sentence in English. An alternate notation for f an e is f(D) and e(D) respectively. Hierarchical Phrase-Based M T David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005

The following is a partial left-most derivation to the sentence English: "Australia is one of the few countries that have diplomatic relations with North Korea" Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

To decode the CKY parser with beam search has been used Highest probability single derivation is given below: - Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005. The arg max is computed over each derivation tree D yields f The corresponding English sentence is given by e(D) In each cell of the CYK parser, the beam search eliminates Each item that has a score worse than β times the best score in the same cell Each item that is that is worse than the b- th best item in the same cell b = 40, β = 10*exp(−1) for X cells; b = 15, β = 10*exp(−1) for S cells

w(r) is the weight of the rule r [the first formula] Plm is the language model probability for sentence e|e| denotes length of sentence e λ lm and λwp denote the respective exponent factors exp(−wp*|e|) is the word penaltyΦi and λi denote the feature weight and the exponent Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005 Franz Josef Och and Hermann Ney - The alignment template approach to statistical machine translation , Computational Linguistics 2004

Hierarchical Phrase-Based MT David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.

Example translation: - Hierarchical Phrase Based MT S  { X1 X2 , X1 ne X 2 } ---- ENGLISH | hindiX  { RAM , ram }X  { HAD TOLD , kaha tha }THE ENGLISH SENTENCE : - RAM HAD TOLDS ⇒ < X 1 X 2 , X 1 ne X 2 > ⇒ < RAM X 2 , ram ne X 2 > ⇒ < RAM HAD TOLD , ram ne kaha tha >Compared to pure statistical parsing, the hierarchical phrase based (in general syntax based) MT handles dependency and divergence better.

S  { NP VP | NP ne VP } ---- ENGLISH | hindiNP  {N | N} N  { RAM | ram }VP  { VPAST_P | VPAST_P }VPAST_P  { HAD TOLD| kaha tha }THE ENGLISH SENTENCE PARSE TREES ⇒ NP VP ⇒ N VP ⇒ RAM VP ⇒ RAM V PAST_P ⇒ RAM HAD TOLD HINDI TRANSLATION BY APPLYING THE DUAL OF EACH RULE S ⇒ NP ne VP ⇒ N ne VP ⇒ ram ne VP ⇒ ram ne VPAST_P ⇒ ram ne kaha tha Example translation: - Synchronous CFG Translation

Joshua Toolkit Open source toolkitParsing based Machine Translation Joshua decoder is written in Java with implementation of several algorithms Chart-parsingn-gram language model integrationBeam and cube pruningUnique k-best extraction

Goals Extendibility : Implementation is organized as packages for customization.End to End Cohesion : Integrated with suffix-array grammar extraction(Burch, et al., 2005) and minimum error rate training(Och, 2003) Scalability : Parsing and pruning algorithms are implemented with dynamic programming

Experiment Data TrainingChinese - English 570K parallel data Language model was built on 130M wordsDecodingSCFG – 3M rules, 49M n-gramsResults shows that it is 22 times faster decoder than others Translation quality is better than BLEU-4 (Papineni et al., 2002)

Joshua Features Decoding AlgorithmsGrammar Formalism Handles only SCFGs currently Chart Parsing Generates one best or k-best translations using CKY algorithmPruning Increases computational efficiency

Joshua Features Decoding AlgorithmsHyper-graphs and k-best extraction For each source sentence hyper-graph is generated containing set of derivations K-best extraction is used to retrieve subset of derivations Parallel and Distributed Computing Parallel decoding Distributed language model

Syntax based language and translation models provide a promising technique for use in noisy channel SMT. Syntax based LM can be combined with several MT systemsParsing Models such as YC, YT, BT have shown perfect translations of 45% by improving the English syntax of translations.By using syntactic linguistic information of different word orders and case markers the quality of translation can be improved.Conclusions

Hierarchical phrase based translation does not require synchronous grammar as input – uses bitext to generateHierarchical phrase pairs can be learned without any syntactically-annotated training data Improve translation accuracy over pure statistical phrase-based MT by 7.5% The major challenge in future is to produce a complete provable MTAnother goal is to reduce the number of derivation trees with a more syntactically-motivated grammarConclusions

References 1. Translation-Eugene Charniak, Kevin et al. - Syntax based Language Models for Statistical Machine Brown Univ.(2002) 2. David Chiang - A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, June 2005.3. David Chiang - An Introduction to Synchronous Grammars, Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, 21 June 2006. 4. Franz Josef Och and Hermann Ney - The alignment template approach to statistical machine translation, Computational Linguistics 2004 5. Zhifei Li, Chris Callison -Burch, Chris Dyer, Juri Ganitkevitch , Ann Irvine, Sanjeev Khudanpur , Lane Schwartz, Wren N. G. Thornton, ZiyuanWang , JonathanWeese and Omar F. Zaidan - Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings , Discriminative Training and Other Goodies - Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMAT , Uppsala, Sweden, 15-16 July 2010.

Thank You