/
“pipeline”ofmaximumaposterioriinferencestepstoidentifyhidden “pipeline”ofmaximumaposterioriinferencestepstoidentifyhidden

“pipeline”ofmaximumaposterioriinferencestepstoidentifyhidden - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
396 views
Uploaded On 2016-07-06

“pipeline”ofmaximumaposterioriinferencestepstoidentifyhidden - PPT Presentation

WeintegratedourwiderpipelinemodelintothePSCFGgrammarconstructionprocessofthepubliclyavailableSyntaxAugmentedMachineTranslationsystemZollmannandVenugopal2006WerstreviewPSCFGgrammarsSection2 ID: 392840

Weintegratedour`wider-pipeline'modelintothePSCFGgrammarconstructionprocessofthepub-liclyavailableSyntax-AugmentedMachineTrans-lationsystem(ZollmannandVenugopal 2006).WerstreviewPSCFGgrammars(Section2

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "“pipeline”ofmaximumaposteriori..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

“pipeline”ofmaximumaposterioriinferencestepstoidentifyhiddentranslationstructureandestimatetheparametersoftheirtranslationmodels.Therststepinthispipelinetypicallyinvolveslearningword-alignments(Brownetal.,1993)overparallelsentencealignedtrainingdata.Theoutputsofthissteparethemodel'smostprobableword-to-wordcorrespondenceswithineachparallelsentencepair.Thesealignmentsareusedastheinputtoaphraseextractionstep,wheremulti-wordphrasepairsareidentiedandscored(withmultiplefeatures)basedonstatisticscomputedacrossthetrainingdata.Themostsuccessfulmethodsextractphrasesthatadheretoheuristicconstraints(Koehnetal.,2003;OchandNey,2004).Thus,errorsmadewithinthesingle-bestalignmentarepropagated(1)totheidentica-tionofphrases,sinceerrorsinthealignmentaffectwhichphrasesareextracted,and(2)totheestima-tionofphraseweights,sinceeachextractedphraseiscountedasevidenceforrelativefrequencyesti-mates.MethodslikethosedescribedinWu(1997),MarcuandWong(2002),andDeNeroetal.(2006)addressthisproblembyjointlymodelingalignmentandphraseidentication,yethavenotachievedthesameempiricalresultsassurfaceheuristicbasedmethods,orrequiresubstantiallymorecomputa-tionalefforttotrain.Inthisworkwedescribeanapproachthat“widens”thepipeline,ratherthanperformingtwostepsjointly.WepresentN-bestalignmentsandparsestothedownstreamphraseextractionalgo-rithmanddeneaprobabilitydistributionoverthesealternativestogenerateexpected,possiblyfractionalcountsfortheextractedtranslationrules,underthatdistribution.Thesefractionalcountsarethenusedwhenassigningweightstorules.Thistechniqueisdirectlyapplicabletobothatandhierarchically-structuredtranslationmodels.Insyntax-basedtranslation,single-besttargetlanguageparsetrees(givenbyastatisticalparser)areusedtoassignsyntacticcategorieswithineachrule,andtoconstrainthecombinationofthoserules.Deci-sionsmadeduringtheparsingstepofthepipelineaffectthechoiceofnonterminalsusedforeachruleinthePSCFG.PresentingN-bestparsealternativestotheruleextractionprocessallowstheidentica-tionofmorediversestructuresforuseduringtrans-lationand,perhaps,bettergeneralizationability. Weintegratedour`wider-pipeline'modelintothePSCFGgrammarconstructionprocessofthepub-liclyavailableSyntax-AugmentedMachineTrans-lationsystem(ZollmannandVenugopal,2006).WerstreviewPSCFGgrammars(Section2),andthen,inSection3,presentamethodofintegratingPSCFGrulesextractedfromN-bestalignmentsandparsesandallowtheposteriorfractionalcountstoinuencetheruleweights.InSection4,weshowhowthewidenedpipelineimprovestranslationperformanceonalimited-domaindomainspeechtranslationtask,theIWSLTChinese-Englishdatatrack(Paul,2006).2SynchronousGrammarsforSMTProbabilisticsynchronouscontext-freegrammars(PSCFGs)aredenedbyasourceterminalset(sourcevocabulary)TS,atargetterminalset(targetvocabulary)TT,asharednonterminalsetNandin-ducerulesoftheformX!h ; ;;wiwhere  X2Nisanonterminal,  2(N[TS)isasequenceofnonterminalsandsourceterminals,  2(N[TT)isasequenceofnonterminalsandtargetterminals,  thecount#NT( )ofnonterminaltokensin isequaltothecount#NT( )ofnonterminaltokensin ,  :f1;:::;#NT( )g!f1;:::;#NT( )gisaone-to-onemappingfromnonterminaltokensin tononterminaltokensin ,and  w2[0;1)isanonnegativereal-valuedweightassignedtotherule.Inournotation,wewillassumetobeimplicitlydenedbyindexingtheNToccurrencesin fromlefttorightstartingwith1,andbyindexingtheNToccurrencesin bytheindicesoftheircorrespond-ingcounterpartsin .Syntax-orientedPSCFGap-proachesoftenignoresourcestructure,insteadfo-cusingongeneratingsyntacticallywell-formedtar-getderivations.Chiang(2005)usesasinglenon-terminalcategory,Galleyetal.(2006)usesyntac-ticconstituentsforthePSCFGnonterminalset,and withthelanguagemodelweightLMviaminimum-errortraining(MER)(Och,2003).Here,wefocusontheestimationofthefeaturevaluesduringthegrammarconstructionprocess.Thefeaturevaluesarestatisticsestimatedfromrulecounts.2.3FeatureValueStatisticsThefeaturesrepresentmultiplecriteriabywhichthedecodingprocesscanjudgethequalityofeachruleand,byextension,eachderivation.Wein-cludebothreal-valuedandboolean-valuedfeaturesforeachrule.Thefollowingprobabilisticquantitiesareestimatedandusedasfeaturevalues:  ^p(rjlhs(X)):probabilityofarulegivenitsleft-handsidecategory;  ^p(rjsrc(r)):probabilityofarulegivenitssourceside;  ^p(rjtgt(r)):probabilityofarulegivenitstargetside;  ^p(ul(tgt(r))jul(src(r))):probabilityoftheunla-beledtargetsideoftherulegivenitsunlabeledsourceside;and  ^p(ul(src(r))jul(tgt(r))):probabilityoftheunla-beledsourceandtargetsideoftherulegivenitsunlabeledtargetside.Inournotation,lhsreturnstheleft-handsideofarule,srcreturnsthesourceside ,andtgtre-turnsthetargetside ofaruler.Thefunc-tionulremovesallsyntacticlabelsfromitsargu-ments,butretainsorderingnotation.Forexample,ul(NP+AUX1doesnotgo)=21doesnotgo.Thelasttwofeaturesrepresentthesamekindofrelativefrequencyestimatescommonlyusedinphrase-basedsystems.Theulfunctionallowsustocalculatetheseestimatesforruleswithnonterminalsaswell.Toestimatetheseprobabilisticfeatures,weusemaximumlikelihoodestimatesbasedoncountsoftherulesextractedfromthetrainingdata.Forexample,^p(rjlhs(r))isestimatedbycomputing#(r)=#(lhs(r)),aggregatingcountsfromallex-tractedrules.Asinphrase-basedtranslationmodelestimation,alsocontainstwolexicalweights^pw(lex(src(r))jlex(tgt(r)))and^pw(lex(tgt(r))jlex(src(r)))(Koehnetal.,2003) thatarebasedonthelexicalsymbolsof and .Theseweightsareestimatedbasedonanpairofstatisticallexiconsthatrepresent^p(sjt);^p(tjs),wheresandtaresinglewordsinthesourceandtargetvocabulary.Theseword-leveltranslationmodelsaretypicallyestimatedbymaximumlike-lihood,consideringtheword-to-wordlinksfrom“single-best”alignmentsasevidence.containsseveralbooleanfeaturesthatindicatewhether:(a.)theruleispurelylexicalin and ,(b.)theruleispurelynon-lexicalin and ,(c.)theratiooflexicalsourceandtargetwordsintheruleisbetween1/5and5.alsocontainsafeaturethatreectsthenumberoftargetlexicalsymbolsandafeaturethatis1foreachrule,allowingthedecodertoprefershorter(orlonger)derivationsbasedonthecorrespondingweightin.3N-bestEvidenceThePSCFGruleextractionproceduredescribedabovereliesonhighqualitywordalignmentsandparses.Thequalityofthealignmentsaffectsthesetofphrasesthatcanbeidentiedbytheheuris-ticsin(Koehnetal.,2003).Improvingordiversi-fyingthesetofinitialphrasesalsoaffectstheruleswithnonterminalsthatareidentiedviatheproce-duredescribedabove.SincePSCFGsystemsrelyonruleswithnonterminalsymbolstorepresentreorder-ingoperations,thesetoftheseinitialphraseshasthepotentialtohaveaprofoundimpactontranslationquality.Thequalityoftheparsesaffectsthesyn-tacticcategoriesassignedtotheleft-hand-sideandnonterminalsymbolsofeachrule.Thesecategoriesplayanimportantroleinconstrainingthedecodingprocesstogrammaticallyfeasibletargetparsetrees.Severalrecentstudiesexploretherelationshipbetweenthequalityoftheinitialmodelsinthe“pipeline”andnaltranslationquality.QuirkandCorston-Oliver(2006)showimprovementsintrans-lationqualitywhenthequalityofparsingisim-provedbyaddingadditionaltrainingdatawithinthe“treelet”paradigmintroducedbyQuirketal.(2005).Koehnetal.(2003)showthattranslationqualityinaphrasebasedsystemdoesnotvarysignicantlywhenincreasingthecomplexityofthemodelusedforalignment(rangingfromIBMmodel1through4),butthatincreasingtheamountofparalleltraining instancesthatrelyonlowprobabilityorfeweralign-mentsandparseswillgetlowercounts(approaching0ascertaintyincreases).3.2RenedAlignmentsWorkbyOchandNey(2004)andKoehnetal.(2003)demonstratesthevalueofgeneratingwordalignmentsinbothsource-to-targetandtarget-to-sourcedirectionsinordertofacilitatetheextrac-tionofphraseswithmany-to-manywordrelation-ships.WefollowKoehnetal.(2003)ingeneratingarenedbidirectionalalignmentusingtheheuristicalgorithm“grow-diag-nal-and”describedinthatwork.SincewerequireN-bestalignments,werstextractN-bestalignmentsineachdirection,andthenperformtherenementtechniquetoallN2bidirectionalalignmentpairs.Theresultingalign-mentsareassignedtheprobability(pf:pr) wherepfisthecandidateprobabiltyfortheforwardalign-mentandpristhecandidateprobabilitytothere-versealignment.Wethenremoveanyduplicaterenedalignments(therenedalignmentwiththehighestprobabilityisretained)thatcameaboutduetotherenementpro-cess.Finally,weselectthetopNalignmentsfromthissetofrenedalignments.Theselectionof controlstheentropyofthere-sultingdistributionovercandidatealignments(afternormalization).Highervaluesof �1makethedistributionmorepeaked(affectingtheestimationoffeaturesonrulesfromthesealignments),whilevaluesof0 1makethedistributionmoreuniform.Amorepeakeddistributionfavorsrulesfromthetopalignments,whileamoreuniformonegivesrulesfromlowerperformingaligmentsmoreofachancetoparticipateintranslation.Wecanalsousethissametechniquetocontrolthedistributionoverparses.4TranslationResults4.1ExperimentalSetupWepresentresultsontheIWSLT2007and2008Chinese-to-Englishtranslationtask,basedonthefullBTECcorpusoftravelexpressionswith120Kparallelsentences(906Ksourcewordsand1.2Mtar-getwords)aswellastheevaluationcorporafromtheevaluationyearspreceding2007.Thedevelop- mentdataconsistsof489sentences(averagelengthof10.6words)fromthe2006evaluation,the2007testsetcontains489sentence(averagelengthof6.47words)sentencesandthe2008testsetcontains507sentences(averagelengthof5.59words).WordalignmentwastrainedusingtheGIZA++toolkit,andN-bestparsesgeneratedbytheCharniak(2000)parser,withoutadditionalre-ranking.2N-bestalign-mentsweregeneratedfromsourcetotargetandtar-gettosource,renedasdescribedabove.Initialphrasesofuptolength10wereidentiedusingtheheuristicsproposedbyKoehnetal.(2003).Ruleswithupto2nonterminalsareextractedusingtheSAMTtoolkit(ZollmannandVenugopal,2006),modiedtohandleN-bestalignmentsandparsesandposteriorcounting.Notethatlexicalweights(Koehnetal.,2003)asdescribedaboveareassignedtobasedon“single-best”wordalignments.Rulesthatreceivezeroprobabilityvaluefortheirlexi-calweightsareimmediatelydiscarded,sincetheywouldthenhaveaprohibitivelyhighcostwhenusedduringtranslation.Rulesextractedfromsingle-bestevidenceaswellasNbestevidencecanbedis-cardedinthisway.Then-gramlanguagemodelistrainedonthetar-getsideoftheparalleltrainingcorpus3andtransla-tionexperimentsusethedecoderandMERtraineravailableinthesametoolkit.Weusethecube-pruningoption(Chiang,2007)intheseexperiments.4.2Cumulative(N;N0)-BestWemeasuretranslationqualityusingthemixed-casedIBM-BLEU(Papinenietal.,2002)metricaswevarythesizeofNandN0foralignmentsandparsesrespectively.EachvalueofNimpliesthattherstNalternativeshavebeenconsideredwhenbuildingthegrammar.Foreachgrammarwealsotrackthenumberofrulesrelevantfortherstsen-tenceintheIWSLT2007testset(grammarsaresub-sampledonaper-sentencebasistokeepmemoryrequirementslowduringdecoding).Wealsonotethenumberofsecondsrequiredtotranslateeachtest 2Rerankingmightbeusedtochangeestimatesof^p(i),butwouldnotchangethesetofrulesextracted—onlythefractionalcounts.3AsBTECisaverydomain-speciccorpus,trainingthelan-guagemodelonlargeavailablemonolingualcorpora(e.g.,fromthenews-domain)isoflimitedutility. System #Rules(1sent.) Dev 2007 2008 2007Time(s) 2008Time(s) N=1(lex=1st) 400K 0.309 0.355 0.453 8108 8367 N=1( =1lex=m4) 420K 0.301 0.361 0.440 8024 8250 N=5( =1lex=m4) 680K 0.322 0.374 0.470 15376 15577 N=10( =1lex=m4) 900K 0.313 0.382 0.467 19298 19469 N=50( =1lex=m4) 1500K 0.316 0.370 0.478 29500 30894 N=10( =0:5lex=m4) 900K 0.315 0.395 0.477 20398 20118 N=50( =0:5lex=m4) 1500K 0.317 0.373 0.477 33682 34760 N=10( =2lex=m4) 900K 0.313 0.375 0.464 15117 15070 N=50( =2lex=m4) 1500K 0.315 0.373 0.488 26590 27126 Table1:Grammarstatisticsandtranslationquality(IBM-BLEU)ondevelopment(IWSLT2006)andtestset(IWSLT2007,2008)whenintegratingN-bestalignmentsforalternativeSyntaxAugmentedgrammarcongurations.#RulesreectrulesthatareapplicabletotherstsentenceinIWSLT2007.Decodingtimesinsecondsarecumulativeoverallsentencesinrespectivetestset. System #Rules(1sent.) Dev 2007 2008 2007Time(s) 2008Time(s) HierN=1 10K 0.277 0.367 0.460 895 1451 HierN=5( =1) 12K 0.286 0.374 0.472 906 1476 HierN=10( =1) 13K 0.291 0.382 0.477 944 1516 HierN=50( =1) 14K 0.282 0.384 0.463 979 1596 HierN=10( =0:5) 13K 0.285 0.399 0.476 963 1547 HierN=50( =0:5) 14K 0.283 0.376 0.470 982 1599 HierN=10( =2) 13K 0.284 0.372 0.467 965 1570 HierN=50( =2) 14K 0.290 0.374 0.459 921 1483 Table2:Grammarstatisticsandtranslationquality(IBM-BLEU)ondevelopment(IWSLT2006)andtestsets(IWSLT2007,2008)whenintegratingN-bestalignmentsforpurelyhierarchicalgrammarcongurations.#RulesreectrulesthatareapplicabletotherstsentenceinIWSLT2007.Decodingtimesinsecondsarecumulativeoverallsentencesinrespectivetestset. 4.3GrammarRulesFigure1showsthemostfrequentlyoccurringrulesthatexistonlyinthebestperformingN=10;N0=1grammar,andnotinthebaseline(Model-4lex-icon)grammar.Weshowtheestimatedcountsontheserulesaswellastheirsource,targetandleft-hand-sidenonterminalsymbol.Theserulesarepar-ticularlyinterestingwhenconsideringthedomainofthistranslationtask.Thesourcesideofthetrainingdatacontainsnopunctuation(sinceitistranscribedspeech),whilethetargetsidedoes(sincetheyweremanuallygeneratedtranslations).Thesystemthere-foreattemptstogeneratepunctuationduringtransla-tion.Considertherstexample,wheretheChinesewordfor“please”(oftenfoundatthebeginningofasentence)isalignedtotheEnglish“please.”(attheendofthesentenceasindicatedbythepunctuation).Thisruleisextractedfromalower-probabilityalign-mentwithhighlevelsofdistortion.Thispatternwas Figure1:Toprulesextractedbyourmethod,butnotthebaseline. notseeninanysingle-bestalignments.5ConclusionInthisworkwehavedemonstratedthefeasibilityandbenetsofwideningtheMTpipelinetoinclude System #Rules(1sent.) #Labels Dev 2007 2008 2007Time(s) 2008Time(s) N0=1 420K 10K 0.301 0.361 0.440 8024 8250 N0=5 800K 15K 0.300 0.358 0.447 16930 15102 N0=10 1079K 18K 0.299 0.361 0.460 26944 23662 Table3:Grammarstatisticsandtranslationquality(IBM-BLEU)ondevelopment(IWSLT2006)andtestsets(IWSLT2007,2008)andwhenintegratingN-bestparseswiththeSyntaxAugmentedgrammar.#RulesreectrulesthatareapplicabletotherstsentenceinIWSLT2007.Decodingtimesinsecondsarecumulativeoverallsentencesinrespectivetestset.Allexperimentsinthistableuselex=m4, =1and1-bestalignments. additionalevidencefromN-bestalignmentsandparses.Weintegratethisdiverseknowledgeunderaprincipledmodelthatusesaprobabilitydistribu-tionoverthesealternatives.Weachievesignicantimprovementsintranslationqualityovergrammarsbuilton“single-best”evidencealonewhenconsid-eringN-bestalignments,whileN0-bestparsesseemtohavenoimpactontranslationquality.Usingarel-ativelysmallnumberofadditionalalternativealign-mentsresultsinsignicantimprovementsinquality,withminimalimpactonthenumberofrulesinthegrammarandthetranslationruntimeforahierarchi-calsystem,butatsignicantlyincreasedgrammarsizeandruntimeforasyntax-augmentedsystem.Infutureworkweplantofocusonmethodstotakebet-teradvantageofthesyntacticlabelsfromalternativeparsecandidates.AcknowledgmentsThisworkhasbeenpartlyfundedbyGALEHR0011-06-2-0001.N.SmithissupportedbyNSFIIS-0836431andanIBMfacultyaward.References AlfredV.AhoandJeffreyD.Ullmann.1969.Syntaxdi-rectedtranslationsandthepushdownassembler.Jour-nalofComputerandSystemSciences. PeterF.Brown,VincentJ.DellaPietra,StephenA.DellaPietra,andRobertL.Mercer.1993.Themathemat-icsofstatisticalmachinetranslation:parameteresti-mation.ComputationalLinguistics. EugeneCharniak.2000.Amaximumentropy-inspiredparser.InProceedingsoftheHumanLanguageTech-nologyConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguisticsConfer-ence(HLT/NAACL). DavidChiang.2005.Ahierarchicalphrase-basedmodelforstatisticalmachinetranslation.InProceedingsoftheAnnualMeetingoftheAssociationforCompua-tionalLinguistics(ACL). DavidChiang.2007.Hierarchicalphrasebasedtransla-tion.ComputationalLinguistics. JohnDeNero,DanGillick,JamesZhang,andDanKlein.2006.Whygenerativephrasemodelsunderperformsurfaceheuristics.InProceedingsoftheWorkshoponStatisticalMachineTranslation,ACL. ChristopherDyer,SmarandaMuresan,andPhilipResnik.2008.Generalizingwordlatticetranslation.InPro-ceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL). MichaelGalley,MarkHopkins,KevinKnight,andDanielMarcu.2006.Scalableinferencesandtrainingofcontext-richsyntaxtranslationmodels.InProceed-ingsoftheHumanLanguageTechnologyConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguisticsConference(HLT/NAACL). KuzmanGanchev,JoaoV.Graca,andBenTaskar.2008.Betteralignments=bettertranslations?InProceed-ingsoftheAnnualMeetingoftheAssociationforCom-putationalLinguistics(ACL). PhilippKoehn,FranzJ.Och,andDanielMarcu.2003.Statisticalphrase-basedtranslation.InProceedingsoftheHumanLanguageTechnologyConferenceoftheNorthAmericanChapteroftheAssociationforCom-putationalLinguisticsConference(HLT/NAACL). DanielMarcuandWilliamWong.2002.Aphrase-based,jointprobabilitymodelforstatisticalmachinetransla-tion.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP). DanielMarcu,WeiWang,AbdessamadEchihabi,andKevinKnight.2006.SPMT:Statisticalmachinetranslationwithsyntactiedtargetlanguagephrases.InProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP),Syd-ney,Australia. HaitaoMi,LiangHuang,andQunLiu.2008.Forest-basedtranslation.InProceedingsoftheAnnualMeet-ingoftheAssociationforComputationalLinguistics(ACL). FranzJ.OchandHermannNey.2003.Asystematiccomparisonofvariousalignmentmodels.Computa-tionalLinguistics.