/
(allezaubanque;gotothebank;�2:5)(similarcases,wheretherearefewerEnglis (allezaubanque;gotothebank;�2:5)(similarcases,wheretherearefewerEnglis

(allezaubanque;gotothebank;2:5)(similarcases,wheretherearefewerEnglis - PDF document

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
348 views
Uploaded On 2015-09-24

(allezaubanque;gotothebank;2:5)(similarcases,wheretherearefewerEnglis - PPT Presentation

InputsekfkAkfork1nInitializationLAlgorithmFork1nFors1mkfortsmkFors01lkfort0s0lkIfconsistentAksts0t0True1De neffksfktde neeeks0e ID: 138914

Inputs:e(k);f(k);A(k)fork=1:::nInitialization:L=;Algorithm:Fork=1:::n{Fors=1:::mk fort=s:::mkFors0=1:::lk fort0=s0:::lkIfconsistent(A(k);(s;t);(s0;t0))=True(1)De nef=f(k)s:::f(k)t de nee=e(k)s0:::e

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "(allezaubanque;gotothebank;2:5)(similar..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

(allezaubanque;gotothebank;�2:5)(similarcases,wheretherearefewerEnglishwordsthanFrenchwords,wouldalsobeallowed).This exibilityinthede nitionoflexicalentriesisimportant,becauseinmanycasesitisveryusefultohavealexicalentrywherethenumberofforeignandEnglishwordsarenotequal.We'llsoondescribehowaphrasallexiconLcanbeusedintranslation.First,however,wedescribehowaphrasallexiconcanbelearnedfromasetofexampletranslations.2LearningPhrasalLexiconsfromTranslationExamplesAsbefore,we'llassumethatourtrainingdataconsistsofEnglishsentencese(k)=e(k)1:::e(k)lkpairedwithFrenchsentencesf(k)=f(k)1:::f(k)mk,fork=1:::n.Heretheintegerlkisthelengthofthek'thEnglishsentence,ande(k)jisthej'thwordinthek'thEnglishsentence.Theintegermkisthelengthofthek'thFrenchsentence,andf(k)iisthei'thwordinthek'thFrenchsentence.Inadditiontothesentencesthemselves,wewillalsoassumethatwehaveanalignmentmatrixforeachtrainingexample.ThealignmentmatrixA(k)forthek'thexamplehaslkmkentries,whereA(k)i;j=1ifFrenchwordiisalignedtoEnglishwordj,0otherwiseNotethatthisrepresentationismoregeneralthanthealignmentsconsideredforIBMmodels1and2.Inthosemodels,wehadalignmentvariablesaifori2f1:::mkg,specifyingwhichEnglishwordthei'thFrenchwordisalignedto.Byde nition,inIBMmodels1and2eachFrenchwordcouldonlybealignedtoasingleEnglishword.WithanalignmentmatrixA(k)i;j,thealignmentscanbemany-to-many;forexample,agivenFrenchwordcouldbealignedtomorethanoneEnglishword(i.e.,foragiveni,wecouldhaveA(k)i;j=1formorethanonevalueofj).We'llremainagnosticastohowthealignmentmatricesA(k)arederived.Inpractice,acommonmethodissomethinglikethefollowing(seethelectureslides,andtheslidesfromPhilippKoehn'stutorial,formoredetails).First,wetrainIBMmodel2,usingtheEMalgorithmdescribedinthepreviouslecture.Second,weusevariousheuristicstoextractalignmentmatricesfromtheIBMmodel'soutputoneachtrainingexample.Tobespeci c,averysimplemethodwouldbeasfollows(themethodistoonaivetobeusedinpractice,butwillsuceasanexample):Usethetrainingexamplese(k);f(k)fork=1:::ntotrainIBMmodel2usingtheEMalgorithmdescribedinthepreviouslecture.ForanyEnglishstringe,Frenchstringf,andFrenchlengthm,thismodelgivesaconditionalprobabilityp(f;aje;m).Foreachtrainingexample,de nea(k)=argmaxap(f(k);aje(k);mk)i.e.,a(k)isthemostlikelyalignmentunderthemodel,forthek'thexample(seethenotesonIBMmodels1and2forhowtocomputethis).2 Inputs:e(k);f(k);A(k)fork=1:::nInitialization:L=;Algorithm:Fork=1:::n{Fors=1:::mk,fort=s:::mkFors0=1:::lk,fort0=s0:::lkIfconsistent(A(k);(s;t);(s0;t0))=True(1)De nef=f(k)s:::f(k)t,de nee=e(k)s0:::e(k)t0(2)SetL=L[f(f;e)g(3)c(e;f)=c(e;f)+1(4)c(e)=c(e)+1Foreach(f;e)2Lcreatealexicalentry(f;e;g)whereg=logc(e;f) c(e) Figure1:Analgorithmforderivingaphrasallexiconfromasetoftrainingexampleswithalignments.Thefunctionconsistent(A(k);(s;t);(s0;t0))isde nedin gure2. De nitionofconsistent(A;(s;t);(s0;t0)):(RecallthatAisanalignmentmatrixwithAi;j=1ifFrenchwordiisalignedtoEnglishwordj.(s;t)representsthesequenceofFrenchwordsfs:::ft.(s0;t0)representsthesequenceofEnglishwordses0:::fs0.)ForagivenmatrixA,de neA(i)=fj:Ai;j=1gSimilarly,de neA0(j)=fi:Ai;j=1gThusA(i)isthesetofEnglishwordsthatFrenchwordiisalignedto;A0(j)isthesetofFrenchwordsthatEnglishwordjisalignedto.Thenconsistent(A;(s;t);(s0;t0))istrueifandonlyifthefollowingconditionsaremet:1.Foreachi2fs:::tg,A(i)fs0:::t0g2.Foreachj2fs0:::t0g,A0(j)fs:::tg3.Thereisatleastone(i;j)pairsuchthati2fs:::tg,j2fs0:::t0g,andAi;j=1 Figure2:Thede nitionoftheconsistentfunction.4 havethissubstringastheirEnglishstring.Wemayendupwithmorethanonephrasalentryforaparticularsource-languagesub-string.Aderivationyisthena nitesequenceofphrases,p1;p2;:::pL,whereeachpjforj2f1:::LgisamemberofP.ThelengthLcanbeanypositiveintegervalue.Foranyderivationyweusee(y)torefertotheunderlyingtranslationde nedbyy,whichisderivedbyconcatenatingthestringse(p1);e(p2);:::e(pL).Forexample,ify=(1,3,wemustalso),(7,7,take),(4,5,thiscriticism),(6,6,seriously)(2)thene(y)=wemustalsotakethiscriticismseriously3.2TheSetofValidDerivationsWewilluseY(x)todenotethesetofvalidderivationsforaninputsentencex=x1x2:::xn.ThesetY(x)isthesetof nitelengthsequencesofphrasesp1p2:::pLwhichsatisfythefollowingconditions:Eachpkfork2f1:::LgisamemberofthesetofphrasesPforx1:::xn.(Recallthateachpkisatriple(s;t;e).)Eachwordistranslatedexactlyonce.Moreformally,ifforaderivationy=p1:::pLwede ney(i)=LXk=1[[s(pk)it(pk)]](3)tobethenumberoftimeswordiistranslated(wede ne[[]]tobe1ifistrue,0otherwise),thenwemusthavey(i)=1fori=1:::n.Forallk2f1:::L�1g,jt(pk)+1�s(pk+1)jdwhered0isaparameterofthemodel.Inaddition,wemusthavej1�s(p1)jdThe rsttwoconditionsshouldbeclear.Thelastcondition,whichdependsontheparameterd,deservesmoreexplanation.Theparameterdisalimitonhowfarconsecutivephrasescanbefromeachother,andisoftenreferredtoasadistortionlimit.Toillustratethis,considerourpreviousexamplederivation:y=(1,3,wemustalso),(7,7,take),(4,5,thiscriticism),(6,6,seriously)Inthiscasey=p1p2p3p4(i.e.,thenumberofphrases,L,isequalto4).Forthesakeofargument,assumethatthedistortionparameterd,isequalto4.6 3.3ScoringDerivationsThenextquestionisthefollowing:howdowescorederivations?Thatis,howdowede nethefunc-tionf(y)whichassignsascoretoeachpossiblederivationforasentence?Theoptimaltranslationunderthemodelforasource-languagesentencexwillbeargmaxy2Y(x)f(y)Inphrase-basedsystems,thescoreforanyderivationyiscalculatedasfollows:f(y)=h(e(y))+LXk=1g(pk)+L�1Xk=1jt(pk)+1�s(pk+1)j(5)Thecomponentsofthisscoreareasfollows:Asde nedbefore,e(y)isthetarget-languagestringforderivationy.h(e(y))isthelog-probabilityforthestringe(y)underatrigramlanguagemodel.Henceife(y)=e1e2:::em,thenh(e(y))=logmYi=1q(eijei�2;ei�1)=mXi=1logq(eijei�2;ei�1)whereq(eijei�2;ei�1)istheprobabilityofwordeifollowingthebigramei�2;ei�1underatrigramlanguagemodel.Asde nedbefore,g(pk)isthescoreforthephrasepk(seeforexampleEq.1foronepossiblewayofde ningg(p)).isa\distortionparameter"ofthemodel.Itcaningeneralbeanypositiveornegativevalue,althoughinpracticeitisalmostalwaysnegative.Eachtermoftheformjt(pk)+1�s(pk+1)jthencorrespondstoapenalty(assumingthatisnegative)onhowfarphrasespkandpk+1arefromeachother.Thusinadditiontohavinghardconstraintsonthedistancebetweenconsecutivephrases,wealsohaveasoftconstraint(i.e.,apenaltythatincreaseslinearlywiththisdistance).Giventhesede nitions,theoptimaltranslationinthemodelforasource-languagesentencex=x1:::xnisargmaxy2Y(x)f(y)3.4Summary:PuttingitallTogetherDe nition2(Phrase-basedtranslationmodels)Aphrase-basedtranslationmodelisatuple(L;h;d;),where:8 Anysequenceofphrasescanbemappedtoacorrespondingstate.Forexample,thesequencey=(1,3,wemustalso),(7,7,take),(4,5,thiscriticism)wouldbemappedtothestate(this;criticism;1111101;5; )Thestaterecordsthelasttwowordsinthetranslationunderlyingthissequenceofphrases,namelythisandcriticism.Thebit-stringrecordswhichwordshavebeentranslated:thei'thbitinthebit-stringisequalto1ifthei'thwordhasbeentranslated,0otherwise.Inthiscase,onlythe6'thbitis0,asonlythe6'thwordhasnotbeentranslated.Thevaluer=5indicatesthatthe nalphraseinthesequence,(4;5;thiscriticism)endsatposition5.Finally, willbethescoreofthepartialtranslation,calculatedas =h(e(y))+LXk=1g(pk)+L�1Xk=1jt(pk)+1�s(pk+1)jwhereL=3,wehavee(y)=wemustalsotakethiscriticismandp1=(1,3,wemustalso);p2=(7,7,take);p3=(4,5,thiscriticism)Notethatthestateonlyrecordsthelasttwowordsinaderivation:aswillseeshortly,thisisbecauseatrigramlanguagemodelisonlysensitivetothelasttwowordsinthesequence,sothestateonlyneedstorecordtheselasttwowords.Wede netheinitialstateasq0=(;;0n;0;0)where0nisbit-stringoflengthn,withnzeroes.Wehaveused*torefertothespecial\start"symbolinthelanguagemodel.Theinitialstatehasnowordstranslated(allbitssetto0);thevalueforris0;andthescore is0.Nextwede neafunctionph(q)thatmapsastateqtothesetofphraseswhichcanbeappendedtoq.Foraphraseptobeamemberofph(q),whereq=(e1;e2;b;r; ),thefollowingconditionsmustbesatis ed:pmustnotoverlapwiththebit-stringb.I.e.,wemusthavebi=0fori2fs(p):::t(p)g.Thedistortionlimitmustnotbeviolated.Morespeci cally,wemusthavejr+1�s(p)jdwheredisthedistortionlimit.Inaddition,foranystateq,foranyphrasep2ph(q),wede nenext(q;p)tobethestateformedbycombiningstateqwithphrasep.Formally,ifq=(e1;e2;b;r; ),andp=(s;t;1:::M),thennext(q;p)isthestateq0=(e01;e02;b0;r0; 0)de nedasfollows:10 Inputs:sentencex1:::xn.Phrase-basedmodel(L;h;d;).Thephrase-basedmodelde nesthefunctionsph(q)andnext(q;p).Initialization:setQ0=fq0g,Qi=;fori=1:::n.Fori=0:::n�1{Foreachstateq2beam(Qi),foreachphrasep2ph(q):(1)q0=next(q;p)(2)Add(Qj;q0;q;p)wherej=len(q0)Return:highestscoringstateinQn.Backpointerscanbeusedto ndtheunderlyingsequenceofphrases(andthetranslation).Figure3:Thebasicdecodingalgorithm.len(q0)isthenumberbitsequalto1inthebit-stringforq0(i.e.,thenumberofforeignwordstranslatedinthestateq0).Add(Q;q0;q;p)Ifthereissomeq002Qsuchthateq(q00;q0)=True:{If (q0)� (q00)Q=fq0g[Qnfq00gsetbp(q0)=(q;p){ElsereturnElse{Q=Q[fq0g{setbp(q0)=(q;p)Figure4:TheAdd(Q;q0;q;p)function.De nitionofbeam(Q):de ne =argmaxq2Q (q)i.e., isthehighestscoreforanystateinQ.De ne 0tobethebeam-widthparameterThenbeam(Q)=fq2Q: (q) � gFigure5:De nitionofthebeamfunctioninthealgorithmin gure3.12

Related Contents


Next Show more