/
etry.Greeneetal.(2010)useanitestatetrans-ducertoinferthesyllable-stre etry.Greeneetal.(2010)useanitestatetrans-ducertoinferthesyllable-stre

etry.Greeneetal.(2010)useanitestatetrans-ducertoinferthesyllable-stre - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
368 views
Uploaded On 2016-07-24

etry.Greeneetal.(2010)useanitestatetrans-ducertoinferthesyllable-stre - PPT Presentation

2Arhymemayspanmorethanonewordinaline150forexamplelaureateToryatareyeatByron1824butthisisuncommonAnextensionofourmodelcouldincludealatentvariablethatselectstheentirerhymingportionofal ID: 418010

2Arhymemayspanmorethanonewordinaline–forex-ample laureate.../Toryat.../areyeat(Byron 1824) butthisisuncommon.Anextensionofourmodelcouldincludealatentvariablethatselectstheentirerhymingportionofal

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "etry.Greeneetal.(2010)useanitestatetran..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

etry.Greeneetal.(2010)useanitestatetrans-ducertoinferthesyllable-stressassignmentsinlinesofpoetryundermetricalconstraints.Genzeletal.(2010)incorporateconstraintsonmeterandrhyme(wherethestressandrhyminginformationisderivedfromapronunciationdictionary)intoamachinetranslationsystem.JiangandZhou(2008)developasystemtogeneratethesecondlineofaChinesecou-pletgiventherst.Afewresearchershavealsoex-ploredtheproblemofpoetrygenerationundersomeconstraints(Manurungetal.,2000;Netzeretal.,2009;Ramakrishnanetal.,2009).Therehasalsobeensomeworkoncomputationalapproachestocharacterizingrhymes(ByrdandChodorow,1985)andglobalpropertiesoftherhymenetwork(Son-deregger,2011)inEnglish.Tothebestofourknowl-edge,therehasbeennolanguage-independentcom-putationalworkonndingrhymeschemes.3FindingStanzaRhymeSchemesAcollectionofrhymingpoetryinevitablycontainsrepetitionofrhymingpairs.Forexample,thewordtreeswilloftenrhymewithbreezeacrossdifferentstanzas,eventhosewithdifferentrhymeschemesandwrittenbydifferentauthors.Thisispartlyduetosparsityofrhymes–manywordsthathavenorhymesatall,andmanyothershaveonlyahandful,forcingpoetstoreuserhymingpairs.Inthissection,wedescribeanunsupervisedal-gorithmtoinferrhymeschemesthatharnessesthisrepetition,basedonamodelofstanzageneration.3.1GenerativeModelofaStanza1.Pickarhymeschemeroflengthnwithproba-bilityP(r).2.Foreachi2[1;n],pickawordsequence,choosingthelast2wordxiasfollows:(a)If,accordingtor,theithlinedoesnotrhymewithanypreviouslineinthestanza,pickawordxifromavocabularyofline-endwordswithprobabilityP(xi).(b)Iftheithlinerhymeswithsomepreviousline(s)jaccordingtor,chooseawordxithat 2Arhymemayspanmorethanonewordinaline–forex-ample,laureate.../Toryat.../areyeat(Byron,1824),butthisisuncommon.Anextensionofourmodelcouldincludealatentvariablethatselectstheentirerhymingportionofaline.rhymeswiththelastwordsofallsuchlineswithprobabilityQj:ri=rjP(xijxj).TheprobabilityofastanzaxoflengthnisgivenbyEq.1.Ii;ristheindicatorvariableforwhetherlineirhymeswithatleastonepreviouslineunderr.P(x)=Xr2RP(r)P(xjr)=Xr2RP(r)nYi=1(1�Ii;r)P(xi)+Ii;rYj:ri=rjP(xijxj)(1)3.2LearningWedenoteourdatabyX,asetofstanzas.Eachstanzaxisrepresentedasasequenceofitsline-endwords,xi;:::xlen(x).WearealsogivenalargesetRofallpossiblerhymeschemes.3Ifeachstanzainthedataisgeneratedindepen-dently(anassumptionwerelaxinx4),thelog-likelihoodofthedataisPx2XlogP(x).Wewouldliketomaximizethisoverallpossiblerhymeschemeassignments,underthelatentvariables,whichrep-resentspairwiserhymestrength,and,thedistribu-tionofrhymeschemes.v;wisdenedforallwordsvandwasanon-negativerealvalueindicatinghowstronglythewordsvandwrhyme,andrisP(r).Theexpectationmaximization(EM)learningal-gorithmforthisformulationisdescribedbelow.Theintuitionbehindthealgorithmisthis:afteroneiter-ation,v;w=0forallvandwthatneveroccurto-getherinastanza.Ifvandwco-occurinmorethanonestanza,v;whasahighpseudo-count,reectingthefactthattheyarelikelytoberhymes.Initialize:anduniformly(givingthesamepositivevalueforallwordpairs).ExpectationStep:ComputeP(rjx)=P(xjr)r=Pq2RP(xjq)q,whereP(xjr)=nYi=1(1�Ii;r)P(xi)+Ii;rYj:ri=rjxi;xj=Xww;xi(2) 3Whilethenumberofrhymeschemesoflengthnistech-nicallythenumberofpartitionsofann-elementset(theBellnumber),onlyasubsetofthesearetypicallyused. Table1:RhymeschemeaccuracyandF-Score(computedfromaverageprecisionandrecalloveralllines)usingouralgorithmforindependentstanzas,withuniforminitializationof.Rowslabeled`All'refertotrainingandevaluationonallthedatainthelanguage.Otherrowsrefertotrainingandevaluatingonaparticularsub-corpusonly.Boldindicatesthatweoutperformthena¨vebaseline,wheremostcommonschemeoftheappropriatelengthfromthegoldstandardoftheentirecorpusisassignedtoeverystanza,anditalicsthatweoutperformthe`lessna¨ve'baseline,whereweassignthemostcommonschemeoftheappropriatelengthfromthegoldstandardofthegivensub-corpus. Sub-corpus Sub-corpusoverview Accuracy(%) F-Score (time- #of Total# #ofline- EM Na¨ve Lessna¨ve EM Na¨ve Less period) stanzas oflines endwords induction baseline baseline induction baseline na¨ve En All 11613 93030 13807 62.15 56.76 60.24 0.79 0.74 0.77 1450-1550 197 1250 782 17.77 53.30 97.46 0.41 0.73 0.98 1550-1650 3786 35485 7826 67.17 62.28 74.72 0.82 0.78 0.85 1650-1750 2198 20110 4447 87.58 58.42 82.98 0.94 0.68 0.91 1750-1850 2555 20598 5188 31.00 69.16 74.52 0.65 0.83 0.87 1850-1950 2877 15587 4382 50.92 37.43 49.70 0.81 0.55 0.68 Fr All 2814 26543 10781 40.29 39.66 64.46 0.58 0.57 0.80 1450-1550 1478 14126 7122 28.21 58.66 77.67 0.59 0.83 0.89 1550-1650 1336 12417 5724 52.84 18.64 61.23 0.70 0.28 0.75 temporarydictionaries,andtherefore,benetsmorefromamodelthatassumesnopronunciationknowl-edge.(Whilewemaygetbetterresultsonolderdatausingdictionariesthatarehistoricallyaccurate,thesearenoteasilyavailable,andrequireagreatdealofeffortandlinguisticknowledgetocreate.)InitializingasspeciedaboveandthenrunningEMproducessomeimprovementcomparedtoor-thographicsimilarity(Table2).4AccountingforStanzaDependenciesSofar,wehavetreatedstanzasasbeingindepen-dentofeachother.Inreality,stanzasinapoemareusuallygeneratedusingthesameorsimilarrhymeschemes.Furthermore,somerhymeschemesspanmultiplestanzas–forexample,theItalianformterzarimahastheschemeababcbcdc...(the1stand3rdlinesrhymewiththe2ndlineofthepreviousstanza).4.1GenerativeModelWemodelstanzagenerationwithinapoemasaMarkovprocess,whereeachstanzaisconditionedonthepreviousone.Togenerateapoemyconsist-ingofmstanzas,foreachk2[1;m],generateastanzaxkoflengthnkasdescribedbelow:1.Ifk=1,pickarhymeschemerkoflengthnkwithprobabilityP(rk),andgeneratethestanzaasintheprevioussection.Figure1:ComparisonofEMwithadenition-basedsystem (a)AccuracyandF-Scoreratiosoftherhyming-denition-basedsystemoverthatofourmodelwithorthographicsim-ilarity.TheformerismoreaccuratethanEMforpost-1850data(ratio�1),butisoutperformedbyourmodelforolderpoetry(ratio1),largelyduetopronunciationchangesliketheGreatVowelShiftthatalterrhymingrelations. FoundbyEM Foundbydenitions 1450-1550 left/craft,shone/done edify/lie,adieu/hue 1550-1650 appeareth/weareth, obtain/vain,amend/ speaking/breaking, depend,breed/heed, proue/moue,doe/two prefers/hers 1650-1750 most/cost,presage/ see/family,blade/ rage,join'd/mind shade,noted/quoted 1750-1850 desponds/wounds, gore/shore,ice/vice, o'er/shore,it/basket head/tread,too/blew 1850-1950 of/love,lover/ old/enfold,within/ half-over,again/rain win,be/immortality (b)SomeexamplesofrhymesinEnglishfoundbyEMbutnotthedenition-basedsystem(duetodivergencefromthecontem-porarydictionaryorrhymingdenition),andvice-versa(duetoinadequaterepetition). Table2:PerformanceofEMwithinitializedbyorthographicsimilarity(x3.5),pronunciation-basedrhymingdenitions(x3.6),andtheHMMforstanzadependencies(x4).Boldanditalicsindicatethatweoutperformthena¨vebaselinesshowninTable1. Sub-corpus Accuracy(%) F-Score (time- HMM Rhyming Orthographic Uniform HMM Rhyming Ortho. Uniform period) stanzas denitioninit. initialization initialization stanzas defn.init. init. init. En All 72.48 64.18 63.08 62.15 0.88 0.84 0.83 0.79 1450-1550 74.31 75.63 69.04 17.77 0.86 0.86 0.82 0.41 1550-1650 79.17 69.76 71.98 67.17 0.90 0.86 0.88 0.82 1650-1750 91.23 91.95 89.54 87.58 0.97 0.97 0.96 0.94 1750-1850 49.11 42.74 33.62 31.00 0.82 0.77 0.70 0.65 1850-1950 58.95 57.18 54.05 50.92 0.90 0.89 0.84 0.81 Fr All 56.47 - 48.90 40.29 0.81 - 0.75 0.58 1450-1550 61.28 - 35.25 28.21 0.86 - 0.71 0.59 1550-1650 67.96 - 63.40 52.84 0.79 - 0.77 0.70 2.Ifk�1,pickaschemerkoflengthnkwithprobabilityP(rkjrk�1).Ifnorhymesinrkaresharedwiththepreviousstanza'srhymescheme,rk�1,generatethestanzaasbefore.Ifrksharesrhymeswithrk�1,generatethestanzaasacontinuationofxk�1.Forexam-ple,ifxk�1=[dreams,lay,streams],andrk�1andrk=abaandbcb,thestanzaxkshouldbegeneratedsothatxk1andxk3rhymewithlay.4.2LearningThismodelforapoemcanbeformalizedasanau-toregressiveHMM,anhiddenMarkovmodelwhereeachobservationisconditionedonthepreviousob-servationaswellasthelatentstate.Anobservationatatimestepkisthestanzaxk,andthelatentstateatthattimestepistherhymeschemerk.Thismodelisparametrizedbyand,wherer;q=P(rjq)forallschemesrandq.isinitializedwithorthographicsimilarity.ThelearningalgorithmfollowsfromEMforHMMsandourearlieralgorithm.ExpectationStep:EstimateP(rjx)foreachstanzainthepoemusingtheforward-backwardalgorithm.The`emissionprobability'P(xjr)fortherststanzaissameasinx3,andforsubsequentstanzasxk;k�1isgivenby:P(xkjxk�1;rk)=nkYi=1(1�Ii;rk)P(xki)+Ii;rkYj:rki=rkjP(xkijxkj)Yj:rki=rk�1jP(xkijxk�1j)(6)MaximizationStep:UpdateandanalogouslytoHMMtransitionandemissionprobabilities.4.3ResultsAsTable2shows,thereisconsiderableimprove-mentovermodelsthatassumeindependentstanzas.ThemostgainsarefoundinFrench,whichcontainsmanyinstancesof`linked'stanzasliketheterzarima,aswellasEnglishdatacontaininglongpoemsmadeofseveralstanzaswiththesamescheme.5FutureWorkSomepossibleextensionsofourworkincludeau-tomaticallygeneratingthesetofpossiblerhymeschemesR,andincorporatingpartialsupervisionintoouralgorithmaswellasbetterwaysofusingandadaptingpronunciationinformationwhenavail-able.Wewouldalsoliketotestourmethodonarangeoflanguagesandtexts.Toreturntothemotivations,onecouldusethediscoveredannotationsformachinetranslationofpoetry,ortocomputationallyreconstructpro-nunciations,whichisusefulforhistoricallinguis-ticsaswellasotherapplicationsinvolvingout-of-vocabularywords.AcknowledgmentsWewouldliketothankMorganSondereggerforprovidingmostoftheannotatedEnglishdataintherhymingcorpusandforhelpfuldiscussion,andtheanonymousreviewersfortheirsuggestions.