ARetrieveandEditFrameworkforPredictingStructuredOutputs

1 -

ARetrieveandEditFrameworkforPredictingStructuredOutputs - Description


TatsunoriBHashimotoDepartmentofComputerScienceStanfordUniversitythashimstanfordeduKelvinGuuDepartmentofStatisticsStanfordUniversitykguustanfordeduYonatanOrenDepartmentofComputerScienceStanfordUniversi Download

Tags

pret 150 vjx pdata 150 pret pdata vjx edit yjv minion return data retrieve arxivpreprintarxiv type vjx0 logp 2016

Embed / Share - ARetrieveandEditFrameworkforPredictingStructuredOutputs


Presentation on theme: "ARetrieveandEditFrameworkforPredictingStructuredOutputs"— Presentation transcript


1 ARetrieve-and-EditFrameworkforPredicting
ARetrieve-and-EditFrameworkforPredictingStructuredOutputs TatsunoriB.HashimotoDepartmentofComputerScienceStanfordUniversitythashim@stanford.eduKelvinGuuDepartmentofStatisticsStanfordUniversitykguu@stanford.eduYonatanOrenDepartmentofComputerScienceStanfordUniversityyonatano@stanford.eduPercyLiangDepartmentofComputerScienceStanfordUniversitypliang@cs.stanford.eduAbstractForthetaskofgeneratingcomplexoutputssuchassourcecode,editingexistingoutputscanbeeasierthangeneratingcomplexoutputsfromscratch.Withthismotivation,weproposeanapproachthatrstretrievesatrainingexamplebasedontheinput(e.g.,naturallanguagedescription)andtheneditsittothedesiredoutput(e.g.,code).Ourcontributionisacomputationallyefcientmethodforlearningaretrievalmodelthatembedstheinputinatask-dependentwaywithoutrelyingonahand-craftedmetricorincurringtheexpenseofjointlytrainingtheretrieverwiththeeditor.Ourretrieve-and-editframeworkcanbeappliedontopofanybasemodel.WeshowthatonanewautocompletetaskforGitHubPythoncodeandtheHearthstonecardsbenchmark,retrieve-and-editsignicantlybooststheperformanceofavanillasequence-to-sequencemodelonbothtasks.1IntroductionInpredictiontaskswithcomplexoutputs,generatingwell-formedoutputsischallenging,asiswell-knowninnaturallanguagegeneration[20,28].However,thedesiredoutputmightbeavariationofanother,previously-observedexample[14,13,30,18,24].Othertasksrangingfrommusicgenerationtoprogramsynthesisexhibitthesamephenomenon:manysongsborrowchordstructurefromothersongs,andsoftwareengineersroutinelyadaptcodefromStackOverow.Motivatedbytheseobservations,weadoptthefollowingretrieve-and-editframework(Figure1):1.Retrieve:Givenaninputx,e.g.,anaturallanguagedescription`Sumthersttwoelementsintmp',weusearetrievertochooseasimilartrainingexample(x0;y0),suchas`Sumtherst5itemsinCustomers'.2.Edit:Wethentreaty0fromtheretrievedexampleasa“prototype”anduseaneditortoedititintothedesiredoutputyappropriatefortheinputx.Whilemanyexistingmethodscombineretrievalandediting[13,30,18,24],theseapproachesrelyonaxedhand-craftedorgenericretrievalmechanism.Onedrawbacktothisapproachisthatdesigningatask-specicretrieveristime-consuming,andagenericretrievermaynotperformwellontaskswherexisstructuredorcomplex[40].Ideally,theretrievalmetricwouldbelearnedfromthedatainatask-dependentway:wewishtoconsiderxandx0similaronlyiftheircorrespondingoutputsyandy0differbyasmall,easy-to-performedit.However,thestraightforwardwayoftrainingaretriever32ndConferenceonNeuralInformationProcessingSystems(NeurIPS2018),Montréal,Canada. jointlywiththeeditorwouldrequiresummingoverallpossiblex0foreachexample,whichwouldbeprohibitivelyslow.Inthispaper,weproposeawaytotrainaretrievalmodelthatisoptimizedforthedownstreamedittask.Wersttrainanoisyencoder-decodermodel,carefullyselectingthenoiseandembeddingspacetoensurethatinputsthatreceivesimilarembeddingscanbeeasilyeditedbyanoracleeditor.Wethentraintheeditorbyretrievingaccordingtothislearnedmetric.Themainadvantageofthisapproachisthatitiscomputationallyefcientandrequiresnodomainknowledgeotherthananencoder-decodermodelwithlowreconstructionerror.Weevaluateourretrieve-and-editapproachonanewPythoncodeautocompletedatasetof76kf

2 unctions,wherethetaskistopredictthenextt
unctions,wherethetaskistopredictthenexttokengivenpartiallywrittencodeandnaturallanguagedescription.Weshowthatapplyingtheretrieve-and-editframeworktoastandardsequence-to-sequencemodelboostsitsperformanceby14pointsinBLEUscore[25].Comparingretrievalmethods,learnedretrievalimprovesoveraxed,bag-of-wordsbaselineby6BLEU.WealsoevaluateontheHearthstonecardsbenchmark[22],wheresystemsmustpredictacodesnippetbasedoncardpropertiesandanaturallanguagedescription.Weshowthataugmentingastandardsequence-to-sequencemodelwiththeretrieve-and-editapproachimprovesthemodelby7BLEUandoutperformsthebestnon-abstractsyntaxtree(AST)basedmodelby4points.2ProblemstatementTask.Ourgoalistolearnamodelpmodel(yjx)thatpredictsanoutputy(e.g.,a5–15linecodesnippet)givenaninputx(e.g.,anaturallanguagedescription)drawnfromadistributionpdata.SeeFigure1foranillustrativeexample.Retrieve-and-edit.Theretrieve-and-editframeworkcorrespondstothefollowinggenerativepro-cess:givenaninputx,werstretrieveanexample(x0;y0)fromthetrainingsetDbysamplingusingaretrieveroftheformpret((x0;y0)jx).Wethengenerateanoutputyusinganeditoroftheformpedit(yjx;(x0;y0)).Theoveralllikelihoodofgeneratingygivenxispmodel(yjx)=X(x0;y0)2Dpedit(yjx;(x0;y0))pret((x0;y0)jx);(1)andtheobjectivethatweseektomaximizeisL(pedit;pret):=E(x;y)pdata[logpmodel(yjx)]:(2)Forsimplicity,wefocusondeterministicretrievers,wherepret((x0;y0)jx)isapointmassonaparticularexample(x0;y0).Thismatchesthetypicalapproachforretrieve-and-editmethods,andweleaveextensionstostochasticretrieval[14]andmultipleretrievals[13]tofuturework.Learningtask-dependentsimilarity.Asmentionedearlier,wewouldliketheretrievertoincor-poratetask-dependentsimilarity:twoinputsxandx0shouldbeconsideredsimilaronlyiftheeditorhasahighlikelihoodofeditingy0intoy.Theoptimalretrieverforaxededitorwouldbeonethatmaximizesthestandardmaximummarginallikelihoodobjectiveinequation(1).Aninitialideatolearntheretrievermightbetooptimizeformaximummarginallikelihoodusingstandardapproachessuchasgradientdescentorexpectationmaximization(EM).However,bothof Figure1.Theretrieve-and-editapproachconsistsoftheretriever,whichidentiesarelevantexamplefromthetrainingset,andtheeditor,whichpredictstheoutputconditionedontheretrievedexample.2 Sum the first two elements in Input Retrieved input tmpSum the first 5 items in xx’ np.sum(Customers[:5])Customers )Editor PrototypeGenerated output theseapproachesinvolvesummingoveralltrainingexamplesDoneachtrainingiteration,whichiscomputationallyintractable.Instead,webreakuptheoptimizationproblemintotwoparts.Wersttraintheretrieverinisolation,replacingtheeditmodelpeditwithanoracleeditorpeditandoptimizingalowerboundforthemarginallikelihoodunderthiseditor.Then,giventhisretriever,wetraintheeditorusingthestandardmaximumlikelihoodobjective.Thisdecompositionmakesitpossibletoavoidthecomputationaldifcultiesoflearningatask-dependentretrievalmetric,butimportantly,wewillstillbeabletolearnaretrieverthatistask-dependent.3LearningtoretrieveandeditWerstdescribetheprocedurefortrainingourretriever(Section3.1),whichconsistsofembeddingtheinputsxintoavectorspace(Section3.1.1)andretrievingaccordingtothisembedding.Wethendescribethee

3 ditoranditstrainingprocedure,whichfollow
ditoranditstrainingprocedure,whichfollowsimmediatelyfrommaximizingthemarginallikelihood(Section3.2).3.1RetrieverSections3.1.1–3.1.3willjustifyourtrainingprocedureasmaximizationofalowerboundonthelikelihood;onecanskiptoSection3.1.4fortheactualtrainingprocedureifdesired.WewouldliketotraintheretrieverbasedonL(Equation2),butwedonotyetknowthebehavioroftheeditor.Wecanavoidthisproblembyoptimizingtheretrieverpret,assumingtheeditoristhetrueconditionaldistributionoverthetargetsygiventheretrievedexample(x0;y0)underthejointdistributionpret((x0;y0)jx)pdata(x;y).Wecallthistheoracleeditorforpret,pedit(yj(x0;y0))=Pxpret((x0;y0)jx)pdata(x;y) Px;ypret((x0;y0)jx)pdata(x;y):TheoracleeditorgivesrisetothefollowinglowerboundonsuppeditL(pret;pedit)L(pret):=E(x;y)pdata[E(x0;y0)jxpret[logpedit(yj(x0;y0))]];(3)whichfollowsfromJensen'sinequalityandusingaparticulareditorpeditratherthanthebestpossiblepedit.1Unliketherealeditorpedit,peditdoesnotconditionontheinputxtoensurethattheboundrepresentsthequalityoftheretrievedexamplealone.Next,wewishtondafurtherlowerboundthattakestheformofadistanceminimizationproblem:L(pret)C�Expdata[Ex0jxpret[d(x0;x)2]];(4)whereCisaconstantindependentofpret.Thepretthatmaximizesthislowerboundisthedeterministicretrieverwhichndsthenearestneighbortoxunderthemetricd.Inordertoobtainsuchalowerbound,wewilllearnanencoderp(vjx)anddecoderp(yjv)andusethedistancemetricinthelatentspaceofvasourdistanced.Whenp(vjx)takesaparticularform,wecanshowthatthisresultsinthedesiredlowerbound(4).3.1.1Thelatentspaceasatask-dependentmetricConsideranyencoder-decodermodelwithaprobabilisticencoderp(vjx)anddecoderp(yjv).Wecanshowthatthereisavariationallowerboundthattakesaformsimilarto(4)anddecouplespretfromtherestoftheobjective.Proposition1.Foranydensitiesp(vjx)andp(yjv)andrandomvariables(x;y;x0;y0)pret((x0;y0)jx)pdata(x;y),L(pret)E(x;y)pdata[Evp(vjx)[logp(yjv)]]| {z }:=Lreconstruct(;)�Ex[Ex0jxpret[KL(p(vjx)jjp(vjx0))]]| {z }:=Ldiscrepancy(;pret):(5) 1ThisexpressionistheconditionalentropyH(yjx0;y0).AnalternativeinterpretationofListhatmaximizationwithrespecttopretisequivalenttomaximizingthemutualinformationbetweenyand(x0;y0).3 ProofTheinequalityfollowsfromstandardargumentsonvariationalapproximations.Sincepedit(yj(x0;y0))istheconditionaldistributionimpliedbythejointdistribution(x0;y0;x;y),wehave:Eyjx0;y0pedit[logpedit(yj(x0;y0))]Eyjx0;y0peditlogZp(yjv)p(vjx0)dv;whereRp(yjv)p(vjx0)dvisjustanotherdistribution.Takingtheexpectationofbothsideswithrespectto(x;x0;y0)andapplyinglawoftotalexpectationyields:L(pret)E(x;y)pdataE(x0;y0)jxpretlogZp(yjv)p(vjx0)dv:(6)Next,weapplythestandardevidencelowerbound(ELBO)onthelatentvariablevwithvariationaldistributionp(vjx).ThiscontinuesthelowerboundsE(x;y)pdataE(x0;y0)jxpretEvjxp[logp(yjv)]�KL(p(vjx)jjp(vjx0))E(x;y)pdata[Evjxp[logp(yjv)]]�Expdata[Ex0pret

4 [KL(p(vjx)jjp(vjx0))]];whereth
[KL(p(vjx)jjp(vjx0))]];wherethelastinequalityisjustcollapsingexpectations. Proposition1takestheformofthedesiredlowerbound(4),sinceitdecouplesthereconstructiontermE(x;y)pdataEvjxp[logp(yjv)]fromadiscrepancytermKL(p(vjx)jjp(vjx0)).However,therearetwodifferencesbetweentheearlierlowerbound(4)andourderivedresult.TheKL-divergencemaynotrepresentadistancemetric,andthereisdependenceonunknownparameters(;).Wewillresolvetheseproblemsnext.3.1.2TheKL-divergenceasadistancemetricWewillnowshowthatforaparticularchoiceofp,theKLdivergenceKL(p(vjx)jjp(vjx0))takestheformofasquareddistancemetric.Inparticular,choosep(vjx)tobeavonMises-Fisherdistributionoverunitvectorscenteredontheoutputofanencoder(x):p(vjx)=vMF(v;(x);)=Cexp�(x)�v;(7)wherebothvand(x)areunitvectors,andCisanormalizationconstantdependingonlyondand.ThevonMises-FisherdistributionpturnstheKLdivergencetermintoasquaredEuclideandistanceontheunitsphere(seetheAppendixA).Thisfurthersimpliesthediscrepancyterm(5)toLdiscrepancy(;pret)=CExpdata[Ex0pret[k(x)�(x0)k22]];(8)TheKLdivergenceonotherdistributionssuchastheGaussiancanalsobeexpressedasadistancemetric,butwechoosethevon-MisesFishersincetheKLdivergenceisupperboundedbyaconstant,apropertythatwewillusenext.Theretrieverpretthatminimizes(8)deterministicallyretrievesthex0thatisclosesttoxaccordingtotheembedding.Forefciency,weimplementthisretrieverusingacosine-LSHhashviatheannoyPythonlibrary,whichwefoundtobebothaccurateandscalable.3.1.3Settingtheencoder-decoderparameters(;)Anychoiceof(;)turnsProposition1intoalowerboundoftheform(4),buttheboundcanpotentiallybeverylooseiftheseparametersarechosenpoorly.Jointoptimizationover(;;pret)iscomputationallyexpensive,asitrequiresasumoverthepotentialretrievedexamples.Instead,wewilloptimize;withrespecttoaconservativelowerboundthatisindependentofpret.Forthevon-MisesFisherdistribution,KL(p(vjx)jjp(vjx0))2C,andthusE(x;y)pdata[Evjxp[logp(yjv)]]�Expdata[Ex0pret[KL(p(vjx)jjp(vjx0))]]E(x;y)pdata[Evjxp[logp(yjv)]]�2C:Therefore,wecanoptimize;withrespecttothisworst-casebound.Thislowerboundobjectiveisanalogoustotherecentlyproposedhypersphericalvariationalautoencoderandisstraightforwardtotrainusingreparametrizationgradients[9,14,38].Ourtrainingprocedureconsistsofapplyingminibatchstochasticgradientdescenton(;)wheregradientsinvolvingvarecomputedwiththereparametrizationtrick.4 3.1.4OverallprocedureTheoverallretrievaltrainingprocedureconsistsoftwosteps:1.Trainanencoder-decodertomapeachinputxintoanembeddingvthatcanreconstructtheoutputy:(^;^):=argmax;E(x;y)pdata[Evjxp[logp(yjv)]]:(9)2.Settheretrievertobethedeterministicnearestneighborinputinthetrainingsetundertheencoder:^pret(x0;y0jx):=1[(x0;y0)=argmin(x0;y0)2Dk^(x)�^(x0)k22]:(10)3.2EditorTheprocedureinSection3.1.4returnsaretriever^prettha

5 tmaximizesalowerboundonL,whichisde&#
tmaximizesalowerboundonL,whichisdenedintermsoftheoracleeditorpedit.Sincewedonothaveaccesstotheoracleeditorpedit,wetraintheeditorpedittodirectlymaximizeL(pedit;^pret).Specically,wesolvetheoptimizationproblem:argmaxpeditE(x;y)pdata[E(x0;y0)^pret[logpedit(yjx;(x0;y0))]]:(11)Inourexperiments,weletpeditbeastandardsequence-to-sequencemodelwithattentionandcopying[12,36](seeAppendixBfordetails),butanymodelarchitecturecanbeusedfortheeditor.4ExperimentsWeevaluateourretrieve-and-editframeworkontwotasks.First,weconsideracodeautocompletetaskoverPythonfunctionstakenfromGitHubandshowthatretrieve-and-editsubstantiallyoutper-formsapproachesbasedonlyonsequence-to-sequencemodelsorretrieval.Then,weconsidertheHearthstonecardsbenchmarkandshowthatretrieve-and-editcanboosttheaccuracyofexistingsequence-to-sequencemodels.Forbothexperiments,thedatasetisprocessedbystandardspace-and-punctuationtokenization,andweruntheretrieveandeditmodelwithrandomlyinitializedwordvectorsand=500,whichweobtainedbyevaluatingBLEUscoresonthedevelopmentsetofbothdatasets.Boththeretrieverandeditorweretrainedfor1000iterationsonHearthstoneand3000onGitHubviaADAMminibatchgradientdescent,withbatchsize16andalearningrateof0.001.4.1AutocompleteonPythonGitHubcodeGivenanaturallanguagedescriptionofaPythonfunctionandapartiallywrittencodefragment,thetaskistoreturnacandidatelistofk=1;5;10nexttokens(Figure2).Amodelpredictscorrectlyifthegroundtruthtokenisinthecandidatelist.Theperformanceofamodelisdenedintermsoftheaverageormaximumnumberofsuccessivetokenscorrectlypredicted.Dataset.OurPythonautocompletedatasetisarepresentativesampleofPythoncodefromGitHub,obtainedfromGoogleBigquerybyretrievingPythoncodecontainingatleastoneblockcommentwithrestructuredtext(reST)formatting(SeeAppendixCfordetails).Weusethisdatatoformacodepredictiontaskwhereeachexampleconsistsoffourinputs:theblockcomment,functionname,arguments,andapartiallywrittenfunctionbody.Theoutputisthenexttokeninthefunctionbody.Toavoidthepossibilitythatrepositoryforksandduplicatedlibrarylesresultinalargenumberofduplicatefunctions,weexplicitlydeduplicatedalllesbasedonboththelecontentsandrepositorypathname.Wealsoremovedanyduplicatefunction/docstringpairsandsplitthetrainandtestsetattherepositorylevel.Wetokenizedusingspaceandpunctuationandkeptonlyfunctionswithatmost150tokens,asthelongerfunctionsarenearlyimpossibletopredictfromthedocstring.Thisresultedinatrainingsetof76kPythonfunctions.5 LongestcompletedlengthAvgcompletionlengthBLEUk=1k=5k=10k=1k=5k=10 Retrieve-and-edit(Retrieve+Edit)17.620.921.95.87.58.134.7Seq2Seq10.612.513.22.53.43.819.2Retrieveronly(TaskRetriever)13.54.729.9 Table1.Retrieve-and-editsubstantiallyimprovestheperformanceoverbaselinesequence-to-sequencemodels(Seq2Seq)andtrainedretrievalwithoutediting(TaskRetriever)onthePythonautocompletedataset.kindicatesthenumberofcandidatesoverbeam-searchconsideredforpredictingatoken,andcompletionlengthisthenumberofsuccessivetokensthatarecorrectlypredicted.LongestcompletedlengthAvgcompletionlengthBLEU TaskRetriever13.54.729.9InputRetriever12.34.129.8LexicalRetriever9.83.423.1 Table2.Retrieversbasedonthenoisyencoder-decoder(TaskRetriever)outpe

6 rformaretrieverbasedonbag-of-wordvectors
rformaretrieverbasedonbag-of-wordvectors(LexicalRetriever).Learninganencoder-decoderontheinputsalone(InputRetriever)resultsinaslightlossinaccuracy.Results.Comparingtheretrieve-and-editmodel(Retrieve+Edit)toasequence-to-sequencebaseline(Seq2Seq)whosearchitectureandtrainingprocedurematchesthatoftheeditor,wendthatretrievaladdssubstantialperformancegainsonallmetricswithnodomainknowledgeorhand-craftedfeatures(Table1).Wealsoevaluatevariousretrievers:TaskRetriever,whichisourtask-dependentretrieverpresentedinSection3.1;LexicalRetriever,whichembedstheinputtokensusingabag-of-wordvectorsandretrievesbasedoncosinesimilarity;andInputRetriever,whichusesthesameencoder-decoderarchitectureasTaskRetrieverbutmodiesthedecodertopredictxratherthany.Table2showsthatTaskRetrieversignicantlyoutperformsLexicalRetrieveronallmetrics,butiscomparabletoInputRetrieveronBLEUandslightlybetterontheautocompletemetrics.Wedidnotdirectlycomparetoabstractsyntaxtree(AST)basedmethodsheresincetheydonothaveadirectwaytoconditiononpartially-generatedcode,whichisneededforautocomplete.ExamplesofpredictedoutputsinFigure2demonstratethatthedocstringdoesnotfullyspecifythestructureoftheoutputcode.Despitethis,theretrieval-basedmethodsaresometimesabletoretrieverelevantfunctions.Intheexample,theretrieverlearnstoreturnafunctionthathasasimilarconditionalcheck.Retrieve+Editdoesnothaveenoughinformationtopredictthetruefunctionandthereforepredictsagenericconditional(ifnotb_data).Incontrast,theseq2seqdefaultstopredictingagenericgetterfunctionratherthanaconditional. Figure2.ExamplefromthePythonautocompletedatasetalongwiththeretrievedexampleusedduringprediction(topcenter)andbaselines(rightpanels).Theeditedoutput(bottomcenter)mostlyfollowstheretrievedexamplebutreplacestheconditionalwithagenericone.6 is_encrypted Test if this is vault encrypted data blob :arg data: a python2 str or a python3 'bytes' to test whether it is recognized as vault encrypted data :returns: True if it is recognized. Otherwise, False. b_datadef is_encrypted(b_data): if b_data.startswith(b_HEADER): return True return Falsedef is_encrypted(b_data): if not b_data.startswith(b_HEADER): return True return False def is_encrypted(b_data): if b_data.startswith(b_HEADER): return True return b_data.get() return False def check_if_finished(pclip): duration=pclip.clip.duration started=pclip.started if(datetime.now()-started).total_seconds �()duration:

7 return True return False@classmetho
return True return False@classmethod def delete_grid_file(cls,file): ret_val=gxapi_cy.WrapSYS._delete_grid_file( GXContext . _get_tls_geo(), file.encode()) return ret_val Ground truthRetrieved prototypeFixed retrievalEdited outputSeq2seq only Input y’ y Figure3.ExamplefromtheHearthstonevalidationset(leftpanels)andtheretrievedexampleusedduringprediction(topright).Theoutput(bottomright)differswiththegoldstandardonlyinomittinganoptionalvariabledenition(minion_type). BLEUAccuracy ASTbased AbstractSyntaxNetwork(ASN)[26]79.222.7Yinetal[41]75.816.2 NonASTmodels Retrieve+Edit(thiswork)70.09.1LatentPredictorNetwork[22]65.64.5Retriever[22]62.50.0Sequence-to-sequence[22]60.41.5StatisticalMT[22]43.20.0 Table3.Retrieve-and-editsubstantiallyimprovesuponstandardsequence-to-sequenceapproachesforHearthstone,andclosesthegaptoAST-basedmodels.4.2HearthstonecardsbenchmarkTheHearthstonecardsbenchmarkconsistsof533cardsinacomputercardgame,whereeachcardisassociatedwithacodesnippet.ThetaskistooutputaPythonclassgivenacarddescription.Figure3showsatypicalexamplealongwiththeretrievedexampleandeditedoutput.Thesmallsizeofthisdatasetmakesitchallengingforsequence-to-sequencemodelstoavoidoverttingtothetrainingset.Indeed,ithasbeenobservedthatnaivesequence-to-sequenceapproachesperformquitepoorly[22].Forquantitativeevaluation,wecomputeBLEUandexactmatchprobabilitiesusingthetokenizationandevaluationschemeof[41].Retrieve+Editprovidesa7pointimprovementinBLEUoverthesequence-to-sequenceandretrievalbaselines(Table4.2)and4pointsoverthebestnon-ASTbasedmethod,despitethefactthatoureditorisavanillasequence-to-sequencemodel.MethodsbasedonASTsstillachievethehighestBLEUandexactmatchscores,butweareabletosignicantlynarrowthegapbetweenspecializedcodegenerationtechniquesandvanillasequence-to-sequencemodelsifthelatterisboostedwiththeretrieve-and-editframework.Notethatretrieve-and-editcouldalsobeappliedtoAST-basedmodels,whichwouldbeaninterestingdirectionforfuturework.Analysisofexampleoutputsshowsthatforthemostpart,theretrieverndsrelevantcards.Asanexample,Figure3showsaretrievedcard(DarkIronDwarf)thatfunctionssimilarlytothedesiredoutput(Spellbreaker).Bothcardssharethesamecardtypeandattributes,bothhaveabattlecry,whichisapieceofcodethatexecuteswheneverthecardisplayed,andthisbattlecryconsistsofmodifyingtheattributesofanothercard.Ourpredictedoutputcorrectsnearlyallmistakesintheretrievedoutput,identifyingthatthemodicationshouldbechangedfromChangeAttacktoSilence.Theoutput7 Name: Spellbreaker Stats: ATK4 DEF3 COST4 DUR-1 Type: Minion Class: Neutral Minion type: NIL Rarity: CommonDescription : TurnEnded(player=CurrentPlayer()))) MinionSelector(players=BothPlayer(), Ground truthEdited outputRetrieved pro

8 totypeInput xy’ yclass Spellbreaker
totypeInput xy’ yclass Spellbreaker (MinionCard): def __init__(self): super().__init__ ("Spellbreaker",4, CHARACTER_CLASS.ALL,CARD_RARITY.COMMON, minion_type=MINION_TYPE.NONE, battlecry=Battlecry(Silence(), MinionSelector(players=BothPlayer(), picker = UserPicker()))) def create_minion(self, player): return Minion(4, 3)class Spellbreaker (MinionCard): def __init__(self): super().__init__ ("Spellbreaker",4, CHARACTER_CLASS.ALL,CARD_RARITY.COMMON, minion_type=MINION_TYPE.NONE, battlecry=Battlecry(Silence(), MinionSelector(players=BothPlayer(), picker = UserPicker()))) def create_minion(self, player): return Minion(4, 3)Blue text: missing from generation, but appears in ground truthRed text: appears in generation, but not in ground truth differsfromthegoldstandardononlyoneline:omittingthelineminion_type=MINION_TYPE.none.Incidentally,itturnsoutthatthisisnotanactualsemanticerrorsinceMINION_TYPE.noneisthedefaultsettingforthiseld,andtheretrievedDarkIronDwarfcardalsoomitsthiseld.5RelatedworkRetrievalmodelsfortextgeneration.Theuseofretrievalintextgenerationdatesbacktoearlyexample-basedmachinetranslationsystemsthatretrievedandadaptedphrasesfromatranslationdatabase[33].Recentworkondialoguegeneration[40,30,37]proposedajointsysteminwhichanRNNistrainedtotransformaretrievedcandidate.Closelyrelatedworkinmachinetranslation[13]augmentsaneuralmachinetranslationmodelwithsentencepairsfromthetrainingsetretrievedbyanoff-the-shelfsearchengine.Retrieval-augmentedmodelshavealsobeenusedinimagecaptioning[18,24].Thesemodelsgeneratecaptionsofanimageviaasentencecompressionschemefromaninitialcaptionretrievedbasedonimagecontext.Ourworkdiffersfromalltheaboveconceptuallyindesigningretrievalsystemsexplicitlyforthetaskofediting,ratherthanusingxedretrievers(e.g.,basedonlexicaloverlap).Ourworkalsodemonstratesthatretrieve-and-editcanboosttheperformanceofvanillasequence-to-sequencemodelswithouttheuseofdomain-specicretrievers.Arelatededit-basedmodel[14]hasalsoproposededitingexamplesasawaytoaugmenttextgeneration.However,thetasktherewasunconditionalgeneration,andexampleswerechosenbyrandomsampling.Incontrast,ourworkfocusesonconditionalsequencegenerationwithadeterministicretriever,whichcannotbesolvedusingthesamerandomsamplingandeditingapproach.Embeddingmodels.EmbeddingsentencesusingnoisyautoencodershasbeenproposedearlierasasentenceVAE[5],whichdemonstratedthataGaussianVAEcapturessemanticstructureinalatentvectorspace.Relatedworkonusingthevon-MisesFisherdistributionforVAEshowsthatsentencescanalsoberepresentedusinglatentvectorsontheunitsphere[9,14,38].Ourencoder-decoderisbasedonthesametypeofVAE,showingthatthelatentspaceofanoisyencoder-decoderisappropriateforretrieval.Semantichashingbyautoencoders[16]isarelatedideawhereanautoencoder'slatentrepresentationisusedtoconstructahashfunctiontoidentifysimilarimagesortexts[29,6].Arelatedideaiscross-modalembeddings,whichjointlyembedandalignitemsindifferentdomains(suchasimagesa

9 ndcaptions)usingautoencoders[39,2,31,10]
ndcaptions)usingautoencoders[39,2,31,10].Bothoftheseapproachesseektolearngeneralsimilaritymetricsbetweenexamplesforthepurposeofidentifyingdocumentsorimagesthataresemanticallysimilar.Ourworkdiffersfromtheseapproachesinthatweconsidertask-specicembeddingsthatconsideritemstobesimilaronlyiftheyareusefulforthedownstreamedittaskandderiveboundsthatconnectsimilarityinalatentmetrictoeditability.Learnedretrieval.Somequestionansweringsystemslearntoretrievebasedonsupervisionofthecorrectitemtoretrieve[35,27,19],buttheseapproachesdonotapplytooursettingsincewedonotknowwhichitemsareeasytoeditintoourtargetsequenceyandmustinsteadestimatethisfromtheembedding.Therehavealsobeenrecentproposalsforscalablelarge-scalelearnedmemorymodels[34]thatcanlearnaretrievalmechanismbasedonaknownreward.Whiletheseapproachesmaketrainingprettractableforaknownpedit,theydonotresolvetheproblemthatpeditisnotxedorknown.Codegeneration.Codegenerationiswellstudied[21,17,4,23,1],buttheseapproacheshavenotexplorededit-basedgeneration.RecentcodegenerationmodelshavealsoconstrainedtheoutputstructurebasedonASTs[26,41]orusedspecializedcopymechanismsforcode[22].Ourgoaldiffersfromtheseworksinthatweuseretrieve-and-editasageneral-purposemethodtoboostmodelperformance.Weconsiderdsimplesequence-to-sequencemodelsasanexample,buttheframeworkisagnostictotheeditorandcouldalsobeusedwithspecializedcodegenerationmodels.RecentworkappearingaftersubmissionofthisworksupportsthishypothesisbyshowingthataugmentingAST-basedmodelswithASTsubtreesretrievedviaeditdistancecanboosttheperformanceofAST-basedmodels[15].Nonparametricmodelsandmixturemodels.Ourmodelisrelatedtononparametricregressiontechniques[32],whereinourcase,proximitylearnedbytheencodercorrespondstoaneighborhood,8 andtheeditorisalearnedkernel.Adaptivekernelsfornonparametricregressionarewell-studied[11]buthavemainlyfocusedonlearninglocalsmoothnessparametersratherthanthefunctionalformofthekernel.Moregenerally,theideaofconditioningonretrievedexamplesisaninstanceofamixturemodel,andthesetypesofensemblingapproacheshavebeenshowntoboosttheperformanceofsimplebasemodelsontaskssuchaslanguagemodeling[7].Onecanviewretrieve-and-editasanothertypeofmixturemodel.6DiscussionInthiswork,weconsideredthetaskofgeneratingcomplexoutputssuchassourcecodeusingstandardsequence-to-sequencemodelsaugmentedbyalearnedretriever.Weshowthatlearningaretrieverusinganoisyencoder-decodercannaturallycombinethedesiretoretrieveexamplesthatmaximizedownstreameditabilitywiththecomputationalefciencyofcosineLSH.Usingthisapproach,wedemonstratedthatourmodelcannarrowthegapbetweenspecializedcodegenerationmodelsandvanillasequence-to-sequencemodelsontheHearthstonedataset,andshowsubstantialimprovementsonaPythoncodeautocompletetaskoversequence-to-sequencebaselines.Reproducibility.DataandcodeusedtogeneratetheresultsofthispaperareavailableontheCodaLabWorksheetsplatformathttps://worksheets.codalab.org/worksheets/0x1ad3f387005c492ea913cf0f20c9bb89/.Acknowledgements.ThisworkwasfundedbytheDARPACwCprogramunderAROprimecontractno.W911NF-15-1-0462.References[1]M.Allamanis,D.Tarlow,A.Gordon,andY.Wei.Bimodalmodellingofsourcecodeandnaturallanguage.InInternationalConferenceonMachineLearning(ICML

10 ),pages2123–2132,2015.[2]G.Andrew,R
),pages2123–2132,2015.[2]G.Andrew,R.Arora,J.Bilmes,andK.Livescu.Deepcanonicalcorrelationanalysis.InInternationalConferenceonMachineLearning(ICML),pages1247–1255,2013.[3]D.Bahdanau,K.Cho,andY.Bengio.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.InInternationalConferenceonLearningRepresentations(ICLR),2015.[4]M.Balog,A.L.Gaunt,M.Brockschmidt,S.Nowozin,andD.Tarlow.Deepcoder:Learningtowriteprograms.arXivpreprintarXiv:1611.01989,2016.[5]S.R.Bowman,L.Vilnis,O.Vinyals,A.M.Dai,R.Jozefowicz,andS.Bengio.Generatingsentencesfromacontinuousspace.InComputationalNaturalLanguageLearning(CoNLL),pages10–21,2016.[6]S.ChaidaroonandY.Fang.Variationaldeepsemantichashingfortextdocuments.InACMSpecialInterestGrouponInformationRetreival(SIGIR),pages75–84,2017.[7]C.Chelba,T.Mikolov,M.Schuster,Q.Ge,T.Brants,P.Koehn,andT.Robinson.Onebillionwordbenchmarkformeasuringprogressinstatisticallanguagemodeling.arXivpreprintarXiv:1312.3005,2013.[8]K.Cho,B.vanMerrienboer,C.Gulcehre,D.Bahdanau,F.Bougares,H.Schwenk,andY.Bengio.LearningphraserepresentationsusingRNNencoder-decoderforstatisticalmachinetranslation.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages1724–1734,2014.[9]T.R.Davidson,L.Falorsi,N.D.Cao,T.Kipf,andJ.M.Tomczak.Hypersphericalvariationalauto-encoders.arXivpreprintarXiv:1804.00891,2018.[10]F.Feng,X.Wang,andR.Li.Cross-modalretrievalwithcorrespondenceautoencoder.InProceedingsofthe22NdACMInternationalConferenceonMultimedia,pages7–16,2014.9 [11]A.GoldenshlugerandA.Nemirovski.Onspatiallyadaptiveestimationofnonparametricregression.MathematicalMethodsofStatistics,6:135–170,1997.[12]J.Gu,Z.Lu,H.Li,andV.O.Li.Incorporatingcopyingmechanisminsequence-to-sequencelearning.InAssociationforComputationalLinguistics(ACL),2016.[13]J.Gu,Y.Wang,K.Cho,andV.O.Li.Searchengineguidednon-parametricneuralmachinetranslation.arXivpreprintarXiv:1705.07267,2017.[14]K.Guu,T.B.Hashimoto,Y.Oren,andP.Liang.Generatingsentencesbyeditingprototypes.TransactionsoftheAssociationforComputationalLinguistics(TACL),0,2018.[15]S.A.Hayati,R.Olivier,P.Avvaru,P.Yin,A.Tomasic,andG.Neubig.Retrieval-basedneuralcodegeneration.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2018.[16]A.KrizhevskyandG.E.Hinton.Usingverydeepautoencodersforcontent-basedimageretrieval.In19thEuropeanSymposiumonArticialNeuralNetworks,ComputationalIntelligenceandMachineLearning(ESANN),pages489–494,2011.[17]N.KushmanandR.Barzilay.Usingsemanticunicationtogenerateregularexpressionsfromnaturallanguage.InHumanLanguageTechnologyandNorthAmericanAssociationforComputationalLinguistics(HLT/NAACL),pages826–836,2013.[18]P.Kuznetsova,V.Ordonez,A.C.Berg,T.L.Berg,andY.Choi.Generalizingimagecaptionsforimage-textparallelcorpus.InAssociationforComputationalLinguistics(ACL),pages790–796,2013.[19]T.Lei,H.Joshi,R.Barzilay,T.Jaakkola,K.Tymoshenko,A.Moschitti,andL.Marquez.Semi-supervisedquestionretrievalwithgatedconvolutions.InNorthAmericanAssociationforComputationalLinguistics(NAACL),pages1279–1289,2016.[20]J.Li,W.Monroe,T.Shi,A.Ritter,andD.Jurafsky.Adversariallearningforneuraldialoguegeneration.arXivpreprintarXiv:1701.06547,2017.[21]P.Liang,M.I.Jordan

11 ,andD.Klein.Learningprograms:Ahierarchic
,andD.Klein.Learningprograms:AhierarchicalBayesianapproach.InInternationalConferenceonMachineLearning(ICML),pages639–646,2010.[22]W.Ling,E.Grefenstette,K.M.Hermann,T.Kociský,A.Senior,F.Wang,andP.Blunsom.Latentpredictornetworksforcodegeneration.InAssociationforComputationalLinguistics(ACL),pages599–609,2016.[23]C.MaddisonandD.Tarlow.Structuredgenerativemodelsofnaturalsourcecode.InInterna-tionalConferenceonMachineLearning(ICML),pages649–657,2014.[24]R.MasonandE.Charniak.Domain-specicimagecaptioning.InComputationalNaturalLanguageLearning(CoNLL),pages2–10,2014.[25]K.Papineni,S.Roukos,T.Ward,andW.Zhu.BLEU:Amethodforautomaticevaluationofmachinetranslation.InAssociationforComputationalLinguistics(ACL),2002.[26]M.Rabinovich,M.Stern,andD.Klein.Abstractsyntaxnetworksforcodegenerationandsemanticparsing.InAssociationforComputationalLinguistics(ACL),2017.[27]A.SeverynandA.Moschitti.Learningtorankshorttextpairswithconvolutionaldeepneuralnetworks.InACMSpecialInterestGrouponInformationRetreival(SIGIR),pages373–382,2015.[28]L.Shao,S.Gouws,D.Britz,A.Goldie,B.Strope,andR.Kurzweil.Generatinghigh-qualityandinformativeconversationresponseswithsequence-to-sequencemodels.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pages2210–2219,2017.[29]D.Shen,Q.Su,P.Chapfuwa,W.Wang,G.Wang,R.Henao,andL.Carin.Nash:Towardend-to-endneuralarchitectureforgenerativesemantichashing.InAssociationforComputationalLinguistics(ACL),pages2041–2050,2018.10 [30]Y.Song,R.Yan,X.Li,D.Zhao,andM.Zhang.Twoarebetterthanone:Anensembleofretrieval-andgeneration-baseddialogsystems.arXivpreprintarXiv:1610.07149,2016.[31]N.SrivastavaandR.R.Salakhutdinov.MultimodallearningwithdeepBoltzmannmachines.InAdvancesinNeuralInformationProcessingSystems(NIPS),pages2222–2230,2012.[32]C.J.Stone.Consistentnonparametricregression.AnnalsofStatistics,5,1977.[33]E.SumitaandH.Iida.Experimentsandprospectsofexample-basedmachinetranslation.InAssociationforComputationalLinguistics(ACL),1991.[34]W.Sun,A.Beygelzimer,H.Daume,J.Langford,andP.Mineiro.Contextualmemorytrees.arXivpreprintarXiv:1807.06473,2018.[35]M.Tan,C.dosSantos,B.Xiang,andB.Zhou.LSTM-baseddeeplearningmodelsfornon-factoidanswerselection.arXivpreprintarXiv:1511.04108,2015.[36]Y.Wu,M.Schuster,Z.Chen,Q.V.Le,M.Norouzi,W.Macherey,M.Krikun,Y.Cao,Q.Gao,K.Macherey,etal.Google'sneuralmachinetranslationsystem:Bridgingthegapbetweenhumanandmachinetranslation.arXivpreprintarXiv:1609.08144,2016.[37]Y.Wu,F.Wei,S.Huang,Z.Li,andM.Zhou.Responsegenerationbycontext-awareprototypeediting.arXivpreprintarXiv:1806.07042,2018.[38]J.XuandG.Durrett.Sphericallatentspacesforstablevariationalautoencoders.InEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2018.[39]F.YanandK.Mikolajczyk.Deepcorrelationformatchingimagesandtext.InComputerVisionandPatternRecognition(CVPR),pages3441–3450,2015.[40]R.Yan,Y.Song,andH.Wu.Learningtorespondwithdeepneuralnetworksforretrieval-basedhuman-computerconversationsystem.InACMSpecialInterestGrouponInformationRetreival(SIGIR),pages55–64,2016.[41]P.YinandG.Neubig.Asyntacticneuralmodelforgeneral-purposecodegeneration.InAssociationforComputationalLinguistics(ACL),pages440–45

Shom More....
By: bety
Views: 0
Type: Public

Download Section

Please download the presentation after appearing the download area.


Download - The PPT/PDF document "ARetrieveandEditFrameworkforPredictingStructuredOutputs" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Try DocSlides online tool for compressing your PDF Files Try Now