/
DistributedRepresentationsofSentencesandDocuments DistributedRepresentationsofSentencesandDocuments

DistributedRepresentationsofSentencesandDocuments - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
366 views
Uploaded On 2015-12-01

DistributedRepresentationsofSentencesandDocuments - PPT Presentation

semanticallysimilarwordshavesimilarvectorrepresentationseg147strong148iscloseto147powerful148Followingthesesuccessfultechniquesresearchershavetriedtoextendthemodelstogobeyondwordle ID: 211510

semanticallysimilarwordshavesimilarvectorrepresenta-tions(e.g. “strong”iscloseto“powerful”).Followingthesesuccessfultechniques researchershavetriedtoextendthemodelstogobeyondwordle

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "DistributedRepresentationsofSentencesand..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DistributedRepresentationsofSentencesandDocuments semanticallysimilarwordshavesimilarvectorrepresenta-tions(e.g.,“strong”iscloseto“powerful”).Followingthesesuccessfultechniques,researchershavetriedtoextendthemodelstogobeyondwordleveltoachievephrase-levelorsentence-levelrepresenta-tions(Mitchell&Lapata,2010;Zanzottoetal.,2010;Yessenalina&Cardie,2011;Grefenstetteetal.,2013;Mikolovetal.,2013c).Forinstance,asimpleapproachisusingaweightedaverageofallthewordsinthedocument.Amoresophisticatedapproachiscombiningthewordvec-torsinanordergivenbyaparsetreeofasentence,usingmatrix-vectoroperations(Socheretal.,2011b).Bothap-proacheshaveweaknesses.Therstapproach,weightedaveragingofwordvectors,losesthewordorderinthesamewayasthestandardbag-of-wordsmodelsdo.Thesecondapproach,usingaparsetreetocombinewordvectors,hasbeenshowntoworkforonlysentencesbecauseitreliesonparsing.ParagraphVectoriscapableofconstructingrepresentationsofinputsequencesofvariablelength.Unlikesomeofthepreviousapproaches,itisgeneralandapplicabletotextsofanylength:sentences,paragraphs,anddocuments.Itdoesnotrequiretask-specictuningofthewordweightingfunc-tionnordoesitrelyontheparsetrees.Furtherinthepaper,wewillpresentexperimentsonseveralbenchmarkdatasetsthatdemonstratetheadvantagesofParagraphVector.Forexample,onsentimentanalysistask,weachievenewstate-of-the-artresults,betterthancomplexmethods,yieldingarelativeimprovementofmorethan16%intermsoferrorrate.Onatextclassicationtask,ourmethodconvincinglybeatsbag-of-wordsmodels,givingarelativeimprovementofabout30%.2.AlgorithmsWestartbydiscussingpreviousmethodsforlearningwordvectors.ThesemethodsaretheinspirationforourPara-graphVectormethods.2.1.LearningVectorRepresentationofWordsThissectionintroducestheconceptofdistributedvectorrepresentationofwords.AwellknownframeworkforlearningthewordvectorsisshowninFigure1.Thetaskistopredictawordgiventheotherwordsinacontext.Inthisframework,everywordismappedtoauniquevec-tor,representedbyacolumninamatrixW.Thecolumnisindexedbypositionofthewordinthevocabulary.Theconcatenationorsumofthevectorsisthenusedasfeaturesforpredictionofthenextwordinasentence.Moreformally,givenasequenceoftrainingwordsw1;w2;w3;:::;wT,theobjectiveofthewordvectormodel Figure1.Aframeworkforlearningwordvectors.Contextofthreewords(“the,”“cat,”and“sat”)isusedtopredictthefourthword(“on”).Theinputwordsaremappedtocolumnsofthema-trixWtopredicttheoutputword.istomaximizetheaveragelogprobability1 TT�kXt=klogp(wtjwt�k;:::;wt+k)Thepredictiontaskistypicallydoneviaamulticlassclas-sier,suchassoftmax.There,wehavep(wtjwt�k;:::;wt+k)=eywt PieyiEachofyiisun-normalizedlog-probabilityforeachoutputwordi,computedasy=b+Uh(wt�k;:::;wt+k;W)(1)whereU;barethesoftmaxparameters.hisconstructedbyaconcatenationoraverageofwordvectorsextractedfromW.Inpractice,hierarchicalsoftmax(Morin&Bengio,2005;Mnih&Hinton,2008;Mikolovetal.,2013c)ispreferredtosoftmaxforfasttraining.Inourwork,thestructureofthehieraricalsoftmaxisabinaryHuffmantree,whereshortcodesareassignedtofrequentwords.Thisisagoodspeeduptrickbecausecommonwordsareaccessedquickly.ThisuseofbinaryHuffmancodeforthehierarchyisthesamewith(Mikolovetal.,2013c).Theneuralnetworkbasedwordvectorsareusuallytrainedusingstochasticgradientdescentwherethegra-dientisobtainedviabackpropagation(Rumelhartetal.,1986).Thistypeofmodelsiscommonlyknownasneurallanguagemodels(Bengioetal.,2006).Aparticularimplementationofneuralnetworkbasedal-gorithmfortrainingthewordvectorsisavailableatcode.google.com/p/word2vec/(Mikolovetal.,2013a).Afterthetrainingconverges,wordswithsimilarmeaningaremappedtoasimilarpositioninthevectorspace.For DistributedRepresentationsofSentencesandDocuments example,“powerful”and“strong”areclosetoeachother,whereas“powerful”and“Paris”aremoredistant.Thedif-ferencebetweenwordvectorsalsocarrymeaning.Forex-ample,thewordvectorscanbeusedtoansweranalogyquestionsusingsimplevectoralgebra:“King”-“man”+“woman”=“Queen”(Mikolovetal.,2013d).Itisalsopos-sibletolearnalinearmatrixtotranslatewordsandphrasesbetweenlanguages(Mikolovetal.,2013b).Thesepropertiesmakewordvectorsattractiveformanynaturallanguageprocessingtaskssuchaslanguagemod-eling(Bengioetal.,2006;Mikolov,2012),naturallan-guageunderstanding(Collobert&Weston,2008;Zhilaetal.,2013),statisticalmachinetranslation(Mikolovetal.,2013b;Zouetal.,2013),imageunderstanding(Fromeetal.,2013)andrelationalextraction(Socheretal.,2013a).2.2.ParagraphVector:AdistributedmemorymodelOurapproachforlearningparagraphvectorsisinspiredbythemethodsforlearningthewordvectors.Theinspirationisthatthewordvectorsareaskedtocontributetoapredic-tiontaskaboutthenextwordinthesentence.Sodespitethefactthatthewordvectorsareinitializedrandomly,theycaneventuallycapturesemanticsasanindirectresultofthepredictiontask.Wewillusethisideainourparagraphvec-torsinasimilarmanner.Theparagraphvectorsarealsoaskedtocontributetothepredictiontaskofthenextwordgivenmanycontextssampledfromtheparagraph.InourParagraphVectorframework(seeFigure2),everyparagraphismappedtoauniquevector,representedbyacolumninmatrixDandeverywordisalsomappedtoauniquevector,representedbyacolumninmatrixW.Theparagraphvectorandwordvectorsareaveragedorconcate-natedtopredictthenextwordinacontext.Intheexperi-ments,weuseconcatenationasthemethodtocombinethevectors.Moreformally,theonlychangeinthismodelcomparedtothewordvectorframeworkisinequation1,wherehisconstructedfromWandD.Theparagraphtokencanbethoughtofasanotherword.Itactsasamemorythatrememberswhatismissingfromthecurrentcontext–orthetopicoftheparagraph.Forthisreason,weoftencallthismodeltheDistributedMemoryModelofParagraphVectors(PV-DM).Thecontextsarexed-lengthandsampledfromaslidingwindowovertheparagraph.Theparagraphvectorissharedacrossallcontextsgeneratedfromthesameparagraphbutnotacrossparagraphs.ThewordvectormatrixW,how-ever,issharedacrossparagraphs.I.e.,thevectorfor“pow-erful”isthesameforallparagraphs.Theparagraphvectorsandwordvectorsaretrainedusingstochasticgradientdescentandthegradientisobtainedviabackpropagation.Ateverystepofstochasticgradientde-scent,onecansampleaxed-lengthcontextfromarandomparagraph,computetheerrorgradientfromthenetworkinFigure2andusethegradienttoupdatetheparametersinourmodel.Atpredictiontime,oneneedstoperformaninferencesteptocomputetheparagraphvectorforanewparagraph.Thisisalsoobtainedbygradientdescent.Inthisstep,thepa-rametersfortherestofthemodel,thewordvectorsWandthesoftmaxweights,arexed.SupposethatthereareNparagraphsinthecorpus,Mwordsinthevocabulary,andwewanttolearnparagraphvectorssuchthateachparagraphismappedtopdimen-sionsandeachwordismappedtoqdimensions,thenthemodelhasthetotalofNp+Mqparameters(ex-cludingthesoftmaxparameters).EventhoughthenumberofparameterscanbelargewhenNislarge,theupdatesduringtrainingaretypicallysparseandthusefcient. Figure2.Aframeworkforlearningparagraphvector.Thisframe-workissimilartotheframeworkpresentedinFigure1;theonlychangeistheadditionalparagraphtokenthatismappedtoavec-torviamatrixD.Inthismodel,theconcatenationoraverageofthisvectorwithacontextofthreewordsisusedtopredictthefourthword.Theparagraphvectorrepresentsthemissinginfor-mationfromthecurrentcontextandcanactasamemoryofthetopicoftheparagraph.Afterbeingtrained,theparagraphvectorscanbeusedasfeaturesfortheparagraph(e.g.,inlieuoforinadditiontobag-of-words).Wecanfeedthesefeaturesdirectlytoconventionalmachinelearningtechniquessuchaslogisticregression,supportvectormachinesorK-means.Insummary,thealgorithmitselfhastwokeystages:1)trainingtogetwordvectorsW,softmaxweightsU;bandparagraphvectorsDonalreadyseenparagraphs;and2)“theinferencestage”togetparagraphvectorsDfornewparagraphs(neverseenbefore)byaddingmorecolumnsinDandgradientdescendingonDwhileholdingW;U;bxed.WeuseDtomakeapredictionaboutsomeparticularlabelsusingastandardclassier,e.g.,logisticregression. DistributedRepresentationsofSentencesandDocuments TasksandBaselines:In(Socheretal.,2013b),theau-thorsproposetwowaysofbenchmarking.First,onecouldconsidera5-wayne-grainedclassicationtaskwherethelabelsarefVeryNegative,Negative,Neutral,Posi-tive,VeryPositivegora2-waycoarse-grainedclassica-tiontaskwherethelabelsarefNegative,Positiveg.Theotheraxisofvariationisintermsofwhetherweshouldla-beltheentiresentenceorallphrasesinthesentence.Inthisworkweonlyconsiderlabelingthefullsentences.Socheretal.(Socheretal.,2013b)applyseveralmethodstothisdatasetandndthattheirRecursiveNeuralTensorNetworkworksmuchbetterthanbag-of-wordsmodel.Itcanbearguedthatthisisbecausemoviereviewsareoftenshortandcompositionalityplaysanimportantroleinde-cidingwhetherthereviewispositiveornegative,aswellassimilaritybetweenwordsdoesgiventherathertinysizeofthetrainingset.Experimentalprotocols:Wefollowtheexperimentalprotocolsasdescribedin(Socheretal.,2013b).Tomakeuseoftheavailablelabeleddata,inourmodel,eachsub-phraseistreatedasanindependentsentenceandwelearntherepresentationsforallthesubphrasesinthetrainingset.Afterlearningthevectorrepresentationsfortrainingsen-tencesandtheirsubphrases,wefeedthemtoalogisticre-gressiontolearnapredictorofthemovierating.Attesttime,wefreezethevectorrepresentationforeachword,andlearntherepresentationsforthesentencesusinggradientdescent.Oncethevectorrepresentationsforthetestsentencesarelearned,wefeedthemthroughthelogis-ticregressiontopredictthemovierating.Inourexperiments,wecrossvalidatethewindowsizeus-ingthevalidationset,andtheoptimalwindowsizeis8.Thevectorpresentedtotheclassierisaconcatenationoftwovectors,onefromPV-DBOWandonefromPV-DM.InPV-DBOW,thelearnedvectorrepresentationshave400dimensions.InPV-DM,thelearnedvectorrepresentationshave400dimensionsforbothwordsandparagraphs.Topredictthe8-thword,weconcatenatetheparagraphvec-torsand7wordvectors.Specialcharacterssuchas,.!?aretreatedasanormalword.Iftheparagraphhaslessthan9words,wepre-padwithaspecialNULLwordsymbol.Results:WereporttheerrorratesofdifferentmethodsinTable1.ThersthighlightforthisTableisthatbag-of-wordsorbag-of-n-gramsmodels(NB,SVM,BiNB)per-formpoorly.Simplyaveragingthewordvectors(inabag-of-wordsfashion)doesnotimprovetheresults.Thisisbecausebag-of-wordsmodelsdonotconsiderhoweachsentenceiscomposed(e.g.,wordordering)andthereforefailtorecognizemanysophisticatedlinguisticphenom-ena,forinstancesarcasm.TheresultsalsoshowthatTable1.Theperformanceofourmethodcomparedtootherap-proachesontheStanfordSentimentTreebankdataset.Theerrorratesofothermethodsarereportedin(Socheretal.,2013b). Model Errorrate Errorrate (Positive/ (Fine- Negative) grained) Na¨veBayes 18.2% 59.0% (Socheretal.,2013b) SVMs(Socheretal.,2013b) 20.6% 59.3% BigramNa¨veBayes 16.9% 58.1% (Socheretal.,2013b) WordVectorAveraging 19.9% 67.3% (Socheretal.,2013b) RecursiveNeuralNetwork 17.6% 56.8% (Socheretal.,2013b) MatrixVector-RNN 17.1% 55.6% (Socheretal.,2013b) RecursiveNeuralTensorNetwork 14.6% 54.3% (Socheretal.,2013b) ParagraphVector 12.2% 51.3% moreadvancedmethods(suchasRecursiveNeuralNet-work(Socheretal.,2013b)),whichrequireparsingandtakeintoaccountthecompositionality,performmuchbet-ter.Ourmethodperformsbetterthanallthesebaselines,e.g.,recursivenetworks,despitethefactthatitdoesnotre-quireparsing.Onthecoarse-grainedclassicationtask,ourmethodhasanabsoluteimprovementof2.4%intermsoferrorrates.Thistranslatesto16%relativeimprovement.3.2.BeyondOneSentence:SentimentAnalysiswithIMDBdatasetSomeoftheprevioustechniquesonlyworkonsentences,butnotparagraphs/documentswithseveralsentences.Forinstance,RecursiveNeuralTensorNetwork(Socheretal.,2013b)isbasedontheparsingovereachsentenceanditisunclearhowtocombinetherepresentationsovermanysentences.Suchtechniquesthereforearerestrictedtoworkonsentencesbutnotparagraphsordocuments.Ourmethoddoesnotrequireparsing,thusitcanproducearepresentationforalongdocumentconsistingofmanysentences.Thisadvantagemakesourmethodmoregeneralthansomeoftheotherapproaches.Thefollowingexperi-mentonIMDBdatasetdemonstratesthisadvantage.Dataset:TheIMDBdatasetwasrstproposedbyMaasetal.(Maasetal.,2011)asabenchmarkforsentimentanal-ysis.Thedatasetconsistsof100,000moviereviewstakenfromIMDB.Onekeyaspectofthisdatasetisthateachmoviereviewhasseveralsentences.The100,000moviereviewsaredividedintothreedatasets: DistributedRepresentationsofSentencesandDocuments Paragraph2:doyouwanttondoutwhocalledyoufrom+1000-000-0000,+10000000000or(000)000-0000?seereportsandshareinformationyouhaveaboutthiscallerParagraph3:allinahealthclinicpatientsforyourconvenience,youcanpayyourallinahealthclinicbillonline.payyourclinicbillnow,questionandanswers...Thetripletsaresplitintothreesets:80%fortraining,10%forvalidation,and10%fortesting.Anymethodthatre-quireslearningwillbetrainedonthetrainingset,whileitshyperparameterswillbeselectedonthevalidationset.Webenchmarkfourmethodstocomputefeaturesforpara-graphs:bag-of-words,bag-of-bigrams,averagingwordvectorsandParagraphVector.Toimprovebag-of-bigrams,wealsolearnaweightingmatrixsuchthatthedistancebe-tweenthersttwoparagraphsisminimizedwhereasthedistancebetweentherstandthethirdparagraphismax-imized(theweightingfactorbetweenthetwolossesisahyperparameter).Werecordthenumberoftimeswheneachmethodproducessmallerdistanceforthersttwoparagraphsthantherstandthethirdparagraph.Anerrorismadeifamethoddoesnotproducethatdesirabledistancemetriconatripletofparagraphs.TheresultsofParagraphVectorandotherbaselinesarere-portedinTable3.Inthistask,wendthatTF-IDFweight-ingperformsbetterthanrawcounts,andthereforeweonlyreporttheresultsofmethodswithTF-IDFweighting.TheresultsshowthatParagraphVectorworkswellandgivesa32%relativeimprovementintermsoferrorrate.Thefactthattheparagraphvectormethodsignicantlyout-performsbagofwordsandbigramssuggeststhatourpro-posedmethodisusefulforcapturingthesemanticsoftheinputtext.Table3.TheperformanceofParagraphVectorandbag-of-wordsmodelsontheinformationretrievaltask.“WeightedBag-of-bigrams”isthemethodwherewelearnalinearmatrixWonTF-IDFbigramfeaturesthatmaximizesthedistancebetweentherstandthethirdparagraphandminimizesthedistancebetweentherstandthesecondparagraph. Model Errorrate VectorAveraging 10.25% Bag-of-words 8.10% Bag-of-bigrams 7.28% WeightedBag-of-bigrams 5.67% ParagraphVector 3.82% 3.4.SomefurtherobservationsWeperformfurtherexperimentstounderstandvariousas-pectsofthemodels.Here'ssomeobservationsPV-DMisconsistentlybetterthanPV-DBOW.PV-DMalonecanachieveresultsclosetomanyresultsinthispaper(seeTable2).Forexample,inIMDB,PV-DMonlyachieves7.63%.ThecombinationofPV-DMandPV-DBOWoftenworkconsistentlybet-ter(7.42%inIMDB)andthereforerecommended.UsingconcatenationinPV-DMisoftenbetterthansum.InIMDB,PV-DMwithsumcanonlyachieve8.06%.Perhaps,thisisbecausethemodellosestheorderinginformation.It'sbettertocrossvalidatethewindowsize.Agoodguessofwindowsizeinmanyapplicationsisbetween5and12.InIMDB,varyingthewindowsizesbetween5and12causestheerrorratetouctuate0.7%.ParagraphVectorcanbeexpensive,butitcanbedoneinparallelattesttime.Onaverage,ourimplementa-tiontakes30minutestocomputetheparagraphvec-torsoftheIMDBtestset,usinga16coremachine(25,000documents,eachdocumentonaveragehas230words).4.RelatedWorkDistributedrepresentationsforwordswererstproposedin(Rumelhartetal.,1986)andhavebecomeasuccessfulparadigm,especiallyforstatisticallanguagemodeling(El-man,1990;Bengioetal.,2006;Mikolov,2012).Wordvec-torshavebeenusedinNLPapplicationssuchaswordrep-resentation,namedentityrecognition,wordsensedisam-biguation,parsing,taggingandmachinetranslation(Col-lobert&Weston,2008;Turney&Pantel,2010;Turianetal.,2010;Collobertetal.,2011;Socheretal.,2011b;Huangetal.,2012;Zouetal.,2013).Representingphrasesisarecenttrendandreceivedmuchattention(Mitchell&Lapata,2010;Zanzottoetal.,2010;Yessenalina&Cardie,2011;Grefenstetteetal.,2013;Mikolovetal.,2013c).Inthisdirection,autoencoder-stylemodelshavealsobeenusedtomodelparagraphs(Maasetal.,2011;Larochelle&Lauly,2012;Srivastavaetal.,2013).DistributedrepresentationsofphrasesandsentencesarealsothefocusofSocheretal.(Socheretal.,2011a;c;2013b).Theirmethodstypicallyrequireparsingandisshowntoworkforsentence-levelrepresentations.Anditisnotobvioushowtoextendtheirmethodsbeyondsinglesentences.Theirmethodsarealsosupervisedandthusre-quiremorelabeleddatatoworkwell.ParagraphVector, DistributedRepresentationsofSentencesandDocuments Mikolov,Tomas,Yih,ScottWen-tau,andZweig,Geoffrey.Linguisticregularitiesincontinuousspacewordrepre-sentations.InNAACLHLT,2013d.Mitchell,JeffandLapata,Mirella.Compositionindistri-butionalmodelsofsemantics.CognitiveScience,2010.Mnih,AndriyandHinton,GeoffreyE.Ascalablehi-erarchicaldistributedlanguagemodel.InAdvancesinNeuralInformationProcessingSystems,pp.1081–1088,2008.Morin,FredericandBengio,Yoshua.Hierarchicalproba-bilisticneuralnetworklanguagemodel.InProceedingsoftheInternationalWorkshoponArticialIntelligenceandStatistics,pp.246–252,2005.Pang,BoandLee,Lillian.Seeingstars:Exploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscales.InProceedingsofAssociationforCompu-tationalLinguistics,pp.115–124,2005.Perronnin,FlorentandDance,Christopher.Fisherkernelsonvisualvocabulariesforimagecategorization.InIEEEConferenceonComputerVisionandPatternRecogni-tion,2007.Perronnin,Florent,Liu,Yan,Sanchez,Jorge,andPoirier,Herve.Large-scaleimageretrievalwithcompressedshervectors.InIEEEConferenceonComputerVisionandPatternRecognition,2010.Rumelhart,DavidE,Hinton,GeoffreyE,andWilliams,RonaldJ.Learningrepresentationsbyback-propagatingerrors.Nature,323(6088):533–536,1986.Socher,Richard,Huang,EricH.,Pennington,Jeffrey,Manning,ChrisD.,andNg,AndrewY.Dynamicpool-ingandunfoldingrecursiveautoencodersforparaphrasedetection.InAdvancesinNeuralInformationProcess-ingSystems,2011a.Socher,Richard,Lin,CliffC,Ng,Andrew,andManning,Chris.Parsingnaturalscenesandnaturallanguagewithrecursiveneuralnetworks.InProceedingsofthe28thInternationalConferenceonMachineLearning(ICML-11),pp.129–136,2011b.Socher,Richard,Pennington,Jeffrey,Huang,EricH,Ng,AndrewY,andManning,ChristopherD.Semi-supervisedrecursiveautoencodersforpredictingsenti-mentdistributions.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2011c.Socher,Richard,Chen,Danqi,Manning,ChristopherD.,andNg,AndrewY.Reasoningwithneuraltensornet-worksforknowledgebasecompletion.InAdvancesinNeuralInformationProcessingSystems,2013a.Socher,Richard,Perelygin,Alex,Wu,JeanY.,Chuang,Ja-son,Manning,ChristopherD.,Ng,AndrewY.,andPotts,Christopher.Recursivedeepmodelsforsemanticcom-positionalityoverasentimenttreebank.InConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2013b.Srivastava,Nitish,Salakhutdinov,Ruslan,andHinton,Ge-offrey.Modelingdocumentswithdeepboltzmannma-chines.InUncertaintyinArticialIntelligence,2013.Turian,Joseph,Ratinov,Lev,andBengio,Yoshua.Wordrepresentations:asimpleandgeneralmethodforsemi-supervisedlearning.InProceedingsofthe48thAnnualMeetingoftheAssociationforComputationalLinguis-tics,pp.384–394.AssociationforComputationalLin-guistics,2010.Turney,PeterD.andPantel,Patrick.Fromfrequencytomeaning:Vectorspacemodelsofsemantics.JournalofArticialIntelligenceResearch,2010.Wang,SidaandManning,ChrisD.Baselinesandbigrams:Simple,goodsentimentandtextclassication.InPro-ceedingsofthe50thAnnualMeetingoftheAssociationforComputationalLinguistics,2012.Yessenalina,AinurandCardie,Claire.Compositionalmatrix-spacemodelsforsentimentanalysis.InConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing,2011.Zanzotto,Fabio,Korkontzelos,Ioannis,Fallucchi,Francesca,andManandhar,Suresh.Estimatinglinearmodelsforcompositionaldistributionalsemantics.InCOLING,2010.Zhila,A.,Yih,W.T.,Meek,C.,Zweig,G.,andMikolov,T.Combiningheterogeneousmodelsformeasuringre-lationalsimilarity.InNAACLHLT,2013.Zou,Will,Socher,Richard,Cer,Daniel,andManning,Christopher.Bilingualwordembeddingsforphrase-basedmachinetranslation.InConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2013.

Related Contents


Next Show more