133K - views

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard Socher Alex Perelygin Jean Y

Wu Jason Chuang Christopher D Manning Andrew Y Ng and Christopher Potts Stanford University Stanford CA 94305 USA richardsocherorg aperelygjcchuangang csstanfordedu jeaneismanningcgpotts stanfordedu Abstract Semantic word spaces have been very use f

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Recursive Deep Models for Semantic Compo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard Socher Alex Perelygin Jean Y






Presentation on theme: "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard Socher Alex Perelygin Jean Y"— Presentation transcript:

workonRNNs,severalcompositionalityideasre-latedtoneuralnetworkshavebeendiscussedbyBot-tou(2011)andHinton(1990)andrstmodelssuchasRecursiveAuto-associativememoriesbeenexper-imentedwithbyPollack(1990).Theideatorelateinputsthroughthreewayinteractions,parameterizedbyatensorhavebeenproposedforrelationclassi-cation(Sutskeveretal.,2009;Jenattonetal.,2012),extendingRestrictedBoltzmannmachines(RanzatoandHinton,2010)andasaspeciallayerforspeechrecognition(Yuetal.,2012).SentimentAnalysis.Apartfromtheabove-mentionedwork,mostapproachesinsentimentanal-ysisusebagofwordsrepresentations(PangandLee,2008).SnyderandBarzilay(2007)analyzedlargerreviewsinmoredetailbyanalyzingthesentimentofmultipleaspectsofrestaurants,suchasfoodoratmosphere.Severalworkshaveexploredsentimentcompositionalitythroughcarefulengineeringoffea-turesorpolarityshiftingrulesonsyntacticstructures(PolanyiandZaenen,2006;MoilanenandPulman,2007;Rentoumietal.,2010;Nakagawaetal.,2010).3StanfordSentimentTreebankBagofwordsclassierscanworkwellinlongerdocumentsbyrelyingonafewwordswithstrongsentimentlike`awesome'or`exhilarating.'How-ever,sentimentaccuraciesevenforbinaryposi-tive/negativeclassicationforsinglesentenceshasnotexceeded80%forseveralyears.Forthemoredifcultmulticlasscaseincludinganeutralclass,accuracyisoftenbelow60%forshortmessagesonTwitter(Wangetal.,2012).Fromalinguisticorcognitivestandpoint,ignoringwordorderinthetreatmentofasemantictaskisnotplausible,and,aswewillshow,itcannotaccuratelyclassifyhardex-amplesofnegation.Correctlypredictingthesehardcasesisnecessarytofurtherimproveperformance.InthissectionwewillintroduceandprovidesomeanalysesforthenewSentimentTreebankwhichin-cludeslabelsforeverysyntacticallyplausiblephraseinthousandsofsentences,allowingustotrainandevaluatecompositionalmodels.Weconsiderthecorpusofmoviereviewexcerptsfromtherottentomatoes.comwebsiteorig-inallycollectedandpublishedbyPangandLee(2005).Theoriginaldatasetincludes10,662sen- Figure3:Thelabelinginterface.Randomphraseswereshownandannotatorshadasliderforselectingthesenti-mentanditsdegree.tences,halfofwhichwereconsideredpositiveandtheotherhalfnegative.Eachlabelisextractedfromalongermoviereviewandreectsthewriter'sover-allintentionforthisreview.Thenormalized,lower-casedtextisrstusedtorecover,fromtheorigi-nalwebsite,thetextwithcapitalization.RemainingHTMLtagsandsentencesthatarenotinEnglisharedeleted.TheStanfordParser(KleinandMan-ning,2003)isusedtoparsesall10,662sentences.Inapproximately1,100casesitsplitsthesnippetintomultiplesentences.WethenusedAmazonMe-chanicalTurktolabeltheresulting215,154phrases.Fig.3showstheinterfaceannotatorssaw.Thesliderhas25differentvaluesandisinitiallysettoneutral.Thephrasesineachhitarerandomlysampledfromthesetofallphrasesinordertopreventlabelsbeinginuencedbywhatfollows.Formoredetailsonthedatasetcollection,seesupplementarymaterial.Fig.2showsthenormalizedlabeldistributionsateachn-gramlength.Startingatlength20,thema-jorityarefullsentences.Oneofthendingsfromlabelingsentencesbasedonreader'sperceptionisthatmanyofthemcouldbeconsideredneutral.Wealsonoticethatstrongersentimentoftenbuildsupinlongerphrasesandthemajorityoftheshorterphrasesareneutral.Anotherobservationisthatmostannotatorsmovedtheslidertooneofthevepo-sitions:negative,somewhatnegative,neutral,posi-tiveorsomewhatpositive.Theextremevalueswererarelyusedandthesliderwasnotoftenleftinbe-tweentheticks.Hence,evena5-classclassicationintothesecategoriescapturesthemainvariabilityofthelabels.Wewillnamethisne-grainedsenti-mentclassicationandourmainexperimentwillbetorecoverthesevelabelsforphrasesofalllengths. (d) (d) Distributions of sentiment values for (a) unigrams, (b) 10-grams, (c) 20-grams, and (d) full sentences. (a) (a) (b) (b) (c) (c) Figure5:AsinglelayeroftheRecursiveNeuralTen-sorNetwork.Eachdashedboxrepresentsoneofd-manyslicesandcancaptureatypeofinuenceachildcanhaveonitsparent.TheRNTNusesthisdenitionforcomputingp1:p1=f bcTV[1:d]bc+Wbc!;whereWisasdenedinthepreviousmodels.Thenextparentvectorp2inthetri-gramwillbecom-putedwiththesameweights:p2=f ap1TV[1:d]ap1+Wap1!:ThemainadvantageoverthepreviousRNNmodel,whichisaspecialcaseoftheRNTNwhenVissetto0,isthatthetensorcandirectlyrelatein-putvectors.Intuitively,wecaninterpreteachsliceofthetensorascapturingaspecictypeofcompo-sition.AnalternativetoRNTNswouldbetomakethecompositionalfunctionmorepowerfulbyaddingasecondneuralnetworklayer.However,initialexper-imentsshowedthatitishardtooptimizethismodelandvectorinteractionsarestillmoreimplicitthanintheRNTN.4.4TensorBackpropthroughStructureWedescribeinthissectionhowtotraintheRNTNmodel.Asmentionedabove,eachnodehasasoftmaxclassiertrainedonitsvectorrepresenta-tiontopredictagivengroundtruthortargetvectort.Weassumethetargetdistributionvectorateachnodehasa0-1encoding.IfthereareCclasses,thenithaslengthCanda1atthecorrectlabel.Allotherentriesare0.Wewanttomaximizetheprobabilityofthecor-rectprediction,orminimizethecross-entropyerrorbetweenthepredicteddistributionyi2RC1atnodeiandthetargetdistributionti2RC1atthatnode.Thisisequivalent(uptoaconstant)tomini-mizingtheKL-divergencebetweenthetwodistribu-tions.TheerrorasafunctionoftheRNTNparame-ters=(V;W;Ws;L)forasentenceis:E()=XiXjtijlogyij+kk2(2)Thederivativefortheweightsofthesoftmaxclas-sierarestandardandsimplysumupfromeachnode'serror.Wedenexitobethevectoratnodei(intheexampletrigram,thexi2Rd1'sare(a;b;c;p1;p2)).WeskipthestandardderivativeforWs.EachnodebackpropagatesitserrorthroughtotherecursivelyusedweightsV;W.Leti;s2Rd1bethesoftmaxerrorvectoratnodei:i;s=�WTs(yi�ti) f0(xi);where istheHadamardproductbetweenthetwovectorsandf0istheelement-wisederivativeoffwhichinthestandardcaseofusingf=tanhcanbecomputedusingonlyf(xi).Theremainingderivativescanonlybecomputedinatop-downfashionfromthetopnodethroughthetreeandintotheleafnodes.ThefullderivativeforVandWisthesumofthederivativesateachofthenodes.Wedenethecompleteincomingerrormessagesforanodeiasi;com.Thetopnode,inourcasep2,onlyreceivederrorsfromthetopnode'ssoftmax.Hence,p2;com=p2;swhichwecanusetoobtainthestandardbackpropderivativeforW(GollerandK¨uchler,1996;Socheretal.,2010).Forthederivativeofeachslicek=1;:::;d,weget:@Ep2 @V[k]=p2;comkap1ap1T;wherep2;comkisjustthek'thelementofthisvector.Now,wecancomputetheerrormessageforthetwo Figure6:Accuracycurvesfornegrainedsentimentclassicationateachn-gramlengths.Left:Accuracyseparatelyforeachsetofn-grams.Right:Cumulativeaccuracyofalln-grams.5.2FullSentenceBinarySentimentThissetupiscomparabletopreviousworkontheoriginalrottentomatoesdatasetwhichonlyusedfullsentencelabelsandbinaryclassicationofpos-itive/negative.Hence,theseexperimentsshowtheimprovementevenbaselinemethodscanachievewiththesentimenttreebank.Table1showsresultsofthisbinaryclassicationforbothallphrasesandforonlyfullsentences.Thepreviousstateoftheartwasbelow80%(Socheretal.,2012).Withthecoarsebagofwordsannotationfortraining,manyofthemorecomplexphenomenacouldnotbecaptured,evenbymorepowerfulmodels.ThecombinationofthenewsentimenttreebankandtheRNTNpushesthestateoftheartonshortphrasesupto85.4%.5.3ModelAnalysis:ContrastiveConjunctionInthissection,weuseasubsetofthetestsetwhichincludesonlysentenceswithan`XbutY'structure:AphraseXbeingfollowedbybutwhichisfollowedbyaphraseY.Theconjunctionisinterpretedasanargumentforthesecondconjunct,withtherstfunctioningconcessively(Lakoff,1971;Blakemore,1989;Merin,1999).Fig.7containsanexample.Weanalyzeastrictsetting,whereXandYarephrasesofdifferentsentiment(includingneutral).Theex-ampleiscountedascorrect,iftheclassicationsforbothphrasesXandYarecorrect.Furthermore,thelowestnodethatdominatesbothofthewordbutandthenodethatspansYalsohavetohavethesamecorrectsentiment.Fortheresulting131cases,theRNTNobtainsanaccuracyof41%comparedtoMV-RNN(37),RNN(36)andbiNB(27).5.4ModelAnalysis:HighLevelNegationWeinvestigatetwotypesofnegation.Foreachtype,weuseaseparatedatasetforevaluation. Figure7:ExampleofcorrectpredictionforcontrastiveconjunctionXbutY.Set1:NegatingPositiveSentences.Therstsetcontainspositivesentencesandtheirnegation.Inthisset,thenegationchangestheoverallsentimentofasentencefrompositivetonegative.Hence,wecomputeaccuracyintermsofcorrectsentimentre-versalfrompositivetonegative.Fig.9showstwoexamplesofpositivenegationtheRNTNcorrectlyclassied,evenifnegationislessobviousinthecaseof`least'.Table2(left)givestheaccuraciesover21positivesentencesandtheirnegationforallmodels.TheRNTNhasthehighestreversalaccuracy,show-ingitsabilitytostructurallylearnnegationofposi-tivesentences.Butwhatifthemodelsimplymakesphrasesverynegativewhennegationisinthesen-tence?Thenextexperimentsshowthatthemodelcapturesmorethansuchasimplisticnegationrule.Set2:NegatingNegativeSentences.Thesec-ondsetcontainsnegativesentencesandtheirnega-tion.Whennegativesentencesarenegated,thesen-timenttreebankshowsthatoverallsentimentshouldbecomelessnegative,butnotnecessarilypositive.Forinstance,`Themoviewasterrible'isnegativebutthe`Themoviewasnotterrible'saysonlythatitwaslessbadthanaterribleone,notthatitwasgood(Horn,1989;Israel,2001).Hence,weevaluateac- Figure9:RNTNpredictionofpositiveandnegative(bottomright)sentencesandtheirnegation. ModelAccuracy NegatedPositiveNegatedNegative biNB19.027.3RNN33.345.5MV-RNN52.454.6RNTN71.481.8 Table2:Accuracyofnegationdetection.Negatedposi-tiveismeasuredascorrectsentimentinversions.Negatednegativeismeasuredasincreasesinpositiveactivations.curacyintermsofhowofteneachmodelwasabletoincreasenon-negativeactivationinthesentimentofthesentence.Table2(right)showstheaccuracy.Inover81%ofcases,theRNTNcorrectlyincreasesthepositiveactivations.Fig.9(bottomright)showsatypicalcaseinwhichsentimentwasmademorepositivebyswitchingthemainclassfromnegativetoneutraleventhoughbothnotanddullwerenega-tive.Fig.8showsthechangesinactivationforbothsets.Negativevaluesindicateadecreaseinaver- Figure8:Changeinactivationsfornegations.OnlytheRNTNcorrectlycapturesbothtypes.Itdecreasespositivesentimentmorewhenitisnegatedandlearnsthatnegat-ingnegativephrases(suchasnotterrible)shouldincreaseneutralandpositiveactivations.agepositiveactivation(forset1)andpositivevaluesmeananincreaseinaveragepositiveactivation(set2).TheRNTNhasthelargestshiftsinthecorrectdi-rections.ThereforewecanconcludethattheRNTNisbestabletoidentifytheeffectofnegationsuponbothpositiveandnegativesentimentsentences. R.Socher,C.Lin,A.Y.Ng,andC.D.Manning.2011a.ParsingNaturalScenesandNaturalLanguagewithRecursiveNeuralNetworks.InICML.R.Socher,J.Pennington,E.H.Huang,A.Y.Ng,andC.D.Manning.2011b.Semi-SupervisedRecursiveAutoencodersforPredictingSentimentDistributions.InEMNLP.R.Socher,B.Huval,C.D.Manning,andA.Y.Ng.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InEMNLP.I.Sutskever,R.Salakhutdinov,andJ.B.Tenenbaum.2009.ModellingrelationaldatausingBayesianclus-teredtensorfactorization.InNIPS.P.D.TurneyandP.Pantel.2010.Fromfrequencytomeaning:Vectorspacemodelsofsemantics.JournalofArticialIntelligenceResearch,37:141–188.H.Wang,D.Can,A.Kazemzadeh,F.Bar,andS.Narayanan.2012.Asystemforreal-timetwit-tersentimentanalysisof2012u.s.presidentialelec-tioncycle.InProceedingsoftheACL2012SystemDemonstrations.D.Widdows.2008.Semanticvectorproducts:Someini-tialinvestigations.InProceedingsoftheSecondAAAISymposiumonQuantumInteraction.A.YessenalinaandC.Cardie.2011.Composi-tionalmatrix-spacemodelsforsentimentanalysis.InEMNLP.D.Yu,L.Deng,andF.Seide.2012.Largevocabularyspeechrecognitionusingdeeptensorneuralnetworks.InINTERSPEECH.F.M.Zanzotto,I.Korkontzelos,F.Fallucchi,andS.Man-andhar.2010.Estimatinglinearmodelsforcomposi-tionaldistributionalsemantics.InCOLING.L.ZettlemoyerandM.Collins.2005.Learningtomapsentencestologicalform:Structuredclassica-tionwithprobabilisticcategorialgrammars.InUAI.