Wu Jason Chuang Christopher D Manning Andrew Y Ng and Christopher Potts Stanford University Stanford CA 94305 USA richardsocherorg aperelygjcchuangang csstanfordedu jeaneismanningcgpotts stanfordedu Abstract Semantic word spaces have been very use f ID: 11293
Download Pdf The PPT/PDF document "Recursive Deep Models for Semantic Compo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
workonRNNs,severalcompositionalityideasre-latedtoneuralnetworkshavebeendiscussedbyBot-tou(2011)andHinton(1990)andrstmodelssuchasRecursiveAuto-associativememoriesbeenexper-imentedwithbyPollack(1990).Theideatorelateinputsthroughthreewayinteractions,parameterizedbyatensorhavebeenproposedforrelationclassi-cation(Sutskeveretal.,2009;Jenattonetal.,2012),extendingRestrictedBoltzmannmachines(RanzatoandHinton,2010)andasaspeciallayerforspeechrecognition(Yuetal.,2012).SentimentAnalysis.Apartfromtheabove-mentionedwork,mostapproachesinsentimentanal-ysisusebagofwordsrepresentations(PangandLee,2008).SnyderandBarzilay(2007)analyzedlargerreviewsinmoredetailbyanalyzingthesentimentofmultipleaspectsofrestaurants,suchasfoodoratmosphere.Severalworkshaveexploredsentimentcompositionalitythroughcarefulengineeringoffea-turesorpolarityshiftingrulesonsyntacticstructures(PolanyiandZaenen,2006;MoilanenandPulman,2007;Rentoumietal.,2010;Nakagawaetal.,2010).3StanfordSentimentTreebankBagofwordsclassierscanworkwellinlongerdocumentsbyrelyingonafewwordswithstrongsentimentlike`awesome'or`exhilarating.'How-ever,sentimentaccuraciesevenforbinaryposi-tive/negativeclassicationforsinglesentenceshasnotexceeded80%forseveralyears.Forthemoredifcultmulticlasscaseincludinganeutralclass,accuracyisoftenbelow60%forshortmessagesonTwitter(Wangetal.,2012).Fromalinguisticorcognitivestandpoint,ignoringwordorderinthetreatmentofasemantictaskisnotplausible,and,aswewillshow,itcannotaccuratelyclassifyhardex-amplesofnegation.Correctlypredictingthesehardcasesisnecessarytofurtherimproveperformance.InthissectionwewillintroduceandprovidesomeanalysesforthenewSentimentTreebankwhichin-cludeslabelsforeverysyntacticallyplausiblephraseinthousandsofsentences,allowingustotrainandevaluatecompositionalmodels.Weconsiderthecorpusofmoviereviewexcerptsfromtherottentomatoes.comwebsiteorig-inallycollectedandpublishedbyPangandLee(2005).Theoriginaldatasetincludes10,662sen- Figure3:Thelabelinginterface.Randomphraseswereshownandannotatorshadasliderforselectingthesenti-mentanditsdegree.tences,halfofwhichwereconsideredpositiveandtheotherhalfnegative.Eachlabelisextractedfromalongermoviereviewandreectsthewriter'sover-allintentionforthisreview.Thenormalized,lower-casedtextisrstusedtorecover,fromtheorigi-nalwebsite,thetextwithcapitalization.RemainingHTMLtagsandsentencesthatarenotinEnglisharedeleted.TheStanfordParser(KleinandMan-ning,2003)isusedtoparsesall10,662sentences.Inapproximately1,100casesitsplitsthesnippetintomultiplesentences.WethenusedAmazonMe-chanicalTurktolabeltheresulting215,154phrases.Fig.3showstheinterfaceannotatorssaw.Thesliderhas25differentvaluesandisinitiallysettoneutral.Thephrasesineachhitarerandomlysampledfromthesetofallphrasesinordertopreventlabelsbeinginuencedbywhatfollows.Formoredetailsonthedatasetcollection,seesupplementarymaterial.Fig.2showsthenormalizedlabeldistributionsateachn-gramlength.Startingatlength20,thema-jorityarefullsentences.Oneofthendingsfromlabelingsentencesbasedonreader'sperceptionisthatmanyofthemcouldbeconsideredneutral.Wealsonoticethatstrongersentimentoftenbuildsupinlongerphrasesandthemajorityoftheshorterphrasesareneutral.Anotherobservationisthatmostannotatorsmovedtheslidertooneofthevepo-sitions:negative,somewhatnegative,neutral,posi-tiveorsomewhatpositive.Theextremevalueswererarelyusedandthesliderwasnotoftenleftinbe-tweentheticks.Hence,evena5-classclassicationintothesecategoriescapturesthemainvariabilityofthelabels.Wewillnamethisne-grainedsenti-mentclassicationandourmainexperimentwillbetorecoverthesevelabelsforphrasesofalllengths. (d) (d) Distributions of sentiment values for (a) unigrams, (b) 10-grams, (c) 20-grams, and (d) full sentences. (a) (a) (b) (b) (c) (c) Figure5:AsinglelayeroftheRecursiveNeuralTen-sorNetwork.Eachdashedboxrepresentsoneofd-manyslicesandcancaptureatypeofinuenceachildcanhaveonitsparent.TheRNTNusesthisdenitionforcomputingp1:p1=f bcTV[1:d]bc+Wbc!;whereWisasdenedinthepreviousmodels.Thenextparentvectorp2inthetri-gramwillbecom-putedwiththesameweights:p2=f ap1TV[1:d]ap1+Wap1!:ThemainadvantageoverthepreviousRNNmodel,whichisaspecialcaseoftheRNTNwhenVissetto0,isthatthetensorcandirectlyrelatein-putvectors.Intuitively,wecaninterpreteachsliceofthetensorascapturingaspecictypeofcompo-sition.AnalternativetoRNTNswouldbetomakethecompositionalfunctionmorepowerfulbyaddingasecondneuralnetworklayer.However,initialexper-imentsshowedthatitishardtooptimizethismodelandvectorinteractionsarestillmoreimplicitthanintheRNTN.4.4TensorBackpropthroughStructureWedescribeinthissectionhowtotraintheRNTNmodel.Asmentionedabove,eachnodehasasoftmaxclassiertrainedonitsvectorrepresenta-tiontopredictagivengroundtruthortargetvectort.Weassumethetargetdistributionvectorateachnodehasa0-1encoding.IfthereareCclasses,thenithaslengthCanda1atthecorrectlabel.Allotherentriesare0.Wewanttomaximizetheprobabilityofthecor-rectprediction,orminimizethecross-entropyerrorbetweenthepredicteddistributionyi2RC1atnodeiandthetargetdistributionti2RC1atthatnode.Thisisequivalent(uptoaconstant)tomini-mizingtheKL-divergencebetweenthetwodistribu-tions.TheerrorasafunctionoftheRNTNparame-ters=(V;W;Ws;L)forasentenceis:E()=XiXjtijlogyij+kk2(2)Thederivativefortheweightsofthesoftmaxclas-sierarestandardandsimplysumupfromeachnode'serror.Wedenexitobethevectoratnodei(intheexampletrigram,thexi2Rd1'sare(a;b;c;p1;p2)).WeskipthestandardderivativeforWs.EachnodebackpropagatesitserrorthroughtotherecursivelyusedweightsV;W.Leti;s2Rd1bethesoftmaxerrorvectoratnodei:i;s=WTs(yiti) f0(xi);where istheHadamardproductbetweenthetwovectorsandf0istheelement-wisederivativeoffwhichinthestandardcaseofusingf=tanhcanbecomputedusingonlyf(xi).Theremainingderivativescanonlybecomputedinatop-downfashionfromthetopnodethroughthetreeandintotheleafnodes.ThefullderivativeforVandWisthesumofthederivativesateachofthenodes.Wedenethecompleteincomingerrormessagesforanodeiasi;com.Thetopnode,inourcasep2,onlyreceivederrorsfromthetopnode'ssoftmax.Hence,p2;com=p2;swhichwecanusetoobtainthestandardbackpropderivativeforW(GollerandK¨uchler,1996;Socheretal.,2010).Forthederivativeofeachslicek=1;:::;d,weget:@Ep2 @V[k]=p2;comkap1ap1T;wherep2;comkisjustthek'thelementofthisvector.Now,wecancomputetheerrormessageforthetwo Figure6:Accuracycurvesfornegrainedsentimentclassicationateachn-gramlengths.Left:Accuracyseparatelyforeachsetofn-grams.Right:Cumulativeaccuracyofalln-grams.5.2FullSentenceBinarySentimentThissetupiscomparabletopreviousworkontheoriginalrottentomatoesdatasetwhichonlyusedfullsentencelabelsandbinaryclassicationofpos-itive/negative.Hence,theseexperimentsshowtheimprovementevenbaselinemethodscanachievewiththesentimenttreebank.Table1showsresultsofthisbinaryclassicationforbothallphrasesandforonlyfullsentences.Thepreviousstateoftheartwasbelow80%(Socheretal.,2012).Withthecoarsebagofwordsannotationfortraining,manyofthemorecomplexphenomenacouldnotbecaptured,evenbymorepowerfulmodels.ThecombinationofthenewsentimenttreebankandtheRNTNpushesthestateoftheartonshortphrasesupto85.4%.5.3ModelAnalysis:ContrastiveConjunctionInthissection,weuseasubsetofthetestsetwhichincludesonlysentenceswithan`XbutY'structure:AphraseXbeingfollowedbybutwhichisfollowedbyaphraseY.Theconjunctionisinterpretedasanargumentforthesecondconjunct,withtherstfunctioningconcessively(Lakoff,1971;Blakemore,1989;Merin,1999).Fig.7containsanexample.Weanalyzeastrictsetting,whereXandYarephrasesofdifferentsentiment(includingneutral).Theex-ampleiscountedascorrect,iftheclassicationsforbothphrasesXandYarecorrect.Furthermore,thelowestnodethatdominatesbothofthewordbutandthenodethatspansYalsohavetohavethesamecorrectsentiment.Fortheresulting131cases,theRNTNobtainsanaccuracyof41%comparedtoMV-RNN(37),RNN(36)andbiNB(27).5.4ModelAnalysis:HighLevelNegationWeinvestigatetwotypesofnegation.Foreachtype,weuseaseparatedatasetforevaluation. Figure7:ExampleofcorrectpredictionforcontrastiveconjunctionXbutY.Set1:NegatingPositiveSentences.Therstsetcontainspositivesentencesandtheirnegation.Inthisset,thenegationchangestheoverallsentimentofasentencefrompositivetonegative.Hence,wecomputeaccuracyintermsofcorrectsentimentre-versalfrompositivetonegative.Fig.9showstwoexamplesofpositivenegationtheRNTNcorrectlyclassied,evenifnegationislessobviousinthecaseof`least'.Table2(left)givestheaccuraciesover21positivesentencesandtheirnegationforallmodels.TheRNTNhasthehighestreversalaccuracy,show-ingitsabilitytostructurallylearnnegationofposi-tivesentences.Butwhatifthemodelsimplymakesphrasesverynegativewhennegationisinthesen-tence?Thenextexperimentsshowthatthemodelcapturesmorethansuchasimplisticnegationrule.Set2:NegatingNegativeSentences.Thesec-ondsetcontainsnegativesentencesandtheirnega-tion.Whennegativesentencesarenegated,thesen-timenttreebankshowsthatoverallsentimentshouldbecomelessnegative,butnotnecessarilypositive.Forinstance,`Themoviewasterrible'isnegativebutthe`Themoviewasnotterrible'saysonlythatitwaslessbadthanaterribleone,notthatitwasgood(Horn,1989;Israel,2001).Hence,weevaluateac- Figure9:RNTNpredictionofpositiveandnegative(bottomright)sentencesandtheirnegation. ModelAccuracy NegatedPositiveNegatedNegative biNB19.027.3RNN33.345.5MV-RNN52.454.6RNTN71.481.8 Table2:Accuracyofnegationdetection.Negatedposi-tiveismeasuredascorrectsentimentinversions.Negatednegativeismeasuredasincreasesinpositiveactivations.curacyintermsofhowofteneachmodelwasabletoincreasenon-negativeactivationinthesentimentofthesentence.Table2(right)showstheaccuracy.Inover81%ofcases,theRNTNcorrectlyincreasesthepositiveactivations.Fig.9(bottomright)showsatypicalcaseinwhichsentimentwasmademorepositivebyswitchingthemainclassfromnegativetoneutraleventhoughbothnotanddullwerenega-tive.Fig.8showsthechangesinactivationforbothsets.Negativevaluesindicateadecreaseinaver- Figure8:Changeinactivationsfornegations.OnlytheRNTNcorrectlycapturesbothtypes.Itdecreasespositivesentimentmorewhenitisnegatedandlearnsthatnegat-ingnegativephrases(suchasnotterrible)shouldincreaseneutralandpositiveactivations.agepositiveactivation(forset1)andpositivevaluesmeananincreaseinaveragepositiveactivation(set2).TheRNTNhasthelargestshiftsinthecorrectdi-rections.ThereforewecanconcludethattheRNTNisbestabletoidentifytheeffectofnegationsuponbothpositiveandnegativesentimentsentences. R.Socher,C.Lin,A.Y.Ng,andC.D.Manning.2011a.ParsingNaturalScenesandNaturalLanguagewithRecursiveNeuralNetworks.InICML.R.Socher,J.Pennington,E.H.Huang,A.Y.Ng,andC.D.Manning.2011b.Semi-SupervisedRecursiveAutoencodersforPredictingSentimentDistributions.InEMNLP.R.Socher,B.Huval,C.D.Manning,andA.Y.Ng.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InEMNLP.I.Sutskever,R.Salakhutdinov,andJ.B.Tenenbaum.2009.ModellingrelationaldatausingBayesianclus-teredtensorfactorization.InNIPS.P.D.TurneyandP.Pantel.2010.Fromfrequencytomeaning:Vectorspacemodelsofsemantics.JournalofArticialIntelligenceResearch,37:141188.H.Wang,D.Can,A.Kazemzadeh,F.Bar,andS.Narayanan.2012.Asystemforreal-timetwit-tersentimentanalysisof2012u.s.presidentialelec-tioncycle.InProceedingsoftheACL2012SystemDemonstrations.D.Widdows.2008.Semanticvectorproducts:Someini-tialinvestigations.InProceedingsoftheSecondAAAISymposiumonQuantumInteraction.A.YessenalinaandC.Cardie.2011.Composi-tionalmatrix-spacemodelsforsentimentanalysis.InEMNLP.D.Yu,L.Deng,andF.Seide.2012.Largevocabularyspeechrecognitionusingdeeptensorneuralnetworks.InINTERSPEECH.F.M.Zanzotto,I.Korkontzelos,F.Fallucchi,andS.Man-andhar.2010.Estimatinglinearmodelsforcomposi-tionaldistributionalsemantics.InCOLING.L.ZettlemoyerandM.Collins.2005.Learningtomapsentencestologicalform:Structuredclassica-tionwithprobabilisticcategorialgrammars.InUAI.