/
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
475 views
Uploaded On 2014-11-13

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Richard - PPT Presentation

Wu Jason Chuang Christopher D Manning Andrew Y Ng and Christopher Potts Stanford University Stanford CA 94305 USA richardsocherorg aperelygjcchuangang csstanfordedu jeaneismanningcgpotts stanfordedu Abstract Semantic word spaces have been very use f ID: 11293

Jason Chuang Christopher

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Recursive Deep Models for Semantic Compo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

workonRNNs,severalcompositionalityideasre-latedtoneuralnetworkshavebeendiscussedbyBot-tou(2011)andHinton(1990)andrstmodelssuchasRecursiveAuto-associativememoriesbeenexper-imentedwithbyPollack(1990).Theideatorelateinputsthroughthreewayinteractions,parameterizedbyatensorhavebeenproposedforrelationclassi-cation(Sutskeveretal.,2009;Jenattonetal.,2012),extendingRestrictedBoltzmannmachines(RanzatoandHinton,2010)andasaspeciallayerforspeechrecognition(Yuetal.,2012).SentimentAnalysis.Apartfromtheabove-mentionedwork,mostapproachesinsentimentanal-ysisusebagofwordsrepresentations(PangandLee,2008).SnyderandBarzilay(2007)analyzedlargerreviewsinmoredetailbyanalyzingthesentimentofmultipleaspectsofrestaurants,suchasfoodoratmosphere.Severalworkshaveexploredsentimentcompositionalitythroughcarefulengineeringoffea-turesorpolarityshiftingrulesonsyntacticstructures(PolanyiandZaenen,2006;MoilanenandPulman,2007;Rentoumietal.,2010;Nakagawaetal.,2010).3StanfordSentimentTreebankBagofwordsclassierscanworkwellinlongerdocumentsbyrelyingonafewwordswithstrongsentimentlike`awesome'or`exhilarating.'How-ever,sentimentaccuraciesevenforbinaryposi-tive/negativeclassicationforsinglesentenceshasnotexceeded80%forseveralyears.Forthemoredifcultmulticlasscaseincludinganeutralclass,accuracyisoftenbelow60%forshortmessagesonTwitter(Wangetal.,2012).Fromalinguisticorcognitivestandpoint,ignoringwordorderinthetreatmentofasemantictaskisnotplausible,and,aswewillshow,itcannotaccuratelyclassifyhardex-amplesofnegation.Correctlypredictingthesehardcasesisnecessarytofurtherimproveperformance.InthissectionwewillintroduceandprovidesomeanalysesforthenewSentimentTreebankwhichin-cludeslabelsforeverysyntacticallyplausiblephraseinthousandsofsentences,allowingustotrainandevaluatecompositionalmodels.Weconsiderthecorpusofmoviereviewexcerptsfromtherottentomatoes.comwebsiteorig-inallycollectedandpublishedbyPangandLee(2005).Theoriginaldatasetincludes10,662sen- Figure3:Thelabelinginterface.Randomphraseswereshownandannotatorshadasliderforselectingthesenti-mentanditsdegree.tences,halfofwhichwereconsideredpositiveandtheotherhalfnegative.Eachlabelisextractedfromalongermoviereviewandreectsthewriter'sover-allintentionforthisreview.Thenormalized,lower-casedtextisrstusedtorecover,fromtheorigi-nalwebsite,thetextwithcapitalization.RemainingHTMLtagsandsentencesthatarenotinEnglisharedeleted.TheStanfordParser(KleinandMan-ning,2003)isusedtoparsesall10,662sentences.Inapproximately1,100casesitsplitsthesnippetintomultiplesentences.WethenusedAmazonMe-chanicalTurktolabeltheresulting215,154phrases.Fig.3showstheinterfaceannotatorssaw.Thesliderhas25differentvaluesandisinitiallysettoneutral.Thephrasesineachhitarerandomlysampledfromthesetofallphrasesinordertopreventlabelsbeinginuencedbywhatfollows.Formoredetailsonthedatasetcollection,seesupplementarymaterial.Fig.2showsthenormalizedlabeldistributionsateachn-gramlength.Startingatlength20,thema-jorityarefullsentences.Oneofthendingsfromlabelingsentencesbasedonreader'sperceptionisthatmanyofthemcouldbeconsideredneutral.Wealsonoticethatstrongersentimentoftenbuildsupinlongerphrasesandthemajorityoftheshorterphrasesareneutral.Anotherobservationisthatmostannotatorsmovedtheslidertooneofthevepo-sitions:negative,somewhatnegative,neutral,posi-tiveorsomewhatpositive.Theextremevalueswererarelyusedandthesliderwasnotoftenleftinbe-tweentheticks.Hence,evena5-classclassicationintothesecategoriescapturesthemainvariabilityofthelabels.Wewillnamethisne-grainedsenti-mentclassicationandourmainexperimentwillbetorecoverthesevelabelsforphrasesofalllengths. (d) (d) Distributions of sentiment values for (a) unigrams, (b) 10-grams, (c) 20-grams, and (d) full sentences. (a) (a) (b) (b) (c) (c) Figure5:AsinglelayeroftheRecursiveNeuralTen-sorNetwork.Eachdashedboxrepresentsoneofd-manyslicesandcancaptureatypeofinuenceachildcanhaveonitsparent.TheRNTNusesthisdenitionforcomputingp1:p1=f bcTV[1:d]bc+Wbc!;whereWisasdenedinthepreviousmodels.Thenextparentvectorp2inthetri-gramwillbecom-putedwiththesameweights:p2=f ap1TV[1:d]ap1+Wap1!:ThemainadvantageoverthepreviousRNNmodel,whichisaspecialcaseoftheRNTNwhenVissetto0,isthatthetensorcandirectlyrelatein-putvectors.Intuitively,wecaninterpreteachsliceofthetensorascapturingaspecictypeofcompo-sition.AnalternativetoRNTNswouldbetomakethecompositionalfunctionmorepowerfulbyaddingasecondneuralnetworklayer.However,initialexper-imentsshowedthatitishardtooptimizethismodelandvectorinteractionsarestillmoreimplicitthanintheRNTN.4.4TensorBackpropthroughStructureWedescribeinthissectionhowtotraintheRNTNmodel.Asmentionedabove,eachnodehasasoftmaxclassiertrainedonitsvectorrepresenta-tiontopredictagivengroundtruthortargetvectort.Weassumethetargetdistributionvectorateachnodehasa0-1encoding.IfthereareCclasses,thenithaslengthCanda1atthecorrectlabel.Allotherentriesare0.Wewanttomaximizetheprobabilityofthecor-rectprediction,orminimizethecross-entropyerrorbetweenthepredicteddistributionyi2RC1atnodeiandthetargetdistributionti2RC1atthatnode.Thisisequivalent(uptoaconstant)tomini-mizingtheKL-divergencebetweenthetwodistribu-tions.TheerrorasafunctionoftheRNTNparame-ters=(V;W;Ws;L)forasentenceis:E()=XiXjtijlogyij+kk2(2)Thederivativefortheweightsofthesoftmaxclas-sierarestandardandsimplysumupfromeachnode'serror.Wedenexitobethevectoratnodei(intheexampletrigram,thexi2Rd1'sare(a;b;c;p1;p2)).WeskipthestandardderivativeforWs.EachnodebackpropagatesitserrorthroughtotherecursivelyusedweightsV;W.Leti;s2Rd1bethesoftmaxerrorvectoratnodei:i;s=�WTs(yi�ti) f0(xi);where istheHadamardproductbetweenthetwovectorsandf0istheelement-wisederivativeoffwhichinthestandardcaseofusingf=tanhcanbecomputedusingonlyf(xi).Theremainingderivativescanonlybecomputedinatop-downfashionfromthetopnodethroughthetreeandintotheleafnodes.ThefullderivativeforVandWisthesumofthederivativesateachofthenodes.Wedenethecompleteincomingerrormessagesforanodeiasi;com.Thetopnode,inourcasep2,onlyreceivederrorsfromthetopnode'ssoftmax.Hence,p2;com=p2;swhichwecanusetoobtainthestandardbackpropderivativeforW(GollerandK¨uchler,1996;Socheretal.,2010).Forthederivativeofeachslicek=1;:::;d,weget:@Ep2 @V[k]=p2;comkap1ap1T;wherep2;comkisjustthek'thelementofthisvector.Now,wecancomputetheerrormessageforthetwo Figure6:Accuracycurvesfornegrainedsentimentclassicationateachn-gramlengths.Left:Accuracyseparatelyforeachsetofn-grams.Right:Cumulativeaccuracyofalln-grams.5.2FullSentenceBinarySentimentThissetupiscomparabletopreviousworkontheoriginalrottentomatoesdatasetwhichonlyusedfullsentencelabelsandbinaryclassicationofpos-itive/negative.Hence,theseexperimentsshowtheimprovementevenbaselinemethodscanachievewiththesentimenttreebank.Table1showsresultsofthisbinaryclassicationforbothallphrasesandforonlyfullsentences.Thepreviousstateoftheartwasbelow80%(Socheretal.,2012).Withthecoarsebagofwordsannotationfortraining,manyofthemorecomplexphenomenacouldnotbecaptured,evenbymorepowerfulmodels.ThecombinationofthenewsentimenttreebankandtheRNTNpushesthestateoftheartonshortphrasesupto85.4%.5.3ModelAnalysis:ContrastiveConjunctionInthissection,weuseasubsetofthetestsetwhichincludesonlysentenceswithan`XbutY'structure:AphraseXbeingfollowedbybutwhichisfollowedbyaphraseY.Theconjunctionisinterpretedasanargumentforthesecondconjunct,withtherstfunctioningconcessively(Lakoff,1971;Blakemore,1989;Merin,1999).Fig.7containsanexample.Weanalyzeastrictsetting,whereXandYarephrasesofdifferentsentiment(includingneutral).Theex-ampleiscountedascorrect,iftheclassicationsforbothphrasesXandYarecorrect.Furthermore,thelowestnodethatdominatesbothofthewordbutandthenodethatspansYalsohavetohavethesamecorrectsentiment.Fortheresulting131cases,theRNTNobtainsanaccuracyof41%comparedtoMV-RNN(37),RNN(36)andbiNB(27).5.4ModelAnalysis:HighLevelNegationWeinvestigatetwotypesofnegation.Foreachtype,weuseaseparatedatasetforevaluation. Figure7:ExampleofcorrectpredictionforcontrastiveconjunctionXbutY.Set1:NegatingPositiveSentences.Therstsetcontainspositivesentencesandtheirnegation.Inthisset,thenegationchangestheoverallsentimentofasentencefrompositivetonegative.Hence,wecomputeaccuracyintermsofcorrectsentimentre-versalfrompositivetonegative.Fig.9showstwoexamplesofpositivenegationtheRNTNcorrectlyclassied,evenifnegationislessobviousinthecaseof`least'.Table2(left)givestheaccuraciesover21positivesentencesandtheirnegationforallmodels.TheRNTNhasthehighestreversalaccuracy,show-ingitsabilitytostructurallylearnnegationofposi-tivesentences.Butwhatifthemodelsimplymakesphrasesverynegativewhennegationisinthesen-tence?Thenextexperimentsshowthatthemodelcapturesmorethansuchasimplisticnegationrule.Set2:NegatingNegativeSentences.Thesec-ondsetcontainsnegativesentencesandtheirnega-tion.Whennegativesentencesarenegated,thesen-timenttreebankshowsthatoverallsentimentshouldbecomelessnegative,butnotnecessarilypositive.Forinstance,`Themoviewasterrible'isnegativebutthe`Themoviewasnotterrible'saysonlythatitwaslessbadthanaterribleone,notthatitwasgood(Horn,1989;Israel,2001).Hence,weevaluateac- Figure9:RNTNpredictionofpositiveandnegative(bottomright)sentencesandtheirnegation. ModelAccuracy NegatedPositiveNegatedNegative biNB19.027.3RNN33.345.5MV-RNN52.454.6RNTN71.481.8 Table2:Accuracyofnegationdetection.Negatedposi-tiveismeasuredascorrectsentimentinversions.Negatednegativeismeasuredasincreasesinpositiveactivations.curacyintermsofhowofteneachmodelwasabletoincreasenon-negativeactivationinthesentimentofthesentence.Table2(right)showstheaccuracy.Inover81%ofcases,theRNTNcorrectlyincreasesthepositiveactivations.Fig.9(bottomright)showsatypicalcaseinwhichsentimentwasmademorepositivebyswitchingthemainclassfromnegativetoneutraleventhoughbothnotanddullwerenega-tive.Fig.8showsthechangesinactivationforbothsets.Negativevaluesindicateadecreaseinaver- Figure8:Changeinactivationsfornegations.OnlytheRNTNcorrectlycapturesbothtypes.Itdecreasespositivesentimentmorewhenitisnegatedandlearnsthatnegat-ingnegativephrases(suchasnotterrible)shouldincreaseneutralandpositiveactivations.agepositiveactivation(forset1)andpositivevaluesmeananincreaseinaveragepositiveactivation(set2).TheRNTNhasthelargestshiftsinthecorrectdi-rections.ThereforewecanconcludethattheRNTNisbestabletoidentifytheeffectofnegationsuponbothpositiveandnegativesentimentsentences. R.Socher,C.Lin,A.Y.Ng,andC.D.Manning.2011a.ParsingNaturalScenesandNaturalLanguagewithRecursiveNeuralNetworks.InICML.R.Socher,J.Pennington,E.H.Huang,A.Y.Ng,andC.D.Manning.2011b.Semi-SupervisedRecursiveAutoencodersforPredictingSentimentDistributions.InEMNLP.R.Socher,B.Huval,C.D.Manning,andA.Y.Ng.2012.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InEMNLP.I.Sutskever,R.Salakhutdinov,andJ.B.Tenenbaum.2009.ModellingrelationaldatausingBayesianclus-teredtensorfactorization.InNIPS.P.D.TurneyandP.Pantel.2010.Fromfrequencytomeaning:Vectorspacemodelsofsemantics.JournalofArticialIntelligenceResearch,37:141–188.H.Wang,D.Can,A.Kazemzadeh,F.Bar,andS.Narayanan.2012.Asystemforreal-timetwit-tersentimentanalysisof2012u.s.presidentialelec-tioncycle.InProceedingsoftheACL2012SystemDemonstrations.D.Widdows.2008.Semanticvectorproducts:Someini-tialinvestigations.InProceedingsoftheSecondAAAISymposiumonQuantumInteraction.A.YessenalinaandC.Cardie.2011.Composi-tionalmatrix-spacemodelsforsentimentanalysis.InEMNLP.D.Yu,L.Deng,andF.Seide.2012.Largevocabularyspeechrecognitionusingdeeptensorneuralnetworks.InINTERSPEECH.F.M.Zanzotto,I.Korkontzelos,F.Fallucchi,andS.Man-andhar.2010.Estimatinglinearmodelsforcomposi-tionaldistributionalsemantics.InCOLING.L.ZettlemoyerandM.Collins.2005.Learningtomapsentencestologicalform:Structuredclassica-tionwithprobabilisticcategorialgrammars.InUAI.