/
Learning Word Vectors for Sentiment Analysis Andrew L Learning Word Vectors for Sentiment Analysis Andrew L

Learning Word Vectors for Sentiment Analysis Andrew L - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
529 views
Uploaded On 2014-12-22

Learning Word Vectors for Sentiment Analysis Andrew L - PPT Presentation

Maas Raymond E Daly Peter T Pham Dan Huang Andrew Y Ng and Christopher Potts Stanford University Stanford CA 94305 amaas rdaly ptpham yuze ang cgpottsstanfordedu Abstract Unsupervised vectorbased approaches to se mantics can model rich lexical mean ID: 28010

Maas Raymond Daly

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Learning Word Vectors for Sentiment Anal..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

cientlygeneraltoworkalsowithcontinuousandmulti-dimensionalnotionsofsentimentaswellasnon-sentimentannotations(e.g.,politicalafliation,speakercommitment).Afterpresentingthemodelindetail,wepro-videillustrativeexamplesofthevectorsitlearns,andthenwesystematicallyevaluatetheapproachondocument-levelandsentence-levelclassicationtasks.Ourexperimentsinvolvethesmall,widelyusedsentimentandsubjectivitycorporaofPangandLee(2004),whichpermitsustomakecomparisonswithanumberofrelatedapproachesandpublishedresults.Wealsoshowthatthisdatasetcontainsmanycorrelationsbetweenexamplesinthetrainingandtestingsets.Thisleadsustoevaluateon,andmakepubliclyavailable,alargedatasetofinformalmoviereviewsfromtheInternetMovieDatabase(IMDB).2RelatedworkThemodelwepresentinthenextsectiondrawsin-spirationfrompriorworkonbothprobabilistictopicmodelingandvector-spacedmodelsforwordmean-ings.LatentDirichletAllocation(LDA;(Bleietal.,2003))isaprobabilisticdocumentmodelthatas-sumeseachdocumentisamixtureoflatenttop-ics.ForeachlatenttopicT,themodellearnsaconditionaldistributionp(wjT)fortheprobabilitythatwordwoccursinT.Onecanobtainak-dimensionalvectorrepresentationofwordsbyrsttrainingak-topicmodelandthenllingthematrixwiththep(wjT)values(normalizedtounitlength).Theresultisaword–topicmatrixinwhichtherowsaretakentorepresentwordmeanings.However,becausetheemphasisinLDAisonmodelingtop-ics,notwordmeanings,thereisnoguaranteethattherow(word)vectorsaresensibleaspointsinak-dimensionalspace.Indeed,weshowinsection4thatusingLDAinthiswaydoesnotdeliverro-bustwordvectors.ThesemanticcomponentofourmodelsharesitsprobabilisticfoundationwithLDA,butisfactoredinamannerdesignedtodiscoverwordvectorsratherthanlatenttopics.SomerecentworkintroducesextensionsofLDAtocapturesen-timentinadditiontotopicalinformation(Lietal.,2010;LinandHe,2009;Boyd-GraberandResnik,2010).LikeLDA,thesemethodsfocusonmodel-ingsentiment-imbuedtopicsratherthanembeddingwordsinavectorspace.Vectorspacemodels(VSMs)seektomodelwordsdirectly(TurneyandPantel,2010).LatentSeman-ticAnalysis(LSA),perhapsthebestknownVSM,explicitlylearnssemanticwordvectorsbyapply-ingsingularvaluedecomposition(SVD)tofactoraterm–documentco-occurrencematrix.ItistypicaltoweightandnormalizethematrixvaluespriortoSVD.Toobtainak-dimensionalrepresentationforagivenword,onlytheentriescorrespondingtotheklargestsingularvaluesaretakenfromtheword'sba-sisinthefactoredmatrix.Suchmatrixfactorization-basedapproachesareextremelysuccessfulinprac-tice,buttheyforcetheresearchertomakeanumberofdesignchoices(weighting,normalization,dimen-sionalityreductionalgorithm)withlittletheoreticalguidancetosuggestwhichtoprefer.Usingtermfrequency(tf)andinversedocumentfrequency(idf)weightingtotransformthevaluesinaVSMoftenincreasestheperformanceofre-trievalandcategorizationsystems.Deltaidfweight-ing(MartineauandFinin,2009)isasupervisedvari-antofidfweightinginwhichtheidfcalculationisdoneforeachdocumentclassandthenonevalueissubtractedfromtheother.MartineauandFininpresentevidencethatthisweightinghelpswithsen-timentclassication,andPaltoglouandThelwall(2010)systematicallyexploreanumberofweight-ingschemesinthecontextofsentimentanalysis.ThesuccessofdeltaidfweightinginpreviousworksuggeststhatincorporatingsentimentinformationintoVSMvaluesviasupervisedmethodsishelp-fulforsentimentanalysis.Weadoptthisinsight,butweareabletoincorporateitdirectlyintoourmodel'sobjectivefunction.(Section4comparesourapproachwitharepresentativesampleofsuchweightingschemes.)3OurModelTocapturesemanticsimilaritiesamongwords,wederiveaprobabilisticmodelofdocumentswhichlearnswordrepresentations.Thiscomponentdoesnotrequirelabeleddata,andsharesitsfoundationwithprobabilistictopicmodelssuchasLDA.Thesentimentcomponentofourmodelusessentimentannotationstoconstrainwordsexpressingsimilar weights(and),andthewordvectordimension-ality .3.2CapturingWordSentimentThemodelpresentedsofardoesnotexplicitlycap-turesentimentinformation.Applyingthisalgorithmtodocumentswillproducerepresentationswherewordsthatoccurtogetherindocumentshavesim-ilarrepresentations.However,thisunsupervisedapproachhasnoexplicitwayofcapturingwhichwordsarepredictiveofsentimentasopposedtocontent-related.Muchpreviousworkinnaturallan-guageprocessingachievesbetterrepresentationsbylearningfrommultipletasks(CollobertandWeston,2008;FinkelandManning,2009).Followingthisthemeweintroduceasecondtasktoutilizelabeleddocumentstoimproveourmodel'swordrepresenta-tions.Sentimentisacomplex,multi-dimensionalcon-cept.Dependingonwhichaspectsofsentimentwewishtocapture,wecangivesomebodyoftextasentimentlabelswhichcanbecategorical,continu-ous,ormulti-dimensional.Toleveragesuchlabels,weintroduceanobjectivethatthewordvectorsofourmodelshouldpredictthesentimentlabelusingsomeappropriatepredictor,^s=f(w):(8)Usinganappropriatepredictorfunctionf(x)wemapawordvectorwtoapredictedsentimentlabel^s.Wecanthenimproveourwordvectorwtobetterpredictthesentimentlabelsofcontextsinwhichthatwordoccurs.Forsimplicityweconsiderthecasewherethesen-timentlabelsisascalarcontinuousvaluerepre-sentingsentimentpolarityofadocument.Thiscap-turesthecaseofmanyonlinereviewswheredoc-umentsareassociatedwithalabelonastarratingscale.Welinearlymapsuchstarvaluestotheinter-vals2[0;1]andtreatthemasaprobabilityofpos-itivesentimentpolarity.Usingthisformulation,weemployalogisticregressionasourpredictorf(x).Weusew'svectorrepresentationwandregressionweights toexpressthisasp(s=1jw;R; )=( Tw+bc);(9)where(x)isthelogisticfunctionand 2R isthelogisticregressionweightvector.Weadditionallyintroduceascalarbiasbcfortheclassier.Thelogisticregressionweights andbcdenealinearhyperplaneinthewordvectorspacewhereawordvector'spositivesentimentprobabilityde-pendsonwhereitlieswithrespecttothishyper-plane.Learningoveracollectionofdocumentsre-sultsinwordsresidingdifferentdistancesfromthishyperplanebasedontheaveragepolarityofdocu-mentsinwhichthewordsoccur.GivenasetoflabeleddocumentsDwhereskisthesentimentlabelfordocumentdk,wewishtomaximizetheprobabilityofdocumentlabelsgiventhedocuments.Weassumedocumentsinthecollec-tionandwordswithinadocumentarei.i.d.samples.Bymaximizingthelog-objectiveweobtain,maxR; ;bcjDjXk=1NkXi=1logp(skjwi;R; ;bc):(10)Theconditionalprobabilityp(skjwi;R; ;bc)iseasilyobtainedfromequation9.3.3LearningThefulllearningobjectivemaximizesasumofthetwoobjectivespresented.Thisproducesanalob-jectivefunctionof,jjRjj2F+jDjXk=1jj^kjj22+NkXi=1logp(wij^k;R;b)+jDjXk=11 jSkjNkXi=1logp(skjwi;R; ;bc):(11)jSkjdenotesthenumberofdocumentsinthedatasetwiththesameroundedvalueofsk(i.e.sk0:5andsk0:5).Weintroducetheweighting1 jSkjtocombatthewell-knownimbalanceinratingspresentinreviewcollections.Thisweightingpreventstheoveralldistributionofdocumentratingsfromaffect-ingtheestimateofdocumentratingsinwhichapar-ticularwordoccurs.Thehyper-parametersofthemodelaretheregularizationweights(and),andthewordvectordimensionality .MaximizingtheobjectivefunctionwithrespecttoR,b, ,andbcisanon-convexproblem.Weusealternatingmaximization,whichrstoptimizesthe OurmodelOurmodelSentiment+SemanticSemanticonlyLSA melancholybittersweetthoughtfulpoeticheartbreakingwarmthlyricalhappinesslayerpoetrytendernessgentleprofoundcompassionatelonelinessvivid ghastlyembarrassinglypredatorshideoustritehideousineptlaughablytubeseverelyatrociousbafedgrotesqueappallingsmackunsuspecting lacklusterlamepassableuninspiredlaughableunconvincingatunimaginativeamateurishblanduninspiredclich´edforgettableawfulinsipidmediocre romanticromanceromanceromancelovecharmingscrewballsweetdelightfulgrantbeautifulsweetcomediesrelationshipchemistrycomedy Table1:Similarityoflearnedwordvectors.Eachtargetwordisgivenwithitsvemostsimilarwordsusingcosinesimilarityofthevectorsdeterminedbyeachmodel.Thefullversionofourmodel(left)capturesbothlexicalsimilarityaswellassimilarityofsentimentstrengthandorientation.Ourunsupervisedsemanticcomponent(center)andLSA(right)capturesemanticrelations.VSMinduction(TurneyandPantel,2010).LatentDirichletAllocation(LDA;Bleietal.,2003)Weusethemethoddescribedinsec-tion2forinducingwordrepresentationsfromthetopicmatrix.Totrainthe50-topicLDAmodelweusecodereleasedbyBleietal.(2003).Weusethesame5,000termvocabularyforLDAasisusedfortrainingwordvectormodels.WeleavetheLDAhyperparametersattheirdefaultvalues,thoughsomeworksuggestsoptimizingoverpriorsforLDAisimportant(Wallachetal.,2009).WeightingVariantsWeevaluatebothbinary(b)termfrequencyweightingwithsmootheddeltaidf(t')andnoidf(n)becausethesevariantsworkedwellinpreviousexperimentsinsentiment(Mar-tineauandFinin,2009;Pangetal.,2002).Inallcases,weusecosinenormalization(c).PaltoglouandThelwall(2010)performanextensiveanalysisofsuchweightingvariantsforsentimenttasks.4.3DocumentPolarityClassicationOurrstevaluationtaskisdocument-levelsenti-mentpolarityclassication.Aclassiermustpre-dictwhetheragivenreviewispositiveornegativegiventhereviewtext.Givenadocument'sbagofwordsvectorv,weobtainfeaturesfromourmodelusingamatrix-vectorproductRv,wherevcanhavearbitrarytf.idfweighting.Wedonotcosinenormalizev,insteadapplyingcosinenormalizationtothenalfeaturevectorRv.ThisprocedureisalsousedtoobtainfeaturesfromtheLDAandLSAwordvectors.Inpreliminaryexperiments,wefound`bnn'weightingtoworkbestforvwhengeneratingdocumentfea-turesviatheproductRv.Inallexperiments,weusethisweightingtogetmulti-wordrepresentations usesdisjointsetsofmoviesfortrainingandtesting.Thesestepsminimizetheabilityofalearnertorelyonidiosyncraticword–classassociations,therebyfocusingattentionongenuinesentimentfeatures.4.3.2IMDBReviewDatasetWeconstructedacollectionof50,000reviewsfromIMDB,allowingnomorethan30reviewspermovie.Theconstructeddatasetcontainsanevennumberofpositiveandnegativereviews,soran-domlyguessingyields50%accuracy.Followingpreviousworkonpolarityclassication,weconsideronlyhighlypolarizedreviews.Anegativereviewhasascore4outof10,andapositivereviewhasascore7outof10.Neutralreviewsarenotin-cludedinthedataset.Intheinterestofprovidingabenchmarkforfutureworkinthisarea,wereleasethisdatasettothepublic.2Weevenlydividedthedatasetintotrainingandtestsets.Thetrainingsetisthesame25,000la-beledreviewsusedtoinducewordvectorswithourmodel.Weevaluateclassierperformanceaftercross-validatingclassierparametersonthetrainingset,againusingalinearSVMinallcases.Table2showsclassicationperformanceonoursubsetofIMDBreviews.Ourmodelshowedsuperiorper-formancetootherapproaches,andperformedbestwhenconcatenatedwithbagofwordsrepresenta-tion.Againthevariantofourmodelwhichutilizedextraunlabeleddataduringtrainingperformedbest.Differencesinaccuracyaresmall,but,becauseourtestsetcontains25,000examples,thevarianceoftheperformanceestimateisquitelow.Forex-ample,anaccuracyincreaseof0.1%correspondstocorrectlyclassifyinganadditional25reviews.4.4SubjectivityDetectionAsasecondevaluationtask,weperformedsentence-levelsubjectivityclassication.Inthistask,aclas-sieristrainedtodecidewhetheragivensentenceissubjective,expressingthewriter'sopinions,orob-jective,expressingpurelyfacts.WeusedthedatasetofPangandLee(2004),whichcontainssubjectivesentencesfrommoviereviewsummariesandobjec-tivesentencesfrommovieplotsummaries.Thistask 2Datasetandfurtherdetailsareavailableonlineat:http://www.andrew-maas.net/data/sentimentissubstantiallydifferentfromthereviewclassica-tiontaskbecauseitusessentencesasopposedtoen-tiredocumentsandthetargetconceptissubjectivityinsteadofopinionpolarity.Werandomlysplitthe10,000examplesinto10foldsandreport10-foldcrossvalidationaccuracyusingtheSVMtrainingprotocolofPangandLee(2004).Table2showsclassicationaccuraciesfromthesentencesubjectivityexperiment.OurmodelagainprovidedsuperiorfeatureswhencomparedagainstotherVSMs.Improvementoverthebag-of-wordsbaselineisobtainedbyconcatenatingthetwofeaturevectors.5DiscussionWepresentedavectorspacemodelthatlearnswordrepresentationscaptuingsemanticandsentimentin-formation.Themodel'sprobabilisticfoundationgivesatheoreticallyjustiedtechniqueforwordvectorinductionasanalternativetotheoverwhelm-ingnumberofmatrixfactorization-basedtechniquescommonlyused.Ourmodelisparametrizedasalog-bilinearmodelfollowingrecentsuccessinus-ingsimilartechniquesforlanguagemodels(Bengioetal.,2003;CollobertandWeston,2008;MnihandHinton,2007),anditisrelatedtoprobabilisticlatenttopicmodels(Bleietal.,2003;SteyversandGrif-ths,2006).Weparametrizethetopicalcomponentofourmodelinamannerthataimstocapturewordrepresentationsinsteadoflatenttopics.Inourex-periments,ourmethodperformedbetterthanLDA,whichmodelslatenttopicsdirectly.Weextendedtheunsupervisedmodeltoincor-poratesentimentinformationandshowedhowthisextendedmodelcanleveragetheabundanceofsentiment-labeledtextsavailableonlinetoyieldwordrepresentationsthatcapturebothsentimentandsemanticrelations.Wedemonstratedtheutil-ityofsuchrepresentationsontwotasksofsenti-mentclassication,usingexistingdatasetsaswellasalargeronethatwereleaseforfutureresearch.Thesetasksinvolverelativelysimplesentimentin-formation,butthemodelishighlyexibleinthisregard;itcanbeusedtocharacterizeawidevarietyofannotations,andthusisbroadlyapplicableinthegrowingareasofsentimentanalysisandretrieval.