/
lexiconrelativetotwopreviouslypublishedlexicons–thelexiconusedinW lexiconrelativetotwopreviouslypublishedlexicons–thelexiconusedinW

lexiconrelativetotwopreviouslypublishedlexicons–thelexiconusedinW - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
356 views
Uploaded On 2015-08-25

lexiconrelativetotwopreviouslypublishedlexicons–thelexiconusedinW - PPT Presentation

InputGVEwij201PNOutputpol2RjVjInitializepoli10forallvi2Pandpoli10forallvi2Nandpoli008vi2PN1fortT2poliPvivj2Ewijpolj Pvivjwij8vi2V3resetpoli108vi2Presetpoli108 ID: 114982

Input:G=(V;E) wij2[0;1] NOutput:pol2RjVjInitialize:poli=1:0forallvi2Pandpoli=1:0forallvi2Nandpoli=0:08vi=2P[N1.for:t..T2.poli=P(vi;vj)2Ewijpolj P(vi;vj)wij 8vi2V3.resetpoli=1:08vi2Presetpoli=1:08

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "lexiconrelativetotwopreviouslypublishedl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

lexiconrelativetotwopreviouslypublishedlexicons–thelexiconusedinWilsonetal.(2005)andthelexiconusedinBlair-Goldensohnetal.(2008).Ourexperimentsshowthataweb-derivedlexiconisnotonlysignicantlylarger,buthasimprovedaccuracyonasentencepolarityclassicationtask,whichisanimportantprobleminmanysentimentanalysisapplications,includingsentimentaggregationandsummarization(HuandLiu,2004;Careninietal.,2006;Lermanetal.,2009).Theseresultsholdtruebothwhenthelexiconsareusedinconjunctionwithstringmatchingtoclassifysentences,andwhentheyareincludedwithinacontextualclassierframe-work(Wilsonetal.,2005).ExtractingpolaritylexiconsfromthewebhasbeeninvestigatedpreviouslybyKajiandKitsure-gawa(2007),whostudytheproblemexclusivelyforJapanese.Inthatworkasetofpositive/negativesen-tencesarerstextractedfromthewebusingcuesfromasyntacticparseraswellasthedocumentstructure.Adjectivesphrasesarethenextractedfromthesesentencesbasedondifferentstatisticsoftheiroccurrenceinthepositiveornegativeset.Ourwork,ontheotherhand,doesnotrelyonsyntacticparsersorrestrictthesetofcandidatelexiconentriestospe-cicsyntacticclasses,i.e.,adjectivephrases.Asaresult,thelexiconbuiltinourstudyisonadifferentscalethanthatexaminedinKajiandKitsuregawa(2007).Thoughthishypothesisisnottestedhere,italsomakesourtechniquesmoreamenabletoadap-tationforotherlanguages.2ConstructingtheLexiconInthissectionwedescribeamethodtoconstructpo-laritylexiconsusinggraphpropagationoveraphrasesimilaritygraphconstructedfromtheweb.2.1GraphPropagationAlgorithmWeconstructourlexiconusinggraphpropagationtechniques,whichhavepreviouslybeeninvestigatedintheconstructionofpolaritylexicons(KimandHovy,2004;HuandLiu,2004;EsuliandSabas-tiani,2009;Blair-Goldensohnetal.,2008;RaoandRavichandran,2009).Weassumeasinputanundi-rectededgeweightedgraphG=(V;E),wherewij2[0;1]istheweightofedge(vi;vj)2E.ThenodesetVisthesetofcandidatephrasesforinclu-sioninasentimentlexicon.Inpractice,Gshoulden-codesemanticsimilaritiesbetweentwonodes,e.g.,forsentimentanalysisonewouldhopethatwij�wikifvi=good,vj=greatandvk=bad.Wealsoas-sumeasinputtwosetsofseedphrases,denotedPforthepositiveseedsetandNforthenegativeseedset.Thecommonpropertyamongallgraphpropaga-tionalgorithmsisthattheyattempttopropagatein-formationfromtheseedsetstotherestofthegraphthroughitsedges.Thiscanbedoneusingmachinelearning,graphalgorithmsormoreheuristicmeans.ThespecicalgorithmusedinthisstudyisgiveninFigure1,whichisdistinctfromcommongraphpropagationalgorithms,e.g.,labelpropagation(seeSection2.3).Theoutputisapolarityvectorpol2RjVjsuchthatpoliisthepolarityscorefortheithcandidatephrase(ortheithnodeinG).Inparticular,wedesirepoltohavethefollowingsemantics:poli=8�&#x]TJ ;� -1; .63; Td;&#x [00;:&#x]TJ ;� -1; .63; Td;&#x [00;0ithphrasehaspositivepolarity0ithphrasehasnegativepolarity=0ithphrasehasnosentimentIntuitively,thealgorithmworksbycomputingbothapositiveandanegativepolaritymagnitudeforeachnodeinthegraph,callthempol+iandpol-i.Thesevaluesareequaltothesumoverthemaxweightedpathfromeveryseedword(eitherposi-tiveornegative)tonodevi.Phrasesthatarecon-nectedtomultiplepositiveseedwordsthroughshortyethighlyweightedpathswillreceivehighpositivevalues.Thenalpolarityofaphraseisthensettopoli=pol+i� pol-i,where aconstantmeanttoaccountforthedifferenceinoverallmassofpositiveandnegativeowinthegraph.Thus,aftertheal-gorithmisrun,ifaphrasehasahigherpositivethannegativepolarityscore,thenitsnalpolaritywillbepositive,andnegativeotherwise.Therearesomeimplementationdetailsworthpointingout.First,thealgorithminFigure1iswrit-teninaniterativeframework,whereoneachitera-tion,pathsofincreasinglengthsareconsidered.TheinputvariableTcontrolsthemaxpathlengthcon-sideredbythealgorithm.Thiscanbesettobeasmallvalueinpractice,sincethemultiplicativepathweightsresultinlongpathsrarelycontributingtopolarityscores.Second,theparameter isathresh-oldthatdenestheminimumpolaritymagnitudea Input:G=(V;E),wij2[0;1],P,NOutput:pol2RjVjInitialize:poli=1:0forallvi2Pandpoli=�1:0forallvi2Nandpoli=0:08vi=2P[N1.for:t..T2.poli=P(vi;vj)2Ewijpolj P(vi;vj)wij,8vi2V3.resetpoli=1:08vi2Presetpoli=�1:08vi2N Figure2:Thelabelpropagationalgorithm(ZhuandGhahramani,2002).(minimizesquarederrorbetweenvaluesofadjacentnodes),andanequivalencetocomputingrandomwalksthroughgraphs.TheprimarydifferencebetweenstandardlabelpropagationandthegraphpropagationalgorithmgiveninSection2.1,isthatanodewithmultiplepathstoaseedwillbeinuencedbyallthesepathsinthelabelpropagationalgorithm,whereasonlythesinglepathfromaseedwillinuencethepolarityofanodeinourproposedpropagationalgorithm–namelythepathwithhighestweight.Theintuitionbehindlabelpropagationseemsjustied.Thatis,ifanodehasmultiplepathstoaseed,itshouldbere-ectedinahigherscore.Thisiscertainlytruewhenthegraphisofhighqualityandallpathstrustwor-thy.However,inagraphconstructedfromwebco-occurrencestatistics,thisisrarelythecase.Ourgraphconsistedofmanydensesubgraphs,eachrepresentingsomesemanticentityclass,suchasactors,authors,techcompanies,etc.Problemsarosewhenpolarityowedintothesedensesub-graphswiththelabelpropagationalgorithm.Ulti-mately,thisowwouldamplifysincethedensesub-graphprovidedexponentiallymanypathsfromeachnodetothesourceoftheow,whichcausedare-inforcementeffect.Asaresult,thelexiconwouldconsistoflargegroupsofactornames,companies,etc.Thisalsoledtoconvergenceissuessincethepolarityisdividedproportionaltothesizeofthedensesubgraph.Additionally,negativephrasesinthegraphappearedtobeinmoredenselyconnectedregions,whichresultedinthenallexiconsbeinghighlyskewedtowardsnegativeentriesduetotheinuenceofmultiplepathstoseedwords.Forbestpathpropagation,theseproblemswerelessacuteaseachnodeinthedensesubgraphwouldonlygetthepolarityasingletimefromeachseed,whichisdecayedbythefactthatedgeweightsaresmallerthan1.Furthermore,thefactthatedgeweightsarelessthan1resultsinmostlongpathshavingweightsnearzero,whichinturnresultsinfastconvergence.3LexiconEvaluationWeranthebestpathgraphpropagationalgorithmoveragraphconstructedfromthewebusingmanu-allyconstructedpositiveandnegativeseedsetsof187and192wordsinsize,respectively.Thesewordsweregeneratedbyasetofvehumansandmanyaremorphologicalvariantsofthesameroot,e.g.,excel/excels/excelled.Thealgorithmproducedalexiconthatcontained178,104entries.Dependingonthethreshold (seeFigure1),thislexiconcouldbelargerorsmaller.Asstatedearlier,ourselectionof andallhyperparameterswasbasedonmanualinspectionoftheresultinglexiconsandperformanceonheld-outdata.Intherestofthissectionweinvestigatetheprop-ertiesofthislexicontounderstandbothitsgeneralcharacteristicsaswellasitspossibleutilityinsen-timentapplications.Tothisendwecomparethreedifferentlexicons:1.Wilsonetal.:DescribedinWilsonetal.(2005).LexiconconstructedbycombiningthelexiconbuiltinRiloffandWiebe(2003)withothersources1.Entriesarearecoarselyrated–strong/weakpositive/negative–whichweweightedas1.0,0.5,-0.5,and-1.0forourex-periments.2.WordNetLP:DescribedinBlair-Goldensohnetal.(2008).Constructedusinglabelpropaga-tionoveragraphderivedfromWordNetsyn-onymandantonymlinks.Notethatlabelprop-agationisnotpronetothekindsoferrorsdis-cussedinSection2.3sincethelexicalgraphisderivedfromahighqualitysource.3.WebGP:Theweb-derivedlexicondescribedinSection2.1andSection2.2. 1Seehttp://www.cs.pitt.edu/mpqa/ Recallwrtotherlexicons AllPhrases Pos.Phrases Neg.Phrases Wilsonetal. WordNetLP WebGP Wilsonetal. 7,628 2,718 4,910 100% 37% 2% WordNetLP 12,310 5,705 6,605 21% 100% 3% WebGP 178,104 90,337 87,767 70% 48% 100% Table1:Lexiconstatistics.Wilsonetal.isthelexiconusedinWilsonetal.(2005),WordNetLPisthelexiconconstructedbyBlair-Goldensohnetal.(2008)thatuseslabelpropagationalgorithmsoveragraphconstructedthroughWordNet,andWebGPistheweb-derivedlexiconfromthisstudy. POSITIVEPHRASES NEGATIVEPHRASES TypicalMultiwordexpressionsSpellingvariations TypicalMultiwordexpressionsVulgarity cuteonceinalifetimeloveable dirtyrunofthemillfuckingstupid fabulousstate-of-the-artnicee repulsiveoutoftouchfuckedup cuddlyfail-safeoperationniice crappyoverthehillcompletebullshit pluckyjustwhatthedoctororderedcooool suckyashinthepanshitty ravishingoutofthisworldcoooool subparbumpsintheroadhalfassed spunkytopofthelinekoool horrendousfoamingatthemouthjackass enchantingmeltinyourmouthkewl miserabledimeadozenpieceofshit precioussnugasabugcozy lousypie-in-the-skysonofabitch charmingoutoftheboxcosy abysmalsicktomystomachsonofabitch stupendousmoregoodthanbadsikk wretchedpaininmyasssonuvabitch Table3:Examplepositiveandnegativephrasesfromweblexicon.ate,e.g.,“shitty”,butsomeareclearlyinsultsandoutburststhataremostlikelyincludedduetotheirco-occurrencewithangrytexts.Therewerealsoanumberofderogatorytermsandracialslursinthelexicon,againmostofwhichreceivednegativesen-timentduetotheirtypicaldisparagingusage.3.2QuantitativeEvaluationTodeterminethepracticalusefulnessofapolaritylexiconderivedfromtheweb,wemeasuredtheper-formanceofthelexicononasentenceclassica-tion/rankingtask.Theinputisasetofsentencesandtheoutputisaclassicationofthesentencesasbe-ingeitherpositive,negativeorneutralinsentiment.Additionally,thesystemoutputstworankings,therstarankingofthesentencebypositivepolarityandthesecondarankingofthesentencebynegativepolarity.Classifyingsentencesbytheirsentimentisasubtaskofsentimentaggregationsystems(HuandLiu,2004;Gamonetal.,2005).Rankingsentencesbytheirpolarityisacriticalsub-taskinextractivesentimentsummarization(Careninietal.,2006;Ler-manetal.,2009).Toclassifysentencesasbeingpositive,negativeorneutral,weusedanaugmentedvote-ipalgo-rithm(ChoiandCardie,2009),whichisgiveninFigure3.Thisintuitionbehindthisalgorithmissim-ple.Thenumberofmatchedpositiveandnegativephrasesfromthelexiconarecountedandwhicheverhasthemostvoteswins.Thealgorithmipsthede-cisionifthenumberofnegationsisodd.Thoughthisalgorithmappearscrude,itbenetsfromnotrelyingonthresholdvaluesforneutralclassication,whichisdifcultduetothefactthatthepolarityscoresinthethreelexiconsarenotonthesamescale.Toranksentenceswedenedthepurityofasen-tenceXasthenormalizedsumofthesentimentscoresforeachphrasexinthesentence:purity(X)=Px2Xpolx +Px2XjpolxjThisisanormalizedscoreintherange[�1;1].In-tuitively,sentenceswithmanytermsofthesamepo-laritywillhavepurityscoresattheextremepointsoftherange.Beforecalculatingpurity,asimplenega-tionheuristicwasimplementedthatreversedthesentimentscoresoftermsthatwerewithinthescopeofnegations.Thetermhelpstofavorsentenceswithmultiplephrasematches.Purityisacommonmetricusedforrankingsentencesforinclusioninsentimentsummaries(Lermanetal.,2009).Purityandnegativepuritywereusedtoranksentencesasbeingpositiveandnegativesentiment,respectively.ThedatausedinourinitialEnglish-onlyexperi- Figure4:Lexiconclassierprecision/recallcurvesforpositive(left)andnegative(right)classes. Figure5:Contextualclassierprecision/recallcurvesforpositive(left)andnegative(right)classesings,againshowingthatatalmosteverylevelofre-call,theweb-derivedlexiconhashigherprecision.ForanalEnglishexperimentwebuiltameta-classicationsystemthatisidenticaltothecontex-tualclassiers,exceptitistrainedusingfeaturesde-rivedfromalllexicons.ResultsareshowninthelastrowofTable4andprecision-recallcurvesareshowninFigure5.Notsurprisingly,thissystemhasthebestperformanceintermsofaverageprecisionasithasaccesstothelargestamountofinformation,thoughitsperformanceisonlyslightlybetterthanthecontextualclassierfortheweb-derivedlexicon.4ConclusionsInthispaperweexaminedtheviabilityofsenti-mentlexiconslearnedsemi-automaticallyfromtheweb,asopposedtothosethatrelyonmanualanno-tationand/orresourcessuchasWordNet.Ourquali-tativeexperimentsindicatethatthewebderivedlex-iconcanincludeawiderangeofphrasesthathavenotbeenavailabletoprevioussystems,mostno-tablyspellingvariations,slang,vulgarity,andmulti-wordexpressions.Quantitatively,weobservedthatthewebderivedlexiconhadsuperiorperformancetopreviouslypublishedlexiconsforEnglishclas-sication.Ultimately,ametaclassierthatincor-poratesfeaturesfromalllexiconsprovidesthebestperformance.Inthefutureweplantoinvestigatetheconstructionofweb-derivedlexiconsforlanguagesotherthanEnglish,whichisanactiveareaofre-search(Mihalceaetal.,2007;JijkounandHofmann,2009;RaoandRavichandran,2009).Theadvantageoftheweb-derivedlexiconsstudiedhereisthattheydonotrelyonlanguagespecicresourcesbesidesunlabeleddataandseedlists.Aprimaryquestioniswhethersuchlexiconsimproveperformanceoveratranslate-to-Englishstrategy(Baneaetal.,2008).Acknowledgements:TheauthorsthankAndrewHogue,RajKrishnanandDeepakRavichandranforinsightfuldiscussionsaboutthiswork. ReferencesE.Alfonseca,K.Hall,andS.Hartmann.2009.Large-scalecomputationofdistributionalsimilaritiesforqueries.InProceedingsoftheNorthAmericanChap-teroftheAssociationforComputationalLinguistics(NAACL-HLT).C.Banea,R.Mihalcea,J.Wiebe,andS.Hassan.2008.Multilingualsubjectivityanalysisusingmachinetrans-lation.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Ney-lon,G.A.Reis,andJ.Reynar.2008.Buildingasenti-mentsummarizerforlocalservicereviews.InNLPintheInformationExplosionEra.G.Carenini,R.Ng,andA.Pauls.2006.Multi-documentsummarizationofevaluativetext.InProceedingsoftheEuropeanChapteroftheAssociationforCompu-tationalLinguistics(EACL).Y.ChoiandC.Cardie.2009.Adaptingapolaritylexiconusingintegerlinearprogrammingfordomain-specicsentimentclassication.InProceedingsoftheConfer-enceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP).S.R.DasandM.Y.Chen.2007.Yahoo!forAmazon:Sentimentextractionfromsmalltalkontheweb.Man-agementScience,53(9):1375–1388.AEsuliandF.Sabastiani.2009.SentiWordNet:Apub-liclyavailablelexicalresourceforopinionmining.InProceedingsoftheLanguageResourceandEvaluationConference(LREC).M.Gamon,A.Aue,S.Corston-Oliver,andE.Ringger.2005.Pulse:Miningcustomeropinionsfromfreetext.InProceedingsofthe6thInternationalSymposiumonIntelligentDataAnalysis(IDA).V.HatzivassiloglouandK.R.McKeown.1997.Predict-ingthesemanticorientationofadjectives.InProceed-ingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL).M.HuandB.Liu.2004.Miningandsummarizingcus-tomerreviews.InProceedingsoftheInternationalConferenceonKnowledgeDiscoveryandDataMin-ing(KDD).V.B.JijkounandK.Hofmann.2009.Generatinganon-englishsubjectivitylexicon:Relationsthatmatter.InProceedingsoftheEuropeanChapteroftheAssocia-tionforComputationalLinguistics(EACL).N.KajiandM.Kitsuregawa.2007.BuildinglexiconforsentimentanalysisfrommassivecollectionofHTMLdocuments.InProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLanguageProcess-ingandComputationalNaturalLanguageLearning(EMNLP-CoNLL).S.M.KimandE.Hovy.2004.Determiningthesenti-mentofopinions.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING).KevinLerman,SashaBlair-Goldensohn,andRyanMc-Donald.2009.Sentimentsummarization:Evaluat-ingandlearninguserpreferences.InProceedingsoftheEuropeanChapteroftheAssociationforCompu-tationalLinguistics(EACL).R.McDonald,K.Hannan,T.Neylon,M.Wells,andJ.Reynar.2007.Structuredmodelsforne-to-coarsesentimentanalysis.InProceedingsoftheAnnualCon-ferenceoftheAssociationforComputationalLinguis-tics(ACL).R.Mihalcea,C.Banea,andJ.Wiebe.2007.Learningmultilingualsubjectivelanguageviacross-lingualpro-jections.InProceedingsoftheAnnualConferenceoftheAssociationforComputationalLinguistics(ACL).S.Mohammad,B.Dorr,andC.Dunne.2009.Generat-inghigh-coveragesemanticorientationlexiconsfromovertlymarkedwordsandathesaurus.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?Sentimentclassicationusingmachinelearn-ingtechniques.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).P.Pantel,E.Crestan,A.Borkovsky,A.Popescu,andV.Vyas.2009.Web-scaledistributionalsimilarityandentitysetexpansion.InProceedingsofConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).D.RaoandD.Ravichandran.2009.Semi-SupervisedPolarityLexiconInduction.InProceedingsoftheEu-ropeanChapteroftheAssociationforComputationalLinguistics(EACL).E.RiloffandJ.Wiebe.2003.Learningextractionpat-ternsforsubjectiveexpressions.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(EMNLP).P.Turney.2002.Thumbsuporthumbsdown?Sentimentorientationappliedtounsupervisedclassicationofre-views.InProceedingsoftheAnnualConferenceoftheAssociationforComputationalLinguistics(ACL).J.Wiebe.2000.Learningsubjectiveadjectivesfromcor-pora.InProceedingsoftheNationalConferenceonArticialIntelligence(AAAI).T.Wilson,J.Wiebe,andP.Hoffmann.2005.Recogniz-ingcontextualpolarityinphrase-levelsentimentanal-ysis.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).X.ZhuandZ.Ghahramani.2002.Learningfromlabeledandunlabeleddatawithlabelpropagation.Technicalreport,CMUCALDtechreportCMU-CALD-02.

Related Contents


Next Show more