/
Seeing stars Exploiting class elationships or sentiment categorization with espect to Seeing stars Exploiting class elationships or sentiment categorization with espect to

Seeing stars Exploiting class elationships or sentiment categorization with espect to - PDF document

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
517 views
Uploaded On 2014-12-20

Seeing stars Exploiting class elationships or sentiment categorization with espect to - PPT Presentation

g one to stars This task represents an interesting twist on stan dard multiclass te xt cate gorization be cause there are se eral dif ferent de grees of similarity between class labels for x ample three stars is intuiti ely closer to four stars than ID: 26944

one stars This

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Seeing stars Exploiting class elationshi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Seeingstars:ExploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscalesBoPangandLillianLee(1)DepartmentofComputerScience,CornellUniversity(2)LanguageTechnologiesInstitute,CarnegieMellonUniversity(3)ComputerScienceDepartment,CarnegieMellonUniversityAbstractWeaddresstherating-inferenceproblem,whereinratherthansimplydecidewhetherareviewis“thumbsup”or“thumbsdown”,asinprevioussentimentanaly-siswork,onemustdetermineanauthor'sevaluationwithrespecttoamulti-pointscale(e.g.,onetove“stars”).Thistaskrepresentsaninterestingtwistonstan-dardmulti-classtextcategorizationbe-causethereareseveraldifferentdegreesofsimilaritybetweenclasslabels;forex-ample,“threestars”isintuitivelycloserto“fourstars”thanto“onestar”.Werstevaluatehumanperformanceatthetask.Then,weapplyameta-algorithm,basedonametriclabelingfor-mulationoftheproblem,thataltersagiven-aryclassier'soutputinanex-plicitattempttoensurethatsimilaritemsreceivesimilarlabels.Weshowthatthemeta-algorithmcanprovidesigni-cantimprovementsoverbothmulti-classandregressionversionsofSVMswhenweemployanovelsimilaritymeasureappro-priatetotheproblem.Publicationinfo:ProceedingsoftheACL,2005.1IntroductionTherehasrecentlybeenadramaticsurgeofinter-estinsentimentanalysis,asmoreandmorepeoplebecomeawareofthescienticchallengesposedandthescopeofnewapplicationsenabledbythepro-cessingofsubjectivelanguage.(Thepaperscol-lectedbyQu,Shanahan,andWiebe(2004)formarepresentativesampleofresearchinthearea.)Mostpriorworkonthespecicproblemofcategorizingexpresslyopinionatedtexthasfocusedonthebi-narydistinctionofpositivevs.negative(Turney,2002;Pang,Lee,andVaithyanathan,2002;Dave,Lawrence,andPennock,2003;YuandHatzivas-siloglou,2003).Butitisoftenhelpfultohavemoreinformationthanthisbinarydistinctionprovides,es-peciallyifoneisrankingitemsbyrecommendationorcomparingseveralreviewers'opinions:exampleapplicationsincludecollaborativelteringandde-cidingwhichconferencesubmissionstoaccept.Therefore,inthispaperweconsidergeneralizingtoner-grainedscales:ratherthanjustdeterminewhetherareviewis“thumbsup”ornot,weattempttoinfertheauthor'simpliednumericalrating,suchas“threestars”or“fourstars”.Notethatthisdiffersfromidentifyingopinionstrength(Wilson,Wiebe,andHwa,2004):rantsandraveshavethesamestrengthbutrepresentoppositeevaluations,andref-ereeformsoftenallowonetoindicatethatoneisverycondent(highstrength)thataconferencesub-missionismediocre(middlingrating).Also,ourtaskdiffersfromrankingnotonlybecauseonecanbegivenasingleitemtoclassify(asopposedtoasetofitemstobeorderedrelativetooneanother),butbecausetherearesettingsinwhichclassicationisharderthanranking,andviceversa.Onecanapplystandard-aryclassiersorregres-siontothisrating-inferenceproblem;independent workbyKoppelandSchler(2005)considerssuchmethods.Butanalternativeapproachthatexplic-itlyincorporatesinformationaboutitemsimilaritiestogetherwithlabelsimilarityinformation(forin-stance,“onestar”iscloserto“twostars”thanto“fourstars”)istothinkofthetaskasoneofmet-riclabeling(KleinbergandTardos,2002),wherelabelrelationsareencodedviaadistancemetric.Thisobservationyieldsameta-algorithm,applicabletobothsemi-supervised(viagraph-theoretictech-niques)andsupervisedsettings,thataltersagiven-aryclassier'soutputsothatsimilaritemstendtobeassignedsimilarlabels.Inwhatfollows,werstdemonstratethathu-manscandiscernrelativelysmalldifferencesin(hid-den)evaluationscores,indicatingthatratinginfer-enceisindeedameaningfultask.Wethenpresentthreetypesofalgorithms—one-vs-all,regression,andmetriclabeling—thatcanbedistinguishedbyhowexplicitlytheyattempttoleveragesimilaritybetweenitemsandbetweenlabels.Next,wecon-siderwhatitemsimilaritymeasuretoapply,propos-ingonebasedonthepositive-sentencepercentage.Incorporatingthisnewmeasurewithinthemetric-labelingframeworkisshowntooftenprovidesig-nicantimprovementsovertheotheralgorithms.Wehopethatsomeoftheinsightsderivedheremightapplytootherscalesfortextclassifcationthathavebeenconsidered,suchasclause-levelopin-ionstrength(Wilson,Wiebe,andHwa,2004);af-fecttypeslikedisgust(SubasicandHuettner,2001;Liu,Lieberman,andSelker,2003);readinglevel(Collins-ThompsonandCallan,2004);andurgencyorcriticality(Horvitz,Jacobs,andHovel,1999).2ProblemvalidationandformulationWerstranasmallpilotstudyonhumansubjectsinordertoestablisharoughideaofwhatareason-ableclassicationgranularityis:ifevenpeoplecan-notaccuratelyinferlabelswithrespecttoave-starschemewithhalfstars,say,thenwecannotexpectalearningalgorithmtodoso.Indeed,somepotentialobstaclestoaccurateratinginferenceincludelackofcalibration(e.g.,whatanunderstatedauthorin-tendsashighpraisemayseemlukewarm),authorinconsistencyatassigningne-grainedratings,andRatingdiff.PooledSubject1Subject2ormore100%100%(35)100%(15)2(e.g.,1star)83%77%(30)100%(11)1(e.g.,star)69%65%(57)90%(10)055%47%(15)80%(5)Table1:Humanaccuracyatdeterminingrelativepositivity.Ratingdifferencesaregivenin“notches”.Parenthesesenclosethenumberofpairsattempted.ratingsnotentirelysupportedbythetext1.Fordata,werstcollectedInternetmoviereviewsinEnglishfromfourauthors,removingexplicitrat-ingindicatorsfromeachdocument'stextautomati-cally.Now,whiletheobviousexperimentwouldbetoasksubjectstoguesstheratingthatareviewrep-resents,doingsowouldforceustospecifyaxedrating-scalegranularityinadvance.Instead,weex-aminedpeople'sabilitytodiscernrelativediffer-ences,becausebyvaryingtheratingdifferencesrep-resentedbythetestinstances,wecanevaluatemul-tiplegranularitiesinasingleexperiment.Speci-cally,atintervalsoveranumberofweeks,weau-thors(anon-nativeandanativespeakerofEnglish)examinedpairsofreviews,attempingtodeterminewhethertherstreviewineachpairwas(1)morepositivethan,(2)lesspositivethan,or(3)asposi-tiveasthesecond.Thetextsinanyparticularreviewpairweretakenfromthesameauthortofactorouttheeffectsofcross-authordivergence.AsTable1shows,bothsubjectsperformedper-fectlywhentheratingseparationwasatleast3“notches”intheoriginalscale(wedeneanotchasahalfstarinafour-orve-starschemeand10pointsina100-pointscheme).Interestingly,al-thoughhumanperformancedropsasratingdiffer-encedecreases,evenataone-notchseparation,bothsubjectshandilyoutperformedtherandom-choicebaselineof33%.However,therewaslargevariationinaccuracybetweensubjects.21Forexample,thecriticDennisSchwartzwritesthat“some-timesthereviewitself[indicates]thelettergradeshouldhavebeenhigherorlower,asthereviewmightfailtotakeintocon-siderationmyoverallimpressionofthelm—whichIhopetocaptureinthegrade”(http://www.sover.net/˜ozus/cinema.htm).2Onecontributingfactormaybethatthesubjectsvieweddisjointdocumentsets,sincewewantedtomaximizeexperi-mentalcoverageofthetypesofdocumentpairswithineachdif-ferenceclass.Wethuscannotreportinter-annotatoragreement, Becauseofthisvariation,wedenedtwodiffer-entclassicationregimes.Fromtheevidenceabove,athree-classtask(categories0,1,and2—es-sentially“negative”,“middling”,and“positive”,re-spectively)seemslikeonethatmostpeoplewoulddoquitewellat(butweshouldnotassume100%humanaccuracy:accordingtoourone-notchre-sults,peoplemaymisclassifyborderlinecaseslike2.5stars).Ourstudyalsosuggeststhatpeoplecoulddoatleastfairlywellatdistinguishingfullstarsinazero-tofour-starscheme.However,whenwebegantoconstructve-categorydatasetsforeachofourfourauthors(seebelow),wefoundthatineachcase,eitherthemostnegativeorthemostpos-itiveclass(butnotboth)containedonlyabout5%ofthedocuments.Tomaketheclassesmorebal-anced,wefoldedtheseminorityclassesintothead-jacentclass,thusarrivingatafour-classproblem(categories0-3,increasinginpositivity).Notethatthefour-classproblemseemstooffermorepossi-bilitiesforleveragingclassrelationshipinformationthanthethree-classsetting,sinceitinvolvesmoreclasspairs.Also,eventhetwo-categoryversionoftherating-inferenceproblemformoviereviewshasprovenquitechallengingformanyautomatedclas-sicationtechniques(Pang,Lee,andVaithyanathan,2002;Turney,2002).Weappliedtheabovetwolabelingschemestoascaledataset3containingfourcorporaofmoviereviews.Allreviewswereautomaticallypre-processedtoremovebothexplicitratingindicatorsandobjectivesentences;themotivationforthelatterstepisthatithaspreviouslyaidedpositivevs.neg-ativeclassication(PangandLee,2004).Allofthe1770,902,1307,or1027documentsinagivencor-puswerewrittenbythesameauthor.Thisdecisionfacilitatesinterpretationoftheresults,sinceitfac-torsouttheeffectsofdifferentchoicesofmethodsforcalibratingauthors'scales.4Wepointoutthatbutsinceourgoalistorecoverareviewer's“true”recommen-dation,reader-authoragreementismorerelevant.WhileanotherfactormightbedegreeofEnglishuency,inaninformalexperiment(sixsubjectsviewingthesamethreepairs),nativeEnglishspeakersmadetheonlytwoerrors.3Availableathttp://www.cs.cornell.edu/People/pabo/movie-review-dataasscaledatasetv1.0.4FromtheRottenTomatoeswebsite'sFAQ:“starsystemsarenotconsistentbetweencritics.ForcriticslikeRogerEbertandJamesBerardinelli,2.5starsorloweroutof4starsisal-waysnegative.Forothercritics,2.5starscaneitherbepositiveitispossibletogatherauthor-specicinformationinsomepracticalapplications:forinstance,systemsthatuseselectedauthors(e.g.,theRottenTomatoesmovie-reviewwebsite—where,wenote,notallauthorsprovideexplicitratings)couldrequirethatsomeonesubmitrating-labeledsamplesofnewly-admittedauthors'work.Moreover,ourresultsatleastpartiallygeneralizetomixed-authorsituations(seeSection5.2).3AlgorithmsRecallthattheproblemweareconsideringismulti-categoryclassicationinwhichthelabelscanbenaturallymappedtoametricspace(e.g.,pointsonaline);forsimplicity,weassumethedistancemetric\n \r   throughout.Inthissection,wepresentthreeapproachestothisprobleminorderofincreasinglyexplicituseofpairwisesimilarityinfor-mationbetweenitemsandbetweenlabels.Inordertomakecomparisonsbetweenthesemethodsmean-ingful,webaseallthreeofthemonSupportVec-torMachines(SVMs)asimplementedinJoachims'(1999) "!$#&%('*)package.3.1One-vs-allThestandardSVMformulationappliesonlytobi-naryclassication.One-vs-all(OVA)(RifkinandKlautau,2004)isacommonextensiontothe-arycase.Trainingconsistsofbuilding,foreachlabel ,anSVMbinaryclassierdistinguishinglabel from“not- ”.Weconsiderthenaloutputtobealabelpreferencefunction+-,./ 102\r 3,denedasthesigneddistanceof(test)item0tothe sideofthe vs.not- decisionplane.Clearly,OVAmakesnoexplicituseofpairwiselabeloritemrelationships.However,itcanperformwellifeachclassexhibitssufcientlydistinctlan-guage;seeSection4formorediscussion.3.2RegressionAlternatively,wecantakearegressionperspectivebyassumingthatthelabelscomefromadiscretiza-tionofacontinuousfunction4mappingfromtheornegative.EventhoughEricLuriousesa5starsystem,hisgradingisveryrelaxed.So,2starscanbepositive.”Thus,calibrationmaysometimesrequirestrongfamiliaritywiththeauthorsinvolved,asanyonewhohaseverneededtoreconcileconictingrefereereportsprobablyknows. featurespacetoametricspace.5Ifwechoose4fromafamilyofsufciently“gradual”functions,thensimilaritemsnecessarilyreceivesimilarlabels.Inparticular,weconsiderlinear,5-insensitiveSVMregression(Vapnik,1995;SmolaandSch¨olkopf,1998);theideaistondthehyperplanethatbesttsthetrainingdata,butwheretrainingpointswhosela-belsarewithindistance5ofthehyperplaneincurnoloss.Then,for(test)instance0,thelabelpreferencefunction+76 8:9 10;\r isthenegativeofthedistancebe-tween andthevaluepredictedfor0bythettedhyperplanefunction.Wilson,Wiebe,andHwa(2004)usedSVMre-gressiontoclassifyclause-levelstrengthofopinion,reportingthatitprovidedloweraccuracythanothermethods.However,independentlyofourwork,KoppelandSchler(2005)foundthatapplyinglin-earregressiontoclassifydocuments(inadifferentcorpusthanours)withrespecttoathree-pointrat-ingscaleprovidedgreateraccuracythanOVASVMsandotheralgorithms.3.3MetriclabelingRegressionimplicitlyencodesthe“similaritems,similarlabels”heuristic,inthatonecanrestrictconsiderationto“gradual”functions.Butwecanalsothinkofourtaskasametriclabelingprob-lem(KleinbergandTardos,2002),aspecialcaseofthemaximumaposterioriestimationproblemforMarkovrandomelds,toexplicitlyencodeourdesideratum.Supposewehaveaninitiallabelpref-erencefunction+ 102\r 3,perhapscomputedviaoneofthetwomethodsdescribedabove.Also,letbeadistancemetriconlabels,andlet�7= 10?de-notethe@nearestneighborsofitem0accordingtosomeitem-similarityfunctionAB C.Then,itisquitenaturaltoposeourproblemasndingamap-pingofinstances0tolabels (respectingtheorig-inallabelsofthetraininginstances)thatminimizesEDFtestGH+ 102\r D-IJEKFMLNLPO*QRD3SNT UV D\r KAB C 102\rWXYZ[\rwhereTismonotonicallyincreasing(wechoseT U\]^unlessotherwisespecied)andJisatrade-offand/orscalingparameter.(Theinnersum-mationisfamiliarfromworkinlocally-weighted5WediscusstheordinalregressionvariantinSection6.learning6(Atkeson,Moore,andSchaal,1997).)Inasense,weareusingexplicititemandlabelsimilarityinformationtoincreasinglypenalizetheinitialclas-sierasitassignsmoredivergentlabelstosimilaritems.Inthispaper,weonlyreportsupervised-learningexperimentsinwhichthenearestneighborsforanygiventestitemweredrawnfromthetrainingsetalone.Insuchasetting,thelabelingdecisionsfordifferenttestitemsareindependent,sothatsolvingtherequisiteoptimizationproblemissimple.Aside:transductionTheaboveformulationalsoallowsfortransductivesemi-supervisedlearningaswell,inthatwecouldallownearestneighborstocomefromboththetrainingandtestsets.Weintendtoaddressthiscaseinfuturework,sincethereareimportantsettingsinwhichonehasasmallnumberoflabeledreviewsandalargenum-berofunlabeledreviews,inwhichcaseconsider-ingsimilaritiesbetweenunlabeledtextscouldprovequitehelpful.Infullgenerality,thecorrespond-ingmulti-labeloptimizationproblemisintractable,butformanyfamiliesofTfunctions(e.g.,con-vex)thereexistpracticalexactorapproximationalgorithmsbasedontechniquesforndingmini-mums-tcutsingraphs(IshikawaandGeiger,1998;Boykov,Veksler,andZabih,1999;Ishikawa,2003).Interestingly,previoussentimentanalysisresearchfoundthataminimum-cutformulationforthebinarysubjective/objectivedistinctionyieldedgoodresults(PangandLee,2004).Ofcourse,therearemanyotherrelatedsemi-supervisedlearningalgorithmsthatwewouldliketotryaswell;seeZhu(2005)forasurvey.4Classstruggle:ndingalabel-correlateditem-similarityfunctionWeneedtospecifyanitemsimilarityfunctionA_B1Ctousethemetric-labelingformulationdescribedinSection3.3.Wecould,asiscommonlydone,em-ployaterm-overlap-basedmeasuresuchastheco-sinebetweenterm-frequency-baseddocumentvec-tors(henceforth“TO(cos)”).However,Table26Ifweignorethe`badc\e1fgterm,differentchoicesofhcor-respondtodifferentversionsofnearest-neighborlearning,e.g.,majority-vote,weightedaverageoflabels,orweightedmedianoflabels. Labeldifference:123Three-classdata37%33%—Four-classdata34%31%30%Table2:Averageoverauthorsandclasspairsofbetween-classvocabularyoverlapastheclasslabelsofthepairgrowfartherapart.showsthatinaggregate,thevocabulariesofdistantclassesoverlaptoadegreesurprisinglysimilartothatofthevocabulariesofnearbyclasses.Thus,itemsimilarityasmeasuredbyTO(cos)maynotcor-relatewellwithsimilarityoftheitem'struelabels.Wecanpotentiallydevelopamoreusefulsimilar-itymetricbyaskingourselveswhat,intuitively,ac-countsforthelabelrelationshipsthatweseektoex-ploit.Asimplehypothesisisthatratingscanbede-terminedbythepositive-sentencepercentage(PSP)ofatext,i.e.,thenumberofpositivesentencesdi-videdbythenumberofsubjectivesentences.(Term-basedversionsofthispremisehavemotivatedmuchsentiment-analysisworkforoveradecade(DasandChen,2001;Tong,2001;Turney,2002).)Butcoun-terexamplesareeasytoconstruct:reviewscancon-tainoff-topicopinions,orrecountmanypositiveas-pectsbeforedescribingafatalaw.Wethereforetestedthehypothesisasfollows.Toavoidtheneedtohand-labelsentencesasposi-tiveornegative,werstcreatedasentencepolaritydataset7consistingof10,662movie-review“snip-pets”(astrikingextractusuallyonesentencelong)downloadedfromwww.rottentomatoes.com;eachsnippetwaslabeledwithitssourcereview'slabel(positiveornegative)asprovidedbyRottenToma-toes.Then,wetrainedaNaiveBayesclassieronthisdatasetandappliedittoourscaledatasettoidentifythepositivesentences(recallthatobjectivesentenceswerealreadyremoved).Figure1showsthatallfourauthorstendtoex-hibitahigherPSPwhentheywriteamorepos-itivereview,andweexpectthatmosttypicalre-viewerswouldfollowsuit.Hence,PSPappearstobeapromisingbasisforcomputingdocumentsim-ilarityforourrating-inferencetask.Inparticular,7Availableathttp://www.cs.cornell.edu/People/pabo/movie-review-dataassentencepolaritydatasetv1.0.wedened XiX&jkk 107tobethetwo-dimensionalvec-tor kk 107_\rkk 10?,andthensettheitem-similarityfunctionrequiredbythemetric-labelingoptimizationfunction(Section3.3)toAB C 102\rWXnoprqts1iXiRjkk 10?\rXXiujkk 1Wiwvyx8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80246810mean and standard deviation of PSPrating (in notches)Positive-sentence percentage (PSP) statisticsAuthor aAuthor bAuthor cAuthor dFigure1:AverageandstandarddeviationofPSPforreviewsexpressingdifferentratings.Butbeforeproceeding,wenotethatitispossi-blethatsimilarityinformationmightyieldnoextrabenetatall.Forinstance,wedon'tneeditifwecanreliablyidentifyeachclassjustfromsomesetofdistinguishingterms.Ifwedenesuchtermsasfrequentones(z{|)thatappearinasin-gleclass50%ormoreofthetime,thenwedondmanyinstances;someexamplesforoneauthorare:“meaningless”,“disgusting”(class0);“pleasant”,“uneven”(class1);and“oscar”,“gem”(class2)forthethree-classcase,and,inthefour-classcase,“at”,“tedious”(class1)versus“straightforward”,“likeable”(class2).Someunexpecteddistinguish-ingtermsforthisauthorare“lion”forclass2(three-classcase),andforclass2inthefour-classcase,“jennifer”,forawidevarietyofJennifers.5EvaluationThissectioncomparestheaccuraciesoftheap-proachesoutlinedinSection3onthefourcorporacomprisingourscaledataset.(Resultsusing}er-rorwerequalitativelysimilar.)Throughout,when8Whileadmittedlyweinitiallychosethisfunctionbecauseitwasconvenienttoworkwithcosines,posthocanalysisre-vealedthatthecorrespondingmetricspace“stretched”certaindistancesinausefulway. werefertosomethingas“signicant”,wemeansta-tisticallysowithrespecttothepaired~-test,€x|r‚.Theresultsthatfollowarebasedon\ƒ!$#&%('*)'sdefaultparametersettingsforSVMregressionandOVA.Preliminaryanalysisoftheeffectofvaryingtheregressionparameter5inthefour-classcasere-vealedthatthedefaultvaluewasoftenoptimal.Thenotation“AIB”denotesmetriclabelingwheremethodAprovidestheinitiallabelpreferencefunction+andBservesassimilaritymeasure.Totrain,werstselectthemeta-parameters@andJbyrunning9-foldcross-validationwithinthetrain-ingset.Fixing@andJtothosevaluesyieldingthebestperformance,wethenre-trainA(butwithSVMparametersxed,asdescribedabove)onthewholetrainingset.Attesttime,thenearestneighborsofeachitemarealsotakenfromthefulltrainingset.5.1MaincomparisonFigure2summarizesouraverage10-foldcross-validationaccuracyresults.WerstobservefromtheplotsthatallthealgorithmsdescribedinSection3alwaysdenitivelyoutperformthesimplebaselineofpredictingthemajorityclass,althoughtheim-provementsaresmallerinthefour-classcase.In-cidentally,thedatawasdistributedinsuchawaythattheabsoluteperformanceofthebaselineit-selfdoesnotchangemuchbetweenthethree-andfour-classcase(whichimpliesthatthethree-classdatasetswererelativelymorebalanced);andAuthorc'sdatasetsseemnoticeablyeasierthantheothers.Wenowexaminetheeffectofimplicitlyusingla-belanditemsimilarity.Inthefour-classcase,re-gressionperformedbetterthanOVA(signicantlysofortwoauthors,asshownintherighthandta-ble);butforthethree-categorytask,OVAsigni-cantlyoutperformsregressionforallfourauthors.Onemightinitiallyinterpretethis“ip”asshowingthatinthefour-classscenario,itemandlabelsimi-laritiesprovidearichersourceofinformationrela-tivetoclass-speciccharacteristics,especiallysinceforthenon-majorityclassesthereislessdataavail-able;whereasinthethree-classsettingthecategoriesarebettermodeledasquitedistinctentities.However,thethree-classresultsformetriclabel-ingontopofOVAandregression(showninFigure2byblackversionsofthecorrespondingicons)showthatemployingexplicitsimilaritiesalwaysimprovesresults,oftentoasignicantdegree,andyieldsthebestoverallaccuracies.Thus,wecaninfacteffec-tivelyexploitsimilaritiesinthethree-classcase.Ad-ditionally,inboththethree-andfour-classscenar-ios,metriclabelingoftenbringstheperformanceoftheweakerbasemethoduptothatofthestrongerone(asindicatedbythe“disappearance”ofupwardtrianglesincorrespondingtablerows),andneverhurtsperformancesignicantly.Inthefour-classcase,metriclabelingandregres-sionseemroughlyequivalent.Onepossibleinter-pretationisthattherelevantstructureoftheproblemisalreadycapturedbylinearregression(andper-hapsadifferentkernelforregressionwouldhaveimproveditsthree-classperformance).However,accordingtoadditionalexperimentsweraninthefour-classsituation,thetest-set-optimalparametersettingsformetriclabelingwouldhaveproducedsignicantimprovements,indicatingtheremaybegreaterpotentialforourframework.Atanyrate,weviewthefactthatmetriclabelingperformedquitewellforbothratingscalesasadenitelypositivere-sult.FurtherdiscussionQ:Metriclabelinglookslikeit'sjustcombiningSVMswithnearestneighbors,andclassiercombi-nationoftenimprovesperformance.Couldn'twegetthesamekindofresultsbycombiningSVMswithanyotherreasonablemethod?A:No.Forexample,ifwetakethestrongestbaseSVMmethodforinitiallabelpreferences,butreplacePSPwiththeterm-overlap-basedcosine(TO(cos)),performanceoftendropssignicantly.Thisresult,whichisinaccordancewithSection4'sdata,suggeststhatchoosinganitemsimilarityfunctionthatcorrelateswellwithlabelsimilarityisimportant.(ovaIPSP„P„P„P„ovaITO(cos)[3c];regIPSP„regITO(cos)[4c])Q:Couldyouexplainthatnotation,please?A:Trianglespointtowardthesignicantlybet-teralgorithmforsomedataset.Forinstance,“M„P„P…N[3c]”means,“Inthe3-classtask,methodMissignicantlybetterthanNfortwoauthordatasetsandsignicantlyworseforonedataset(sothealgorithmswerestatisticallyindistinguishableontheremainingdataset)”.Whenthealgorithmsbe-ingcomparedarestatisticallyindistinguishableon Averageaccuracies,three-classdataAverageaccuracies,four-classdata 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8Author aAuthor bAuthor cAuthor dmajorityovaova+PSPregreg+PSP 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8Author aAuthor bAuthor cAuthor dmajorityovaova+PSPregreg+PSPAverageten-foldcross-validationaccuracies.Openicons:SVMsineitherone-versus-all(square)orre-gression(circle)mode;darkversions:metriclabelingusingthecorrespondingSVMtogetherwiththepositive-sentencepercentage(PSP).TheW-axesofthetwoplotsarealigned.Signicantdifferences,three-classdataSignicantdifferences,four-classdataovaova+PSPregreg+PSPabcdabcdabcdabcdova†V†?†.„?„?„V„.„..ova+PSP‡?‡?‡.„?„?„V„„?„?„.reg†?†?†V††V†?†?†.†.†reg+PSP.†..†V†?†..‡.‡ovaova+PSPregreg+PSPabcdabcdabcdabcdova.†?†?††?†..†..†ova+PSP.‡?‡?‡†...†...reg„?„..„.......reg+PSP„..„„.......Trianglespointtowardssignicantlybetteralgorithmsfortheresultsplottedabove.Specically,ifthedifferencebetweenarowandacolumnalgorithmforagivenauthordataset(a,b,c,ord)issignicant,atrianglepointstothebetterone;otherwise,adot(.)isshown.DarkiconshighlighttheeffectofaddingPSPinformationviametriclabeling.Figure2:Resultsformainexperimentalcomparisons.allfourdatasets(the“notriangles”case),weindi-catethiswithanequalssign(“=”).Q:Thanks.Doesn'tFigure1showthatthepositive-sentencepercentagewouldbeagoodclassiereveninisolation,sometriclabelingisn'tnecessary?No.PredictingclasslabelsdirectlyfromthePSPvalueviatrainedthresholdsisn'taseffective(ovaIPSP„P„P„P„thresholdPSP[3c];regIPSP„P„thresholdPSP[4c]).Alternatively,wecoulduseonlythePSPcom-ponentofmetriclabelingbysettingthela-belpreferencefunctiontotheconstantfunction0,butevenwithtest-set-optimalparameterset-tings,doingsounderperformsthetrainedmet-riclabelingalgorithmwithaccesstoanini-tialSVMclassier(ovaIPSP„P„P„P„0Ikk‰ˆ[3c];regIPSP„P„0IkkŠˆ[4c]).Q:WhataboutusingPSPasoneofthefeaturesforinputtoastandardclassier?A:Ourfocusisoninvestigatingtheutilityofsimi-larityinformation.Inourparticularrating-inferencesetting,itsohappensthatthebasisforourpair-wisesimilaritymeasurecanbeincorporatedasan item-specicfeature,butweviewthisasatan-gentialissue.Thatbeingsaid,preliminaryexperi-mentsshowthatmetriclabelingcanbebetter,barely(fortest-set-optimalparametersettingsforbothal-gorithms:signicantlybetterresultsforoneauthor,four-classcase;statisticallyindistinguishableother-wise),althoughoneneedstodetermineanappropri-ateweightforthePSPfeaturetogetgoodperfor-mance.Youdenedthe“metrictransformation”func-tionTastheidentityfunctionT U‹Œ,imposinggreaterlossasthedistancebetweenlabelsassignedtotwosimilaritemsincreases.Canyoudojustaswellifyoupenalizeallnon-equallabelassignmentsbythesameamount,ordoesthedistancebetweenlabelsreallymatter?A:You'reaskingforacomparisontothePottsmodel,whichsetsTtothefunctionT UlifŽ|,|otherwise.Intheoneset-tinginwhichthereisasignicantdifferencebetweenthetwo,thePottsmodeldoesworse(ovaIPSP„ovaIPSP[3c]).Also,employingthePottsmodelgenerallyleadstofewersignicantimprovementsoverachosenbasemethod(com-pareFigure2'stableswith:regIPSP„reg[3c];ovaIPSP„P„ova[3c];ovaIPSPova[4c];butnotethatregIPSP„reg[4c]).Wenotethatopti-mizingthePottsmodelinthemulti-labelcaseisNP-hard,whereastheoptimalmetriclabelingwiththeidentitymetric-transformationfunctioncanbeef-cientlyobtained(seeSection3.3).Q:Yourdatasetshadmanylabeledreviewsandonlyoneauthoreach.Isyourworkrelevanttosettingswithmanyauthorsbutverylittledataforeach?A:AsdiscussedinSection2,itcanbequitedif-culttoproperlycalibratedifferentauthors'scales,sincethesamenumberof“stars”evenwithinwhatisostensiblythesameratingsystemcanmeandiffer-entthingsfordifferentauthors.Butsinceyouask:wetemporarilyturnedablindeyetothisseriousis-sue,creatingacollectionof5394reviewsby496au-thorswithatmost80reviewsperauthor,wherewepretendedthatourratingconversionsmappedcor-rectlyintoauniversalratingscheme.Preliminaryresultsonthisdatasetwereactuallycomparabletotheresultsreportedabove,althoughsincewearenotcondentintheclasslabelsthemselves,moreworkisneededtoderiveaclearanalysisofthisset-ting.(Abusingnotation,sincewe'realreadyplay-ingfastandloose:[3c]:baseline52.4%,reg61.4%,regIPSP61.5%,ova(65.4%)…ovaIPSP(66.3%);[4c]:baseline38.8%,reg(51.9%)…regIPSP(52.7%),ova(53.8%)…ovaIPSP(54.6%))Infuturework,itwouldbeinterestingtodeter-mineauthor-independentcharacteristicsthatcanbeusedon(orsuitablyadaptedto)dataforspecicau-thors.Howabouttrying—A:—Yes,therearemanyalternatives.AfewthatwetestedaredescribedintheAppendix,andweproposesomeothersinthenextsection.Weshouldmentionthatwehavenotyetexperimentedwithall-vs.-all(AVA),anotherstandardbinary-to-multi-categoryclassierconversionmethod,be-causewewishedtofocusontheeffectofomit-tingpairwiseinformation.Inindependentworkon3-categoryratinginferenceforadifferentcorpus,KoppelandSchler(2005)foundthatregressionout-performedAVA,andRifkinandKlautau(2004)ar-guethatinprincipleOVAshoulddojustaswellasAVA.Butweplantotryitout.6RelatedworkandfuturedirectionsInthispaper,weaddressedtherating-inferenceproblem,showingtheutilityofemployinglabelsim-ilarityand(appropriatechoiceof)itemsimilarity—eitherimplicitly,throughregression,orexplicitlyandoftenmoreeffectively,throughmetriclabeling.Inthefuture,wewouldliketoapplyourmethodstootherscale-basedclassicationproblems,andex-plorealternativemethods.Clearly,varyingtheker-nelinSVMregressionmightyieldbetterresults.Anotherchoiceisordinalregression(McCullagh,1980;Herbrich,Graepel,andObermayer,2000),whichonlyconsiderstheorderingonlabels,ratherthananyexplicitdistancesbetweenthem;thisap-proachcouldworkwellifagoodmetriconlabelsislacking.Also,onecouldusemixturemodels(e.g.,combine“positive”and“negative”languagemod-els)tocaptureclassrelationships(McCallum,1999;SchapireandSinger,2000;Takamura,Matsumoto,andYamada,2004).Wearealsointerestedinframingmulti-classbutnon-scale-basedcategorizationproblemsasmetric labelingtasks.Forexample,positivevs.negativevs.neutralsentimentdistinctionsaresometimesconsid-eredinwhichneutralmeanseitherobjective(En-gstr¨om,2004)oraconationofobjectivewitharat-ingofmediocre(DasandChen,2001).(KoppelandSchler(2005)inindependentworkalsodiscussvar-ioustypesofneutrality.)Ineithercase,wecouldapplyametricinwhichpositiveandnegativeareclosertoobjective(orobjective+mediocre)thantoeachother.Asanotherexample,hierarchicallabelrelationshipscanbeeasilyencodedinalabelmet-ric.Finally,asmentionedinSection3.3,wewouldliketoaddressthetransductivesetting,inwhichonehasasmallamountoflabeleddataandusesrela-tionshipsbetweenunlabeleditems,sinceitispar-ticularlywell-suitedtothemetric-labelingapproachandmaybequiteimportantinpractice.AcknowledgmentsWethankPaulBennett,DaveBlei,ClaireCardie,ShimonEdelman,ThorstenJoachims,JonKlein-berg,OrenKurland,JohnLafferty,GuyLebanon,PradeepRavikumar,JerryZhu,andtheanonymousreviewersformanyveryusefulcommentsanddiscussion.WelearnedofMosheKoppelandJonathanSchler'sworkwhilepreparingthecamera-readyversionofthispaper;wethankthemforsoquicklyan-sweringourrequestforapre-print.Ourdescriptionsoftheirworkarebasedonthatpre-print;weapologizeinadvanceforanyinaccuraciesinourdescriptionsthatresultfromchangesbetweentheirpre-printandtheirnalversion.WealsothankCMUforitshospitalityduringtheyear.ThispaperisbaseduponworksupportedinpartbytheNationalScienceFounda-tion(NSF)undergrantno.IIS-0329064andCCR-0122581;SRIInternationalundersubcontractno.03-000211ontheirprojectfundedbytheDepartmentoftheInterior'sNationalBusinessCenter;andbyanAlfredP.SloanResearchFellow-ship.Anyopinions,ndings,andconclusionsorrecommen-dationsexpressedarethoseoftheauthorsanddonotneces-sarilyreecttheviewsorofcialpolicies,eitherexpressedorimplied,ofanysponsoringinstitutions,theU.S.government,oranyotherentity.ReferencesAtkeson,ChristopherG.,AndrewW.Moore,andStefanSchaal.1997.Locallyweightedlearning.ArticialIntelligenceRe-view,11(1):11–73.Boykov,Yuri,OlgaVeksler,andRaminZabih.1999.Fastap-proximateenergyminimizationviagraphcuts.InProceed-ingsoftheInternationalConferenceonComputerVision(ICCV),pages377–384.JournalversioninIEEETransac-tionsonPatternAnalysisandMachineIntelligence(PAMI)23(11):1222–1239,2001.Collins-Thompson,KevynandJamieCallan.2004.Alanguagemodelingapproachtopredictingreadingdifculty.InHLT-NAACL:ProceedingsoftheMainConference,pages193–200.Das,SanjivandMikeChen.2001.Yahoo!forAmazon:Ex-tractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacicFinanceAssociationAnnualConference(APFA).Dave,Kushal,SteveLawrence,andDavidM.Pennock.2003.Miningthepeanutgallery:Opinionextractionandsemanticclassicationofproductreviews.InProceedingsofWWW,pages519–528.Engstr¨om,Charlotta.2004.Topicdependenceinsentimentclassication.Master'sthesis,UniversityofCambridge.Herbrich,Ralf,ThoreGraepel,andKlausObermayer.2000.Largemarginrankboundariesforordinalregression.InAlexanderJ.Smola,PeterL.Bartlett,BernhardSch¨olkopf,andDaleSchuurmans,editors,AdvancesinLargeMarginClassiers,NeuralInformationProcessingSystems.MITPress,pages115–132.Horvitz,Eric,AndyJacobs,andDavidHovel.1999.Attention-sensitivealerting.InProceedingsoftheConferenceonUn-certaintyandArticialIntelligence,pages305–313.Ishikawa,Hiroshi.2003.ExactoptimizationforMarkovran-domeldswithconvexpriors.IEEETransactionsonPatternAnalysisandMachineIntelligence,25(10).Ishikawa,HiroshiandDaviGeiger.1998.Occlusions,discon-tinuities,andepipolarlinesinstereo.InProceedingsofthe5thEuropeanConferenceonComputerVision(ECCV),vol-umeI,pages232–248,London,UK.Springer-Verlag.Joachims,Thorsten.1999.Makinglarge-scaleSVMlearningpractical.InBernhardSch¨olkopfandAlexanderSmola,edi-tors,AdvancesinKernelMethods-SupportVectorLearning.MITPress,pages44–56.Kleinberg,Jonand´EvaTardos.2002.Approximational-gorithmsforclassicationproblemswithpairwiserelation-ships:MetriclabelingandMarkovrandomelds.JournaloftheACM,49(5):616–639.Koppel,MosheandJonathanSchler.2005.Theimportanceofneutralexamplesforlearningsentiment.InWorkshopontheAnalysisofInformalandFormalInformationExchangeduringNegotiations(FINEXIN).Liu,Hugo,HenryLieberman,andTedSelker.2003.Amodeloftextualaffectsensingusingreal-worldknowledge.InPro-ceedingsofIntelligentUserInterfaces(IUI),pages125–132.McCallum,Andrew.1999.Multi-labeltextclassicationwithamixturemodeltrainedbyEM.InAAAIWorkshoponTextLearning.McCullagh,Peter.1980.Regressionmodelsforordinaldata.JournaloftheRoyalStatisticalSociety,42(2):109–42. Pang,BoandLillianLee.2004.Asentimentaleducation:Sen-timentanalysisusingsubjectivitysummarizationbasedonminimumcuts.InProceedingsoftheACL,pages271–278.Pang,Bo,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?Sentimentclassicationusingmachinelearningtechniques.InProceedingsofEMNLP,pages79–86.Qu,Yan,JamesShanahan,andJanyceWiebe,editors.2004.ProceedingsoftheAAAISpringSymposiumonExplor-ingAttitudeandAffectinText:TheoriesandApplications.AAAIPress.AAAItechnicalreportSS-04-07.Rifkin,RyanM.andAldebaroKlautau.2004.Indefenseofone-vs-allclassication.JournalofMachineLearningRe-search,5:101–141.Schapire,RobertE.andYoramSinger.2000.BoosTexter:Aboosting-basedsystemfortextcategorization.MachineLearning,39(2/3):135–168.Smola,AlexJ.andBernhardSch¨olkopf.1998.Atuto-rialonsupportvectorregression.TechnicalReportNeuro-COLTNC-TR-98-030,RoyalHollowayCollege,UniversityofLondon.Subasic,PeroandAlisonHuettner.2001.Affectanalysisoftextusingfuzzysemantictyping.IEEETransactionsonFuzzySystems,9(4):483–496.Takamura,Hiroya,YujiMatsumoto,andHiroyasuYamada.2004.Modelingcategorystructureswithakernelfunction.InProceedingsofCoNLL,pages57–64.Tong,RichardM.2001.Anoperationalsystemfordetectingandtrackingopinionsinon-linediscussion.SIGIRWork-shoponOperationalTextClassication.Turney,Peter.2002.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassicationofreviews.InProceedingsoftheACL,pages417–424.Vapnik,Vladimir.1995.TheNatureofStatisticalLearningTheory.Springer.Wilson,Theresa,JanyceWiebe,andRebeccaHwa.2004.Justhowmadareyou?Findingstrongandweakopinionclauses.InProceedingsofAAAI,pages761–769.Yu,HongandVasileiosHatzivassiloglou.2003.Towardsan-sweringopinionquestions:Separatingfactsfromopinionsandidentifyingthepolarityofopinionsentences.InPro-ceedingsofEMNLP.Zhu,Xiaojin(Jerry).2005.Semi-SupervisedLearningwithGraphs.Ph.D.thesis,CarnegieMellonUniversity.AAppendix:othervariationsattemptedA.1DiscretizingbinaryclassicationInoursetting,wecanalsoincorporateclassrelationsbydirectlyalteringtheoutputofabinaryclassier,asfollows.WersttrainastandardSVM,treatingratingsgreaterthan0.5aspositivelabelsandothersasnegativelabels.Ifwethenconsidertheresultingclassiertooutputapositivity-preferencefunction+\n 107,wecanthenlearnaseriesofthresholdstoconvertthisvalueintothedesiredlabelset,undertheassumptionthatthebigger+ 10?is,themorepositivethereview.9Thisalgorithmalwaysoutper-formsthemajority-classbaseline,butnottothede-greethatthebestofSVMOVAandSVMregres-siondoes.KoppelandSchler(2005)independentlyfoundinathree-classstudythatthresholdingapos-itive/negativeclassiertrainedonlyonclearlyposi-tiveorclearlynegativeexamplesdidnotyieldlargeimprovements.A.2DiscretizingregressionInourexperimentswithSVMregression,wedis-cretizedregressionoutputviaasetofxeddecisionthresholds3|x‚\r‚\r{x‚\r*xdxdx&‘tomapitintooursetofclasslabels.Alternatively,wecanlearnthethresh-oldsinstead.Neitheroptionclearlyoutperformstheotherinthefour-classcase.Inthethree-classset-ting,thelearnedversionprovidesnoticeablybetterperformanceintwoofthefourdatasets.Buttheseresultstakentogetherstillmeanthatinmanycases,thedifferenceisnegligible,andifwehadstarteddownthispath,wewouldhaveneededtoconsidersimilartweaksforone-vs-allSVMaswell.Wethereforestuckwiththesimplerversioninordertomaintainfocusonthecentralissuesathand.9Thisisnotnecessarilytrue:iftheclassier'sgoalistoopti-mizebinaryclassicationerror,itsmajorconcernistoincreasecondenceinthepositive/negativedistinction,whichmaynotcorrespondtohighercondenceinseparating“vestars”from“fourstars”.