g one to stars This task represents an interesting twist on stan dard multiclass te xt cate gorization be cause there are se eral dif ferent de grees of similarity between class labels for x ample three stars is intuiti ely closer to four stars than ID: 26944
Download Pdf The PPT/PDF document "Seeing stars Exploiting class elationshi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Seeingstars:ExploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscalesBoPang andLillianLee (1)DepartmentofComputerScience,CornellUniversity(2)LanguageTechnologiesInstitute,CarnegieMellonUniversity(3)ComputerScienceDepartment,CarnegieMellonUniversityAbstractWeaddresstherating-inferenceproblem,whereinratherthansimplydecidewhetherareviewisthumbsuporthumbsdown,asinprevioussentimentanaly-siswork,onemustdetermineanauthor'sevaluationwithrespecttoamulti-pointscale(e.g.,onetovestars).Thistaskrepresentsaninterestingtwistonstan-dardmulti-classtextcategorizationbe-causethereareseveraldifferentdegreesofsimilaritybetweenclasslabels;forex-ample,threestarsisintuitivelyclosertofourstarsthantoonestar.Werstevaluatehumanperformanceatthetask.Then,weapplyameta-algorithm,basedonametriclabelingfor-mulationoftheproblem,thataltersagiven-aryclassier'soutputinanex-plicitattempttoensurethatsimilaritemsreceivesimilarlabels.Weshowthatthemeta-algorithmcanprovidesigni-cantimprovementsoverbothmulti-classandregressionversionsofSVMswhenweemployanovelsimilaritymeasureappro-priatetotheproblem.Publicationinfo:ProceedingsoftheACL,2005.1IntroductionTherehasrecentlybeenadramaticsurgeofinter-estinsentimentanalysis,asmoreandmorepeoplebecomeawareofthescienticchallengesposedandthescopeofnewapplicationsenabledbythepro-cessingofsubjectivelanguage.(Thepaperscol-lectedbyQu,Shanahan,andWiebe(2004)formarepresentativesampleofresearchinthearea.)Mostpriorworkonthespecicproblemofcategorizingexpresslyopinionatedtexthasfocusedonthebi-narydistinctionofpositivevs.negative(Turney,2002;Pang,Lee,andVaithyanathan,2002;Dave,Lawrence,andPennock,2003;YuandHatzivas-siloglou,2003).Butitisoftenhelpfultohavemoreinformationthanthisbinarydistinctionprovides,es-peciallyifoneisrankingitemsbyrecommendationorcomparingseveralreviewers'opinions:exampleapplicationsincludecollaborativelteringandde-cidingwhichconferencesubmissionstoaccept.Therefore,inthispaperweconsidergeneralizingtoner-grainedscales:ratherthanjustdeterminewhetherareviewisthumbsupornot,weattempttoinfertheauthor'simpliednumericalrating,suchasthreestarsorfourstars.Notethatthisdiffersfromidentifyingopinionstrength(Wilson,Wiebe,andHwa,2004):rantsandraveshavethesamestrengthbutrepresentoppositeevaluations,andref-ereeformsoftenallowonetoindicatethatoneisverycondent(highstrength)thataconferencesub-missionismediocre(middlingrating).Also,ourtaskdiffersfromrankingnotonlybecauseonecanbegivenasingleitemtoclassify(asopposedtoasetofitemstobeorderedrelativetooneanother),butbecausetherearesettingsinwhichclassicationisharderthanranking,andviceversa.Onecanapplystandard-aryclassiersorregres-siontothisrating-inferenceproblem;independent workbyKoppelandSchler(2005)considerssuchmethods.Butanalternativeapproachthatexplic-itlyincorporatesinformationaboutitemsimilaritiestogetherwithlabelsimilarityinformation(forin-stance,onestarisclosertotwostarsthantofourstars)istothinkofthetaskasoneofmet-riclabeling(KleinbergandTardos,2002),wherelabelrelationsareencodedviaadistancemetric.Thisobservationyieldsameta-algorithm,applicabletobothsemi-supervised(viagraph-theoretictech-niques)andsupervisedsettings,thataltersagiven-aryclassier'soutputsothatsimilaritemstendtobeassignedsimilarlabels.Inwhatfollows,werstdemonstratethathu-manscandiscernrelativelysmalldifferencesin(hid-den)evaluationscores,indicatingthatratinginfer-enceisindeedameaningfultask.Wethenpresentthreetypesofalgorithmsone-vs-all,regression,andmetriclabelingthatcanbedistinguishedbyhowexplicitlytheyattempttoleveragesimilaritybetweenitemsandbetweenlabels.Next,wecon-siderwhatitemsimilaritymeasuretoapply,propos-ingonebasedonthepositive-sentencepercentage.Incorporatingthisnewmeasurewithinthemetric-labelingframeworkisshowntooftenprovidesig-nicantimprovementsovertheotheralgorithms.Wehopethatsomeoftheinsightsderivedheremightapplytootherscalesfortextclassifcationthathavebeenconsidered,suchasclause-levelopin-ionstrength(Wilson,Wiebe,andHwa,2004);af-fecttypeslikedisgust(SubasicandHuettner,2001;Liu,Lieberman,andSelker,2003);readinglevel(Collins-ThompsonandCallan,2004);andurgencyorcriticality(Horvitz,Jacobs,andHovel,1999).2ProblemvalidationandformulationWerstranasmallpilotstudyonhumansubjectsinordertoestablisharoughideaofwhatareason-ableclassicationgranularityis:ifevenpeoplecan-notaccuratelyinferlabelswithrespecttoave-starschemewithhalfstars,say,thenwecannotexpectalearningalgorithmtodoso.Indeed,somepotentialobstaclestoaccurateratinginferenceincludelackofcalibration(e.g.,whatanunderstatedauthorin-tendsashighpraisemayseemlukewarm),authorinconsistencyatassigningne-grainedratings,andRatingdiff.PooledSubject1Subject2ormore100%100%(35)100%(15)2(e.g.,1star)83%77%(30)100%(11)1(e.g., star)69%65%(57)90%(10)055%47%(15)80%(5)Table1:Humanaccuracyatdeterminingrelativepositivity.Ratingdifferencesaregiveninnotches.Parenthesesenclosethenumberofpairsattempted.ratingsnotentirelysupportedbythetext1.Fordata,werstcollectedInternetmoviereviewsinEnglishfromfourauthors,removingexplicitrat-ingindicatorsfromeachdocument'stextautomati-cally.Now,whiletheobviousexperimentwouldbetoasksubjectstoguesstheratingthatareviewrep-resents,doingsowouldforceustospecifyaxedrating-scalegranularityinadvance.Instead,weex-aminedpeople'sabilitytodiscernrelativediffer-ences,becausebyvaryingtheratingdifferencesrep-resentedbythetestinstances,wecanevaluatemul-tiplegranularitiesinasingleexperiment.Speci-cally,atintervalsoveranumberofweeks,weau-thors(anon-nativeandanativespeakerofEnglish)examinedpairsofreviews,attempingtodeterminewhethertherstreviewineachpairwas(1)morepositivethan,(2)lesspositivethan,or(3)asposi-tiveasthesecond.Thetextsinanyparticularreviewpairweretakenfromthesameauthortofactorouttheeffectsofcross-authordivergence.AsTable1shows,bothsubjectsperformedper-fectlywhentheratingseparationwasatleast3notchesintheoriginalscale(wedeneanotchasahalfstarinafour-orve-starschemeand10pointsina100-pointscheme).Interestingly,al-thoughhumanperformancedropsasratingdiffer-encedecreases,evenataone-notchseparation,bothsubjectshandilyoutperformedtherandom-choicebaselineof33%.However,therewaslargevariationinaccuracybetweensubjects.21Forexample,thecriticDennisSchwartzwritesthatsome-timesthereviewitself[indicates]thelettergradeshouldhavebeenhigherorlower,asthereviewmightfailtotakeintocon-siderationmyoverallimpressionofthelmwhichIhopetocaptureinthegrade(http://www.sover.net/ozus/cinema.htm).2Onecontributingfactormaybethatthesubjectsvieweddisjointdocumentsets,sincewewantedtomaximizeexperi-mentalcoverageofthetypesofdocumentpairswithineachdif-ferenceclass.Wethuscannotreportinter-annotatoragreement, Becauseofthisvariation,wedenedtwodiffer-entclassicationregimes.Fromtheevidenceabove,athree-classtask(categories0,1,and2es-sentiallynegative,middling,andpositive,re-spectively)seemslikeonethatmostpeoplewoulddoquitewellat(butweshouldnotassume100%humanaccuracy:accordingtoourone-notchre-sults,peoplemaymisclassifyborderlinecaseslike2.5stars).Ourstudyalsosuggeststhatpeoplecoulddoatleastfairlywellatdistinguishingfullstarsinazero-tofour-starscheme.However,whenwebegantoconstructve-categorydatasetsforeachofourfourauthors(seebelow),wefoundthatineachcase,eitherthemostnegativeorthemostpos-itiveclass(butnotboth)containedonlyabout5%ofthedocuments.Tomaketheclassesmorebal-anced,wefoldedtheseminorityclassesintothead-jacentclass,thusarrivingatafour-classproblem(categories0-3,increasinginpositivity).Notethatthefour-classproblemseemstooffermorepossi-bilitiesforleveragingclassrelationshipinformationthanthethree-classsetting,sinceitinvolvesmoreclasspairs.Also,eventhetwo-categoryversionoftherating-inferenceproblemformoviereviewshasprovenquitechallengingformanyautomatedclas-sicationtechniques(Pang,Lee,andVaithyanathan,2002;Turney,2002).Weappliedtheabovetwolabelingschemestoascaledataset3containingfourcorporaofmoviereviews.Allreviewswereautomaticallypre-processedtoremovebothexplicitratingindicatorsandobjectivesentences;themotivationforthelatterstepisthatithaspreviouslyaidedpositivevs.neg-ativeclassication(PangandLee,2004).Allofthe1770,902,1307,or1027documentsinagivencor-puswerewrittenbythesameauthor.Thisdecisionfacilitatesinterpretationoftheresults,sinceitfac-torsouttheeffectsofdifferentchoicesofmethodsforcalibratingauthors'scales.4Wepointoutthatbutsinceourgoalistorecoverareviewer'struerecommen-dation,reader-authoragreementismorerelevant.WhileanotherfactormightbedegreeofEnglishuency,inaninformalexperiment(sixsubjectsviewingthesamethreepairs),nativeEnglishspeakersmadetheonlytwoerrors.3Availableathttp://www.cs.cornell.edu/People/pabo/movie-review-dataasscaledatasetv1.0.4FromtheRottenTomatoeswebsite'sFAQ:starsystemsarenotconsistentbetweencritics.ForcriticslikeRogerEbertandJamesBerardinelli,2.5starsorloweroutof4starsisal-waysnegative.Forothercritics,2.5starscaneitherbepositiveitispossibletogatherauthor-specicinformationinsomepracticalapplications:forinstance,systemsthatuseselectedauthors(e.g.,theRottenTomatoesmovie-reviewwebsitewhere,wenote,notallauthorsprovideexplicitratings)couldrequirethatsomeonesubmitrating-labeledsamplesofnewly-admittedauthors'work.Moreover,ourresultsatleastpartiallygeneralizetomixed-authorsituations(seeSection5.2).3AlgorithmsRecallthattheproblemweareconsideringismulti-categoryclassicationinwhichthelabelscanbenaturallymappedtoametricspace(e.g.,pointsonaline);forsimplicity,weassumethedistancemetric\n \rthroughout.Inthissection,wepresentthreeapproachestothisprobleminorderofincreasinglyexplicituseofpairwisesimilarityinfor-mationbetweenitemsandbetweenlabels.Inordertomakecomparisonsbetweenthesemethodsmean-ingful,webaseallthreeofthemonSupportVec-torMachines(SVMs)asimplementedinJoachims'(1999) "!$#&%('*)package.3.1One-vs-allThestandardSVMformulationappliesonlytobi-naryclassication.One-vs-all(OVA)(RifkinandKlautau,2004)isacommonextensiontothe-arycase.Trainingconsistsofbuilding,foreachlabel,anSVMbinaryclassierdistinguishinglabelfromnot-.Weconsiderthenaloutputtobealabelpreferencefunction+-,./ 102\r3,denedasthesigneddistanceof(test)item0tothesideofthevs.not-decisionplane.Clearly,OVAmakesnoexplicituseofpairwiselabeloritemrelationships.However,itcanperformwellifeachclassexhibitssufcientlydistinctlan-guage;seeSection4formorediscussion.3.2RegressionAlternatively,wecantakearegressionperspectivebyassumingthatthelabelscomefromadiscretiza-tionofacontinuousfunction4mappingfromtheornegative.EventhoughEricLuriousesa5starsystem,hisgradingisveryrelaxed.So,2starscanbepositive.Thus,calibrationmaysometimesrequirestrongfamiliaritywiththeauthorsinvolved,asanyonewhohaseverneededtoreconcileconictingrefereereportsprobablyknows. featurespacetoametricspace.5Ifwechoose4fromafamilyofsufcientlygradualfunctions,thensimilaritemsnecessarilyreceivesimilarlabels.Inparticular,weconsiderlinear,5-insensitiveSVMregression(Vapnik,1995;SmolaandSch¨olkopf,1998);theideaistondthehyperplanethatbesttsthetrainingdata,butwheretrainingpointswhosela-belsarewithindistance5ofthehyperplaneincurnoloss.Then,for(test)instance0,thelabelpreferencefunction+768:9 10;\risthenegativeofthedistancebe-tweenandthevaluepredictedfor0bythettedhyperplanefunction.Wilson,Wiebe,andHwa(2004)usedSVMre-gressiontoclassifyclause-levelstrengthofopinion,reportingthatitprovidedloweraccuracythanothermethods.However,independentlyofourwork,KoppelandSchler(2005)foundthatapplyinglin-earregressiontoclassifydocuments(inadifferentcorpusthanours)withrespecttoathree-pointrat-ingscaleprovidedgreateraccuracythanOVASVMsandotheralgorithms.3.3MetriclabelingRegressionimplicitlyencodesthesimilaritems,similarlabelsheuristic,inthatonecanrestrictconsiderationtogradualfunctions.Butwecanalsothinkofourtaskasametriclabelingprob-lem(KleinbergandTardos,2002),aspecialcaseofthemaximumaposterioriestimationproblemforMarkovrandomelds,toexplicitlyencodeourdesideratum.Supposewehaveaninitiallabelpref-erencefunction+ 102\r3,perhapscomputedviaoneofthetwomethodsdescribedabove.Also,letbeadistancemetriconlabels,andlet000;7= 10?de-notethe@nearestneighborsofitem0accordingtosomeitem-similarityfunctionABC.Then,itisquitenaturaltoposeourproblemasndingamap-pingofinstances0tolabels(respectingtheorig-inallabelsofthetraininginstances)thatminimizesEDFtestGH+ 102\rD-IJEKFMLNLPO*QRD3SNT UV D\rKABC 102\rWXYZ[\rwhereTismonotonicallyincreasing(wechoseT U\]^unlessotherwisespecied)andJisatrade-offand/orscalingparameter.(Theinnersum-mationisfamiliarfromworkinlocally-weighted5WediscusstheordinalregressionvariantinSection6.learning6(Atkeson,Moore,andSchaal,1997).)Inasense,weareusingexplicititemandlabelsimilarityinformationtoincreasinglypenalizetheinitialclas-sierasitassignsmoredivergentlabelstosimilaritems.Inthispaper,weonlyreportsupervised-learningexperimentsinwhichthenearestneighborsforanygiventestitemweredrawnfromthetrainingsetalone.Insuchasetting,thelabelingdecisionsfordifferenttestitemsareindependent,sothatsolvingtherequisiteoptimizationproblemissimple.Aside:transductionTheaboveformulationalsoallowsfortransductivesemi-supervisedlearningaswell,inthatwecouldallownearestneighborstocomefromboththetrainingandtestsets.Weintendtoaddressthiscaseinfuturework,sincethereareimportantsettingsinwhichonehasasmallnumberoflabeledreviewsandalargenum-berofunlabeledreviews,inwhichcaseconsider-ingsimilaritiesbetweenunlabeledtextscouldprovequitehelpful.Infullgenerality,thecorrespond-ingmulti-labeloptimizationproblemisintractable,butformanyfamiliesofTfunctions(e.g.,con-vex)thereexistpracticalexactorapproximationalgorithmsbasedontechniquesforndingmini-mums-tcutsingraphs(IshikawaandGeiger,1998;Boykov,Veksler,andZabih,1999;Ishikawa,2003).Interestingly,previoussentimentanalysisresearchfoundthataminimum-cutformulationforthebinarysubjective/objectivedistinctionyieldedgoodresults(PangandLee,2004).Ofcourse,therearemanyotherrelatedsemi-supervisedlearningalgorithmsthatwewouldliketotryaswell;seeZhu(2005)forasurvey.4Classstruggle:ndingalabel-correlateditem-similarityfunctionWeneedtospecifyanitemsimilarityfunctionA_B1Ctousethemetric-labelingformulationdescribedinSection3.3.Wecould,asiscommonlydone,em-ployaterm-overlap-basedmeasuresuchastheco-sinebetweenterm-frequency-baseddocumentvec-tors(henceforthTO(cos)).However,Table26Ifweignorethe`badc\e1fgterm,differentchoicesofhcor-respondtodifferentversionsofnearest-neighborlearning,e.g.,majority-vote,weightedaverageoflabels,orweightedmedianoflabels. Labeldifference:123Three-classdata37%33%Four-classdata34%31%30%Table2:Averageoverauthorsandclasspairsofbetween-classvocabularyoverlapastheclasslabelsofthepairgrowfartherapart.showsthatinaggregate,thevocabulariesofdistantclassesoverlaptoadegreesurprisinglysimilartothatofthevocabulariesofnearbyclasses.Thus,itemsimilarityasmeasuredbyTO(cos)maynotcor-relatewellwithsimilarityoftheitem'struelabels.Wecanpotentiallydevelopamoreusefulsimilar-itymetricbyaskingourselveswhat,intuitively,ac-countsforthelabelrelationshipsthatweseektoex-ploit.Asimplehypothesisisthatratingscanbede-terminedbythepositive-sentencepercentage(PSP)ofatext,i.e.,thenumberofpositivesentencesdi-videdbythenumberofsubjectivesentences.(Term-basedversionsofthispremisehavemotivatedmuchsentiment-analysisworkforoveradecade(DasandChen,2001;Tong,2001;Turney,2002).)Butcoun-terexamplesareeasytoconstruct:reviewscancon-tainoff-topicopinions,orrecountmanypositiveas-pectsbeforedescribingafatalaw.Wethereforetestedthehypothesisasfollows.Toavoidtheneedtohand-labelsentencesasposi-tiveornegative,werstcreatedasentencepolaritydataset7consistingof10,662movie-reviewsnip-pets(astrikingextractusuallyonesentencelong)downloadedfromwww.rottentomatoes.com;eachsnippetwaslabeledwithitssourcereview'slabel(positiveornegative)asprovidedbyRottenToma-toes.Then,wetrainedaNaiveBayesclassieronthisdatasetandappliedittoourscaledatasettoidentifythepositivesentences(recallthatobjectivesentenceswerealreadyremoved).Figure1showsthatallfourauthorstendtoex-hibitahigherPSPwhentheywriteamorepos-itivereview,andweexpectthatmosttypicalre-viewerswouldfollowsuit.Hence,PSPappearstobeapromisingbasisforcomputingdocumentsim-ilarityforourrating-inferencetask.Inparticular,7Availableathttp://www.cs.cornell.edu/People/pabo/movie-review-dataassentencepolaritydatasetv1.0.wedenedXiX&jkk 107tobethetwo-dimensionalvec-tor kk 107_\rkk 10?,andthensettheitem-similarityfunctionrequiredbythemetric-labelingoptimizationfunction(Section3.3)toABC 102\rWXnoprqts1iXiRjkk 10?\rXXiujkk 1Wiwvyx8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80246810mean and standard deviation of PSPrating (in notches)Positive-sentence percentage (PSP) statisticsAuthor aAuthor bAuthor cAuthor dFigure1:AverageandstandarddeviationofPSPforreviewsexpressingdifferentratings.Butbeforeproceeding,wenotethatitispossi-blethatsimilarityinformationmightyieldnoextrabenetatall.Forinstance,wedon'tneeditifwecanreliablyidentifyeachclassjustfromsomesetofdistinguishingterms.Ifwedenesuchtermsasfrequentones(z{|)thatappearinasin-gleclass50%ormoreofthetime,thenwedondmanyinstances;someexamplesforoneauthorare:meaningless,disgusting(class0);pleasant,uneven(class1);andoscar,gem(class2)forthethree-classcase,and,inthefour-classcase,at,tedious(class1)versusstraightforward,likeable(class2).Someunexpecteddistinguish-ingtermsforthisauthorarelionforclass2(three-classcase),andforclass2inthefour-classcase,jennifer,forawidevarietyofJennifers.5EvaluationThissectioncomparestheaccuraciesoftheap-proachesoutlinedinSection3onthefourcorporacomprisingourscaledataset.(Resultsusing} er-rorwerequalitativelysimilar.)Throughout,when8Whileadmittedlyweinitiallychosethisfunctionbecauseitwasconvenienttoworkwithcosines,posthocanalysisre-vealedthatthecorrespondingmetricspacestretchedcertaindistancesinausefulway. werefertosomethingassignicant,wemeansta-tisticallysowithrespecttothepaired~-test,x|r.Theresultsthatfollowarebasedon\!$#&%('*)'sdefaultparametersettingsforSVMregressionandOVA.Preliminaryanalysisoftheeffectofvaryingtheregressionparameter5inthefour-classcasere-vealedthatthedefaultvaluewasoftenoptimal.ThenotationAIBdenotesmetriclabelingwheremethodAprovidestheinitiallabelpreferencefunction+andBservesassimilaritymeasure.Totrain,werstselectthemeta-parameters@andJbyrunning9-foldcross-validationwithinthetrain-ingset.Fixing@andJtothosevaluesyieldingthebestperformance,wethenre-trainA(butwithSVMparametersxed,asdescribedabove)onthewholetrainingset.Attesttime,thenearestneighborsofeachitemarealsotakenfromthefulltrainingset.5.1MaincomparisonFigure2summarizesouraverage10-foldcross-validationaccuracyresults.WerstobservefromtheplotsthatallthealgorithmsdescribedinSection3alwaysdenitivelyoutperformthesimplebaselineofpredictingthemajorityclass,althoughtheim-provementsaresmallerinthefour-classcase.In-cidentally,thedatawasdistributedinsuchawaythattheabsoluteperformanceofthebaselineit-selfdoesnotchangemuchbetweenthethree-andfour-classcase(whichimpliesthatthethree-classdatasetswererelativelymorebalanced);andAuthorc'sdatasetsseemnoticeablyeasierthantheothers.Wenowexaminetheeffectofimplicitlyusingla-belanditemsimilarity.Inthefour-classcase,re-gressionperformedbetterthanOVA(signicantlysofortwoauthors,asshownintherighthandta-ble);butforthethree-categorytask,OVAsigni-cantlyoutperformsregressionforallfourauthors.Onemightinitiallyinterpretethisipasshowingthatinthefour-classscenario,itemandlabelsimi-laritiesprovidearichersourceofinformationrela-tivetoclass-speciccharacteristics,especiallysinceforthenon-majorityclassesthereislessdataavail-able;whereasinthethree-classsettingthecategoriesarebettermodeledasquitedistinctentities.However,thethree-classresultsformetriclabel-ingontopofOVAandregression(showninFigure2byblackversionsofthecorrespondingicons)showthatemployingexplicitsimilaritiesalwaysimprovesresults,oftentoasignicantdegree,andyieldsthebestoverallaccuracies.Thus,wecaninfacteffec-tivelyexploitsimilaritiesinthethree-classcase.Ad-ditionally,inboththethree-andfour-classscenar-ios,metriclabelingoftenbringstheperformanceoftheweakerbasemethoduptothatofthestrongerone(asindicatedbythedisappearanceofupwardtrianglesincorrespondingtablerows),andneverhurtsperformancesignicantly.Inthefour-classcase,metriclabelingandregres-sionseemroughlyequivalent.Onepossibleinter-pretationisthattherelevantstructureoftheproblemisalreadycapturedbylinearregression(andper-hapsadifferentkernelforregressionwouldhaveimproveditsthree-classperformance).However,accordingtoadditionalexperimentsweraninthefour-classsituation,thetest-set-optimalparametersettingsformetriclabelingwouldhaveproducedsignicantimprovements,indicatingtheremaybegreaterpotentialforourframework.Atanyrate,weviewthefactthatmetriclabelingperformedquitewellforbothratingscalesasadenitelypositivere-sult.FurtherdiscussionQ:Metriclabelinglookslikeit'sjustcombiningSVMswithnearestneighbors,andclassiercombi-nationoftenimprovesperformance.Couldn'twegetthesamekindofresultsbycombiningSVMswithanyotherreasonablemethod?A:No.Forexample,ifwetakethestrongestbaseSVMmethodforinitiallabelpreferences,butreplacePSPwiththeterm-overlap-basedcosine(TO(cos)),performanceoftendropssignicantly.Thisresult,whichisinaccordancewithSection4'sdata,suggeststhatchoosinganitemsimilarityfunctionthatcorrelateswellwithlabelsimilarityisimportant.(ovaIPSPPPPovaITO(cos)[3c];regIPSPregITO(cos)[4c])Q:Couldyouexplainthatnotation,please?A:Trianglespointtowardthesignicantlybet-teralgorithmforsomedataset.Forinstance,MPP N[3c]means,Inthe3-classtask,methodMissignicantlybetterthanNfortwoauthordatasetsandsignicantlyworseforonedataset(sothealgorithmswerestatisticallyindistinguishableontheremainingdataset).Whenthealgorithmsbe-ingcomparedarestatisticallyindistinguishableon Averageaccuracies,three-classdataAverageaccuracies,four-classdata 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8Author aAuthor bAuthor cAuthor dmajorityovaova+PSPregreg+PSP 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8Author aAuthor bAuthor cAuthor dmajorityovaova+PSPregreg+PSPAverageten-foldcross-validationaccuracies.Openicons:SVMsineitherone-versus-all(square)orre-gression(circle)mode;darkversions:metriclabelingusingthecorrespondingSVMtogetherwiththepositive-sentencepercentage(PSP).TheW-axesofthetwoplotsarealigned.Signicantdifferences,three-classdataSignicantdifferences,four-classdataovaova+PSPregreg+PSPabcdabcdabcdabcdovaV?.??V...ova+PSP??.??V??.reg??VV??..reg+PSP...V?...ovaova+PSPregreg+PSPabcdabcdabcdabcdova.???....ova+PSP.??......reg?.........reg+PSP.........Trianglespointtowardssignicantlybetteralgorithmsfortheresultsplottedabove.Specically,ifthedifferencebetweenarowandacolumnalgorithmforagivenauthordataset(a,b,c,ord)issignicant,atrianglepointstothebetterone;otherwise,adot(.)isshown.DarkiconshighlighttheeffectofaddingPSPinformationviametriclabeling.Figure2:Resultsformainexperimentalcomparisons.allfourdatasets(thenotrianglescase),weindi-catethiswithanequalssign(=).Q:Thanks.Doesn'tFigure1showthatthepositive-sentencepercentagewouldbeagoodclassiereveninisolation,sometriclabelingisn'tnecessary?No.PredictingclasslabelsdirectlyfromthePSPvalueviatrainedthresholdsisn'taseffective(ovaIPSPPPPthresholdPSP[3c];regIPSPPthresholdPSP[4c]).Alternatively,wecoulduseonlythePSPcom-ponentofmetriclabelingbysettingthela-belpreferencefunctiontotheconstantfunction0,butevenwithtest-set-optimalparameterset-tings,doingsounderperformsthetrainedmet-riclabelingalgorithmwithaccesstoanini-tialSVMclassier(ovaIPSPPPP0Ikk[3c];regIPSPP0Ikk[4c]).Q:WhataboutusingPSPasoneofthefeaturesforinputtoastandardclassier?A:Ourfocusisoninvestigatingtheutilityofsimi-larityinformation.Inourparticularrating-inferencesetting,itsohappensthatthebasisforourpair-wisesimilaritymeasurecanbeincorporatedasan item-specicfeature,butweviewthisasatan-gentialissue.Thatbeingsaid,preliminaryexperi-mentsshowthatmetriclabelingcanbebetter,barely(fortest-set-optimalparametersettingsforbothal-gorithms:signicantlybetterresultsforoneauthor,four-classcase;statisticallyindistinguishableother-wise),althoughoneneedstodetermineanappropri-ateweightforthePSPfeaturetogetgoodperfor-mance.Youdenedthemetrictransformationfunc-tionTastheidentityfunctionT U,imposinggreaterlossasthedistancebetweenlabelsassignedtotwosimilaritemsincreases.Canyoudojustaswellifyoupenalizeallnon-equallabelassignmentsbythesameamount,ordoesthedistancebetweenlabelsreallymatter?A:You'reaskingforacomparisontothePottsmodel,whichsetsTtothefunctionT Ulif|,|otherwise.Intheoneset-tinginwhichthereisasignicantdifferencebetweenthetwo,thePottsmodeldoesworse(ovaIPSPovaIPSP[3c]).Also,employingthePottsmodelgenerallyleadstofewersignicantimprovementsoverachosenbasemethod(com-pareFigure2'stableswith:regIPSPreg[3c];ovaIPSPPova[3c];ovaIPSPova[4c];butnotethatregIPSPreg[4c]).Wenotethatopti-mizingthePottsmodelinthemulti-labelcaseisNP-hard,whereastheoptimalmetriclabelingwiththeidentitymetric-transformationfunctioncanbeef-cientlyobtained(seeSection3.3).Q:Yourdatasetshadmanylabeledreviewsandonlyoneauthoreach.Isyourworkrelevanttosettingswithmanyauthorsbutverylittledataforeach?A:AsdiscussedinSection2,itcanbequitedif-culttoproperlycalibratedifferentauthors'scales,sincethesamenumberofstarsevenwithinwhatisostensiblythesameratingsystemcanmeandiffer-entthingsfordifferentauthors.Butsinceyouask:wetemporarilyturnedablindeyetothisseriousis-sue,creatingacollectionof5394reviewsby496au-thorswithatmost80reviewsperauthor,wherewepretendedthatourratingconversionsmappedcor-rectlyintoauniversalratingscheme.Preliminaryresultsonthisdatasetwereactuallycomparabletotheresultsreportedabove,althoughsincewearenotcondentintheclasslabelsthemselves,moreworkisneededtoderiveaclearanalysisofthisset-ting.(Abusingnotation,sincewe'realreadyplay-ingfastandloose:[3c]:baseline52.4%,reg61.4%,regIPSP61.5%,ova(65.4%) ovaIPSP(66.3%);[4c]:baseline38.8%,reg(51.9%) regIPSP(52.7%),ova(53.8%) ovaIPSP(54.6%))Infuturework,itwouldbeinterestingtodeter-mineauthor-independentcharacteristicsthatcanbeusedon(orsuitablyadaptedto)dataforspecicau-thors.HowabouttryingA:Yes,therearemanyalternatives.AfewthatwetestedaredescribedintheAppendix,andweproposesomeothersinthenextsection.Weshouldmentionthatwehavenotyetexperimentedwithall-vs.-all(AVA),anotherstandardbinary-to-multi-categoryclassierconversionmethod,be-causewewishedtofocusontheeffectofomit-tingpairwiseinformation.Inindependentworkon3-categoryratinginferenceforadifferentcorpus,KoppelandSchler(2005)foundthatregressionout-performedAVA,andRifkinandKlautau(2004)ar-guethatinprincipleOVAshoulddojustaswellasAVA.Butweplantotryitout.6RelatedworkandfuturedirectionsInthispaper,weaddressedtherating-inferenceproblem,showingtheutilityofemployinglabelsim-ilarityand(appropriatechoiceof)itemsimilarityeitherimplicitly,throughregression,orexplicitlyandoftenmoreeffectively,throughmetriclabeling.Inthefuture,wewouldliketoapplyourmethodstootherscale-basedclassicationproblems,andex-plorealternativemethods.Clearly,varyingtheker-nelinSVMregressionmightyieldbetterresults.Anotherchoiceisordinalregression(McCullagh,1980;Herbrich,Graepel,andObermayer,2000),whichonlyconsiderstheorderingonlabels,ratherthananyexplicitdistancesbetweenthem;thisap-proachcouldworkwellifagoodmetriconlabelsislacking.Also,onecouldusemixturemodels(e.g.,combinepositiveandnegativelanguagemod-els)tocaptureclassrelationships(McCallum,1999;SchapireandSinger,2000;Takamura,Matsumoto,andYamada,2004).Wearealsointerestedinframingmulti-classbutnon-scale-basedcategorizationproblemsasmetric labelingtasks.Forexample,positivevs.negativevs.neutralsentimentdistinctionsaresometimesconsid-eredinwhichneutralmeanseitherobjective(En-gstr¨om,2004)oraconationofobjectivewitharat-ingofmediocre(DasandChen,2001).(KoppelandSchler(2005)inindependentworkalsodiscussvar-ioustypesofneutrality.)Ineithercase,wecouldapplyametricinwhichpositiveandnegativeareclosertoobjective(orobjective+mediocre)thantoeachother.Asanotherexample,hierarchicallabelrelationshipscanbeeasilyencodedinalabelmet-ric.Finally,asmentionedinSection3.3,wewouldliketoaddressthetransductivesetting,inwhichonehasasmallamountoflabeleddataandusesrela-tionshipsbetweenunlabeleditems,sinceitispar-ticularlywell-suitedtothemetric-labelingapproachandmaybequiteimportantinpractice.AcknowledgmentsWethankPaulBennett,DaveBlei,ClaireCardie,ShimonEdelman,ThorstenJoachims,JonKlein-berg,OrenKurland,JohnLafferty,GuyLebanon,PradeepRavikumar,JerryZhu,andtheanonymousreviewersformanyveryusefulcommentsanddiscussion.WelearnedofMosheKoppelandJonathanSchler'sworkwhilepreparingthecamera-readyversionofthispaper;wethankthemforsoquicklyan-sweringourrequestforapre-print.Ourdescriptionsoftheirworkarebasedonthatpre-print;weapologizeinadvanceforanyinaccuraciesinourdescriptionsthatresultfromchangesbetweentheirpre-printandtheirnalversion.WealsothankCMUforitshospitalityduringtheyear.ThispaperisbaseduponworksupportedinpartbytheNationalScienceFounda-tion(NSF)undergrantno.IIS-0329064andCCR-0122581;SRIInternationalundersubcontractno.03-000211ontheirprojectfundedbytheDepartmentoftheInterior'sNationalBusinessCenter;andbyanAlfredP.SloanResearchFellow-ship.Anyopinions,ndings,andconclusionsorrecommen-dationsexpressedarethoseoftheauthorsanddonotneces-sarilyreecttheviewsorofcialpolicies,eitherexpressedorimplied,ofanysponsoringinstitutions,theU.S.government,oranyotherentity.ReferencesAtkeson,ChristopherG.,AndrewW.Moore,andStefanSchaal.1997.Locallyweightedlearning.ArticialIntelligenceRe-view,11(1):1173.Boykov,Yuri,OlgaVeksler,andRaminZabih.1999.Fastap-proximateenergyminimizationviagraphcuts.InProceed-ingsoftheInternationalConferenceonComputerVision(ICCV),pages377384.JournalversioninIEEETransac-tionsonPatternAnalysisandMachineIntelligence(PAMI)23(11):12221239,2001.Collins-Thompson,KevynandJamieCallan.2004.Alanguagemodelingapproachtopredictingreadingdifculty.InHLT-NAACL:ProceedingsoftheMainConference,pages193200.Das,SanjivandMikeChen.2001.Yahoo!forAmazon:Ex-tractingmarketsentimentfromstockmessageboards.InProceedingsoftheAsiaPacicFinanceAssociationAnnualConference(APFA).Dave,Kushal,SteveLawrence,andDavidM.Pennock.2003.Miningthepeanutgallery:Opinionextractionandsemanticclassicationofproductreviews.InProceedingsofWWW,pages519528.Engstr¨om,Charlotta.2004.Topicdependenceinsentimentclassication.Master'sthesis,UniversityofCambridge.Herbrich,Ralf,ThoreGraepel,andKlausObermayer.2000.Largemarginrankboundariesforordinalregression.InAlexanderJ.Smola,PeterL.Bartlett,BernhardSch¨olkopf,andDaleSchuurmans,editors,AdvancesinLargeMarginClassiers,NeuralInformationProcessingSystems.MITPress,pages115132.Horvitz,Eric,AndyJacobs,andDavidHovel.1999.Attention-sensitivealerting.InProceedingsoftheConferenceonUn-certaintyandArticialIntelligence,pages305313.Ishikawa,Hiroshi.2003.ExactoptimizationforMarkovran-domeldswithconvexpriors.IEEETransactionsonPatternAnalysisandMachineIntelligence,25(10).Ishikawa,HiroshiandDaviGeiger.1998.Occlusions,discon-tinuities,andepipolarlinesinstereo.InProceedingsofthe5thEuropeanConferenceonComputerVision(ECCV),vol-umeI,pages232248,London,UK.Springer-Verlag.Joachims,Thorsten.1999.Makinglarge-scaleSVMlearningpractical.InBernhardSch¨olkopfandAlexanderSmola,edi-tors,AdvancesinKernelMethods-SupportVectorLearning.MITPress,pages4456.Kleinberg,Jonand´EvaTardos.2002.Approximational-gorithmsforclassicationproblemswithpairwiserelation-ships:MetriclabelingandMarkovrandomelds.JournaloftheACM,49(5):616639.Koppel,MosheandJonathanSchler.2005.Theimportanceofneutralexamplesforlearningsentiment.InWorkshopontheAnalysisofInformalandFormalInformationExchangeduringNegotiations(FINEXIN).Liu,Hugo,HenryLieberman,andTedSelker.2003.Amodeloftextualaffectsensingusingreal-worldknowledge.InPro-ceedingsofIntelligentUserInterfaces(IUI),pages125132.McCallum,Andrew.1999.Multi-labeltextclassicationwithamixturemodeltrainedbyEM.InAAAIWorkshoponTextLearning.McCullagh,Peter.1980.Regressionmodelsforordinaldata.JournaloftheRoyalStatisticalSociety,42(2):10942. Pang,BoandLillianLee.2004.Asentimentaleducation:Sen-timentanalysisusingsubjectivitysummarizationbasedonminimumcuts.InProceedingsoftheACL,pages271278.Pang,Bo,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?Sentimentclassicationusingmachinelearningtechniques.InProceedingsofEMNLP,pages7986.Qu,Yan,JamesShanahan,andJanyceWiebe,editors.2004.ProceedingsoftheAAAISpringSymposiumonExplor-ingAttitudeandAffectinText:TheoriesandApplications.AAAIPress.AAAItechnicalreportSS-04-07.Rifkin,RyanM.andAldebaroKlautau.2004.Indefenseofone-vs-allclassication.JournalofMachineLearningRe-search,5:101141.Schapire,RobertE.andYoramSinger.2000.BoosTexter:Aboosting-basedsystemfortextcategorization.MachineLearning,39(2/3):135168.Smola,AlexJ.andBernhardSch¨olkopf.1998.Atuto-rialonsupportvectorregression.TechnicalReportNeuro-COLTNC-TR-98-030,RoyalHollowayCollege,UniversityofLondon.Subasic,PeroandAlisonHuettner.2001.Affectanalysisoftextusingfuzzysemantictyping.IEEETransactionsonFuzzySystems,9(4):483496.Takamura,Hiroya,YujiMatsumoto,andHiroyasuYamada.2004.Modelingcategorystructureswithakernelfunction.InProceedingsofCoNLL,pages5764.Tong,RichardM.2001.Anoperationalsystemfordetectingandtrackingopinionsinon-linediscussion.SIGIRWork-shoponOperationalTextClassication.Turney,Peter.2002.Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassicationofreviews.InProceedingsoftheACL,pages417424.Vapnik,Vladimir.1995.TheNatureofStatisticalLearningTheory.Springer.Wilson,Theresa,JanyceWiebe,andRebeccaHwa.2004.Justhowmadareyou?Findingstrongandweakopinionclauses.InProceedingsofAAAI,pages761769.Yu,HongandVasileiosHatzivassiloglou.2003.Towardsan-sweringopinionquestions:Separatingfactsfromopinionsandidentifyingthepolarityofopinionsentences.InPro-ceedingsofEMNLP.Zhu,Xiaojin(Jerry).2005.Semi-SupervisedLearningwithGraphs.Ph.D.thesis,CarnegieMellonUniversity.AAppendix:othervariationsattemptedA.1DiscretizingbinaryclassicationInoursetting,wecanalsoincorporateclassrelationsbydirectlyalteringtheoutputofabinaryclassier,asfollows.WersttrainastandardSVM,treatingratingsgreaterthan0.5aspositivelabelsandothersasnegativelabels.Ifwethenconsidertheresultingclassiertooutputapositivity-preferencefunction+\n 107,wecanthenlearnaseriesofthresholdstoconvertthisvalueintothedesiredlabelset,undertheassumptionthatthebigger+ 10?is,themorepositivethereview.9Thisalgorithmalwaysoutper-formsthemajority-classbaseline,butnottothede-greethatthebestofSVMOVAandSVMregres-siondoes.KoppelandSchler(2005)independentlyfoundinathree-classstudythatthresholdingapos-itive/negativeclassiertrainedonlyonclearlyposi-tiveorclearlynegativeexamplesdidnotyieldlargeimprovements.A.2DiscretizingregressionInourexperimentswithSVMregression,wedis-cretizedregressionoutputviaasetofxeddecisionthresholds3|x\r\r{x\r*xdxdx&tomapitintooursetofclasslabels.Alternatively,wecanlearnthethresh-oldsinstead.Neitheroptionclearlyoutperformstheotherinthefour-classcase.Inthethree-classset-ting,thelearnedversionprovidesnoticeablybetterperformanceintwoofthefourdatasets.Buttheseresultstakentogetherstillmeanthatinmanycases,thedifferenceisnegligible,andifwehadstarteddownthispath,wewouldhaveneededtoconsidersimilartweaksforone-vs-allSVMaswell.Wethereforestuckwiththesimplerversioninordertomaintainfocusonthecentralissuesathand.9Thisisnotnecessarilytrue:iftheclassier'sgoalistoopti-mizebinaryclassicationerror,itsmajorconcernistoincreasecondenceinthepositive/negativedistinction,whichmaynotcorrespondtohighercondenceinseparatingvestarsfromfourstars.