/
FoundationsandTrendsInformationRetrievalVol.2,Nos.1 FoundationsandTrendsInformationRetrievalVol.2,Nos.1

FoundationsandTrendsInformationRetrievalVol.2,Nos.1 - PDF document

trish-goza
trish-goza . @trish-goza
Follow
385 views
Uploaded On 2017-03-01

FoundationsandTrendsInformationRetrievalVol.2,Nos.1 - PPT Presentation

OpinionMiningandSentimentAnalysisBoPangandLillianLeeYahooResearch701FirstAvenueSunnyvaleCA94089USAbopangyahooinccom onsummarizationofevaluativetextandonbroaderissuesregardingprivacymanipulat ID: 521015

OpinionMiningandSentimentAnalysisBoPangandLillianLeeYahoo!Research 701FirstAvenue Sunnyvale CA94089 USA bopang@yahoo-inc.com onsummarizationofevaluativetextandonbroaderissuesregardingprivacy manipulat

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "FoundationsandTrendsInformationRetrieval..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

FoundationsandTrendsInformationRetrievalVol.2,Nos.1…2(2008)1…1352008B.PangandL.LeeDOI:10.1561/1500000001 OpinionMiningandSentimentAnalysisBoPangandLillianLeeYahoo!Research,701FirstAvenue,Sunnyvale,CA94089,USA,bopang@yahoo-inc.com onsummarizationofevaluativetextandonbroaderissuesregardingprivacy,manipulation,andeconomicimpactthatthedevelopmentofopinion-orientedinformation-accessservicesgivesriseto.Tofacilitatefuturework,adiscussionofavailableresources,benchmarkdatasets,andevaluationcampaignsisalsoprovided. 1 Introduction Romanceshouldneverbeginwithsentiment.Itshouldbeginwithscienceandendwithasettlement.„OscarWilde,AnIdealHusband1.1TheDemandforInformationonOpinionsandSentimentWhatotherpeoplethinkŽhasalwaysbeenanimportantpieceofinfor-mationformostofusduringthedecision-makingprocess.LongbeforeawarenessoftheWorldWideWebbecamewidespread,manyofusaskedourfriendstorecommendanautomechanicortoexplainwhotheywereplanningtovoteforinlocalelections,requestedreferencelettersregardingjobapplicantsfromcolleagues,orconsultedReportstodecidewhatdishwashertobuy.ButtheInternetandtheWebhavenow(amongotherthings)madeitpossibleto“ndoutabouttheopinionsandexperiencesofthoseinthevastpoolofpeoplethatarenei-therourpersonalacquaintancesnorwell-knownprofessionalcritics„thatis,peoplewehaveneverheardof.Andconversely,moreandmorepeoplearemakingtheiropinionsavailabletostrangersviatheInternet. IntroductionIndeed,accordingtotwosurveysofmorethan2000Americanadultseach[63,127],81%ofInternetusers(or60%ofAmericans)havedoneonlineresearchonaproductatleastonce;20%(15%ofallAmericans)dosoonatypicalday;amongreadersofonlinereviewsofrestaurants,hotels,andvariousservices(e.g.,travelagenciesordoctors),between73%and87%reportthatreviewshadasigni“cantin”uenceontheirpurchase;consumersreportbeingwillingtopayfrom20%to99%morefora5-star-rateditemthana4-star-rateditem(thevariancestemsfromwhattypeofitemorserviceisconsidered);32%haveprovidedaratingonaproduct,service,orper-sonviaanonlineratingssystem,and30%(including18%ofonlineseniorcitizens)havepostedanonlinecommentorreviewregardingaproductorservice.Wehastentopointoutthatconsumptionofgoodsandservicesisnottheonlymotivationbehindpeoplesseekingoutorexpressingopinionsonline.Aneedforpoliticalinformationisanotherimportantfactor.Forexample,inasurveyofover2500Americanadults,RainieandHorrigan[248]studiedthe31%ofAmericans„over60millionpeople„thatwere2006campaigninternetusers,de“nedasthosewhogatheredinformationaboutthe2006electionsonlineandexchangedviewsviaemail.Ofthese,28%saidthatamajorreasonfortheseonlineactivitieswastogetperspectivesfromwithintheircommunity,and34%saidthatamajorreasonwastogetperspectivesfromoutsidetheircommunity;27%hadlookedonlinefortheendorsementsorratingsofexternalorganizations; Section6.1discussesquantitativeanalysesofactualeconomicimpact,asopposedtocon-sumerperception.Interestingly,HitlinandRainie[123]reportthatIndividualswhohaveratedsomethingonlinearealsomoreskepticaloftheinformationthatisavailableontheWeb.Ž 1.1TheDemandforInformationonOpinionsandSentiment28%saidthatmostofthesitestheyusesharetheirpointofview,but29%saidthatmostofthesitestheyusechal-lengetheirpointofview,indicatingthatmanypeoplearenotsimplylookingforvalidationsoftheirpre-existingopinions;8%postedtheirownpoliticalcommentaryonline.Theuserhungerforandrelianceupononlineadviceandrecom-mendationsthatthedataaboverevealsismerelyonereasonbehindthesurgeofinterestinnewsystemsthatdealdirectlywithopinionsasa“rst-classobject.But,Horrigan[127]reportsthatwhileamajorityofAmericaninternetusersreportpositiveexperiencesduringonlineprod-uctresearch,atthesametime,58%alsoreportthatonlineinformationwasmissing,impossibleto“nd,confusing,and/oroverwhelming.Thus,thereisaclearneedtoaidconsumersofproductsandofinformationbybuildingbetterinformation-accesssystemsthanarecurrentlyinTheinterestthatindividualusersshowinonlineopinionsaboutproductsandservices,andthepotentialin”uencesuchopinionswield,issomethingthatvendorsoftheseitemsarepayingmoreandmoreattentionto[124].Thefollowingexcerptfromawhitepaperisillustra-tiveoftheenvisionedpossibilities,orattheleasttherhetoricsurround-ingthepossibilities:WiththeexplosionofWeb2.0platformssuchasblogs,discussionforums,peer-to-peernetworks,andvariousothertypesofsocialmedia...consumershaveattheirdisposalasoapboxofunprecedentedreachandpowerbywhichtosharetheirbrandexperiencesandopinions,positiveornegative,regardinganyproductorservice.Asmajorcompaniesareincreasinglycomingtorealize,theseconsumervoicescanwieldenormousin”uenceinshapingtheopinionsofotherconsumers„and,ulti-mately,theirbrandloyalties,theirpurchasedecisions,andtheirownbrandadvocacy....Companiescanrespondtotheconsumerinsightstheygeneratethroughsocialmediamonitoringandanalysisbymodifyingtheir Introductionmarketingmessages,brandpositioning,productdevel-opment,andotheractivitiesaccordingly.„ZabinandJeeries[327]Butindustryanalystsnotethattheleveragingofnewmediaforthepurposeoftrackingproductimagerequiresnewtechnologies;hereisarepresentativesnippetdescribingtheirconcerns:Marketershavealwaysneededtomonitormediaforinformationrelatedtotheirbrands„whetheritsforpublicrelationsactivities,fraudviolations,competitiveintelligence.Butfragmentingmediaandchangingconsumerbehaviorhavecrippledtraditionalmonitoringmethods.Technoratiestimatesthat75,000newblogsarecreateddaily,alongwith1.2millionnewpostseachday,manydiscussingconsumeropinionsonproductsandservices.Tactics[ofthetraditionalsort]suchasclippingservices,“eldagents,andadhocresearchsimplycantkeeppace.„Kim[154]Thus,asidefromindividuals,anadditionalaudienceforsystemscapa-bleofautomaticallyanalyzingconsumersentiment,asexpressedinnosmallpartinonlinevenues,arecompaniesanxioustounderstandhowtheirproductsandservicesareperceived.1.2WhatMightbeInvolved?AnExampleExaminationoftheConstructionofanOpinion/ReviewSearchEngineCreatingsystemsthatcanprocesssubjectiveinformationeectivelyrequiresovercominganumberofnovelchallenges.Toillustratesomeofthesechallenges,letusconsidertheconcreteexampleofwhatbuild-inganopinion-orreview-searchapplicationcouldinvolve.Aswehavediscussed,suchanapplicationwould“llanimportantandprevalent Presumably,theauthormeansthedetectionorpreventionoffraudviolations,Žasopposedtothecommission 1.2WhatMightbeInvolved?informationneed,whetheronerestrictsattentiontoblogsearch[213]orconsidersthemoregeneraltypesofsearchthathavebeendescribedabove.Thedevelopmentofacompletereview-oropinion-searchapplica-tionmightinvolveattackingeachofthefollowingproblems.(1)Iftheapplicationisintegratedintoageneral-purposesearchengine,thenonewouldneedtodeterminewhethertheuserisinfactlookingforsubjectivematerial.Thismayormaynotbeadicultprobleminandofitself:perhapsqueriesofthistypewilltendtocontainindicatortermslikereview,Žreviews,Žoropinions,ŽorperhapstheapplicationwouldprovideacheckboxŽtotheusersothatheorshecouldindi-catedirectlythatreviewsarewhatisdesired;butingeneral,queryclassi“cationisadicultproblem„indeed,itwasthesubjectofthe2005KDDCupchallenge[185].(2)Besidesthestill-openproblemofdeterminingwhichdocu-mentsaretopicallyrelevanttoanopinion-orientedquery,anadditionalchallengewefaceinournewsettingissimultaneouslyorsubsequentlydeterminingwhichdocu-mentsorportionsofdocumentscontainreview-likeoropin-ionatedmaterial.Sometimesthisisrelativelyeasy,asintextsfetchedfromreview-aggregationsitesinwhichreview-orientedinformationispresentedinrelativelystereotypedformat:examplesincludeEpinions.comandAmazon.com.However,blogsalsonotoriouslycontainquiteabitofsubjec-tivecontentandthusareanotherobviousplacetolook(andaremorerelevantthanshoppingsitesforqueriesthatcon-cernpolitics,people,orothernon-products),butthedesiredmaterialwithinblogscanvaryquitewidelyincontent,style,presentation,andevenlevelofgrammaticality.(3)Onceonehastargetdocumentsinhand,oneisstillfacedwiththeproblemofidentifyingtheoverallsentimentexpressedbythesedocumentsand/orthespeci“copinionsregard-ingparticularfeaturesoraspectsoftheitemsortopicsinquestion,asnecessary.Again,whilesomesitesmakethis Introductionkindofextractioneasier„forinstance,userreviewspostedtoYahoo!Moviesmustspecifygradesforpre-de“nedsetsofcharacteristicsof“lms„morefree-formtextcanbemuchharderforcomputerstoanalyze,andindeedcanposeaddi-tionalchallenges;forexample,ifquotationsareincludedinanewspaperarticle,caremustbetakentoattributetheviewsexpressedineachquotationtothecorrectentity.(4)Finally,thesystemneedstopresentthesentimentinforma-tionithasgarneredinsomereasonablesummaryfashion.Thiscaninvolvesomeorallofthefollowingactions:(a)AggregationofvotesŽthatmayberegisteredondierentscales(e.g.,onereviewerusesastarsystem,butanotheruseslettergrades).(b)Selectivehighlightingofsomeopinions.(c)Representationofpointsofdisagreementandpointsofconsensus.(d)Identi“cationofcommunitiesofopinionholders.(e)Accountingfordierentlevelsofauthorityamongopinionholders.Notethatitmightbemoreappropriatetoproduceavisual-izationofsentimentdataratherthanatextualsummaryofit,whereastextualsummariesarewhatisusuallycreatedinstandardtopic-basedmulti-documentsummarization.1.3OurChargeandApproachChallenges(2),(3),and(4)intheabovelistareveryactiveareasofresearch,andthebulkofthissurveyisdevotedtoreviewingworkinthesethreesub-“elds.However,duetospacelimitationsandthefocusofthejournalseriesinwhichthissurveyappears,wedonotandcannotaimtobecompletelycomprehensive.Inparticular,whenwebegantowritethissurvey,weweredirectlychargedtofocusoninformation-accessapplications,asopposedtoworkofmorepurelylinguisticinterest.Westressthattheimportanceofworkinthelatterveinisabsolutelynotinquestion. 1.4EarlyHistoryGivenourmandate,thereaderwillnotbesurprisedthatwedescribetheapplicationsthatsentiment-analysissystemscanfacilitateandreviewmanykindsofapproachestoavarietyofopinion-orientedclas-si“cationproblems.Wehavealsochosentoattempttodrawattentiontosingle-andmulti-documentsummarizationofevaluativetext,espe-ciallysinceinterestingconsiderationsregardinggraphicalvisualizationarise.Finally,wemovebeyondjustthetechnicalissues,devotingsig-ni“cantattentiontothebroaderimplicationsthatthedevelopmentofopinion-orientedinformation-accessserviceshave:welookatquestionsofprivacy,manipulation,andwhetherornotreviewscanhavemeasur-ableeconomicimpact.1.4EarlyHistoryAlthoughtheareaofsentimentanalysisandopinionmininghasrecentlyenjoyedahugeburstofresearchactivity,therehasbeenasteadyundercurrentofinterestforquiteawhile.Onecouldcountearlyprojectsonbeliefsasforerunnersofthearea[48,317].Laterworkfocusedmostlyoninterpretationofmetaphor,narrative,pointofview,aect,evidentialityintext,andrelatedareas[121,133,149,262,306,310,311,312,313].Theyear2001orsoseemstomarkthebeginningofwidespreadawarenessoftheresearchproblemsandopportunitiesthatsentimentanalysisandopinionminingraise[51,66,69,79,192,215,221,235,291,296,298,305,326],andsubsequentlytherehavebeenliterallyhundredsofpaperspublishedonthesubject.FactorsbehindthislandrushŽinclude:theriseofmachinelearningmethodsinnaturallanguageprocessingandinformationretrieval;theavailabilityofdatasetsformachinelearningalgorithmstobetrainedon,duetotheblossomingoftheWorldWideWeband,speci“cally,thedevelopmentofreview-aggregationweb-sites;and,ofcourserealizationofthefascinatingintellectualchallengesandcom-mercialandintelligenceapplicationsthattheareaoers. Introduction1.5ANoteonTerminology:OpinionMining,SentimentAnalysis,Subjectivity,andAllthatThebeginningofwisdomisthede“nitionofterms,wroteSocrates.Theaphorismishighlyapplicablewhenitcomestotheworldofsocialmediamonitoringandanalysis,whereanysemblanceofuniversalagreementonterminologyisaltogetherlacking.Today,vendors,practitioners,andthemediaalikecallthisstill-nascentarenaeverythingfrombrandmoni-toring,buzzmonitoringandonlineanthropology,tomarketin”uenceanalytics,conversationminingandonlineconsumerintelligence....Intheend,thetermsocialmediamonitoringandanalysisisitselfaverbalcrutch.Itisplaceholder[sic],tobeuseduntilsomethingbetter(andshorter)takesholdintheEnglishlanguagetodescribethetopicofthisreport.„ZabinandJeeries[327]Theabovequotationhighlightstheproblemsthathavearisenintryingtonameanewarea.Thequotationisparticularlyaptinthecontextofthissurveybecausethe“eldofsocialmediamonitoringandanalysisŽ(orhoweveronechoosestorefertoit)ispreciselyonethatthebodyofworkwereviewisveryrelevantto.Andindeed,therehasbeentodatenouniformterminologyestablishedfortherelativelyyoung“eldwediscussinthissurvey.Inthissection,wesimplymentionsomeofthetermsthatarecurrentlyinvogue,andattempttoindicatewhatthesetermstendtomeaninresearchpapersthattheinterestedreadermayencounter.Thebodyofworkwereviewisthatwhichdealswiththecomputa-tionaltreatmentof(inalphabeticalorder),andjectivityintext.Suchworkhascometobeknownasopinionminingsentimentanalysis,and/orsubjectivityanalysis.Thephrasesreviewappraisalextractionhavebeenused,too,andtherearesomeconnectionstoaectivecomputing,wherethegoalsincludeenablingcomputerstorecognizeandexpressemotions[239].Thisproliferationoftermsre”ectsdierencesintheconnotationsthatthesetermscarry, 1.5ANoteonTerminologybothintheiroriginalgeneral-discourseusagesandintheusagesthathaveevolvedinthetechnicalliteratureofseveralcommunities.In1994,Wiebe[311],in”uencedbythewritingsoftheliterarytheoristBan“eld[26],centeredtheideaofsubjectivityaroundthatofprivatestates,de“nedbyQuirketal.[245]asstatesthatarenotopentoobjectiveobservationorveri“cation.Opinions,evaluations,emotions,andspeculationsallfallintothiscategory;butacanonicalexampleofresearchtypicallydescribedasatypeofsubjectivityanalysisistherecognitionofopinion-orientedlanguageinordertodistinguishitfromobjectivelanguage.Whiletherehasbeensomeresearchself-identi“edassubjectivityanalysisontheparticularapplicationareaofdetermin-ingthevaluejudgments(e.g.,fourstarsŽorC+Ž)expressedintheevaluativeopinionsthatarefound,thisapplicationhasnottendedtobeamajorfocusofsuchwork.ThetermopinionminingappearsinapaperbyDaveetal.[69]thatwaspublishedintheproceedingsofthe2003WWWconference;thepublicationvenuemayexplainthepopularityofthetermwithincommunitiesstronglyassociatedwithWebsearchorinformationretrieval.AccordingtoDaveetal.[69],theidealopinion-miningtoolwouldprocessasetofsearchresultsforagivenitem,generatingalistofproductattributes(quality,features,etc.)andaggregatingopinions Toseethatthedistinctionsincommonusagecanbesubtle,considerhowinterrelatedthefollowingsetofde“nitionsgiveninMerriam-WebstersOnlineDictionarySynonyms:opinion,view,belief,conviction,persuasion,sentimentmeanajudgmentoneholdsastrue.OpinionimpliesaconclusionthoughtoutyetopentodisputeeachexpertseemedtohaveadierentopinionViewsuggestsasubjectiveopinionveryassertiveinstatinghisviewsBeliefimpliesoftendeliberateacceptanceandintellectualassenta“rmbeliefinherpartysplatformConvictionappliestoa“rmlyandseriouslyheldbeliefconvictionthatanimallifeisassacredashumanPersuasionsuggestsabeliefgroundedonassurance(asbyevidence)ofitstruthwasofthepersuasionthateverythingchangesSentimentsuggestsasettledopinionre”ectiveofonesfeelingsherfeministsentimentsarewell-known Introductionabouteachofthem(poor,mixed,good).ŽMuchofthesubsequentresearchself-identi“edasopinionmining“tsthisdescriptioninitsemphasisonextractingandanalyzingjudgmentsonvariousaspectsofgivenitems.However,thetermhasrecentlyalsobeeninterpretedmorebroadlytoincludemanydierenttypesofanalysisofevaluativetext[190].Thehistoryofthephrasesentimentanalysisparallelsthatofopin-ionminingŽincertainrespects.ThetermsentimentŽusedinreferencetotheautomaticanalysisofevaluativetextandtrackingofthepredic-tivejudgmentsthereinappearsin2001papersbyDasandChen[66]andTong[296],duetotheseauthorsinterestinanalyzingmarketsenti-ment.Itsubsequentlyoccurredwithin2002papersbyTurney[298]andPangetal.[235],whichwerepublishedintheproceedingsoftheannualmeetingoftheAssociationforComputationalLinguistics(ACL)andtheannualconferenceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP).Moreover,NasukawaandYi[221]entitledtheir2003paper,Sentimentanalysis:Capturingfavorabilityusingnaturallan-guageprocessingŽ,andapaperinthesameyearbyYietal.[323]wasnamedSentimentAnalyzer:Extractingsentimentsaboutagiventopicusingnaturallanguageprocessingtechniques.ŽTheseeventstogethermayexplainthepopularityofsentimentanalysisŽamongcommuni-tiesself-identi“edasfocusedonNLP.AsizeablenumberofpapersmentioningsentimentanalysisŽfocusonthespeci“capplicationofclassifyingreviewsastotheirpolarity(eitherpositiveornegative),afactthatappearstohavecausedsomeauthorstosuggestthatthephraserefersspeci“callytothisnarrowlyde“nedtask.However,nowa-daysmanyconstruethetermmorebroadlytomeanthecomputationaltreatmentofopinion,sentiment,andsubjectivityintext.Thus,whenbroadinterpretationsareapplied,sentimentanalysisŽandopinionminingŽdenotethesame“eldofstudy(whichitselfcanbeconsideredasub-areaofsubjectivityanalysis).Wehaveattemptedtousethesetermsmoreorlessinterchangeablyinthissurvey.Thisisinnosmallpartbecauseweviewthe“eldasrepresentingauni“edbodyofwork,andwouldthusliketoencourageresearchersintheareatoshareterminologyregardlessofthepublicationvenuesatwhichtheirpapersmightappear. 2 Applications Sentimentwithoutactionistheruinofthesoul.„EdwardAbbeyWeusedoneapplicationofopinionminingandsentimentanalysisasamotivatingexampleintheIntroduction,namely,websearchtargetedtowardreviews.Butotherapplicationsabound.Inthissection,weseektoenumeratesomeofthepossibilities.Itisimportanttomentionthatbecauseofallthepossibleapplica-tions,thereareagoodnumberofcompanies,largeandsmall,thathaveopinionminingandsentimentanalysisaspartoftheirmission.How-ever,wehaveelectednottomentionthesecompaniesindividuallyduetothefactthattheindustriallandscapetendstochangequiterapidly,sothatlistsofcompaniesriskfallingoutofdateratherquickly.2.1ApplicationstoReview-RelatedWebsitesClearly,thesamecapabilitiesthatareview-orientedsearchenginewouldhavecouldalsoserveverywellasthebasisforthecreationandautomatedupkeepofreview-andopinion-aggregationwebsites.Thatis,asanalternativetositeslikeEpinionsthatsolicitfeedbackandreviews, onecouldimaginesitesthatproactivelygathersuchinformation.Topicsneednotberestrictedtoproductreviews,butcouldincludeopinionsaboutcandidatesrunningforoce,politicalissues,andsoforth.Therearealsoapplicationsofthetechnologieswediscusstomoretraditionalreview-solicitationsites,aswell.Summarizinguserreviewsisanimportantproblem.Onecouldalsoimaginethaterrorsinuserratingscouldbe“xed:therearecaseswhereusershaveclearlyacci-dentallyselectedalowratingwhentheirreviewindicatesapositiveevaluation[47].Moreover,asdiscussedlaterinthissurvey(seeSec-tion5.2.4,forexample),thereissomeevidencethatuserratingscanbebiasedorotherwiseinneedofcorrection,andautomatedclassi“erscouldprovidesuchupdates.2.2ApplicationsasaSub-ComponentTechnologySentiment-analysisandopinion-miningsystemsalsohaveanimportantpotentialroleasenablingtechnologiesforothersystems.Onepossibilityisasanaugmentationtorecommendationsystems[292,293],sinceitmightbehoovesuchasystemnottorecommenditemsthatreceivealotofnegativefeedback.Detectionof”amesŽ(overlyheatedorantagonisticlanguage)inemailorothertypesofcommunication[276]isanotherpossibleuseofsubjectivitydetectionandclassi“cation.Inonlinesystemsthatdisplayadsassidebars,itishelpfultodetectwebpagesthatcontainsensitivecontentinappropriateforadsplace-ment[137];formoresophisticatedsystems,itcouldbeusefultobringupproductadswhenrelevantpositivesentimentsaredetected,andper-hapsmoreimportantly,nixtheadswhenrelevantnegativestatementsarediscovered.Ithasalsobeenarguedthatinformationextractioncanbeimprovedbydiscardinginformationfoundinsubjectivesentences[256].Questionansweringisanotherareawheresentimentanalysiscanproveuseful[274,284,189].Forexample,opinion-orientedquestionsmayrequiredierenttreatment.Alternatively,Litaetal.[189]suggestthatforde“nitionalquestions,providingananswerthatincludesmoreinformationabouthowanentityisviewedmaybetterinformtheuser. 2.3ApplicationsinBusinessandGovernmentIntelligenceSummarizationmayalsobene“tfromaccountingformultipleview-points[265].Additionally,therearepotentiallyrelationstocitationanalysis,where,forexample,onemightwishtodeterminewhetheranauthoriscitingapieceofworkassupportingevidenceorasresearchthatheorshedismisses[238].Similarly,oneeortseekstousesemanticorientationtotrackliteraryreputation[287].Ingeneral,thecomputationaltreatmentofaecthasbeenmoti-vatedinpartbythedesiretoimprovehuman…computerinteraction[188,192,295].2.3ApplicationsinBusinessandGovernmentIntelligenceThe“eldofopinionminingandsentimentanalysisiswell-suitedtovarioustypesofintelligenceapplications.Indeed,businessintelligenceseemstobeoneofthemainfactorsbehindcorporateinterestintheConsider,forinstance,thefollowingscenario(thetextofwhichalsoappearsinLee[181]).Amajorcomputermanufacturer,disappointedwithunexpectedlylowsales,“ndsitselfconfrontedwiththequestion:Whyarentconsumersbuyingourlaptop?ŽWhileconcretedatasuchasthelaptopsweightorthepriceofacompetitorsmodelareobviouslyrelevant,answeringthisquestionrequiresfocusingmoreonpeoplespersonalviewsofsuchobjectivecharacteristics.Moreover,subjectivejudgmentsregardingintangiblequalities„e.g.,thedesignistackyŽorcustomerservicewascondescendingŽ„orevenmisperceptions„e.g.,updateddevicedriversarenotavailableŽwhensuchdevicedriversdoinfactexist„mustbetakenintoaccountaswell.Sentiment-analysistechnologiesforextractingopinionsfromunstructuredhuman-authoreddocumentswouldbeexcellenttoolsforhandlingmanybusiness-intelligencetasksrelatedtotheonejustdescribed.Continuingwithourexamplescenario:itwouldbediculttotrytodirectlysurveylaptoppurchaserswhohavenotboughtthecompanysproduct.Rather,wecouldemployasystemthat(a)“ndsreviewsorotherexpressionsofopinionontheWeb„newsgroups,individualblogs,andaggregationsitessuchasEpinionsarelikelyto beproductivesources„andthen(b)createscondensedversionsofindividualreviewsoradigestofoverallconsensuspoints.Thiswouldsaveananalystfromhavingtoreadpotentiallydozensorevenhun-dredsofversionsofthesamecomplaints.NotethatInternetsourcescanvarywildlyinform,tenor,andevengrammaticality;thisfactunder-scorestheneedforrobusttechniquesevenwhenonlyonelanguage(e.g.,English)isconsidered.Besidesreputationmanagementandpublicrelations,onemightper-hapshopethatbytrackingpublicviewpoints,onecouldperformtrendpredictioninsalesorotherrelevantdata[214].(SeeourdiscussionofBroaderImplications(Section6)formorediscussionofpotentialeco-nomicimpact.)Governmentintelligenceisanotherapplicationthathasbeencon-sidered.Forexample,ithasbeensuggestedthatonecouldmonitorsourcesforincreasesinhostileornegativecommunications[1].2.4ApplicationsAcrossDierentDomainsOneexcitingturnofeventshasbeenthecon”uenceofinterestinopin-ionsandsentimentwithincomputersciencewithinterestinopinionsandsentimentinother“elds.Asiswellknown,opinionsmatteragreatdealinpolitics.Someworkhasfocusedonunderstandingwhatvotersarethinking[83,110,126,178,219],whereasotherprojectshaveasalongtermgoaltheclar-i“cationofpoliticianspositions,suchaswhatpublic“guressupportoroppose,toenhancethequalityofinformationthatvotershaveaccessto[27,111,294].Sentimentanalysishasspeci“callybeenproposedasakeyenablingtechnologyineRulemaking,allowingtheautomaticanalysisoftheopin-ionsthatpeoplesubmitaboutpendingpolicyorgovernment-regulationproposals[50,175,271].Onarelatednote,therehasbeeninvestigationintoopinionmininginweblogsdevotedtolegalmatters,sometimesknownasblawgsŽ[64].Interactionswithsociologypromisetobeextremelyfruitful.Forinstance,theissueofhowideasandinnovationsdiuse[258]involvesthequestionofwhoispositivelyornegativelydisposedtowardwhom, 2.4ApplicationsAcrossDierentDomainsandhencewhowouldbemoreorlessreceptivetonewinformationtransmissionfromagivensource.Totakejustoneotherexample:structuralbalancetheoryiscentrallyconcernedwiththepolarityoftiesŽbetweenpeople[54]andhowthisrelatestogroupcohe-sion.Theseideashavebeguntobeappliedtoonlinemediaanalysis[58,144]. 3 GeneralChallenges 3.1ContrastswithStandardFact-BasedTextualAnalysisTheincreasinginterestinopinionminingandsentimentanalysisispartlyduetoitspotentialapplications,whichwehavejustdiscussed.Equallyimportantarethenewintellectualchallengesthatthe“eldpresentstotheresearchcommunity.SowhatmakesthetreatmentofevaluativetextdierentfromclassicŽtextminingandfact-basedTaketextcategorization,forexample.Traditionally,textcategoriza-tionseekstoclassifydocumentsbytopic.Therecanbemanypossiblecategories,thede“nitionsofwhichmightbeuser-andapplication-dependent;andforagiventask,wemightbedealingwithasfewastwoclasses(binaryclassi“cation)orasmanyasthousandsofclasses(e.g.,classifyingdocumentswithrespecttoacomplextaxonomy).Incontrast,withsentimentclassi“cation(seeSection4.1formoredetailsonprecisede“nitions),weoftenhaverelativelyfewclasses(e.g.,pos-itiveŽor3starsŽ)thatgeneralizeacrossmanydomainsandusers.Inaddition,whilethedierentclassesintopic-basedcategorizationcanbecompletelyunrelated,thesentimentlabelsthatarewidely 3.2FactorsthatMakeOpinionMiningDicultconsideredinpreviousworktypicallyrepresentopposing(ifthetaskisbinaryclassi“cation)orordinal/numericalcategories(ifclassi“cationisaccordingtoamulti-pointscale).Infact,theregression-likenatureofstrengthoffeeling,degreeofpositivity,andsoonseemsratheruniquetosentimentcategorization(althoughonecouldarguethatthesamephenomenonexistswithrespecttotopic-basedrelevance).Therearealsomanycharacteristicsofanswerstoopinion-orientedquestionsthatdierfromthoseforfact-basedquestions[284].Asaresult,opinion-orientedinformationextraction,asawaytoapproachopinion-orientedquestionanswering,naturallydiersfromtraditionalinformationextraction(IE)[49].Interestingly,inamannerthatissim-ilartothesituationfortheclassesinsentiment-basedclassi“cation,thetemplatesforopinion-orientedIEalsooftengeneralizewellacrossdier-entdomains,sinceweareinterestedinroughlythesamesetof“eldsforeachopinionexpression(e.g.,holder,type,strength)regardlessofthetopic.Incontrast,traditionalIEtemplatescandiergreatlyfromonedomaintoanother„thetypicaltemplateforrecordinginformationrelevanttoanaturaldisasterisverydierentfromatypicaltemplateforstoringbibliographicinformation.Thesedistinctionsmightmakeourproblemsappeardeceptivelysimplerthantheircounterpartsinfact-basedanalysis,butthisisfarfromthetruth.Inthenextsection,wesampleafewexamplestoshowwhatmakestheseproblemsdicultcomparedtotraditionalfact-basedtextanalysis.3.2FactorsthatMakeOpinionMiningDicultLetusbeginwithasentimentpolaritytext-classi“cationexample.Sup-posewewishtoclassifyanopinionatedtextaseitherpositiveornegative,accordingtotheoverallsentimentexpressedbytheauthorwithinit.Isthisadiculttask?Toanswerthisquestion,“rstconsiderthefollowingexample,consistingofonlyonesentence(byMarkTwain):JaneAustensbooksmaddenmesothatIcantconcealmyfrenzyfromthereader.ŽJustasthetopicofthistextsegmentcanbeidenti“edbythephraseJaneAusten,ŽthepresenceofwordslikemaddenŽandfrenzyŽsuggests GeneralChallengesnegativesentiment.Soonemightthinkthisisaneasytask,andhypothesizethatthepolarityofopinionscangenerallybeidenti“edbyasetofkeywords.But,theresultsofanearlystudybyPangetal.[235]onmoviereviewssuggestthatcomingupwiththerightsetofkeywordsmightbelesstrivialthanonemightinitiallythink.ThepurposeofPangetal.spilotstudywastobetterunderstandthedicultyofthedocument-levelsentiment-polarityclassi“cationproblem.Twohumansubjectswereaskedtopickkeywordsthattheywouldconsidertobegoodindi-catorsofpositiveandnegativesentiment.AsshowninFigure3.1,theuseofthesubjectslistsofkeywordsachievesabout60%accuracywhenemployedwithinastraightforwardclassi“cationpolicy.Incontrast,wordlistsofthesamesizebutchosenbasedonexaminationofthecorpusstatisticsachievesalmost70%accuracy„eventhoughsomeoftheterms,suchasstill,Žmightnotlookthatintuitiveat“rst.However,thefactthatitmaybenon-trivialforhumanstocomeupwiththebestsetofkeywordsdoesnotinitselfimplythattheproblemisharderthantopic-basedcategorization.WhilethefeaturestillŽmightnotbelikelyforanyhumantoproposefromintrospection,giventrainingdata,itscorrelationwiththepositiveclasscanbediscoveredviaadata-drivenapproach,anditsutility(atleastin Proposedwordlists Accuracy Ties (%) (%) Human1 positive:dazzling,brilliant,phenomenal,excellent, 58 75 negative:suck,terrible,awful,unwatchable,hideous Human2 positive:gripping,mesmerizing,riveting,spectacular,cool,awesome,thrilling,badass,excellent,moving,exciting 64 39 negative:bad,cliched,sucks,boring,stupid,slow Statistics-based positive:love,wonderful,best,great,superb,still,beautiful 69 16 negative:bad,worst,stupid,waste,boring,?,! Fig.3.1Sentimentclassi“cationusingkeywordlistscreatedbyhumansubjects(Human1ŽandHuman2Ž),withcorrespondingresultsusingkeywordsselectedviaexaminationofsimplestatisticsofthetestdata(Statistics-basedŽ).AdaptedfromFigures1and2inPangetal.[235]. 3.2FactorsthatMakeOpinionMiningDicultthemoviereviewdomain)doesmakesenseinretrospect.Indeed,applyingmachinelearningtechniquesbasedonunigrammodelscanachieveover80%inaccuracy[235],whichismuchbetterthantheper-formancebasedonhand-pickedkeywordsreportedabove.However,thislevelofaccuracyisnotquiteonparwiththeperformanceonewouldexpectintypicaltopic-basedbinaryclassi“cation.Whydoesthisproblemappearharderthanthetraditionaltaskwhenthetwoclassesweareconsideringherearesodierentfromeachother?Ourdiscussionofalgorithmsforclassi“cationandextraction(Section4)willprovideamorein-depthanswertothisquestion,butthefollowingareafewexamples(fromamongthemanyweknow)showingthattheupperboundonproblemdiculty,fromtheviewpointofmachines,isveryhigh.NotethatnotalloftheissuestheseexamplesraisehavebeenfullyaddressedintheexistingbodyofworkinthisComparedtotopic,sentimentcanoftenbeexpressedinamoresubtlemanner,makingitdiculttobeidenti“edbyanyofasentenceordocumentstermswhenconsideredinisolation.ConsiderthefollowingIfyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.Ž(reviewbyLucaTurinandTaniaSanchezoftheGivenchyperfumeAmarige,inPerfumes:TheGuide,Viking2008.)Noostensiblynegativewordsoccur.SherunsthegamutofemotionsfromAtoB.Ž(DorothyParker,speakingaboutKatharineHepburn.)Noostensiblynegativewordsoccur.Infact,theexamplethatopensthissection,whichwastakenfromthefollowingquotefromMarkTwain,isalsofollowedbyasentencewithnoostensiblynegativewords:JaneAustensbooksmaddenmesothatIcantconcealmyfrenzyfromthereader.EverytimeIreadPrideandPrejudiceIwanttodigherupandbeatherovertheskullwithherownshin-bone. GeneralChallengesArelatedobservationisthatalthoughthesecondsentenceindicatesanextremelystrongopinion,itisdiculttoassociatethepresenceofthisstrongopinionwithspeci“ckeywordsorphrasesinthissentence.Indeed,subjectivitydetectioncanbeadiculttaskinitself.ConsiderthefollowingquotefromCharlotteBront¨e,inalettertoGeorgeLewes:YousayImustfamiliarisemymindwiththefactthatMissAustenisnotapoetess,hasnosentimentŽ(youscornfullyenclosethewordininvertedcommas),hasnoeloquence,noneoftheravishingenthusiasmofpoetryŽ;andthenyouadd,Imustlearntoacknowl-edgeherasoneofthegreatestartists,ofthegreatestpaintersofhumancharacter,andoneofthewriterswiththenicestsenseofmeanstoanendthateverlived.ŽNotethe“nelinebetweenfactsandopinions:whileMissAustenisnotapoetessŽcanbeconsideredtobeafact,noneoftheravishingenthusiasmofpoetryŽshouldprobablybeconsideredasanopinion,eventhoughthetwophrasess(arguably)conveysimilarinformation.Thus,notonlycanwenoteasilyidentifysimplekeywordsforsub-jectivity,butwealso“ndthatlikethefactthatŽdonotnecessarilyguaranteetheobjectivetruthofwhatfollowsthem„andbigramslikenosentimentŽapparentlydonotguaranteetheabsenceofopinions,either.Wecanalsogetaglimpseofhowopinion-orientedinformation OnecanchallengeouranalysisofthepoetessŽclause,asananonymousreviewerindeeddid„whichdisagreementperhapssupportsourgreaterpointaboutthedicultiesthatcansometimespresentthemselves.Dierentresearchersexpressdierentopinionsaboutwhetherdistinguishingbetweensubjectiveandobjectivelanguageisdicultforhumansinthegeneralcase.Forexample,KimandHovy[159]notethatinapilotstudysponsoredbyNIST,humanannotatorsoftendisagreedonwhetherabeliefstatementwasorwasnotanopinion.ŽHowever,otherresearchershavefoundinter-annotatoragreementratesinvarioustypesofsubjectivity-classi“cationtaskstobesatisfactory[45,273,274,309];asummaryprovidedbyoneoftheanonymousrefereesisthat[although]thereisvariationfromstudytostudy,onaverage,about85%ofannotationsarenotmarkedasuncertainbyeitherannotator,andforthesecases,inter-coderagreementisveryhigh(kappavaluesover80).ŽAsinothersettings,morecarefulde“nitionsofthedistinctionstobemadetendtoleadtobetteragreementrates.Inanyevent,thepointsweareexploringintheBront¨equotemaybemademoreclearbyreplacingJaneAustenisnotapoetessŽwithsomethinglikeJaneAustendoesnotwritepoetryforaliving,butisalsonopoetinthebroadersense.Ž 3.2FactorsthatMakeOpinionMiningDicultextractioncanbedicult.Forinstance,itisnon-trivialtorecognizeopinionholders.Intheexamplequotedabove,theopinionisnotthatoftheauthor,buttheopinionofYou,ŽwhichreferstoGeorgeLewesinthisparticularletter.Also,observethatgiventhecontext(youscornfullyenclosethewordininvertedcommas,ŽtogetherwiththereportedendorsementofAustenasagreatartist),itisclearthathasnosentimentŽisnotmeanttobeashow-stoppingcriticismofAustenfromLewes,andBront¨esdisagreementwithhimonthissubjectisalsosubtlyrevealed.Ingeneral,sentimentandsubjectivityarequitecontext-sensitive,and,atacoarsergranularity,quitedomaindependent(inspiteofthefactthatthegeneralnotionofpositiveandnegativeopinionsisfairlyconsistentacrossdierentdomains).Notethatalthoughdomaindepen-dencyisinpartaconsequenceofchangesinvocabulary,eventheexactsameexpressioncanindicatedierentsentimentindierentdomains.Forexample,goreadthebookŽmostlikelyindicatespositivesen-timentforbookreviews,butnegativesentimentformoviereviews.(ThisexamplewasfurnishedtousbyBobBland.)Wewilldiscusstopic-sentimentinteractioninmoredetailinSection4.4.Itdoesnottakeaseasonedwriteroraprofessionaljournalisttoproducetextsthataredicultformachinestoanalyze.ThewritingsofWebuserscanbejustaschallenging,ifnotassubtle,intheirownway„seeFigure3.2foranexample.InthecaseofFigure3.2,itshouldbepointedoutthatmightbemoreusefultolearntorecognizethequalityofareview(seeSection5.2formoredetaileddiscussionsonthatsubject).Still,itisinterestingtoobservetheimportanceofmodelingdiscoursestructure.Whiletheoveralltopicofadocument Fig.3.2Exampleofmoviereviewsproducedbywebusers:a(slightlyreformatted)screen-shotofuserreviewsforTheNightmareBeforeChristmas GeneralChallengesshouldbewhatthemajorityofthecontentisfocusingonregardlessoftheorderinwhichpotentiallydierentsubjectsarepresented,foropinions,theorderinwhichdierentopinionsarepresentedcanresultinacompletelyoppositeoverallsentimentpolarity.Infact,somewhatincontrastwithtopic-basedtextcategorization,ordereectscancompletelyoverwhelmfrequencyeects.Considerthefollowingexcerpt,againfromamoviereview:This“lmshouldbebrilliant.Itsoundslikeagreattheactorsare“rstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcantholdup.Asindicatedbythe(inserted)emphasis,wordsthatarepositiveinorientationdominatethisexcerpt,andyettheoverallsentimentisnegativebecauseofthecruciallastsentence;whereasintraditionaltextclassi“cation,ifadocumentmentionscarsŽrelativelyfrequently,thenthedocumentismostlikelyatleastsomewhatrelatedtocars.Orderdependencealsomanifestsitselfatmore“ne-grainedlevelsofanalysis:AisbetterthanBŽconveystheexactoppositeopinionfromBisbetterthanA.ŽIngeneral,modelingsequentialinformationanddiscoursestructureseemsmorecrucialinsentimentanalysis(furtherdiscussionappearsinSection4.7).Asnotedearlier,notalloftheissueswehavejustdiscussedhavebeenfullyaddressedintheliterature.Thisisperhapspartofthecharmofthisemergingarea.Inthefollowingsections,weaimtogiveanoverviewofaselectionofpastheroiceortstoaddresssomeoftheseissues,andmarchthroughthepositivesandthenegatives,chargedwithunbiasedfeeling,armedwithhardfacts.Fastenyourseatbelts.Itsgoingtobeabumpynight!„BetteDavis,AllAboutEvescreenplaybyJosephMankiewicz OnecouldargueaboutwhetherinthecontextofmoviereviewsthewordStalloneŽhasasemanticorientation.Notethatthisisnotuniquetoopinionexpressions;AkilledBŽandBkilledAŽalsoconveydierentfactualinformation. 4 Classi“cationandExtraction TheBucketList,ŽwhichwaswrittenbyJustinZack-hamanddirectedbyRobReiner,seemstohavebeencreatedbyapplyingalgorithmstosentiment.„DavidDenbymoviereview,TheNewYorker,January7,2007Afundamentaltechnologyinmanycurrentopinion-miningandsentiment-analysisapplicationsisclassi“cation„notethatinthissur-vey,wegenerallyconstruethetermclassi“cationŽbroadly,sothatitencompassesregressionandranking.Thereasonthatclassi“cationissoimportantisthatmanyproblemsofinterestcanbeformulatedasapply-ingclassi“cation/regression/rankingtogiventextualunits;examplesincludemakingadecisionforaparticularphraseordocument(howpositiveisit?Ž),orderingasetoftexts(rankthesereviewsbyhowpos-itivetheyareŽ),givingasinglelabeltoanentiredocumentcollection(whereonthescalebetweenliberalandconservativedothewritingsofthisauthorlie?Ž),andcategorizingtherelationshipbetweentwoenti-tiesbasedontextualevidence(doesAapproveofBsactions?Ž).Thissectioniscenteredonapproachestothesekindsofproblems. Classi“cationandExtractionPartOne(p.24.)coversfundamentalbackground.Speci“cally,Section4.1providesadiscussionofkeyconceptsinvolvedincommonformulationsofclassi“cationproblemsinsentimentanalysisandopin-ionmining.FeaturesthathavebeenexploredforsentimentanalysistasksarediscussedinSection4.2.PartTwo(p.37.)isdevotedtoanin-depthdiscussionofdierenttypesofapproachestoclassi“cation,regression,andrankingproblems.ThebeginningofPartTwoshouldbeconsultedforadetailedoutline,butitisappropriateheretoindicatehowwecoverextraction,sinceitplaysakeyroleinmanysentiment-orientedapplicationsandsosomereadersmaybeparticularlyinterestedinit.First,extractionproblems(e.g.,retrievingopinionsonvariousfea-turesofalaptop)areoftensolvedbycastingmanysub-problemsasclassi“cationproblems(e.g.,givenatextspan,determinewhetheritexpressesanyopinionatall).Therefore,ratherthanhaveasepa-ratesectiondevotedcompletelytotheentiretyoftheextractiontask,wehaveintegrateddiscussionofextraction-orientedclassi“cationsub-problemsintotheappropriateplacesinourdiscussionofdierenttypesofapproachestoclassi“cationingeneral(Sections4.3…4.8).Section4.9coversthoseremainingaspectsofextractionthatcanbethoughtofasdistinctfromclassi“cation.Second,extractionisoftenameanstothefurthergoalofprovid-ingeectivesummariesoftheextractedinformationtousers.DetailsonhowtocombineinformationminedfrommultiplesubjectivetextsegmentsintoasuitablesummarycanbefoundinSection5.PartOne:Fundamentals4.1ProblemFormulationsandKeyConceptsMotivatedbydierentreal-worldapplications,researchershavecon-sideredawiderangeofproblemsoveravarietyofdierenttypesofcorpora.Wenowexaminethekeyconceptsinvolvedintheseproblems.Thisdiscussionalsoservesasaloosegroupingofthemajorproblems,whereeachgroupconsistsofproblemsthataresuitableforsimilartreatmentaslearningtasks. 4.1ProblemFormulationsandKeyConcepts4.1.1SentimentPolarityandDegreesofPositivityOnesetofproblemssharethefollowinggeneralcharacter:givenanopinionatedpieceoftext,whereinitisassumedthattheoverallopin-ioninitisaboutonesingleissueoritem,classifytheopinionasfallingunderoneoftwoopposingsentimentpolarities,orlocateitspositiononthecontinuumbetweenthesetwopolarities.Alargeportionofworkinsentiment-relatedclassi“cation/regression/rankingfallswithinthiscategory.EguchiandLavrenko[84]pointoutthatthepolarityorpos-itivitylabelssoassignedmaybeusedsimplyforsummarizingthecon-tentofopinionatedtextunitsonatopic,whethertheybepositiveornegative,orforonlyretrievingitemsofagivensentimentorientation(say,positive).Thebinaryclassi“cationtaskoflabelinganopinionateddocumentasexpressingeitheranoverallpositiveoranoverallnegativeopin-ioniscalledsentimentpolarityclassi“cationpolarityclassi“cationAlthoughthisbinarydecisiontaskhasalsobeentermedsentimentclas-si“cationintheliterature,asmentionedabove,inthissurveywewillusesentimentclassi“cationŽtoreferbroadlytobinarycategorization,multi-classcategorization,regression,and/orranking.Muchworkonsentimentpolarityclassi“cationhasbeenconductedinthecontextofreviews(e.g.,thumbsupŽorthumbsdownŽformoviereviews).WhileinthiscontextpositiveŽandnegativeŽopin-ionsareoftenevaluative(e.g.,likeŽvs.dislikeŽ),thereareotherproblemswheretheinterpretationofpositiveŽandnegativeŽissub-tlydierent.Oneexampleisdeterminingwhetherapoliticalspeechisinsupportoforoppositiontotheissueunderdebate[27,294];arelatedtaskisclassifyingpredictiveopinionsinelectionforumsintolikelytowinŽandunlikelytowinŽ[160].Sincetheseproblemsareallcon-cernedwithtwoopposingsubjectiveclasses,asmachinelearningtaskstheyareoftenamenabletosimilartechniques.Notethatanumberofotheraspectsofpoliticallyorientedtext,suchaswhetherliberalorconservativeviewsareexpressed,havebeenexplored;sincethelabelsusedinthoseproblemscanusuallybeconsideredpropertiesofasetofdocumentsrepresentingauthorsattitudesovermultipleissuesratherthanpositiveornegativesentimentwithrespecttoasingleissue,we Classi“cationandExtractiondiscussthemunderadierentheadingfurtherbelow(viewpointsandperspectives,ŽSection4.1.4).Theinputtoasentimentclassi“erisnotnecessarilyalwaysstrictlyopinionated.Classifyinganewsarticleintogoodorbadnewshasbeenconsideredasentimentclassi“cationtaskintheliterature[168].Butapieceofnewscanbegoodorbadnewswithoutbeingsubjective(i.e.,withoutbeingexpressiveoftheprivatestatesoftheauthor):forinstance,thestockpriceroseŽisobjectiveinformationthatisgenerallyconsideredtobegoodnewsinappropriatecontexts.Itisnotourmainintenttoprovideaclean-cutde“nitionforwhatshouldbeconsideredsentimentpolarityclassi“cationŽproblems,butitisperhapsusefultopointoutthat(a)indeterminingthesentimentpolarityofopinionatedtextswheretheauthorsdoexplicitlyexpresstheirsentimentthroughstatementslikethislaptopisgreat,Ž(arguably)objectiveinformationsuchaslongbatterylifeŽisoftenusedtohelpdeterminetheoverallsentiment;(b)thetaskofdeterminingwhetherapieceofobjectiveinformationisgoodorbadisstillnotquitethesameasclassifyingitintooneofseveraltopic-basedclasses,andhenceinheritsthechallengesinvolvedinsentimentanalysis;and(c)aswewilldiscussinmoredetaillater,thedistinctionbetweensubjectiveandobjectiveinformationcanbesubtle.IslongbatterylifeŽobjective?Alsoconsiderthedierencebetweenthebatterylasts2hoursŽvs.thebatteryonlylasts2hours.ŽRelatedcategories.Analternativewayofsummarizingreviewsistoextractinformationonwhythereviewerslikedordislikedtheproduct.KimandHovy[158]notethatsuchproandconŽexpressionscandierfrompositiveandnegativeopinionexpressions,althoughthetwocon-cepts„opinion(Ithinkthislaptopisterri“cŽ)andreasonforopinion(Thislaptoponlycosts$399Ž)„areforthepurposesofanalyzingevaluativetextstronglyrelated.Inadditiontopotentiallyformingthebasisfortheproductionofmoreinformativesentiment-orientedsummaries,identifyingproandconreasonscanpotentiallybeusedto Whileitisofutterimportancethattheproblemitselfshouldbewell-de“ned,itisofless,ifany,importancetodecidewhichtasksshouldbelabeledaspolarityclassi“cationŽWhetherthisshouldbeconsideredasanobjectivestatementmaybeupfordebate:onecanimagineanotherreviewerretorting,youcallthatbatterylife?Ž 4.1ProblemFormulationsandKeyConceptshelpdecidethehelpfulnessofindividualreviews:evaluativejudgmentsthataresupportedbyreasonsarelikelytobemoretrustworthy.AnothertypeofcategorizationrelatedtodegreesofpositivityisconsideredbyNiuetal.[226],whoseektodeterminethepolarityofoutcomes(improvementvs.death,say)describedinmedicaltexts.Additionalproblemsrelatedtothedeterminationofdegreeofpos-itivitysurroundtheanalysisofcomparativesentences[139].ThemainideaisthatsentencessuchasThenewmodelismoreexpensivethantheoldoneŽorIpreferthenewmodeltotheoldmodelŽareimportantsourcesofinformationregardingtheauthorsevaluations.RatinginferenceordinalregressionThemoregeneralproblemofratinginference,whereonemustdeterminetheauthorsevaluationwithrespecttoamulti-pointscale(e.g.,oneto“vestarsŽforareview)canbeviewedsimplyasamulti-classtextcategorizationproblem.Predict-ingdegreeofpositivityprovidesmore“ne-grainedratinginformation;atthesametime,itisaninterestinglearningprobleminitself.Butincontrasttomanytopic-basedmulti-classclassi“cationproblems,sentiment-relatedmulti-classclassi“cationcanalsobenat-urallyformulatedasaregressionproblembecauseratingsareordinal.Itcanbearguedtoconstituteaspecialtypeof(ordinal)regressionproblembecausethesemanticsofeachclassmaynotsimplydirectlycorrespondtoapointonascale.Morespeci“cally,eachclassmayhaveitsowndistinctvocabulary.Forinstance,ifweareclassifyinganauthorsevaluationintooneofthepositive,neutral,andnegativeclasses,anoverallneutralopinioncouldbeamixtureofpositiveandnegativelanguage,oritcouldbeidenti“edwithsignaturewordssuchasmediocre.ŽThispresentsuswithinterestingopportunitiestoexploretherelationshipsbetweenclasses.Notethedierencebetweenratinginferenceandpredictingstrengthofopinion(discussedinSection4.1.2);forinstance,itispossibletofeelquitestrongly(highonthestrengthŽscale)thatsomethingismediocre(middlingontheevaluationŽscale).Also,notethatthelabelneutralŽissometimesusedasalabelfortheobjectiveclass(lackofopinionŽ)intheliterature.Inthissurvey,weuseneutralonlyintheaforementionedsenseofasentimentthatliesbetweenpositiveandnegative. Classi“cationandExtractionInterestingly,CabralandHorta¸csu[47]observethatneutralcom-mentsinfeedbacksystemsarenotnecessarilyperceivedbyusersaslyingattheexactmid-pointbetweenpositiveandnegativecomments;rather,theinformationcontainedinaneutralratingisperceivedbyuserstobemuchclosertonegativefeedbackthanpositive.ŽOntheotherhand,theyalsonotethatintheirdata,sellerswerelesslikelytoretaliateagainstneutralcomments,asopposedtonegatives:...abuyerleavinganegativecommenthasa40%chanceofbeinghitback,whileabuyerleavinganeutralcommentonlyhasa10%chanceofbeingretaliateduponbytheseller.ŽAgreement.Theopposingnatureofpolarityclassesalsogivesrisetoexplorationofagreementdetection,e.g.,givenapairoftexts,decidingwhethertheyshouldreceivethesameordieringsentiment-relatedlabelsbasedontherelationshipbetweentheelementsofthepair.Thisisoftennotde“nedasastandaloneproblembutconsideredasasub-taskwhoseresultisusedtoimprovethelabelingoftheopinionsheldbytheentitiesinvolved[272,294].Adierenttypeofagreementtaskhasalsobeenconsideredinthecontextofperspectives,where,forexample,alabelofconservativeŽtendstoindicateagreementwithparticularpositionsonawidevarietyofissues.4.1.2SubjectivityDetectionandOpinionIdenti“cationWorkinpolarityclassi“cationoftenassumestheincomingdocumentstobeopinionated.Formanyapplications,though,wemayneedtodecidewhetheragivendocumentcontainssubjectiveinformationornot,oridentifywhichportionsofthedocumentaresubjective.Indeed,thisproblemwasthefocusofthe2006BlogtrackatTREC[227].Atleastoneopinion-trackingsystemratessubjectivityandsentimentseparately[108].Mihalceaetal.[209]summarizetheevidenceofsev-eralprojectsonsubsententialanalysis[12,90,289,319]asfollows:theproblemofdistinguishingsubjectiveversusobjectiveinstanceshasoftenprovedtobemoredicultthansubsequentpolarityclassi“cation,soimprovementsinsubjectivityclassi“cationpromisetopositivelyimpactsentimentclassi“cation.Ž 4.1ProblemFormulationsandKeyConceptsEarlyworkbyHatzivassiloglouandWiebe[120]examinedtheeectsofadjectiveorientationandgradabilityonsentencesubjectiv-ity.Thegoalwastotellwhetheragivensentenceissubjectiveornotjudgingfromtheadjectivesappearinginthatsentence.Anumberofprojectsaddresssentence-levelorsub-sentence-levelsubjectivitydetec-tionindierentdomains[33,156,232,255,308,315,319,326].Wiebeetal.[316]presentacomprehensivesurveyofsubjectivityrecognitionusingdierentcluesandfeatures.Wilsonetal.[320]addresstheproblemofdeterminingclause-levelopinionstrength(e.g.,howmadareyou?Ž).Notethattheproblemofdeterminingopinionstrengthisdierentfromratinginference.Classi-fyingapieceoftextasexpressinganeutralopinion(givingitamid-pointscore)forratinginferencedoesnotequalclassifyingthatpieceoftextasobjective(lackofopinion):onecanhaveastrongopinionthatsomethingismediocreŽorso-so.ŽRecentworkalsoconsidersrelationsbetweenwordsensedisam-biguationandsubjectivity[307].Subjectivitydetectionorrankingatthedocumentlevelcanbethoughtofashavingitsrootsinstudiesingenreclassi“cation(seeSection4.1.5formoredetail).Forinstance,YuandHatzivassiloglou[326]achievehighaccuracy(97%)withaNaiveBayesclassi“eronaparticularcorpusconsistingofWallStreetJournalarticles,wherethetaskistodistinguisharticlesunderNewsandBusiness(facts)fromarticlesunderEditorialandLettertotheEditor(opinions).(ThistaskwassuggestedearlierbyWiebeetal.[315],andasimilarcorpuswasexploredinpreviouswork[308,316].)Workinthisdirectionisnotlim-itedtothebinarydistinctionbetweensubjectiveandobjectivelabels.Recentworkincludestheresearchbyparticipantsinthe2006TRECBlogtrack[227]andothers[69,97,222,223,234,279,316,326].4.1.3JointTopic…SentimentAnalysisOnesimplifyingassumptionsometimesmadebyworkondocument-levelsentimentclassi“cationisthateachdocumentunderconsiderationisfocusedonthesubjectmatterweareinterestedin.Thisisinpartbecauseonecanoftenassumethatthedocumentsetwascreated Classi“cationandExtractionby“rstcollectingonlyon-topicdocuments(e.g.,by“rstrunningatopic-basedquerythroughastandardsearchengine).However,itispossiblethatthereareinteractionsbetweentopicandopinionthatmakeitdesirabletoconsiderthetwosimultaneously;forexample,Rilofetal.[256]“ndthattopic-basedtext“lteringandsubjectivity“l-teringarecomplementaryŽinthecontextofexperimentsininformationAlso,evenarelevantopinion-bearingdocumentmaycontaino-topicpassagesthattheusermaynotbeinterestedin,andsoonemaywishtodiscardsuchpassages.Anotherinterestingcaseiswhenadocumentcontainsmaterialonmultiplesubjectsthatmaybeofinteresttotheuser.Insuchaset-ting,itisusefultoidentifythetopicsandseparatetheopinionsasso-ciatedwitheachofthem.Twoexamplesofthetypesofdocumentsforwhichthiskindofanalysisisappropriateare(1)comparativestudiesofrelatedproducts,and(2)textsthatdiscussvariousfeatures,aspects,orattributes.4.1.4ViewpointsandPerspectivesMuchworkonanalyzingsentimentandopinionsinpoliticallyori-entedtextfocusesongeneralattitudesexpressedthroughtextsthatarenotnecessarilytargetedataparticularissueornarrowsubject.Forinstance,Grefenstetteetal.[112]experimentedwithdeterminingthepoliticalorientationofwebsitesessentiallybyclassifyingtheconcate-nationofallthedocumentsfoundonthatsite.Wegroupthistypeofworkundertheheadingofviewpointsandperspectives,Žandincludeunderthisrubricworkonclassifyingtextsasliberal,conservative,lib-ertarian,etc.[219],placingtextsalonganideologicalscale[178,202],orrepresentingIsraeliversusPalestinianviewpoints[186,187].Althoughbinaryor-aryclassi“cationmaybeused,here,theclassestypicallycorrespondnottoopinionsonasingle,narrowlyde“nedtopic,buttoacollectionofbundledattitudesandbeliefs.Thiscouldpotentiallyenabledierentapproachesfrompolarity Whenthecontextisclear,weoftenusethetermfeatureŽtorefertofeature,aspect,orattributeŽinthissurvey. 4.1ProblemFormulationsandKeyConceptsclassi“cation.Ontheotherhand,ifwetreatthesetofdocumentsasameta-document,andthedierentissuesbeingdiscussedasmeta-features,thenthisproblemstillsharessomecommongroundwithpolarityclassi“cationoritsmulti-class,regression,andrankingvari-ants.Indeed,someoftheapproachesexploredintheliteratureforthesetwoproblemsindividuallycouldverywellbeadaptedtoworkforeitheroneofthem.Theotherpointofdeparturefromthepolarityclassi“cationproblemisthatthelabelsbeingconsideredaremoreaboutattitudesthatdonotnaturallycorrespondwithdegreeofpositivity.Whileassigning-simplelabelsremainsaclassi“cationproblem,ifwemovefartherandaimatservingmoreexpressiveandopen-endedopinionstotheuser,weneedtosolveextractionproblems.Forinstance,onemaybeinterestedinobtainingdescriptionsofopinionsofagreatercomplexitythansimplelabelsdrawnfromaverysmallset,i.e.,onemightbeseekingsomethingmorelikeachievingworldpeaceisdicultŽthanlikemildlypositive.ŽInfact,muchofthepriorworkonperspectivesandviewpointsseekstoextractmoreperspective-relatedinformation(e.g.,opinionholders).Themotivationwastoenablemulti-perspectivequestionanswering,wheretheusercouldaskquestionssuchaswhatisMissAmericasperspectiveonworldpeace?Žratherthanafact-basedquestion(e.g.,whoisthenewMissAmerica?Ž).Naturally,suchworkisoftenframedinthecontextofextractionproblems,theparticularcharacteristicsofwhicharecoveredinSection4.9.4.1.5OtherNon-FactualInformationinTextResearchershaveconsideredvariousaecttypes,suchasthesixuniversalŽemotions[86]:anger,disgust,fear,happiness,sadness,andsurprise[192,9,285].Aninterestingapplicationisinhuman…computerinteraction:ifasystemdeterminesthatauserisupsetorannoyed,forinstance,itcouldswitchtoadierentmodeofinteraction[188].Otherrelatedareasofresearchincludecomputationalapproachesforhumorrecognitionandgeneration[210].ManyinterestingaectualaspectsoftextlikehappinessŽormoodŽarealsobeingexploredinthecontextofinformaltextresourcessuchasweblogs[224].Potential Classi“cationandExtractionapplicationsincludemonitoringlevelsofhatefulorviolentrhetoric,perhapsinmultilingualsettings[1].Inadditiontoclassi“cationbasedonaectandemotion,anotherrelatedareaofresearchthataddressesnon-topic-basedcategorizationisthatofdeterminingthegenreoftexts[97,98,150,153,182,277].Sincesubjectivegenres,suchaseditorial,Žareoftenoneofthepossiblecategories,suchworkcanbeviewedascloselyrelatedtosubjectivitydetection.Indeed,thisrelationhasbeenobservedinworkfocusedonlearningsubjectivelanguage[316].Therehasalsobeenresearchthatconcentratesonclassifyingdoc-umentsaccordingtotheirsourcesourcestyle,withstatisticallydetectedstylisticvariation[38]servingasanimportantcue.Author-shipidenti“cationisperhapsthemostsalientexample„MostellerandWallaces[216]classicBayesianstudyoftheauthorshipoftheFeder-alistPapersisonewell-knowninstance.Argamon-Engelsonetal.[18]considertherelatedproblemofidentifyingnottheparticularauthorofatext,butitspublisher(e.g.,theNewYorkTimesTheDaily);theworkofKessleretal.[153]ondeterminingadocumentsbrowŽ(e.g.,high-browvs.popular,Žorlow-brow)hassimilargoals.Severalrecentworkshopshavebeendedicatedtostyleanalysisintext[15,16,17].Determiningstylisticcharacteristicscanbeusefulinfacetedsearch[10].Anotherproblemthathasbeenconsideredinintelligenceandsecu-ritysettingsisthedetectionofdeceptivelanguage[46,117,329].4.2FeaturesConvertingapieceoftextintoafeaturevectororotherrepresenta-tionthatmakesitsmostsalientandimportantfeaturesavailableisanimportantpartofdata-drivenapproachestotextprocessing.Thereisanextensivebodyofworkthataddressesfeatureselectionformachinelearningapproachesingeneral,aswellasforlearningapproachestai-loredtothespeci“cproblemsofclassictextcategorizationandinfor-mationextraction[101,263].Acomprehensivediscussionofsuchworkisbeyondthescopeofthissurvey.Inthissection,wefocuson“ndingsinfeatureengineeringthatarespeci“ctosentimentanalysis. 4.2Features4.2.1TermPresencevs.FrequencyItistraditionalininformationretrievaltorepresentapieceoftextasafeaturevectorwhereintheentriescorrespondtoindividualterms.Onein”uential“ndinginthesentiment-analysisareaisasfollows.TermfrequencieshavetraditionallybeenimportantinstandardIR,asthepopularityoftf-idfweightingshows;butincontrast,Pangetal.[235]obtainedbetterperformanceusingpresenceratherthanfrequency.Thatis,binary-valuedfeaturevectorsinwhichtheentriesmerelyindi-catewhetheratermoccurs(value1)ornot(value0)formedamoreeectivebasisforreviewpolarityclassi“cationthandidreal-valuedfeaturevectorsinwhichentryvaluesincreasewiththeoccurrencefre-quencyofthecorrespondingterm.This“ndingmaybeindicativeofaninterestingdierencebetweentypicaltopic-basedtextcategorizationandpolarityclassi“cation:Whileatopicismorelikelytobeempha-sizedbyfrequentoccurrencesofcertainkeywords,overallsentimentmaynotusuallybehighlightedthroughrepeateduseofthesameterms.(WediscussedthispointpreviouslyinSection3.2onfactorsthatmakeopinionminingdicult.)Onarelatednote,hapaxlegomena,orwordsthatappearasingletimeinagivencorpus,havebeenfoundtobehigh-precisionindicatorsofsubjectivity[316].Yangetal.[322]lookatraretermsthatarenotlistedinapre-existingdictionary,onthepremisethatnovelversionsofwords,suchasbugfested,Žmightcorrelatewithemphasisandhencesubjectivityinblogs.4.2.2Term-basedFeaturesBeyondTermUnigramsPositioninformation“ndsitswayintofeaturesfromtimetotime.Thepositionofatokenwithinatextualunit(e.g.,inthemiddlevs.neartheendofadocument)canpotentiallyhaveimportanteectsonhowmuchthattokenaectstheoverallsentimentorsubjectivitystatusoftheenclosingtextualunit.Thus,positioninformationissometimesencodedintothefeaturevectorsthatareemployed[158,235].Whetherhigher-order-gramsareusefulfeaturesappearstobeamatterofsomedebate.Forexample,Pangetal.[235]reportthatuni-gramsoutperformbigramswhenclassifyingmoviereviewsbysentiment Classi“cationandExtractionpolarity,butDaveetal.[69]“ndthatinsomesettings,bigramsandtrigramsyieldbetterproduct-reviewpolarityclassi“cation.Riloetal.[254]exploretheuseofasubsumptionhierarchytoformallyde“nedierenttypesoflexicalfeaturesandtherelationshipsbetweentheminordertoidentifyusefulcomplexfeaturesforopinionanalysis.Airoldietal.[5]applyaMarkovBlanketClassi“ertothisproblemtogetherwithameta-heuristicsearchstrategycalledTabusearchtoarriveatadependencystructureencodingaparsimoniousvocabularyforthepositiveandnegativepolarityclasses.ThecontrastivedistanceŽbetweenterms„anexampleofahigh-contrastpairofwordsintermsoftheimplicitevaluationpolaritytheyexpressisdeliciousŽanddirtyŽ„wasusedasanautomaticallycomputedfeaturebySnyderandBarzilay[272]aspartofarating-inferencesystem.4.2.3PartsofSpeechPart-of-speech(POS)informationiscommonlyexploitedinsentimentanalysisandopinionmining.Onesimplereasonholdsforgeneraltex-tualanalysis,notjustopinionmining:part-of-speechtaggingcanbeconsideredtobeacrudeformofwordsensedisambiguation[318].Adjectiveshavebeenemployedasfeaturesbyanumberofresearchers[217,303].Oneoftheearliestproposalsforthedata-drivenpredictionofthesemanticorientationofwordswasdevelopedforadjectives[119].Subsequentworkonsubjectivitydetectionrevealedahighcorrelationbetweenthepresenceofadjectivesandsentencesubjectivity[120].This“ndinghasoftenbeentakenasevidencethat(certain)adjectivesaregoodindicatorsofsentiment,andsometimeshasbeenusedtoguidefeatureselectionforsentimentclassi“cation,inthatanumberofapproachesfocusonthepresenceorpolarityofadjectiveswhentryingtodecidethesubjectivityorpolaritystatusoftextualunits,especiallyintheunsupervisedsetting.Ratherthanfocus-ingonisolatedadjectives,Turney[298]proposedtodetectdocumentsentimentbasedonselectedphrases,wherethephrasesarechosenviaanumberofpre-speci“edpart-of-speechpatterns,mostincludinganadjectiveoranadverb. 4.2FeaturesThefactthatadjectivesaregoodpredictorsofasentencebeingsubjectivedoesnot,however,implythatotherpartsofspeechdonotcontributetoexpressionsofopinionorsentiment.Infact,inastudybyPangetal.[235]onmovie-reviewpolarityclassi“cation,usingonlyadjectivesasfeatureswasfoundtoperformmuchworsethanusingthesamenumberofmostfrequentunigrams.Theresearcherspointoutthatnouns(e.g.,gemŽ)andverbs(e.g.,loveŽ)canbestrongindica-torsforsentiment.Riloetal.[257]speci“callystudiedtheextractionofsubjectivenouns(e.g.,concern,ŽhopeŽ)viabootstrapping.Therehavebeenseveraltargetedcomparisonsoftheeectivenessofadjec-tives,verbs,andadverbs,wherefurthersubcategorizationoftenplaysarole[34,221,316].4.2.4SyntaxTherehavealsobeenattemptsatincorporatingsyntacticrelationswithinfeaturesets.Suchdeeperlinguisticanalysisseemsparticularlyrelevantwithshortpiecesoftext.Forinstance,KudoandMatsumoto[173]reportthatfortwosentence-levelclassi“cationtasks,sentimentpolarityclassi“cationandmodalityidenti“cation(opinion,Žasser-tion,ŽordescriptionŽ),asubtree-basedboostingalgorithmusingdependency-tree-basedfeaturesoutperformedthebag-of-wordsbase-line(althoughtherewerenosigni“cantdierenceswithrespecttousing-gram-basedfeatures).Nonetheless,theuseofhigher-orderanddependencyorconstituent-basedfeatureshasalsobeenconsid-eredfordocument-levelclassi“cation;Daveetal.[69]ontheonehandandGamon[103],Matsumotoetal.[204],andNgetal.[222]ontheotherhandcometooppositeconclusionsregardingtheeectivenessofdependencyinformation.Parsingthetextcanalsoserveasabasisformodelingvalenceshifterssuchasnegation,intensi“ers,anddiminishers[152].Collocationsandmorecomplexsyntacticpatternshavealsobeenfoundtobeusefulforsubjectivitydetection[255,316].4.2.5NegationHandlingnegationcanbeanimportantconcerninopinion-andsentiment-relatedanalysis.Whilethebag-of-wordsrepresentations Classi“cationandExtractionofIlikethisbookŽandIdontlikethisbookŽareconsideredtobeverysimilarbymostcommonly-usedsimilaritymeasures,theonlydieringtoken,thenegationterm,forcesthetwosentencesintooppositeclasses.TheredoesnotreallyexistaparallelsituationinclassicIRwhereasinglenegationtermcanplaysuchaninstrumentalroleinclassi“cation(exceptincaseslikethisdocumentisaboutcarsŽvs.thisdocumentisnotaboutcarsŽ).Itispossibletodealwithnegationsindirectlyasasecond-orderfeatureofatextsegment,thatis,whereaninitialrepresentation,suchasafeaturevector,essentiallyignoresnegation,butthatrepresentationisthenconvertedintoadierentrepresentationthatisnegation-aware.Alternatively,aswasdoneinpreviouswork,negationcanbeencodeddirectlyintothede“nitionsoftheinitialfeatures.Forexample,DasandChen[66]proposeattachingNOTŽtowordsoccurringclosetonegationtermssuchasnoŽordont,ŽsothatinthesentenceIdontlikedeadlines,ŽthetokenlikeŽisconvertedintothenewtokenlike-NOT.ŽHowever,notallappearancesofexplicitnegationtermsreversethepolarityoftheenclosingsentence.Forinstance,itisincorrecttoattachNOTŽtobestŽinNowonderthisisconsideredoneofthebest.ŽNaetal.[220]attempttomodelnegationmoreaccurately.Theylookforspeci“cpart-of-speechtagpatterns(wherethesepatternsdierfordierentnegationwords),andtagthecompletephraseasanegationphrase.Fortheirdatasetofelectronicsreviews,theyobserveabout3%improvementinaccuracyresultingfromtheirmodelingofnegations.Furtherimprovementprobablyneedsdeeper(syntactic)analysisofthesentence[152].Anotherdicultywithmodelingnegationisthatnegationcanoftenbeexpressedinrathersubtleways.Sarcasmandironycanbequitediculttodetect,butevenintheabsenceofsuchsophisticatedrhetoricaldevices,westillseeexamplessuchas[it]avoidsallclich´andpredictabilityfoundinHollywoodmoviesŽ(internetreviewbyMargie24Ž)„thewordavoidŽhereisanarguablyunexpectedpolarityreverser.ŽWilsonetal.[319]discussothercomplexnegation 4.2Features4.2.6Topic-OrientedFeaturesInteractionsbetweentopicandsentimentplayanimportantroleinopinionmining.Forexample,inahypotheticalarticleonWal-mart,thesentencesWal-martreportsthatpro“tsroseŽandTargetreportsthatpro“tsroseŽcouldindicatecompletelydierenttypesofnews(goodvs.bad)regardingthesubjectofthedocument,Wal-mart[116].Tosomeextent,topicinformationcanbeincorporatedintofeatures.MullenandCollier[217]examinetheeectivenessofvariousfeaturesbasedontopic(e.g.,theytakeintoaccountwhetheraphrasefollowsareferencetothetopicunderdiscussion)undertheexperimentalcondi-tionthattopicreferencesaremanuallytagged.Thus,forexample,inareviewofaparticularworkofartormusic,referencestotheitemreceiveaTHIS WORKŽtag.Fortheanalysisofpredictiveopinions(e.g.,whetheramessagewithrespecttopartytowin),KimandHovy[160]proposetoemployfeaturegeneralization.Speci“cally,foreachsentenceineachpartynameandcandidatenameisreplacedbyPARTY(i.e.,orOTHER(not).PatternssuchasPARTYwillwin,ŽgoPARTYagain,ŽandOTHERwillwinŽarethenextractedas-gramfeatures.Thisschemeoutperformsusingsimple-gramfeaturesbyabout10%inaccuracywhenclassifyingwhichpartyagivenmessagepredictstowin.Topic…sentimentinteractionhasalsobeenmodeledthroughparsetreefeatures,especiallyinopinionextractiontasks.Relationshipsbetweencandidateopinionphrasesandthegivensubjectinadepen-dencytreecanbeusefulinsuchsettings[244].PartTwo:ApproachesTheapproacheswewillnowdiscussallsharethecommonthemeofmappingagivenpieceoftext,suchasadocument,paragraph,orsentence,toalabeldrawnfromapre-speci“ed“nitesetortoarealnumber.AsdiscussedinSection4.1,opinion-orientedclassi“cationcanrangefromsentiment-polaritycategorizationinreviewstodetermining However,unlikeclassi“cationandregression,rankingdoesnotrequiresuchamappingforeachindividualdocument. Classi“cationandExtractionthestrengthofopinionsinnewsarticlestoidentifyingperspectivesinpoliticaldebatestoanalyzingmoodinblogs.Partofwhatispar-ticularlyinterestingabouttheseproblemsisthenewchallengesandopportunitiesthattheypresenttous.Intheremainderofthissection,weexaminedierentsolutionsproposedintheliteraturetotheseprob-lems,looselyorganizedarounddierentaspectsofmachinelearningapproaches.Althoughtheseaspectsmayseemtobegeneralthemesunderlyingmostmachinelearningproblems,weattempttohighlightwhatisuniqueforsentimentanalysisandopinionminingtasks.Forinstance,someunsupervisedlearningapproachesfollowasentiment-speci“cparadigmforhowlabelsforwordsandphrasesareobtained.Also,supervisedandsemi-supervisedlearningapproachesforopinionminingandsentimentanalysisdierfromstandardapproachestoclas-si“cationtasksinpartduetothedierentfeaturesinvolved;butwealsoseeagreatvarietyofattemptsatmodelingvariouskindsofrela-betweenitems,classes,orsub-documentunits.Someoftheserelationshipsareuniquetoourtasks;somebecomemoreimperativetomodelduetothesubtletiesoftheproblemsweaddress.Therestofthissectionisorganizedasfollows.Section4.3coverstheimpactthattheincreasedavailabilityoflabeleddatahashad,includingtheriseofsupervisedlearning.Section4.4considersissuessurround-ingtopicanddomaindependencies.Section4.5describesunsupervisedapproaches.Wenextconsiderincorporatingrelationshipsbetweenvar-ioustypesofentities(Section4.6).Thisisfollowedbyasectiononincorporatingdiscoursestructure(4.7).Section4.8isconcernedwiththeuseoflanguagemodels.Finally,Section4.9investigatescertainissuesinextractionthataresomewhatparticulartoit,andthusarenototherwisediscussedinthesectionsthatprecedeit.Onesuchissueistheidenti“cationoffeaturesandexpressionsofopinionsinreviews.Anothersetofissuesarisewhenopinion-holderidenti“cationneedstobeapplied.4.3TheImpactofLabeledDataWorkuptotheearly1990sonsentiment-relatedtasks,suchasdeter-minationofpointofviewandothertypesofcomplexrecognition 4.3TheImpactofLabeledDataproblems,generallyassumedtheexistenceofsub-systemsforsome-timesrathersophisticatedNLPtasks,rangingfromparsingtotheres-olutionofpragmaticambiguities[121,262,310,311,313].GiventhestateoftheartofNLPatthetimeand,justasimportantly,thelackofsucientamountsofappropriatelabeleddata,theresearchdescribedintheseearlypapersnecessarilyconsideredonlyproposalsforsystemsorprototypesystemswithoutlarge-scaleempiricalevaluation;typi-cally,nolearningcomponentwasinvolved(aninterestingexceptionisWiebeandBruce[306],whoproposedbutdidnotevaluatetheuseofdecomposablegraphicalmodels).Operationalsystemswerefocusedonsimplerclassi“cationtasks,relativelyspeaking(e.g.,categorizationaccordingtoaect),andreliedinsteadonrelativelyshallowanalysisbasedonmanuallyconstructeddiscriminant-wordlexicons[133,296],sincewithsuchalexiconinhand,onecanclassifyatextunitbycon-sideringwhichindicatortermsorphrasesfromthelexiconappearinthegiventext.Theriseofthewidespreadavailablitytoresearchersoforganizedcollectionsofopinionateddocuments(twoexamples:“nancial-newsdiscussionboardsandreviewaggregationsitessuchasEpinions)andofothercorporaofmoregeneraltexts(e.g.,newswire)andofotherresources(e.g.,WordNet)wasamajorcontributortoalargeshiftindirectiontowarddata-drivenapproaches.Tobeginwith,theavailabilityoftherawtextsthemselvesmadeitpossibletolearnopinion-relevantlexiconsinanunsupervisedfashion,asisdiscussedinmoredetailinSection4.5.1,ratherthancreatethemmanually.Buttheincreaseintheamountoflabeledsentiment-relevantdata,inparticular„wherethelabelsarederivedeitherthroughexplicitresearcher-initiatedmanualannotationeortsorbyothermeans(seeSection7.1.1)„wasamajorcontributingfactortoactivityinbothsupervisedandunsupervisedlearning.Intheunsupervisedcase,describedinSection4.5,itfacili-tatedresearchbymakingitpossibletoevaluateproposedalgorithmsinalarge-scalefashion.Unsupervised(andsupervised)learningalsobene“ttedfromtheimprovementstosub-componentsystemsfortagging,parsing,andsoonthatoccurredduetotheapplicationofdata-driventechniquesinthoseareas.And,ofcourse,theimportancetosupervisedlearningofhavingaccesstolabeleddataisparamount. Classi“cationandExtractionOneveryactivelineofworkcanberoughlyglossedastheappli-cationofstandardtext-categorizationalgorithms,surveyedbySebas-tiani[263],toopinion-orientedclassi“cationproblems.Forexample,Pangetal.[235]compareNaiveBayes,SupportVectorMachines,andmaximum-entropy-basedclassi“cationonthesentiment-polarityclassi“cationproblemformoviereviews.Moreextensivecomparisonsoftheperformanceofstandardmachinelearningtechniqueswithothertypesoffeaturesorfeatureselectionschemeshavebeenengagedinlaterwork[5,69,103,204,217];seeSection4.2formoredetail.Wenotethattherehasbeensomeresearchthatexplicitlyconsidersregressionorordinal-regressionformulationsofopinion-miningprob-lems[109,201,233,320]:examplequestionsinclude,howpositiveisthistext?Žandhowstronglyheldisthisopinion?ŽAnotherrolethatlabeleddatacanplayisinlexiconinduction,although,asdetailedinSection4.5.1,theuseoftheunsupervisedparadigmismorecommon.Morinagaetal.[215]andBethardetal.[37]createanopinion-indicatorlexiconbylookingfortermsthattendtobeassociatedmorehighlywithsubjective-genrenewswire,suchaseditorials,thanwithobjective-genrenewswire.DasandChen[66,67]startwithamanuallycreatedlexiconspeci“ctothe“nancedomain(exampleterms:bull,ŽbearŽ),butthenassigndiscriminationweightstotheitemsinthelexiconbasedontheircooccurrencewithpositivelylabeledvs.negativelylabeleddocuments.Othertopicsrelatedtosupervisedlearningarediscussedinsomeofthemorespeci“csectionsthatfollow.4.4DomainAdaptationandTopic-SentimentInteraction4.4.1DomainConsiderationsTheaccuracyofsentimentclassi“cationcanbein”uencedbythedomainoftheitemstowhichitisapplied[21,40,88,249,298].Onereasonisthatthesamephrasecanindicatedierentsentimentindierentdomains:recalltheBobBlandexamplementionedear-lier,wheregoreadthebookŽmostlikelyindicatespositivesenti-mentforbookreviews,butnegativesentimentformoviereviews;orconsiderTurneys[298]observationthatunpredictableŽisapositive 4.4DomainAdaptationandTopic-SentimentInteractiondescriptionforamovieplotbutanegativedescriptionforacarssteer-ingabilities.Dierenceinvocabulariesacrossdierentdomainsalsoaddstothedicultywhenapplyingclassi“erstrainedonlabeleddatainonedomaintotestdatainanother.Severalstudiesshowconcreteperformancedierencesfromdomaintodomain.Inanexperimentauxiliarytotheirmainwork,Daveetal.[69]applyaclassi“ertrainedonapre-assembleddatasetofreviewsofacertaintypetoproductreviewsofadierenttype.Buttheydonotinvestigatetheeectoftraining-testmis-matchindetail.Engstr¨om[88]studieshowtheaccuracyofsentimentclassi“cationcanbein”uencedbytopic.Read[249]“ndsstandardmachinelearningtechniquesforopinionanalysistobebothdomain-dependent(withdomainsrangingfrommoviereviewstonewswirearticles)andtemporallydependent(basedondatasetsspanningdierentrangesoftimeperiodsbutwrittenatleastoneyearapart).Owsleyetal.[229]alsoshowtheimportanceofbuildingadomain-speci“cclassi“er.AueandGamon[21]exploredierentapproachestocustomizingasentimentclassi“cationsystemtoanewtargetdomainintheabsenceoflargeamountsoflabeleddata.Thedierenttypesofdatatheyconsiderrangefromlengthymoviereviewstoshort,phrase-leveluserfeedbackfromwebsurveys.Duetosigni“cantdierencesinthesedomainsalongseveraldimensions,simplyapplyingtheclassi“erlearnedondatafromonedomainbarelyoutperformsthebaselineforanotherdomain.Infact,with100or200labeleditemsinthetargetdomain,anEMalgorithmthatutilizesin-domainunlabeleddataandignoresout-of-domaindataaltogetheroutperformsthemethodbasedexclusivelyon(bothin-andout-of-domain)labeleddata.Yangetal.[321]takethefollowingsimpleapproachtodomaintransfer:they“ndfeaturesthataregoodsubjectivityindicatorsinbothoftwodierentdomains(intheircase,moviereviewsversusproductreviews),andconsiderthesefeaturestobegooddomain-independentBlitzeretal.[40]explicitlyaddressthedomaintransferprob-lemforsentimentpolarityclassi“cationbyextendingthestructuralcorrespondencelearningalgorithm)[11],achievinganaverageof46%improvementoverasupervisedbaselineforsentimentpolarity Classi“cationandExtractionclassi“cationof5dierenttypesofproductreviewsminedfromAma-zon.com.ThesuccessofSCLdependsonthechoiceofinbothdomains,basedonwhichthealgorithmlearnsaprojectionmatrixthatmapsfeaturesinthetargetdomainintothefeaturespaceofthesourcedomain.UnlikepreviousworkthatappliedSCLfortag-ging,wherefrequentwordsinbothdomainshappenedtobegoodpredictorsforthetargetlabels(part-of-speechtags),andwerethere-foregoodcandidatesforpivots,herethepivotsarechosenfromthosewithhighestmutualinformationwiththesourcelabel.Theprojec-tionisabletocapturecorrespondences(intermsofexpressedsenti-mentpolarity)betweenpredictableŽforbookreviewsandpoorlydesignedŽforkitchenappliancereviews.Furthermore,theyalsoshowthatameasureofdomainsimilaritycancorrelatewellwiththeeaseofadaptationfromonedomaintoanother,therebyenablingbetterschedulingofannotationeorts.Cross-lingualadaptation.Muchoftheliteratureonsentimentanal-ysishasfocusedontextwritteninEnglish.Asaresult,mostoftheresourcesdeveloped,suchaslexicawithsentimentlabels,areinEnglish.Adaptingsuchresourcestootherlanguagesisrelatedtodomainadap-tation:theformeraimsatadaptingfromthesourcelanguagetothetargetlanguageinordertoutilizeexistingresourcesinthesourcelan-guage;whereasthelatterseekstoadaptfromonedomaintoanotherinordertoutilizethelabeleddataavailableinthesourcedomain.Notsurprisingly,weobserveparalleltechniques:insteadofprojectingunseentokensfromthenewdomainintotheoldoneviaco-occurrenceinformationinthecorpus[40],expressionsinthenewlanguagecanbealignedwithexpressionsinthelanguagewithexistingresources.Forinstance,onecandeterminecross-lingualprojectionsthroughbilin-gualdictionaries[209],orparallelcorpora[159,209].Alternatively,onecansimplyapplymachinetranslationasasentiment-analysispre-processingstep[32].4.4.2Topic(andsub-topicorfeature)ConsiderationsEvenwhenoneishandlingdocumentsinthesamedomain,thereisstillanimportantandrelatedsourceofvariation:documenttopic.Itis 4.4DomainAdaptationandTopic-SentimentInteractiontruethatsometimesthetopicispre-determined,suchasinthecaseoffree-formresponsestosurveyquestions.However,inmanysentimentanalysisapplications,topicisanotherimportantconsideration;forinstance,onemaybesearchingtheblogospherejustforopinionatedcommentsaboutCornellUniversity.Oneapproachtointegratingsentimentandtopicwhenoneislookingforopinionateddocumentsonaparticularuser-speci“edtopicistosimply“rstperformoneanalysispass,sayfortopic,andthenana-lyzetheresultswithrespecttosentiment[134].(SeeSebastiani[263]forasurveyofmachinelearningapproachestotopic-basedtextcatego-rization.)Suchatwo-passapproachwastakenbyanumberofsystemsattheTRECBlogtrackin2006,accordingtoOunisetal.[227],andothers[234].Alternatively,onemayjointlymodeltopicandsentimentsimultaneously[84,206],ortreatoneasapriorfortheother[85].Buteveninthecasewhereoneisworkingwithdocumentsknowntobeon-topic,notallthesentenceswithinthesedocumentsneedtobeon-topic.HurstandNigam[134,225]proposeatwo-passprocesssimilartothatmentionedabove,whereeachsentenceinthedocumentis“rstlabeledason-topicoro-topic,andsentimentanalysisisconductedonlyforthosethatarefoundtobeon-topic.Theirworkreliesonacollocationassumptionthatifasentenceisfoundtobetopicalandtoexhibitasentimentpolarity,thenthepolarityisexpressedwithrespecttothetopicinquestion.ThisassumptionisalsousedbyNasukawaandYi[221]andGamon[103].Arelatedissueisthatitisalsopossibleforadocumenttocontainmultipletopics.Forinstance,areviewcanbeacomparisonoftwoprod-ucts.Or,evenwhenasingleitemisdiscussedinadocument,onecanconsiderfeaturesoraspectsoftheproducttorepresentmultiple(sub-)topics.Ifallbutthemaintopiccanbedisregarded,thenonepossibil-ityisasfollows:simplyconsidertheoverallsentimentdetectedwithinthedocument„regardlessofthefactthatitmaybeformedfromamixtureofopinionsondierenttopics„tobeassociatedwiththeprimarytopic,leavingthesentimenttowardothertopicsundetermined(indeed,theseothertopicsmayneverbeidenti“ed).Butitismorecommontotrytoidentifythetopicsandthendeterminetheopinionsregardingeachofthesetopicsseparately.Insomework,theimportant Classi“cationandExtractiontopicsarepre-de“ned,makingthistaskeasier[323].Inotherworkinextraction,thisisnotthecase;theproblemoftheidenti“cationofproductfeaturesisaddressedinSection4.9,andSection4.6.3discussestechniquesthatincorporaterelationshipsbetweendierentfeatures.4.5UnsupervisedApproaches4.5.1UnsupervisedLexiconInductionQuiteanumberofunsupervisedlearningapproachestakethetackof“rstcreatingasentimentlexiconinanunsupervisedmanner,andthendeterminingthedegreeofpositivity(orsubjectivity)ofatextunitviasomefunctionbasedonthepositiveandnegative(orsimplysubjective)indicators,asdeterminedbythelexicon,withinit.EarlyexamplesofsuchanapproachincludeHatzivassiloglouandWiebe[120],Turney[298],andYuandHatzivassiloglou[326].Someinterestingvariantsofthisgeneraltechniquearetousethepolarityoftheprevioussentenceasatie-breakerwhenthescoringfunctiondoesnotindicateade“nitiveclassi“cationofagivensentence[130],ortoincorporateinformationdrawnfromsomelabeleddataaswell[33].Acrucialcomponenttoapplyingthistypeoftechniqueis,ofcourse,thecreationofthelexiconviatheunsupervisedlabelingofwordsorphraseswiththeirsentimentpolarity(alsoreferredtoassemanticori-intheliterature)orsubjectivitystatus[12,45,89,90,91,92,119,130,143,146,257,286,288,289,290,299,303,304].Inearlywork,HatzivassiloglouandMcKeown[119]presentanapproachbasedonlinguisticheuristics.Theirtechniqueisbuiltonthefactthatinthecaseofpolarityclassi“cation,thetwoclassesofinterestrepresentopposites,andwecanutilizeoppositionconstraintsŽtohelpmakelabelingdecisions.Speci“cally,constraintsbetweenpairsofadjectivesareinducedfromalargecorpusbylookingatwhetherthetwowordsarelinkedbyconjunctionssuchasbutŽ(evidenceforopposingorientations:elegantbutover-pricedŽ)orandŽ(evidenceforthesameorientation:cleverandinformativeŽ).Thetaskisthencastasaclusteringorbinary-partitioningproblemwheretheinferredconstraintsaretobeobeyed. Forthepurposesofthecurrentdiscussion,weignorethesupervisedaspectsoftheirwork. 4.5UnsupervisedApproachesOncetheclusteringhasbeencompleted,thelabelsofpositiveorientationŽandnegativeorientationŽneedtobeassigned;ratherthanuseexternalinformationtomakethisdecision,HatzivassiloglouandMcKeown[119]simplygivethepositiveorientationŽlabeltotheclasswhosemembershavethehighestaveragefrequency.Butinotherwork,seedwordsforwhichthepolarityisalreadyknownareassumedtobesupplied,inwhichcaselabelscanbedeterminedbypropagatingthelabelsoftheseedwordstotermsthatco-occurwiththemingeneraltextorindictionaryglosses,ortosynonyms,wordsthatco-occurwiththeminotherWordNet-de“nedrelations,orotherrelatedwords(and,alongthesamelines,oppositelabelscanbegivenbasedonsimilarinforma-tion)[12,20,89,90,130,146,148,155,288,298,299].Thejointuseofmutualinformationandco-occurrenceinageneralcorpuswithasmallsetofseedwords,atechniqueemployedbyanumberofresearchers,wassuggestedbyTurney[298];hisideawastoessentiallycomparewhetheraphrasehasagreatertendencytoco-occurwithincertaincontextwin-dowswiththewordpoorŽorwiththewordexcellent,ŽtakingcaretoaccountforthefrequencieswithwhichpoorŽandexcellentŽoccur,wherethedataonwhichsuchcomputationsaretobemadecomefromtheresultsofparticulartypesofWebsearch-enginequeries.Muchoftheworkcitedabovefocusesonidentifyingthepriorpolar-oftermsorphrases,tousetheterminologyofWilsonetal.[319],orwhatwemightbyextensioncalltermsandphrasespriorsubjectivity,meaningthesemanticorientationthattheseitemsmightbesaidtogenerallybearwhentakenoutofcontext.Suchpriorinformationismeant,ofcourse,toservetowardfurtheridentifyingcontextualpolarityorsubjectivity[242,319].Lexiconsforgeneration.ItisworthnotingthatHigashinakaetal.[122]focusonalexicon-inductiontaskthatfacilitatesnaturallanguagegeneration.Theyconsidertheproblemoflearningadictionarythatmapssemanticrepresentationstoverbalizations,wherethedatacomesfromreviews.Althoughreviewsarenotexplicitlymarkedupwithrespecttotheirsemantics,theydocontainexplicitratingandaspectindicators.Forexample,fromsuchdata,theylearnthatonewaytoexpresstheconceptatmosphererating:5Žisniceandcomfortable.Ž Classi“cationandExtraction4.5.2OtherUnsupervisedApproachesBootstrappingisanotherapproach.Theideaistousetheoutputofanavailableinitialclassi“ertocreatelabeleddata,towhichasupervisedlearningalgorithmmaybeapplied.RiloandWiebe[255]usethismethodinconjunctionwithaninitialhigh-precisionclassi“ertolearnextractionpatternsforsubjectiveexpressions.(Aninteresting,ifsimple,patterndiscovered:thenounfact,ŽasinThefactis...,Žexhibitshighcorrelationwithsubjectivity.)KajiandKitsuregawa[142]useasimilarmethodtoautomaticallyconstructacorpusofHTMLdocumentswithpolaritylabels.Similarworkinvolvingself-trainingisdescribedinWiebeandRilo[314]andRiloetal.[257].PangandLee[234]experimentwithadierenttypeofunsuper-visedapproach.Theproblemtheyconsideristoranksearchresultsforreview-seekingqueriessothatdocumentsthatcontainevaluativetextareplacedaheadofthosethatdonot.TheyproposeasimpleblankslateŽmethodbasedontherarityofwordswithinthesearchresultsthatareretrieved(asopposedtowithinatrainingcorpus).Theintuitionisthatwordsthatappearfrequentlywithinthesetofdocu-mentsreturnedforanarrowtopic(thesearchset)aremorelikelytodescribeobjectiveinformation,sinceobjectiveinformationshouldtendtoberepeatedwithinthesearchset;incontrast,itwouldseemthatpeoplesopinionsandhowtheyexpressthemmaydier.Counterin-tuitively,though,PangandLee“ndthatwhenthevocabularytobeconsideredisrestrictedtothemostfrequentwordsinthesearchset(asanoise-reductionmeasure),thesubjectivedocumentstendtobethosethatcontainahigherpercentageofwordsthatarerare,perhapsduetothefactthatmostreviewscoverthemainfeaturesoraspectsoftheobjectbeingreviewed.(Thisechoesourpreviousobservationthatunderstandingtheobjectiveinformationinadocumentcanbecrit-icalforunderstandingtheopinionsandsentimentitexpresses.)Theperformanceofthissimplemethodisonparwiththatofamethodbasedonastate-of-the-artsubjectivitydetectionsystem,Opinion-Finder[255,314].AcomparisonofsupervisedandunsupervisedmethodscanbefoundinChaovalitandZhou[55]. 4.6Classi“cationBasedonRelationshipInformation4.6Classi“cationBasedonRelationshipInformation4.6.1RelationshipsBetweenSentencesandBetweenDocumentsOneinterestingcharacteristicofdocument-levelsentimentanalysisisthefactthatadocumentcanconsistofsub-documentunits(para-graphsorsentences)withdierent,sometimesopposinglabels,wheretheoverallsentimentlabelforthedocumentisafunctionofthesetorsequenceoflabelsatthesub-documentlevel.Asanalternativetotreatingadocumentasabagoffeatures,then,therehavebeenvar-iousattemptstomodelthestructureofadocumentviaanalysisofsub-documentunits,andtoexplicitlyutilizetherelationshipsbetweentheseunits,inordertoachieveamoreaccurategloballabeling.Mod-elingtherelationshipsbetweenthesesub-documentunitsmayleadtobettersub-documentlabelingaswell.Anopinionatedpieceoftextcanoftenconsistofevaluativeportions(thosethatcontributetotheoverallsentimentofthedocument,e.g.,thisisagreatmovieŽ)andnon-evaluativeportions(e.g.,thePower-pugirlslearnedthatwithgreatpowercomesgreatresponsibilityŽ).Theoverlapbetweenthevocabularyusedforevaluativeportionsandnon-evaluativeportionsmakesitparticularlyimportanttomodelthecontextinwhichthesetextsegmentsoccur.PangandLee[232]pro-poseatwo-stepprocedureforpolarityclassi“cationformoviereviews,whereinthey“rstdetecttheobjectiveportionsofadocument(e.g.,plotdescriptions)andthenapplypolarityclassi“cationtotheremain-derofthedocumentaftertheremovalofthesepresumablyuninforma-tiveportions.Importantly,insteadofmakingthesubjective…objectivedecisionforeachsentenceindividually,theypostulatethattheremightbeacertaindegreeofcontinuityinsubjectivitylabels(anauthorusu-allydoesnotswitchtoofrequentlybetweenbeingsubjectiveandbeingobjective),andincorporatethisintuitionbyassigningpreferencesforpairsofnearbysentencestoreceivesimilarlabels.Allthesentencesinthedocumentarethenlabeledasbeingeithersubjectiveorobjectivethroughacollectiveclassi“cationprocess,wherethisprocessemploysareformulationofthetaskasoneof“ndingaminimums-tcutintheappropriategraph[165].Twokeypropertiesofthisapproachare(1)it Classi“cationandExtractionaordsthe“ndingofansolutiontotheunderlyingoptimizationproblemviaanalgorithmthatisecientbothintheoryandinprac-tice,and(2)itmakesiteasytointegrateawidevarietyofknowledgesourcesbothaboutindividualpreferencesthatitemsmayhaveforoneortheotherclassandaboutthepair-wisepreferencesthatitemsmayhaveforbeingplacedinthesameclassregardlessofwhichparticularclassthatis.Follow-upworkhasusedalternatetechniquestodetermineedgeweightswithinaminimum-cutframeworkforvarioustypesofsentiment-relatedbinaryclassi“cationproblemsatthedocumentlevel[3,27,111,294].(Themoregeneralrating-inferenceproblemcanalso,inspecialcases,besolvedusingaminimum-cutformulation[233].)Othershaveconsideredmoresophisticatedgraph-basedtechniques[109].4.6.2RelationshipsBetweenDiscourseParticipantsAninterestingsettingforopinionminingiswhenthetextstobeana-lyzedformpartofarunningdiscussion,suchasinthecaseofindividualturnsinpoliticaldebates,poststoonlinediscussionboards,andcom-mentsonblogposts.Onefascinatingaspectofthiskindofsettingistherichinformationsourcethatreferencesbetweensuchtextsrepresent,sincesuchinformationcanbeexploitedforbettercollectivelabelingofthesetofdocuments.Utilizingsuchrelationshipscanbeparticularlyhelpfulbecausemanydocumentsinthesettingswehavedescribedcanbequiteterse(orcomplicated),andhencediculttoclassifyontheirown,butwecaneasilycategorizeadicultdocumentifwe“ndwithinitindicationsofagreementwithaclearly,say,positivetext.Basedonmanualexaminationof100responsesinnewsgroupsdevotedtothreedistinctcontroversialtopics(abortion,guncontrolandimmigration),Agrawaletal.[4]observethattherelationshipbetweentwoindividualsintheresponded-toŽnetworkismorelikelytobeantagonistic„overall,74%oftheresponsesexaminedwerefoundtobeantagonistic,whereasonly7%werefoundtobereinforcing.Bythenassumingthatrespond-toŽlinksimplydisagreement,theyeectivelyclassifyusersintooppositecampsviagraphpartitioning,outperform-ingmethodsthatdependsolelyonthetextualinformationwithinaparticulardocument. 4.6Classi“cationBasedonRelationshipInformationSimilarly,MullenandMalouf[218]examinequotingŽbehavioramongusersofthepolitics.comdiscussionsite„ausercanrefertoanotherpostbyquotingpartofitorbyaddressingtheotheruserbynameoruserID„whohavebeenclassi“edaseitherliberalorcon-servative.Theresearchers“ndthatasigni“cantfractionofthepostsofinteresttothemcontainquotedmaterial,andthat,incontrasttointer-bloglinkingpatternsdiscussedinAdamicandGlance[2],whereliberalandconservativeblogsiteswerefoundtotendtolinktositesofsimilarpoliticalorientations,andinaccordancewiththeAgrawaletal.[4]“ndingscitedabove,politics.composterstendtoquoteusersattheoppositeendofthepoliticalspectrum.Toperformthe“nalpolitical-orientationclassi“cation,usersareclusteredsothatthosewhotendtoquotethesameentitiesareplacedinthesamecluster.(Efron[83]similarlyusesco-citationanalysisforthesameproblem.)Ratherthanassumethatquotingalwaysindicatesagreementordisagreementregardlessofthecontext,Thomasetal.[294]buildanagreementdetectorforthetaskofanalyzingtranscriptsofcongressional”oor-debates,wheretheclassi“ercategorizescertainexplicitreferencestootherspeakersasrepresentingagreement(e.g.,IheartilysupportMrSmithsviews!Ž)ordisagreement.Theythenencodeevidenceofahighlikelihoodofagreementbetweentwospeakersasarelationshipconstraintbetweentheutterancesmadebythespeakers,andcollec-tivelyclassifytheindividualspeechesastowhethertheysupportoropposethelegislationunderdiscussion,usingaminimum-cutformula-tionoftheclassi“cationproblem,asdescribedabove.Follow-upworkattemptstomakemorere“neduseofdisagreementinformation[27].4.6.3RelationshipsBetweenProductFeaturesPopescuandEtzioni[244]treatthelabelingofopinionwordsregard-ingproductfeaturesasacollectivelabelingprocess.Theyproposeaniterativealgorithmwhereinthepolarityassignmentsforindividualwordsarecollectivelyadjustedthrougharelaxation-labelingprocess.StartingfromglobalŽwordlabelscomputedoveralargetextcollec-tionthatre”ectthesentimentorientationforeachparticularwordingeneralsettings,PopescuandEtzionigraduallyre-de“nethelabelfrom Classi“cationandExtractiononethatisgenerictoonethatisspeci“ctoareviewcorpustoonethatisspeci“ctoagivenproductfeatureto,“nally,onethatisspeci“ctotheparticularcontextinwhichthewordoccurs.Theymakesuretorespectsentence-levellocalconstraintsthatopinionsconnectedbyconnectivessuchasbutŽorandŽshouldreceiveoppositeorthesamepolarities.TheideaofutilizingdiscourseinformationtohelpwiththeinferenceofrelationshipsbetweenproductattributescanalsobefoundintheworkofSnyderandBarzilay[272],whoutilizeagreementinformationinataskwhereonemustpredictratingsformultipleaspectsofthesameitem(e.g.,foodandambianceforarestaurant).Theirapproachistoconstructalinearclassi“ertopredictwhetherallaspectsofaproductaregiventhesamerating,andcombinethispredictionwiththatofindividual-aspectclassi“erssoastominimizeacertainlossfunction(whichtheytermthegriefŽ).Interestingly,SnyderandBarzilay[272]giveanexamplewhereacollectionofinde-pendentaspect-ratingpredictorscannotassignacorrectsetofaspectratings,butaugmentationwiththeiragreementclassi“cationallowsperfectratingassignment;intheirspeci“cexample,theagreementclassi“erisabletousethepresenceofthephrasebutnotŽtopredictacontrastingratingbetweentwoaspects.AnimportantobservationthatSnyderandBarzilay[272]makeabouttheirformulationisthathavingthepieceofinformationthatallaspectratingsagreecutsdownthespaceofpossibleratingtuplestoafargreaterdegreethanhavingtheinformationthatnotalltheaspectratingsarethesame.Notethattheconsiderationsdiscussedhererelatetothetopic-speci“cnatureofopinionsthatwediscussedinthecontextofdomainadaptationinSection4.4.4.6.4RelationshipsBetweenClassesRegressionformulations(whereweincludeordinalregressionunderthisumbrellaterm)arequitewell-suitedtotheratingreferenceproblemofpredictingthedegreeofpositivityinopinionateddocumentssuchasproductreviews,andtosimilarproblemssuchasdeterminingthestrengthwithwhichanopinionisheld.Inasense,regressionimplic-itlymodelssimilarityrelationshipsbetweenclassesthatcorrespondto 4.6Classi“cationBasedonRelationshipInformationpointsonascale,suchasthenumberofstarsŽgivenbyareviewer.Incontrast,standardmulti-classcategorizationfocusesoncapturingthedistinctfeaturespresentineachclass,andignoresthefactthat5starsŽismuchmorelike4starsŽthan2stars.ŽOnamoviereviewdataset,PangandLee[233]observethataone-vs-allmulti-classcategoriza-tionschemecanoutperformregressionforathree-classclassi“cationproblem(positive,neutral,andnegative),perhapsduetoeachclassexhibitingasucientlydistinctvocabulary,butformore“ne-grainedclassi“cation,regressionemergesasthebetterofthetwo.Furthermore,whileregression-basedmodelsimplicitlyencodetheintuitionthatsimilaritemsshouldreceivesimilarlabels,PangandLee[233]formulateratinginferenceasametriclabelingproblem-[164],sothatanaturalnotionofdistancebetweenclasses(2starsŽand3starsŽaremoresimilartoeachotherthan1starŽand4starsŽare)iscapturedexplicitly.Morespeci“cally,anoptimallabelingiscomputedthatbalancestheoutputofaclassi“erthatconsidersitemsinisolationwiththeimportanceofassigningsimilarlabelstosimilaritems.KoppelandSchler[167]considerasimilarversionofthisproblem,butwhereoneoftheclasses,correspondingtoobjective,Ždoesnotlieonthepositive-to-negativecontinuum.GoldbergandZhu[109]presentagraph-basedalgorithmthataddressestheratinginferenceprobleminthesemi-supervisedlearningsetting,whereaclosed-formsolutiontotheunderlyingoptimizationproblemisfoundthroughcomputationonamatrixinducedbyagraphrepresentinginter-documentsimilarityrelationships,andthelossfunctionencodesthedesireforsimilaritemstoreceivesimilarlabels.MaoandLebanon[201](MaoandLebanon[200]isashorterversion)proposetouseisotonicconditionalrandom“eldstocapturetheordinallabelsoflocal(sentence-level)sentiments.Givenwordsthatarestronglyassociatedwithpositiveandnegativesentiment,theyformulateconstraintsontheparameterstore”ecttheintuitionthataddingapositive(negative)wordshouldaectthelocalsentimentlabelpositively(negatively).Wilsonetal.[320]treatclassi“cation(e.g.,classifyinganopinionaccordingtoitsstrength)asanordinalregressiontask.McDonaldetal.[205]leveragerelationshipsbetweenlabelsassignedatdierentclassi“cationstages,suchasthewordlevelorsentencelevel, Classi“cationandExtraction“ndingthata“ne-to-coarseŽcategorizationprocedureisaneectivestrategy.4.7IncorporatingDiscourseStructureComparedtothecasefortraditionaltopic-basedinformationaccesstasks,discoursestructure(e.g.,twistsandturnsindocuments)tendstohavemoreeectonoverallsentimentlabels.Forinstance,Pangetal.[235]observethatsomeformofdiscoursestructuremodelingcanhelptoextractthecorrectlabelinthefollowingexampleIhatetheSpiceGirls....[3thingstheauthorhatesaboutthem]...WhyIsawthismovieisareally,really,reallylongstory,butIdid,andonewouldthinkIddespiseeveryminuteofit.But...Okay,Imashamedofit,butIenjoyedit.Imean,Iadmititsareallyawfulmovie,...[they]actwackyashell...theninth”oorofhell...acheap[beep]movie...Theplotissuchamessthatitsterrible.ButIlovedit.Inspiteofthepredominantnumberofnegativesentences,theoverallsentimenttowardthemovieunderdiscussionispositive,largelyduetotheorderinwhichthesesentencesarepresented.Needlesstosay,suchinformationislostinabag-of-wordsrepresentation.Earlyworkattemptstopartiallyaddressthisproblemviaincor-poratinglocationinformationinthefeatureset[235].Speci“cally,thepositionatwhichatokenappearscanbeappendedtothetokenitselftoformposition-taggedfeatures,sothatthesameunigramappearingin,say,the“rstquarterandthelastquarterofthedocumentaretreatedastwodierentfeatures;buttheperformanceofthissimpleschemedoesnotdiergreatlyfromthatwhichresultsfromusingunigramsOnarelatednote,ithasbeenobservedthatpositionmattersinthecontextofsummarizingsentimentinadocument.Inparticular,incontrasttotopic-basedtextsummarization,wherethebeginningsofarticlesusuallyserveasstrongbaselinesintermsofsummarizingtheobjectiveinformationinthem,thelastsentencesofareviewhave 4.8LanguageModelsbeenshowntoserveasamuchbettersummaryoftheoverallsentimentofthedocumentthanthe“rstsentences,andtobealmostasgoodasusingthemost(automatically-computed)subjectivesentences,intermsofhowaccuratelytheyrepresenttheoverallsentimentofthedocument[232].TheoriesoflexicalcohesionmotivatetherepresentationusedbyDevittandAhmad[73]forsentimentpolarityclassi“cationof“nan-cialnews.Anotherwayofcapturingdiscoursestructureinformationindoc-umentsistomodeltheglobalsentimentofadocumentasatrajec-toryoflocalsentiments.Forexample,MaoandLebanon[200]proposeusingsentiment”owasasequentialmodeltorepresentanopinionateddocument.Morespeci“cally,eachsentenceinthedocumentreceivesalocalsentimentscorefromanisotonic-conditional-random-“eld-basedsentencelevelpredictor.Thesentiment”owisde“nedasafunction:[0(theordinalset),wheretheinterval[(/n,t/n)ismappedtothelabelofthethsentenceinadocumentwithtences.The”owisthensmoothedoutthroughconvolutionwithasmoothingkernel.Finally,thedistancesbetweentwo”ows(e.g.,tancebetweenthetwosmoothed,continuousfunctions)shouldre”ect,tosomedegree,thedistancesbetweenglobalsentiments.Onasmalldataset,MaoandLebanonobservethatthesentiment”owrepresenta-tion(especiallywhenobjectivesentencesareexcluded)outperformsaplainbag-of-wordsrepresentationinpredictingglobalsentimentwithanearestneighborclassi“er.4.8LanguageModelsTheriseoftheuseoflanguagemodelsininformationretrievalhasbeenaninterestingrecentdevelopment[65,177,179,243].Theyhavebeenappliedtovariousopinion-miningandsentiment-analysistasks,andinfactthesubjectivity-extractionworkofPangandLee[232]isademoapplicationfortheheavilylanguage-modeling-orientedLingPipe http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html. Classi“cationandExtractionOnecharacteristicoflanguagemodelingapproachesthatdieren-tiatesthemsomewhatfromotherclassi“cation-orienteddata-driventechniqueswehavediscussedsofaristhatlanguagemodelsareoftenconstructedusinglabeleddata,but,giventhattheyaremechanismsforassigningprobabilitiestotextratherthanlabelsdrawnfroma“niteset,theycannot,strictlyspeaking,bede“nedaseithersupervisedorunsupervisedclassi“ers.Ontheotherhand,therearevariouswaystoconverttheiroutputtolabelswhennecessary.Anexampleofworkinthelanguage-modelingveinisthatofEguchiandLavrenko[84],whoranksentencesbybothsentimentrelevancyandtopicrelevancy,basedonpreviousworkonrelevancelanguagemodels[179].Theyproposeagenerativemodelthatjointlymodelssentimentwords,topicwords,andsentimentpolarityinasentenceasatriple.LinandHauptmann[186]considertheproblemofexaminingwhethertwocollectionsoftextsrepresentdierentperspectives.Intheirstudy,employingReutersdata,twoexamplesofdierentperspectivesarethePalestinianviewpointvs.theIsraeliviewpointinwrittentextandBushvs.Kerryinpresidentialdebates.TheybasetheirnotionofdierenceinperspectiveupontheKullback…Leibler(KL)divergencebetweenposte-riordistributionsinducedfromdocumentcollectionpairs,anddiscoverthattheKLdivergencebetweendierentaspectsisanorderofmagni-tudesmallerthanthatbetweendierenttopics.Thisperhapsprovidesyetanotherreasonthatopinion-orientedclassi“cationhasbeenfoundtobemoredicultthantopic-basedclassi“cation.Researchemployingprobabilisticlatentsemanticanalysis[125]orlatentDirichletallocationLDA)[39]canalsobecastaslanguage-modelingwork[41,195,206].Thebasicideaistoinferlan-guagemodelsthatcorrespondtounobservedfactorsŽinthedata,withthehopethatthefactorsthatarelearnedrepresenttopicsorsentiment4.9SpecialConsiderationsforExtractionOpinion-orientedextraction.Manyapplications,suchassummariza-tionorquestionanswering,requireworkingwithpiecesofinformationthatneedtobepulledfromoneormoretextualunits.Forexample, 4.9SpecialConsiderationsforExtractionmulti-perspectivequestion…answering)systemmightneedtorespondtoopinion-orientedquestionssuchasWasthemostrecentpresidentialelectioninZimbabweregardedasafairelection?Ž[51];theanswermaybeencodedinaparticularsentenceofaparticulardoc-ument,ormayneedtobestitchedtogetherfrompiecesofevidencefoundinmultipledocuments.Informationextraction)ispreciselythe“eldofnaturallanguageprocessingdevotedtothistypeoftask[49].Hence,itisnotsurprisingthattheapplicationofinformation-extractiontechniquestoopinionminingandsentimentanalysishasbeenproposed[51,79].Inthissurvey,weusethetermopinion-orientedinformationextractionopinion-orientedIE)torefertoinformationextractionproblemsparticulartosentimentanalysisandopinionmin-ing.(Wesometimesshortenthephrasetoopinionextraction,whichshouldnotbeconstruednarrowlyasfocusingontheextractionofopin-ionexpressions;forinstance,determiningproductfeaturesisincludedundertheumbrellaofthisterm.)Pastresearchinthisareahasbeendominatedbyworkontwotypesoftexts:Opinion-orientedinformationextractionfromreviewsasnotedabove,attractedagreatdealofinterestinrecentyears.Infact,thetermopinionmining,Žwhenconstruedinitsnarrowsense,hasoftenbeenusedtodescribeworkinthiscontext.Reviews,whiletypically(butnotalways)devotedtoasingleitem,suchasaproduct,service,orevent,generallycommentonmultipleaspects,facets,orfeaturesofthatitem,andallsuchcommentarymaybeimportant.Extractingandanalyzingopinionsassociatedwitheachindi-vidualaspectcanhelpprovidemoreinformativesummariza-tionsorenablemore“ne-grainedopinion-orientedretrieval.Otherworkhasfocusedonnewswire.Unlikereviews,anewsarticleisrelativelylikelytocontaindescriptionsofopinionsthatdonotbelongtothearticlesauthor;anexampleisaquotationfromapolitical“gure.Thispropertyofjournalistictextmakestheidenti“cationofopinionholders(alsoknownasopinionsources)andthecorrectassociationofopinion Classi“cationandExtractionholderswithopinionsimportanttasks,whereasforreviews,allexpressedopinionsaretypicallythoseoftheauthor,soopinion-holderidenti“cationisalesssalientproblem.Thus,whennewswirearticlesarethefocus,theemphasishastendedtobeonidentifyingexpressionsofopinions,theagentexpressingeachopinion,and/orthetypeandstrengthofeachopinion.Earlyworkinthisdirection“rstcarefullydevelopedandevaluatedalow-levelopinionannotationscheme[45,283,309,312],whichfacilitatedthestudyofsub-taskssuchasidentifyingopinionholdersandanalyzingopinionsatthephraselevel[37,42,43,51,60,61,157,320].Itisimportanttounderstandthesimilaritiesanddierencesbetweenopinion-orientedIEandstandardfact-orientedIE.Theysharesomesub-tasksincommon,suchasentityrecognition;forexample,asmentionedabove,determinationofopinionholdersisanactivelineofresearch[37,42,61,158].WhattrulysetstheproblemapartfromstandardorclassicIEisthespeci“ctypesofentitiesandrelationsthatareconsideredimportant.Forinstance,althoughidenti“cationofproductfeaturesisinsomesenseastandardentityrecognitionprob-lem,anopinionextractionsystemwouldbemostlyinterestedinfea-turesforwhichassociatedopinionsexist;similarly,anopinionholderisnotjustanynamedentityinanewsarticle,butonethatexpressesopinions.Examplesofthetypesofrelationsparticularlypertinenttoopinionminingarethosecenteredaroundcomparisons„consider,forexample,therelationsencodedbysuchsentencesasThenewmodelismoreexpensivethantheoldoneŽorIpreferproductAoverproductBŽ([139,191],longerversionofthelatteravailableasJindalandLiu[138])„orbetweenagentsandreportedbeliefs,asdescribedinSection4.9.2.Notethattherelationsofinterestcanformacomplexhierarchicalstructure,asinthecasewhereanopinionisattributedtoonepartybyanother,sothatitisunclearwhetherthe“rstpartytrulyholdstheopinioninquestion[42].Itisalsoimportanttounderstandwhichaspectsofopinion-orientedextractionarementionedinthissectionasopposedtotheprevioussec-tions.Asdiscussedearlier,manysub-problemsofopinionextractionare 4.9SpecialConsiderationsforExtractioninfactclassi“cationproblemsforrelativelysmalltextualunits.Exam-plesincludebothdeterminingwhetherornotatextspanissubjectiveandclassifyingagiventextspanalreadydeterminedtobesubjectivebythestrengthoftheopinionexpressed.Thus,manykeytechniquesinvolvedinbuildinganopinionextractionsystemarealreadydiscussedintheprevioussections.Inthissection,weinsteadfocusonthemiss-ingpieces,Ždescribingapproachestoproblemsthataresomewhatspe-cialtoextractiontasksinsentimentanalysis.Whilethesesub-taskscanbe(andoftenare)castasclassi“cationproblems,theydonothavenaturalcounterpartsoutsideoftheextractioncontext.Speci“cally,Sec-tion4.9.1isdevotedtotheidenti“cationoffeaturesandexpressionsofopinionsinreviews.Section4.9.2considerstechniquesthathavebeenemployedwhenopinion-holderidenti“cationisanissue.Finally,wemakethefollowingorganizationalnote.Onemayoftenwanttopresenttheoutputofopinionextractioninsummarizedform;conversely,someformsofsentimentsummarizationrelyontheoutputofopinionextraction.Opinion-orientedsummarizationisdiscussedinSection5.4.9.1IdentifyingProductFeaturesandOpinionsinReviewsInthecontextofreviewmining[130,166,215,244,323,324],twoimportantextraction-relatedsub-tasksare(1)Theidenti“cationofproductfeatures,and(2)theextractionofopinionsassociatedwiththesefeatures.Whilethekeyfeaturesoraspectsareknowninsomecases,manysystemsstartfromproblem(1).Asnotedabove,identi“cationofproductfeaturesisinsomesenseastandardinformationextractiontaskwithlittletodistinguishitfromothernon-sentiment-relatedproblems.Afterall,thenotionofthefea-turesthatagivenproducthasseemsfairlyobjective.However,HuandLiu[130]showthatonecanbene“tfromlightsentimentanalysisevenforthissub-task,asdescribedshortly. Classi“cationandExtractionExistingworkonidentifyingproductfeaturesdiscussedinreviews(task(1))oftenreliesonthesimplelinguisticheuristicthat(explicit)featuresareusuallyexpressedasnounsornounphrases.Thisnarrowsdownthecandidatewordsorphrasestobeconsidered,butobviouslynotallnounsornounphrasesareproductfeatures.Yietal.[323]con-siderthreeincreasinglystrictheuristicstoselectfromnounphrasesbasedonpart-of-speech-tagpatterns.HuandLiu[130]followtheintu-itionthatfrequentnounsornounphrasesarelikelytobefeatures.Theyidentifyfrequentfeaturesthroughassociationmining,andthenapplyheuristic-guidedpruningaimedatremoving(a)multi-wordcandidatesinwhichthewordsdonotappeartogetherinacertainorder,and(b)single-wordcandidatesforwhichsubsumingsuper-stringshavebeencollected(theideaistoconcentrateonmorespeci“cconcepts,sothat,forexample,lifeŽisdiscardedinfavorofbatterylifeŽ).Thesetech-niquesbythemselvesoutperformageneral-purposeterm-extractionand-indexingsystemknownasFASTR[135].Furthermore„andhereistheobservationthatisrelevanttosentiment„theF-measurecanbefurtherimproved(althoughprecisiondropsslightly)viathefollowingexpansionprocedure:adjectivesappearinginthesamesentenceasfre-quentfeaturesareassumedtobeopinionwords,andnounsandnounphrasesco-occurringwiththeseopinionwordsinothersentencesaretakentobeinfrequentfeatures.Incontrast,PopescuandEtzioni[244]considerproductfeaturestobeconceptsformingcertainrelationshipswiththeproduct(forexam-ple,forascanner,itssizeisoneofitsproperties,whereasitscoverisoneofitsparts)andseektoidentifythefeaturesconnectedwiththeproductnamethroughcorrespondingmeronymydiscriminators.Notethatthisapproach,whichdoesnotinvolvesentimentanalysispersebutsimplyfocusesmoreonthetaskofidentifyingdierenttypesoffeatures,achievedbetterperformancethanthatyieldedbythetech-niquesofHuandLiu[130].Therehasalsobeenworkthatfocusesonextractingattribute-valuepairsfromtextualproductdescriptions,butnotnecessarilyinthecontextofopinionmining.Ofworkinthisvein,Ghanietal.[105]directlycompareagainstthemethodproposedbyHuandLiu[130]. 4.9SpecialConsiderationsforExtractionToidentifyexpressionsofopinionsassociatedwithfeatures(task(2)),asimpleheuristicistosimplyextractadjectivesthatappearinthesamesentenceasthefeatures[130].Deeperanalysescanmakeuseofparseinformationandmanuallyorsemi-automaticallydevelopedrulesorsentiment-relevantlexicons[215,244].4.9.2ProblemsInvolvingOpinionHoldersInthecontextofanalysisofnewswireandrelatedgenres,weneedtoidentifytextspanscorrespondingbothtoopinionholdersandtoexpressionsoftheopinionsheldbythem.Asistruewithothersegmentationtasks,identifyingopinionholderscanbeviewedasasequencelabelingproblem.Choietal.[61]exper-imentwithanapproachthatcombinesConditionalRandomFields)[176]andextractionpatterns.ACRFmodelistrainedonacertaincollectionoflexical,syntactic,andsemanticfeatures.Inpar-ticular,extractionpatternsarelearnedtoprovidesemantictaggingaspartofthesemanticfeatures.(CRFshavealsobeenusedtodetectopinionexpressions[43].)Alternatively,giventhatthestatusofanopinionholderdependsbyde“nitionontheexpressionofanopinion,theidenti“cationofopinionholderscanbene“tfrom,orperhapsevenrequire,account-ingforopinionexpressionseithersimultaneouslyorasapre-processingOneexampleofsimultaneousprocessingistheworkofBethardetal.[37],whospeci“callyaddressthetaskofidentifyingbothopin-ionsandopinionsources.Theirapproachisbasedonsemanticparsingwheresemanticconstituentsofsentences(e.g.,agentŽorproposi-tionŽ)aremarked.Byutilizingopinionwordsautomaticallylearnedbyabootstrappingapproach,theyfurtherre“nethesemanticrolestoidentifypropositionalopinions,i.e.,opinionsthatgenerallyfunctionasthesententialcomplementofapredicate.Thisenablesthemtocon-centrateonverbsandextractverb-speci“cinformationfromsemanticframessuchasarede“nedinFrameNet[25]andPropBank[230].Asanotherexampleofthesimultaneousapproach,Choietal.[60]employanintegerlinearprogrammingapproachtohandlethejoint Classi“cationandExtractionextractionofentitiesandrelations,drawingontheworkofRothandYih[260]onusingglobalinferencebasedonconstraints.Asanalternativetothesimultaneousapproach,asystemcanstartbyidentifyingopinionexpressions,andthenproceedtotheanalysisoftheopinions,includingtheidenti“cationofopinionholders.Indeed,KimandHovy[159]de“netheproblemofopinionholderidenti“cationasidentifyingopinionsourcesgivenanopinionexpressioninasentence.Inparticular,structuralfeaturesfromasyntacticparsetreeareselectedtomodelthelong-distance,structuralrelationbetweenaholderandanexpression.KimandHovyshowthatincorporatingthepatternsofpathsbetweenholderandexpressionoutperformsasimplecombinationoflocalfeatures(e.g.,thetypeoftheholdernode)andothernon-structuralfeatures(e.g.,thedistancebetweenthecandidateholdernodeandtheexpressionnode).One“nalremarkisthatthetaskofdeterminingwhichmentionsofopinionholdersareco-referent(sourcecoreferenceresolution)diersinpracticeininterestingwaysfromtypicalnounphrasecoreferenceresolution,dueinparttothewayinwhichopinion-orienteddatasetsmaybeannotated[282]. 5 Summarization Sofar,wehavetalkedaboutanalyzingandextractingopinioninfor-mationfromindividualdocuments.Thefocusofthissectionisonaggregatingandrepresentingsentimentinformationdrawnfromanindividualdocumentorfromacollectionofdocuments.Forexample,ausermightdesireanat-a-glancepresentationofthemainpointsmadeinasinglereview;creatingsuchsingle-documentsentimentsummariesisdescribedinSection5.1.Anotherapplicationconsideredwithinthisparadigmistheautomaticdeterminationofmarketsentiment,orthemajorityleaningŽofanentirebodyofinvestors,fromtheindividualremarksofthoseinvestors[66,67];thisisatypeofmulti-documentopinion-orientedsummarization,describedinSection5.2.5.1Single-DocumentOpinion-OrientedSummarizationThereisclearlyatightconnectionbetweenextractionoftopic-basedinformationfromasingledocument[49]andtopic-basedsummariza-tionofthatdocument,sincetheinformationthatispulledoutcanserveasasummary;seeRadevetal.[247,Section2.1]forabriefreview. Obviously,thisconnectionbetweenextractionandsummarizationholdsinthecaseofsentiment-basedsummarization,aswell.Onewayinwhichthisconnectionismademanifestinsingle-documentopinion-orientedsummarizationisasfollows:thereareapproachesthatcreatetextualsentimentsummariesbasedonextrac-tionofsentencesorsimilartextunits.Forexample,Beinekeetal.[33]attempttoselectasinglepassagethatre”ectstheopinionofthedocumentsauthor(s),mirroringthepracticeof“lmadvertise-mentsthatpresentsnippetsŽfromreviewsofthemovie.Train-ingandtestdataisacquiredfromthewebsiteRottenTomatoes(http://www.rottentomatoes.com),whichprovidesaroughlysentence-lengthsnippetforeachreview.However,Beinekeetal.[33]notethatlowaccuracycanresultevenforhigh-qualityextractionmeth-odsbecausetheRottenTomatoesdataincludesonlyasinglesnippetperreview,whereasseveralsentencesmightbeperfectlyviablealter-natives.Intermsofcreatinglongersummaries,MaoandLebanon[200]suggestthatbytrackingthesentiment”owwithinadocument„i.e.,howsentimentorientationchangesfromonesentencetothenext,asdiscussedinSection4.7„onecancreatesentimentsummariesbychoosingthesentencesatlocalextremaofthe”ow(plusthe“rstandlastsentence).Aninterestingfeatureofthisapproachisthatbyincor-poratingadocuments”ow,thetechniquetakesintoaccounttheentiredocumentinaholisticway.Bothapproachesjustmentionedseektoselecttheabsolutelymostimportantsentencestopresent.Alterna-tively,onecouldsimplyextractallsubjectivesentences,aswasdonebyPangandLee[232]tocreatesubjectivityextracts.ŽTheysug-gestedthattheseextractscouldbeusedassummaries,although,asmentionedabove,theyfocusedontheuseoftheseextractsasanaidtodownstreampolarityclassi“cation,ratherthanassummariesperse.Finally,wenotethatsentencesarealsousedinmulti-documentsentimentsummarizationaswell,asdescribedinSection5.2.Othersentimentsummarizationmethodscanworkdirectlyotheoutputofopinion-orientedinformation-extractionsystems.Indeed, Beinekeetal.[33]usethetermsentimentsummaryŽtorefertoasinglepassage,butweprefertonotrestrictthattermsde“nitionsotightly. 5.1Single-DocumentOpinion-OrientedSummarizationCardieetal.[51],speakingaboutthemorerestrictedtypeofextractionreferredtobythetechnicalterminformationextraction,Žproposeto...summaryrepresentationsasinformationextraction(IE)sce-nariotemplates...[thus]wepostulatethatmethodsfrominformationextraction...willbeadequatefortheautomaticcreationofopinion-basedsummaryrepresentations.Ž(AsimilarobservationwasmadebyDiniandMazzini[79].)NotethattheseIEtemplatesdonotformcoher-enttextontheirown.However,theycanbeincorporatedasisintoIndeed,oneinterestingaspectoftheproblemofextractingsenti-mentinformationfromasingledocument(orfrommultipledocuments,asdiscussedinSection5.2)isthatsometimesgraph-basedoutputseemsmuchmoreappropriateorusefulthantext-basedoutput.Forexample,graph-basedsummariesareverysuitablewhentheinformationthatismostimportantisthesetofentitiesdescribedandtheopinionsthatsomeoftheseentitiesholdabouteachother[305].Figure5.1showsanexampleofahuman-generatedsummaryintheformofagraphdepictingvariousnegativeopinionsexpressedduringtheaftermathofHurricaneKatrina.Notetheinclusionoftextsnippetsonthearrowstosupporttheinferenceofanegativeopinion;ingeneral,providingsomesenseoftheevidencefromwhichopinionsareinferredislikelytobehelpfultotheuser.WhilesummarizationtechnologiesmaynotbeabletoachievethelevelofsophisticationofinformationpresentationexhibitedbyFigure5.1,currentresearchismakingprogresstowardthatgoal.InFigure5.2,weseeaproposedsummarywhereopinionholdersandtheobjectsoftheiropinionsareconnectedbyedges,andvariousanno-tationsderivedfromIEoutputareincluded,suchasthestrengthofvariousattitudes.Ofcourse,graphicalelementscanalsobeusedtorepresentasin-glebit,numberorgradeasaverysuccinctsummaryofadocuments TheexceptionsaretheedgesfromnewsmediaŽandtheedgesfrompeoplewhodidntevacuate.ŽItis(perhapsintentionally)ambiguouswhetherthelackofsupportingquotesisduemerelytothelackofsucientlyjuicyŽonesorismeanttoindicatethatitisutterlyobviousthattheseentitiesblamemanyothers.Wealsonotethatthehurricaneitselfisnotrepresented. 64Summarization Fig.5.1GraphicbyBillMarshforTheNewYorkTimes,October1,2005,depictingneg-ativeopinionsofvariousentitiestowardeachotherintheaftermathofHurricaneKatrina.RelationtoopinionsummarizationpointedoutbyEricBreck(ClaireCardie,personalcommunication).sentiment.Variationsofstars,lettergrades,andthumbsup/thumbsdowniconsarecommon.Morecomplexvisualizationschemesappliedonasentence-by-sentencebasishavealsobeenproposed[7].5.2Multi-DocumentOpinion-OrientedSummarizationLanguageisitselfthecollectiveartofexpression,asummaryofthousandsuponthousandsofindividualintuitions.Theindividualgetslostinthecollectivecre-ation,buthispersonalexpressionhasleftsometraceinacertaingiveand”exibilitythatareinherentinall 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.2Figure2(labeled3)ofCardieetal.[51]:proposalforasummaryrepresentationderivedfromtheoutputofaninformation-extractionsystem.collectiveworksofthehumanspirit.„EdwardSapir,LanguageandLiterature,1921.ConnectiontosentimentanalysispointedoutbyDasandChen[67].5.2.1SomeProblemConsiderationsThereneverwasintheworldtwoopinionsalike,nomorethantwohairs,ortwograins;themostuniversalqualityisdiversity.„MicheldeMontaigne,Whereanopinionisgeneral,itisusuallycorrect.„JaneAusten,Mans“eldParkWebrie”ydiscussheresomepointstokeepinmindinregardstomulti-documentsentimentsummarization,althoughtoacertaindegree,workinsentimentsummarizationhasnotyetreachedalevelwheretheseproblemshavecometothefore.Determiningwhichdocumentsorportionsofdocumentsexpressthesameopinionisnotalwaysaneasytask;but,clearlyitisonethatneedstobeaddressedinthesummarizationsetting,sincereadersofsentimentsummariessurelyareinterestedintheoverallsentimentinthecorpus„whichmeansthesystemmustdeterminesharedsentimentswithinthedocumentcollectionathand. Thisissuecanstillariseevenwhenlabelshavebeenpredeter-mined,iftheitemsthathavebeenpre-labeledcomefromdierentsub-collections.Forinstance,somedocumentsmayhavepolaritylabels,whereasothersmaycontainratingsona1-to-5scale.Andevenwhentheratingsaredrawnfromthesameset,calibrationissuesmayarise.ConsiderthefollowingfromRottenTomatoesfrequently-asked-questionspage(http://www.rottentomatoes.com/pages/faq#judge):OntheBlade2reviewspage,youhaveanegativereviewfromJamesBerardinelli2.5/4stars,andapositivereviewfromEricLurio.WhyisBerardinellisreviewlabeledRottenandLuriosreviewlabeledFresh?Youreseeingthisdiscrepancybecausestarsystemsarenotconsistentbetweencritics.ForcriticslikeRogerEbertandJamesBerardinelli,2.5starsorloweroutof4starsisalwaysnegative.Forothercritics,2.5starscaneitherbepositiveornegative.EventhoughEricLuriousesa5starsystem,hisgradingisveryrelaxed.So,2starscanbepositive.Also,theresalwaysthepossibilityofthewebmasterorcriticputtingthewrongratingonareview.Asanotherexample,inreconcilingreviewsofconferencesubmissions,program-committeemembersmustoftentakeintoaccountthefactthatcertainreviewersalwaystendtoassignlowscorestopapers,whileoth-ershavetheoppositetendency.Indeed,webelievethiscalibrationissuemaybethereasonwhyreviewsofcarsonEpinionscomenotonlywithanumberofstarsŽannotation,butalsoathumbsup/thumbsdownŽindicator,inordertoclarifywhether,regardlessoftheratingassigned,thereviewauthoractuallyintendstomakeapositiverecommendationornot.Anadditionalobservationtotakenoteofisthefactthatwhentworeviewersagreeonarating,theymayhavedierentreasonsfordoingso,anditmaybeimportanttoindicatethesereasonsinthesummary.Arelatedpointisthatwhenareviewerassignsamiddlingrating,itmaybebecauseheorshethinksthatmostaspectsoftheitemunderdiscussionareso-so,butitmayalsobebecauseheorshesees 5.2Multi-DocumentOpinion-OrientedSummarizationbothstrongpositivesandstrongnegatives.Or,reviewersmayhavethesameopinionsaboutindividualitemfeatures,butweighttheseindividualfactorsdierently,leadingtoadierentoverallsentiment.Indeed,RottenTomatoessummarizesasetofreviewsbothwiththeTomatometer„percentageofreviewsjudgedtobepositive„andanaverageratingona1-to-10scale.Theidea,againaccordingtotheFAQ(http://www.rottentomatoes.com/pages/faq#avgvstmeter),isasfollows:TheAverageRatingmeasurestheoverallqualityofaproductbasedonanaverageofindividualcriticscores.TheTomatometersimplymeasuresthepercentageofcriticswhorecommendacertainproduct.Forexample,whileMeninBlackŽscored90%ontheTomatometer,theaverageratingisonly7.5/10.ThatmeansthatwhileyourelikelytoenjoyMIB,itprobablywasntacontenderforBestPictureattheOscars.Incontrast,ToyStory2Žreceivedaperfect100%ontheTomatometerwithanaverageratingof9.6/10.Thatmeans,notonlyareyoucertaintoenjoyit,youllalsobeimpressedwiththedirection,story,cinematography,andalltheotherthingsthatmaketrulygreat“lmsTheproblemofdecidingwhethertwosentencesortextpas-sageshavethesamesemanticcontentisonethatisfacednotjustbyopinion-orientedmulti-documentsummarizers,butbytopic-basedmulti-documentsummarizersaswell[247];thishasbeenoneofthemotivationsbehindworkonparaphraserecognition[29,30,231]andtextualentailment[28].But,aspointedoutinKuetal.[170],whileintraditionalsummarizationredundantinformationisoftendiscarded,inopinionsummarizationonewantstotrackandreportthedegreeofredundancy,Žsinceintheopinion-orientedsettingtheuseristyp-icallyinterestedinthe(relative)numberoftimesagivensentimentisexpressedinthecorpus.Careninietal.[52]notethatachallengeinsentimentsummariza-tionisthatthepiecesofinformationtobesummarized„peoples opinions„areoftencon”icting,whichisabitdierentfromtheusualsituationintopic-basedsummarization,wheretypicallyonedoesnotassumethattherearecon”ictingsetsoffactsinthedocumentset(althoughthereareexceptions[301,302]).5.2.2TextualSummariesInstandardtopic-basedmulti-documentsummarization,creatingtex-tualsummarieshasbeenamainfocusofeort.Hence,despitethedif-ferencesintopic-andopinion-basedsummarizationmentionedabove,severalresearchershavedevelopedsystemsthatcreatetextualsum-mariesofopinion-orientedinformation.5.2.2.1LeveragingExistingTopic-BasedTechnologiesOnelineofattackistoadaptexistingtopic-basedmulti-documentsum-marizationalgorithmstothesentimentsetting.Sometimestheadaptationconsistssimplyofmodifyingtheinputtothesepre-existingalgorithms.Forinstance,Sekietal.[264]proposethatoneapplystandardmulti-documentsummarizationtoasub-collectionofdocumentsthatareonthesametopicandthataredeterminedtobelongtosomerelevantgenreoftext,suchasargumentative.ŽInothercases,pre-existingtopic-basedsummarizationtechniquesaremodi“ed.Forexample,Careninietal.[52]generatenatural-languagesummariesintheformofanevaluativeargumentŽusingtheclassicnatural-languagegenerationpipelineofcontentselection,lexicalselectionandsentenceplanning,andsentencerealization[251],assumingtheexistenceofapre-de“nedproduct-featurehierarchy.Thesystemexplicitlyproducestextualdescriptionsofaggregateinforma-tion.Thesystemiscapableofrelayingdataabouttheaveragesenti-mentandsignaling,ifappropriate,thatthedistributionofresponsesisbi-modal(thisallowsonetoreportsplitvotesŽ).Theycomparethissystemagainstamodi“cationofanexistingsentence-extractionsystem,MEAD[246].Theformerapproachseemsmorewell-suitedforgeneraloverviews,whereasthelatterseemsbetteratprovidingmorevarietyinexpressionandmoredetail;seeFigure5.3.Relatedtothe 5.2Multi-DocumentOpinion-OrientedSummarization Summarycreatedviaatruenatural-language-generationŽapproachAlmostalluserslovedtheCanonG3possiblybecausesomeusersthoughtthephysicalappearancewasverygood.Furthermore,sev-eralusersfoundthemanualfeaturesandthespecialfeaturestobeverygood.Also,someuserslikedtheconveniencebecausesomeusersthoughtthebatterywasexcellent.Finally,someusersfoundtheediting/viewinginterfacetobegooddespitethefactthatsev-eralcustomersreallydislikedtheview“nder.However,thereweresomenegativeevaluations.Somecustomersthoughtthelenswaspooreventhoughsomecustomersfoundtheopticalzoomcapabilitytobeexcellent.Mostcustomersthoughtthequalityoftheimageswasverygood.Summarycreatedbyamodi“edsentence-extractionsystemBottomline,wellmadecamera,easytouse,very”exibleandpower-fulfeaturestoincludetheabilitytouseexternal”ashandlense/“lterschoices.Ithasabeautifuldesign,lotsoffeatures,veryeasytouse,verycon“gurableandcustomizable,andthebatterydurationisamazing!Greatcolors,pictures,andwhitebalance.Thecameraisadreamtooperateinautomode,butalsogivestremendous”exibilityinaperturepriority,shutterpriority,andmanualmodes.Idhighlyrecommendthiscameraforanyonewhoislookingforexcellentqual-itypicturesandacombinationofeaseofuseandthe”exibilitytogetadvancedwithmanyoptionstoadjustifyoulike. Fig.5.3Sampleautomaticallygeneratedsummaries.AdaptedfromFigure2ofCareninietal.[52].latterapproach,sentenceextractionmethodshavealsobeenusedtocreatesummariesforopinion-orientedqueriesortopics[265,266].Whilewearenotawareofthefollowingtechniquebeingusedinstandardtopic-basedsummarization,weseenoreasonwhyitisnotapplicabletothatsetting,atleastinprinciple.Kuetal.[170](shortversionavailableasKuetal.[169])proposethefollowingsimpleschemetocreateatextualsummaryofasetofdocumentsknowninadvancetobeonthesametopic.Sentencesconsideredtoberepresentativeofthetopicarecollected,andthepolarityofeachsuchsentenceiscomputedbasedonwhatsentiment-bearingwordsitcontains,withnegationtakenintoaccount.Then,tocreateasummaryofthepositivedocuments,thesystemsimplyreturnstheheadlineofthedocumentwiththemostpositiveon-topicsentences,andsimilarlyforthenegative documents.Theauthorsshowthefollowingexamplesforthepositiveandthenegativesummary,respectively:Positive:ChineseScientistsSuggestProperLegislationforCloneTechnology.ŽNegative:UKGovernmentStopsFundingforSheepCloningTeam.ŽTheclevernessofthismethodisthatheadlinesare,byconstruction,goodsummaries(atleastofthearticletheyaredrawnfrom),sothat”uencyandinformativeness,althoughperhapsnotappropriateness,areguaranteed.Anotherperhapsunconventionaltypeofmulti-documentsum-maryŽistheselectionofafewdocumentsofinterestfromthecorpusforpresentationtotheuser.Inthisvein,Kawaietal.[151]havedevel-opedanewsportalsitecalledFairNewsReaderŽthatattemptstodeterminetheaectcharacteristicsofarticlestheuserhasbeenread-ingsofar(e.g.,happinessŽorfearŽ)andthenrecommendsarticlesthatareonthesametopicbuthaveoppositeaectcharacteristics.Onecouldimagineextendingthisconcepttoanewsportalthatpresentedtotheuseropinionsopposinghisorherpre-conceivedones(PhoebeSen-gers,personalcommunication).Onarelatednote,Liu[190]mentionsthatonemightdesireasummarizationsystemtopresentarepresen-tativesampleŽofopinions,sothatbothpositiveandnegativepointsofviewarecovered,ratherthanjustthedominantsentiment.Asofthetimeofthiswriting,Amazonpresentsthemosthelpfulfavorablereviewside-by-sidewiththemosthelpfulcriticalreviewifoneclicksonthe[x]customerreviewsŽlinknexttothestarsindicator.Additionally,onecouldinterprettheopinion-leaderidenti“cationworkofSongetal.[275]assuggestingthatblogpostswrittenbyopinionleaderscouldserveasanalternativetypeofrepresentativesample.Summarizingonlinediscussionsandblogsisanareaofrelatedwork[131,300,330].Thefocusofsuchworkisnotonsummarizingtheopinionsperse,althoughZhouandHovy[330]notethatonemaywanttovarytheemphasisontheopinionsexpressedversusthefacts 5.2Multi-DocumentOpinion-OrientedSummarization5.2.2.2TextualSummarizationWithoutTopic-basedSummarizationTechniquesOtherworkintheareaoftextualmulti-documentsentimentsumma-rizationdepartsfromtopic-basedwork.Themainreasonseemstobethatredundancyeliminationismuchlessofaconcern:usersmaywishtolookatmanyindividualopinionsregardlessofwhethertheseindivid-ualopinionsexpressthesameoverallsentiment,andtheseusersmaynotparticularlycarewhetherthetextualoverviewtheyperuseiscoher-ent.Thus,inseveralcases,textualsummariesŽaregeneratedsimplybylistingsomeorallopinionatedsentences.Theseareoftengroupedbyfeature(sub-topic)and/orpolarity,perhapswithsomerankingheuris-ticsuchasfeatureimportanceapplied[129,170,324,332].5.2.3Non-textualSummariesIntheprevioussection,wehavediscussedthecreationofmariesoftheopinioninformationexpressedwithinacorpus.Butinsettingswherethepolarityororientationofeachindividualdocu-mentwithinacollectionissummedupinasinglebit(e.g.,thumbsup/thumbsdown),number(e.g.,3.5stars),orgrade(e.g.,B+),analternativewaytoobtainasuccinctsummaryoftheoverallsentimentistoreportsummarystatistics,suchasthenumberofreviewsthatarethumbsupŽortheaveragenumberofstarsoraveragegrade.Manysystemstakethisapproachtosummarization.Summarystatisticsareoftenquitesuitedtographicalrepresenta-tions;wedescribesomenoteworthyvisualaspectsofthesesummarieshere(evaluationoftheuser-interfaceaspectshasnotbeenafocusofattentioninthecommunitytodate).5.2.3.1BoundedŽSummaryStatistics:AveragesandRelativeFrequenciesWeusethetermboundedtorefertosummarystatisticsthatliewithinapredeterminedrange.Examplesaretheaveragenumberofstars(range:0to5stars,say)orthepercentageofpositiveopinions(range:0%to100%). ThermometerŽ-typeimagesareonemeansfordisplayingsuchstatistics.OneexampleistheTomatometerŽontheRottenTomatoeswebsite,whichissimplyabarbrokenintotwodierentlycoloredportions;theportionofthebarthatiscoloredredindicatesthefractionofpositivereviewsofagivenmovie.Thisrepresentationextendsstraightforwardlyto-arycategorizationschemes,suchaspos-itive/middling/negative,viatheuseofcolors.Thethermometer-graphicconceptalsogeneralizesinotherways;forinstance,thedepictionofanumberofstarscanbeconsideredtobeavariantofthisidea.Insteadofusingsizeorextenttodepictboundedsummarystatistics,asisdonewiththermometerrepresentations,onecanusecolorshad-ing.Thischoiceseemsparticularlyappropriateinsettingswheretheamountofdisplayreal-estatethatcanbedevotedtoanyparticu-laritemunderevaluationishighlylimitedorwheresizeorloca-tionisreservedtorepresentsomeotherinformation.Forinstance,Gamonetal.[104]usecolortorepresentthegeneralassessmentof(automaticallydetermined)productfeatures.InFigure5.4,weseethateachofmanyfeaturesortopics,suchashandlingŽorvw,service,Žisrepresentedbyashadedbox.Thecolorsforanygivenboxrangefromredtowhitetogreen,indicatinggradationsoftheaveragesentimenttowardthattopic,movingfromnegativetoneutral(orobjective)topositive,respectively.Notethatonecanquicklygleanfromsuchadisplaywhatwaslikedandwhatwasdislikedabouttheproductunderdiscussion,despitethelargenumberoftopicsunderevaluation„peoplelikedrivingthiscarbutdisliketheservice.AsshowninFigure5.5,asimilarinter-face(togetherwithausabilitystudy)ispresentedinCareninietal.[53].Somedierencesarethatnatural-languagesummarizationisalsoemployed,sothatthesummaryisbothverbalŽandvisual;thefeaturesaregroupedintoahierarchy,thusleveragingtheabil-ityofTreemaps[270]todisplayhierarchicaldatavianesting;andtheinterfacealsoincludesaway(notdepictedinthe“gure)toseeanat-a-glanceŽsummaryofthepolaritiesoftheindividualsentencescommentingonaparticularfeature.Ademoisavailableonlineathttp://www.cs.ubc.ca/carenini/storage/SEA/demo.html. 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.4Figure2ofGamonetal.[104],depicting(automaticallydetermined)topicsdis-cussedinreviewsoftheVolkswagenGolf.Thesizeofeachtopicboxindicatesthenumberofmentionsofthattopic.Theshadingofeachtopicbox,rangingfromredtowhitetogreen,indicatestheaveragesentiment,rangingfromnegativetoneutral/nonetonegative,respectively.Atthebottom,thesentencesmostindicativeofnegativesentimentforthetopicvw,serviceŽaredisplayed.5.2.3.2UnboundedSummaryStatisticsAsjustdescribed,thermometergraphicsandcolorshadingcanbeusedtorepresentboundedstatisticssuchasthemeanor,inthecaseof-colorthermometers,relativedistributionsofratingsacrossdierentclasses.Butboundedstatisticsbythemselvesdonotprovideotherimportantpiecesofinformation,suchastheactualnumberofopinionswithineachclass.(Weconsiderrawfrequenciestobeconceptuallyunbounded,althoughtherearepracticallimitstohowmanyopinionscanbeaccountedfor.)Intuitively,theobservationthat50%ofthereviewsofaparticularproductarenegativeismoreofa Weadmittobeingglass-half-emptyŽpeople. 74Summarization Fig.5.5Figure4ofCareninietal.[53],showingasummaryofreviewsofaparticularproduct.Anautomaticallygeneratedtextsummaryisontheleft;avisualsummaryisontheright.Thesizeofeacheachboxinthevisualsummaryindicatesthenumberofmentionsofthecorrespondingtopic,whichoccupyapre-de“nedhierarchy.Theshadingofeachtopicbox,rangingfromredtoblacktogreen,indicatestheaveragesentiment,rangingfromnegativetoneutral/nonetonegative,respectively.Atthebottomisshownthesourcefortheportionofthegeneratednatural-languagesummarythathasbeenlabeledwiththefootnote4.Žbigdealifthatstatisticisbasedon10,000reviewsthanifitbasedononlytwo.Anotherproblemspeci“ctothemeanasasummarystatisticisthatreview-aggregationsitesseemtooftenexhibithighlyskewedrat-ingdistributions,withaparticularbiastowardhighlypositivereviews[74,59,128,253,132,240].Sincetherecanoftenbeasecondmode,orbump,attheextremelowendoftheratingscale,indicatingpolariza-tion„forexample,Huetal.[132]remarkthat54%oftheitemsinasampleofAmazonbook,DVD,andvideoproductswithmorethan20reviewsfailbothstatisticalnormalityandunimodalitytests„report-ingonlythemeanratingscoremaynotprovideenoughinformation.Toputitanotherway,divulgingtheaveragedoesnotgivetheuser Onarelatednote,WilliamSa“resNewYorkTimesMay1,2005articleBlurbosphereŽquotesCharlesMcGrath,formereditoroftheNewYorkTimesBookReview,asasking,hasthereeverbeenabookthatwasntacclaimed?Ž 5.2Multi-DocumentOpinion-OrientedSummarizationenoughinformationtodistinguishbetweenasetofmiddlingreviewsandasetofpolarizedreviews.Ontheotherhand,itisworthpointingoutthatjustgivingthenum-berofpositiveandnegativereviews,respectively,ontheassumptionthattheusercanalwaysderivethepercentagesfromthesecounts,maynotsuce.CabralandHorta¸csu[47]observethatonceeBayswitchedtodisplayingthepercentageofpiecesoffeedbackonsellersthatwerenegative,asopposedtosimplytherawnumbers,thennegativereviewsbegantohaveameasurableeconomicimpact(seeSection6).Hence,notsurprisingly,sentimentsummariestendtoincludedataontheaveragerating,thedistributionofratings,and/orthenumberofratings.Visualizationofunboundedsummarystatistics.Ofthetwosystemsdescribedabovethatrepresenttheaveragepolarityofopinionsviacolor,bothrepresentthequantityoftheopinionsonagiventopicviasize.Thismeansthatthecountdataforpositiveandfornegativeopinionsarenotexplicitlypresentedseparately.Inothersystems,thisisnotthecase;rather,frequenciesfordierentclassesarebrokenoutanddisplayed.Forinstance,asofthetimeofthiswriting,Amazondisplaysanaverageratingasanumberofstarswiththenumberofreviewsnexttoit;mousingoverthestarsbringsupahistogramofreviewerratingsannotatedwithcountsforthe5-starreviews,4-starreviews,etc.(Fur-thermousingoverthebarsofthehistogrambringsupthepercentageofreviewsthateachofthosecountsrepresent.)Asanotherexample,asampleoutputoftheOpinionObserversys-tem[191]isdepictedinFigure5.6,wheretheportionofabarproject-ingabovethecenteredhorizonŽlinerepresentsthenumberofpositiveopinionsaboutacertainproductfeature,andtheportionofthebarbelowthelinerepresentsthenumberofnegativeopinions.(Thesameideacanbeusedtorepresentpercentagestoo,ofcourse.)Anicefeatureofthisvisualizationisthatbecauseoftheuseofahorizonline,twoseparatefrequencydatapoints„thepositiveandnegativecounts„canberepresentedbywhatisvisuallyoneobject,namely,asolidbar,andonecaneasilysimultaneouslycomparenegativesagainstnegatives 76Summarization Fig.5.6Figure2ofLiuetal.[191].Threecellphonesarerepresented,eachbyadierentcolor.Foreachfeature(General,ŽLCD,Žetc.),threebarsareshown,oneforeachofthreecellphones.Foragivenfeatureandphone,theportionsofthebaraboveandbelowthehorizontallinerepresentthenumberofreviewsthatexpresspositiveornegativeviewsaboutthatcamerasfeature,respectively.(Thesystemcanalsoplotthepercentagesofpositiveornegativeopinions,ratherthantherawnumbers.)Thepaneontheupper-rightdisplaysthepositivesentencesregardingoneoftheproducts.andpositivesagainstpositives.ThissimultaneouscomparisonismademuchmoredicultifthebarsallhaveoneendplantedŽatthesamelocation,asisthecaseforstandardhistogramssuchastheonedepictedinFigure5.7.WhilethedataforthefeaturesarepresentedsequentiallyinFigure5.6(“rstGeneral,ŽthenLCD,Žandsoforth),analternativevisualizationtechniquecalledaroseplotisexempli“edinFigure5.8,whichdepictsasampleoutputofthesystemdevelopedbyGregoryetal.[113].Themedianandquartilesacrossadocumentsub-collectionofthepercentageofpositiveandnegativewordsperdocument,togetherwithsimilardataforotherpossibleaect-classi“cationdimensions,arerepresentedviaavariantofboxplots.(Adaptationtorawcountsrather 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.7AportionofFigure4ofYiandNiblack[324],rotatedtosavespaceandfacilitatecomparisonwithFigure5.6.NoticethatsimultaneouscomparisonofthenegativecountsandthepositivecountsfortwodierentproductsisnotaseasyasitisinFigure5.6.thanpercentagesisstraightforward.)MappingthisideatoproductcomparisonsinthestyleofOpinionObserver,onecouldassociatedif-ferentfeatureswithdierentcompassdirections,Že.g.,thefeaturebatterylifeŽwithsouthwest,Žaslongasthenumberoffeaturesbeingreportedonisnottoolarge.Thereasonthatthisrepresentationmightproveadvantageousinsomesettingsisthatinsomesituations,acir-culararrangementmaybemorecompactthanasequentialone,anditmaybeeasierforausertorememberafeatureasbeingsouthwestŽthanasbeingthe“fthofeight.ŽAnadditionalfunctionalityofthesystemthatisnotshowninthe“gureistheabilitytodepicthowmuchanindividualdocumentspositive/negativepercentagediersfromtheaverageforagivendocumentgrouptowhichthedocumentbelongs.AsimilarcircularlayoutisproposedinSubasicandHuettner[285]forvisualizingvariousdimensionsofaectwithinasingledocument.Morinagaetal.[215]opttorepresentdegreesofassociationbetweenproductsandopinion-indicativetermsofapre-speci“edpolarity.First, 78Summarization Fig.5.8Figure7ofGregoryetal.[113].Ontherightaretworoseplots,oneforeachoftwoproducts;ontheleftistheplotslegend.Ineachroseplot,theuppertwopetalsŽrepresentpositivityandnegativity,respectively(oftheothersixpetals,thebottomtwoareviceandvirtue,etc.).Similarlytoboxplots,themedianvalueisindicatedbyadarkarc,andthequartilesby(colored)bandsaroundthemedianarc.Darkershadingforoneofthetwopetalsinapair(e.g.,positiveandnegativeŽ)aremeanttoindicatethenegativeendofthespectrumfortheaectdimensionrepresentedbythegivenpetalpair.Thehistogrambeloweachroserelatestothenumberofdocumentsrepresented.opinionsaregatheredusingtheauthorspre-existingsystem[291].Coding-lengthandprobabilisticcriteriaareusedtodeterminewhichtermstofocuson,andprincipalcomponentanalysisisthenappliedtoproduceatwo-dimensionalvisualization,suchthatnearnesscorre-spondstostrengthofassociation,asintheauthorspreviouswork[184].Thus,inFigure5.9,weseethatcellphoneAisassociatedwithwhatwerecognizeaspositiveterms,whereascellphoneCisassociatedwithnegativeterms. 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.9Figure5ofMorinagaetal.[215]:principal-components-analysisvisualizationofassociationsbetweenproducts(squares)andautomaticallyselectedopinion-orientedterms5.2.3.3TemporalVariationandSentimentTimelinesSofar,thesummarieswehaveconsidereddonotexplicitlyincorpo-rateanytemporaldimension.However,timeisoftenanimportantFirst,usersmaywishtoviewindividualreviewsinreversechrono-logicalorder,i.e.,newest“rst.Indeed,atthetimeofthiswriting,thisisoneofthetwosortingoptionsthatAmazonpresents. Second,inmanyapplications,analystsandotherusersareinter-estedintrackingchangesinsentimentaboutaproduct,politicalcan-didate,company,orissueovertime.Clearly,onecancreateasentimenttimelinesimplybyplottingthevalueofachosensummarystatisticatdierenttimes;thechosenstatisticcanre”ecttheprevailingpolarity[170,296]orsimplythenumberofmentions,inwhichcasewhatisbeingmeasuredisperhapsnotsomuchpublicopinion,butratherpub-licawareness[102,197,211,212].Suchworkisstronglyrelatedataconceptualleveltotopicdetectionandtracking[8],areviewofwhichisbeyondthescopeofthissurvey.MishneanddeRijke[212]alsodepictthederivativeofthesummarystatisticconsideredasafunctionoftime.5.2.4Review(er)QualityHowdoweidentifywhatisgood?Andhowdowecensurewhatisbad?Wewillarguethatdevelopingahumanereputationsystemecologycanprovidebet-teranswerstothesetwogeneralquestions„restrain-ingthebasersideofhumannature,whileliberatingthehumanspirittoreachforeverhighergoals.„Manifestoforthereputationsociety.ŽMasumandZhang[203]Whencreatingsummariesofreviewsoropinionatedtext,animpor-tanttypeofinformationthatdeservescarefulconsiderationiswhetherornotindividualreviewsarehelpfuloruseful.Forexample,asystemmightwanttodownweightorevendiscardunhelpfulreviewsbeforecre-atingsummariesorcomputingaggregatestatistics,asinLiuetal.[193].Alternatively,thesystemcoulduseallreviews,butprovidehelpfulnessindicatorsforindividualreviewsasasummaryoftheirexpectedutil-ity.Indeed,non-summarizationsystemscouldusesuchinformation,too:forinstance,areview-orientedsearchenginecouldrankitssearchresultsbyhelpfulness.Somewebsitesalreadygatherhelpfulnessinformationfromhumanreaders.Forexample,Amazon.comannotatesreviewswithcommentslike120of140peoplefoundthefollowingreviewhelpful,Žmeaning 5.2Multi-DocumentOpinion-OrientedSummarizationthatofthe140peoplewhopressedoneoftheyesŽornoŽbut-tonsinresponsetothequestionWasthisreviewhelpfultoyou?Ž„wedeemthese140peopleutilityevaluators„120choseyes.ŽSimi-larly,theInternetMovieDatabase(IMDb,http://www.imdb.com)alsoannotatesusercommentswithoutofpeoplefoundthefollowingcommentuseful.ŽThissimilarityisperhapsnotsurprisingduetothefactthatAmazonownsIMDb,althoughfromaresearchpointofview,notethatthetwopopulationsofusersareprobablyatleastsomewhatdisjoint,meaningthattheremightbeinterestingdierencesbetweenthesourcesofdata.OthersitessolicitingutilityevaluationsincludeYahoo!MoviesandYahoo!TV,whichallowtheusertosortreviewsbyhelpfulness;CitySearch,whichsolicitsutilityevaluationsfromgeneralusersandgivesmorehelpfulreviewsgreaterprominence;andEpin-ions,whichonlyallowsregisteredmemberstoratereviewsanddoesnotappeartohavehelpfulnessasasortcriterion,atleastfornon-registeredvisitors.(Welearnedaboutthesolicitationofutilityeval-uationsbyIMDbfromZhuangetal.[332]andbyCitysearchfromDellarocas[71].)Despitethefactthatmanyreview-aggregationsitesalreadyprovidehelpfulnessinformationgatheredfromhumanusers,therearestillatleasttworeasonswhyautomatichelpfulnessclassi“cationisausefullineofworktopursue.Itemsthatlackutilityevaluations.Manyreviewsreceiveveryfewutilityevaluations.Forexample,38%ofasampleofroughly20,000AmazonMP3-playerreviews,and31%ofthoseagedatleastthreemonths,receivedthreeorfewerutilityevaluations[161].Similarly,Liuetal.[193]con“rmonespriorintuitionsthatAmazonreviewsthatareyoungestandreviewsthataremostlowlyranked(i.e.,determinedtobeleasthelpful)bythesitereceivethefewestutilityevaluations. Wenotethatwewereunableto“ndAmazonsde“nitionofhelpful,Žandconcludethattheydonotsupplyone.Incontrast,Yahoo!speci“esthefollowing:Was[areview]informative,wellwrittenoramusing„aboveallwasitwashelpfultoyouinlearn-ingaboutthe[“lmorshow]?Ifso,thenyoushouldratethatreviewashelpful.ŽItmightbeinterestingtoinvestigatewhetherthesedieringpolicieshaveimplica-tions.TherehaveinfactbeensomecommentsthatAmazonshouldclarifyitsques-tion(http://www.amazon.com/Was-this-review-helpful-you/forum/Fx1JS1YLZ490S1O/Tx3QHE2JPEXQ1V7/1? encoding=UTF8&asin=B000FL7CAU). Perhapssomereviewsreceivenoutilityevaluationssimplybecausetheyaresoobviouslybadthatnobodybotherstoratethem.Butthisdoesnotimplythatreviewswithoututilityevaluationsmustnecessarilybeunhelpful;certainlywecannotassumethisofreviewstoorecentlywrittentohavebeenreadbymanypeople.Oneimportantrolethatautomatedhelpfulnessclassi“erscanplay,then,istoprovideutilityratingsinthemanycaseswhenhumanevaluationsarelacking.Skewinutilityevaluations.Anotherinterestingpotentialapplicationofautomatedhelpfulnessclassi“cationistocorrectforbiasesinhuman-generatedutilityevaluations.We“rstconsiderindirectevidencethatsuchbiasesexist.Itturnsoutthatjustasthedistributionofratingscanoftenbeheavilyskewedtowardthepositiveend,asdiscussedinSection5.2.3.2,thedistributionofutilityevaluationscanalsobeheavilyskewedtowardthehelpfulend,probablydueatleastinparttosimilarreasonsasintheproduct-ratingscase.Inacrawlofapproximately4millionuniqueAmazonreviewsforabout670,000books(excludingalternateeditions),theaverageper-centageofyesŽresponsesamongtheutilityevaluationsisbetween74%and70%,dependingonwhetherreviewswithfewerthan10util-ityevaluationsareexcluded(GueorgiKossinetsandCristianDanescuNiculescu-Mizil,personalcommunication).Similarly,halfofasampleofabout23,000Amazondigital-camerareviewshadhelpful/unhelpfulvoteratiosofover9to1[193].Asintheratingsdistributioncase,onesintuitionisthatthepercentageofreviewsthataretrulyhelpfulisnotashighasthesestatisticswouldseemtoindicate.Anothertypeofindi-rectevidenceofbiasisthatthenumberofutilityevaluationsreceivedbyareviewappearstodecreaseexponentiallyinhelpfulnessrankascomputedbyAmazon[193].(Certainlytherehastobesomesortofdecrease,sinceAmazonshelpfulnessrankingisbasedinpartonthenumberofutilityevaluationsareviewreceives.)Liuetal.[193]conjec-turethatreviewsthathavemanyutilityevaluationswillhaveadispro-portionatein”uenceonreaders(andutilityevaluators)becausetheyareviewedasmoreauthoritative,butreviewscouldgetmanyutilityeval-uationsonlybecausetheyaremoreprominentlydisplayed,notbecausereadersactuallycomparedthemagainstotherreviews.(Liuetal.[193] 5.2Multi-DocumentOpinion-OrientedSummarizationcallthistendencyforoften-evaluatedreviewstoquicklyaccumulateevenmoreutilityevaluationsaswinnercircleŽbias;inotherlitera-tureonpower-laweects,relatedphenomenaarealsoreferredtoasrich-get-richer.Ž)Asformoredirectevidence:Liuetal.[193]conductanre-annotationstudyinwhichtheAmazonreviewersutilityevaluationsoftendidnotmatchthoseofthehumanre-labelers.However,thislatterevidenceshouldbetakenwithagrainofsalt.First,insomeoftheexperimentsinthestudy,groundtruthŽhelpfulnesswasmeasuredby,amongotherthings,thenumberofaspectsofaproductthataredis-cussedbyareview.Second,inallexperiments,thetestitemsappeartohaveconsistedofonlythetextofasinglereviewconsideredinisolation.Itisnotclearthatthe“rstpointcorrespondstothestandardthatallAmazonreviewersused,orshouldberequiredtouse,andclearly,thesecondpointdescribesanisolated-textsettingthatisnottheonethatrealAmazonreviewersworkin.Toexemplifyboththeseobjections:averyshortreviewwrittenbyareputablecritic(e.g.,atopreviewerŽ)thatpointsoutsomethingthatreviewsmissedcan,indeed,bequitehelpful,butwouldscorepoorlyaccordingtothespeci“cationofLiuetal.[193].Indeed,thesampleprovidedofareviewthatshouldbelabeledbadŽstarts,Iwanttopointoutthatyoushouldneverbuyagenericbattery,likethepersonfromSanDiegowhoreviewedtheS410onMay15,2004,wasrecommending.Yesyoudsavemoney,buttherehavebeenmanyreportsofgenericbatteriesexplodingwhenchargedfortoolong.Wewouldviewthiscomment,iftrue,tobequitehelpful,despitethefactthatitfailsthespeci“cation.Anothertechnicalissueisthatthere-labelersusedafour-classcategorizationscheme,whereasessentiallyeverypossiblepercentageofpositiveutilityevaluationscouldformadistinctclassfortheAmazonlabels:itmighthavebeenbettertotreatreviewswithhelpfulnesspercentagesof60%and61%asequivalent,ratherthansayingthatAmazonreviewersratedthelatterasbetterthantheformer. Nonetheless,giventhelargepredominanceofhelpfulŽamongutil-ityevaluationsdespitethefactthatanecdotalevidencewehavegath-eredindicatesthatnotallreviewsdeservetobecalledhelpful,Žandgiventhesuggestiveresultsofthere-annotationexperimentjustdescribed,itislikelythatsomeofthehumanutilityevaluationsarenotstronglyrelatedtothequalityofthereviewathand.Thus,webelievethatcorrectionoftheseutilityevaluationsbyautomaticmeansisavalidpotentialapplication.Anoteregardingtheeectofutilityevaluations.Itisimportanttomentiononecaveatbeforeproceedingtodescriberesearchinthisarea.Parketal.[236]attemptedtodeterminewhattheeectofreviewqual-ityactuallyisonpurchasingintention,runningastudyinwhichsub-jectsengagedinhypotheticalbuyingbehavior.Theyfoundnon-uniformeects:low-involvement[i.e.,motivated]consumersareaectedbythequantityratherthanthequalityofreviews...high-involvementcon-sumersareaectedbyreviewquantitymainlywhenthereviewqualityishigh...Theeectofreviewqualityonhigh-involvementconsumersismorepronouncedwithasizablenumberofreviews,whereastheeectofreviewquantityissigni“cantevenwhenthereviewqualityislow.Ž(MoreontheeconomicimpactsofsentimentanalysisisdescribedinSection6.)5.2.4.1MethodsforAutomaticallyDeterminingReviewQualityInaway,onecouldconsiderthereview-qualitydeterminationproblemasatypeofreadabilityassessmentandapplyessay-scoringtechniques[19,99].However,whilesomeofthesystemsdescribedbelowdotrytotakeintoaccountsomereadability-relatedfeatures,theyaretailoredspeci“callytoproductreviews.Kimetal.[161],ZhangandVaradarajan[328],andGhoseandIpeirotis[106]attempttoautomaticallyrankcertainsetsofreviewsontheAmazon.comwebsiteaccordingtotheirhelpfulnessorutility,usingaregressionformulationoftheproblem.Thedomainsconsid-eredareabitdierent:MP3playersanddigitalcamerasinthe“rstcase;Canonelectronics,engineeringbooks,andPG-13moviesinthe 5.2Multi-DocumentOpinion-OrientedSummarizationsecondcase;andAVplayersplusdigitalcamerasinthethirdcase.Liuetal.[193]converttheproblemintooneoflow-qualityreviewdetection(i.e.,binaryclassi“cation),experimentingmostlywithman-ually(re-)annotatedreviewsofdigitalcameras,althoughCNeteditorialratingswerealsoconsideredontheassumptionthatthesecanbecon-sideredtrustworthy.RubinandLiddy[261]alsosketchaproposaltoconsiderwhetherreviewscanbeconsideredcredible.Kimetal.[161]studywhichofamultitudeoflength-based,lexical,POS-count,product-aspect-mentioncount,andmetadatafeaturesaremosteectivewhenutilizingSVMregression.Thebestfeaturecombi-nationturnedouttobereviewlengthplustf-idfscoresforlemmatizedunigramsinthereviewplusthenumberofstarsŽthereviewerassignedtotheproduct.Somewhatdisappointingly,thebestpairoffeaturesamongthesewasthelengthofthereviewandthenumberofstars.(UsingnumberofstarsŽastheonlyfeatureyieldedsimilarresultstousingjustthedeviationofthenumberofstarsgivenbytheparticularreviewerfromtheaveragenumberofstarsgrantedbyallreviewersfortheitem.)Theeectivenessofusingallunigramsappearstosubsumethatofusingaselectsubset,suchassentiment-bearingwordsfromtheGeneralInquirerlexicon[281].ZhangandVaradarajan[328]useadierentfeatureset.Theyemploya“nerclassi“cationoflexicaltypes,andmoresourcesforsub-jectiveterms,butdonotincludeanymeta-datainformation.Interest-ingly,theyalsoconsiderthesimilaritybetweenthereviewinquestionandtheproductspeci“cation,onthepremisethatagoodreviewshoulddiscussmanyaspectsoftheproduct;andtheyincludethereviewssimilaritytoeditorialreviews,onthepremisethateditorialreviewsrepresenthigh-qualityexamplesofopinion-orientedtext.(DavidandPinch[70]observe,however,thateditorialreviewsforbooksarepaidforandaremeanttoinducesalesofthebook.)However,theselattertwooriginalfeaturesdonotappeartoenhanceperformance.Thefeaturesthatappeartocontributethemostaretheclassofshallowsyntac-ticfeatures,which,theauthorsspeculate,seemtocharacterizestyle;examplesincludecountsofwords,sentences,wh-words,comparativesandsuperlatives,propernouns,etc.Reviewlengthseemstobeveryweaklycorrelatedwithutilityscore. WethusseethatKimetal.[161]“ndthatmeta-dataandverysimpletermstatisticssuce,whereasZhangandVaradarajan[328]observethatmoresophisticatedcuesthatappearcorrelatedwithlin-guisticaspectsappeartobemostimportant.Possibly,thedierenceisaresultofthedierenceindomainchoice:wespeculatethatbookandmoviereviewscaninvolvemoresophisticatedlanguageusethanwhatisexhibitedinreviewsofelectronics.Declaringthemselvesin”uencedbypriorworkoncreatingsubjectiv-ityextracts[232],GhoseandIpeirotis[106]takeadierentapproach.Theyfocusontherelationshipbetweenthesubjectivityofareviewanditshelpfulness.Thebasisformeasuringreviewsubjectivityisasfollows:usingaclassi“erthatoutputstheprobabilityofasen-tencebeingsubjective,onecancomputeforagivenreviewtheaver-agesubjectiveness-probabilityoverallitssentences,orthestandarddeviationofthesubjectivityscoresofthesentenceswithinthereview.Theyfoundthatboththestandarddeviationofthesentencesubjectiv-ityscoresandareadabilityscore(reviewlengthincharactersdividedbynumberofsentences)haveastronglystatisticallysigni“canteectonutilityevaluations,andthatthisissometimestrueoftheaveragesubjectiveness-probabilityaswell.Theythensuggestonthebasisofthisandotherevidencethatitisextremereviewsthatareconsideredtobemosthelpful,anddevelopahelpfulnesspredictorbasedontheirLiuetal.[193]consideredfeaturesrelatedtoreviewandsentencelength;brand,productandproduct-aspectmentions,withspecialcon-siderationforappearancesinreviewtitles;sentencesubjectivityandpolarity;andparagraphstructure.ŽThislatterreferstoparagraphsasdelimitedbyautomaticallydeterminedkeywords.Interestingly,thetechniqueoftakingthe30mostfrequentpairsofnounsornounphrasesthatappearatthebeginningofaparagraphaskeywordsyieldsseparatorpairssuchasprosŽ/cons,ŽstrengthŽ/weakness,ŽandtheupsidesŽ/downsides.Ž(Notethatthisdiersfromidenti-fyingproorconreasonsthemselves[157],oridentifyingthepolarityofsentences.Notealsothatotherauthorshaveclaimedthatdier-enttechniquesareneededforsituationsinwhichpro/condelimitersaremandatedbytheformatimposedbyareviewaggregationsite 5.2Multi-DocumentOpinion-OrientedSummarizationbutaseparatedetailedtextualdescriptionmustalsobeincluded,asinEpinions,asopposedtosettingswheresuchdelimitersneednotbepresentorwherealltextisplacedinthecontextofsuchdelim-iters[191].)Somewhatunconventionallywithrespecttoothertext-categorizationwork,thebaselinewastakenasSVMlightrunwiththreesentence-levelstatisticsasfeatures;thatis,theperformanceofaclas-si“ertrainedusingbag-of-wordfeaturesisnotreported.Giventhisunconventionalstartingpoint,theadditionofthefeaturesthatdonotre”ectsubjectivityorsentimenthelp.Includingsubjectivityandpolar-ityontopofwhathasalreadybeenmentioneddoesnotyieldfurtherimprovement,anduseoftitle-appearanceformentionsdidnotseemtohelp.Review-oropinion-spamdetection„theidenti“cationofdeliber-atelymisleadingreviews„isalineofworkbyJindalandLiu([141],shortversionavailableasJindalandLiu[140])inthesamevein.Onechallengetheseresearchersfacedwasthedicultyinobtaininggroundtruth.Therefore,forexperimentalpurposesthey“rstre-framedtheproblemasoneoftryingtorecognizeduplicatereviews,sinceaprioriitishardtoseewhypostingrepeatsofreviewsisjusti“ed.(However,onepotentialproblemwiththeassumptionthatrepeatedreviewscon-stitutesomesortofmanipulationattempt,atleastfortheAmazondatathatwasconsidered,isthatAmazonitselfcross-postsreviewsacrossdierentproducts„wheredierentŽincludesdierentinstan-tiations(e.g.,e-bookvs.hardcover)orsubsequenteditionsofthesameitem(GueorgiKossinetsandCristianDanescuNiculescu-Mizil,per-sonalcommunication).Speci“cally,inasampleofover1millionAma-zonbookreviews,aboutone-thirdwereduplicates,butthesewereallduetoAmazonscross-posting.Humanerror(e.g.,accidentallyhittingthesubmitŽbuttontwice)causesothercasesofnon-maliciousdupli-cates.)Asecondroundofexperimentsattemptedtoidentifyreviewsonbrandsonly,Žads,andotherirrelevantreviewscontainingnoopin-ionsŽ(e.g.,questions,answers,andrandomtexts).Someofthefeaturesusedweresimilartothoseemployedinthestudiesdescribedabove;othersincludedfeaturesonthereviewauthorandtheutilityevalua-tionsthemselves.Theoverallmessagewasthatthiskindofspamisrelativelyeasytodetect. 5.2.4.2Reviewer-IdentityConsiderationsIntheabove,wehavediscusseddeterminingthequalityofindividualreviews.Analternateapproachistolookatthequalityofthereview-ers;doingsocanbethoughtofasawayofclassifyingallthereviewsauthoredbythesamepersonatonce.Interestingly,onestudyhasfoundthatthereisarealeconomiceecttobeobservedwhenfactoringinreviewercredibility:Guetal.[114]notethataweightedaverageofmessage-boardpostingsinwhichpostercredibilityisfactoredinhaspredictionpoweroverfutureabnor-malreturnsofthestock,Žbutifpostingsareweighteduniformly,thepredictivepowerdisappears.Therehasbeenworkinanumberofareasinthehuman-language-technologiescommunitythatincorporatestheauthority,trustworthiness,in”uentialness,orcredibilityofauthors[94,96,141,275].PageRank[44,241]andhubsandauthorities(alsoknownasHITS)[163]areveryin”uentialexamplesofworkinlinkanalysisonidentifyingitemsofgreatimportance.Trustmetricsalsoappearinotherwork,suchasresearchintopeer-to-peerandreputationnetworksandinformationcredibility[71,115,147,174,252]. 6 BroaderImplications SentimentisthemightiestforceincivilizationJ.EllenFoster,WhatAmericaOwestoWomen,1893Aswehaveseen,sentiment-analysistechnologieshavemanypoten-tialapplications.Inthissection,webrie”ydiscusssomeofthelargerimplicationsthattheexistenceofopinion-orientedinformation-accessserviceshas..Onepointthatshouldbementionedisthatapplicationsthatgatherdataaboutpeoplespreferencescantriggerconcernsaboutpri-vacyviolations.Wesuspectthatinmanypeoplesminds,havingonespublicblogscannedbyacoeecompanyforpositivementionsofitsproductisonething;havingonescell-phoneconversationsmonitoredbytherulingpartyofonesowncountryfornegativementionsofgov-ernmentocialsisquiteanother.Itisnotourintenttocommentfur-therhereonprivacyissues,thesenotbeingissuesonwhichwearequali“edtospeak;rather,wesimplywanttobethoroughbyremind-ingthereaderthattheseissuesdoexistandareimportant,andthattheseconcernsapplytoalldata-miningtechnologiesingeneral. BroaderImplications.Butevenifwerestrictattentiontotheapparentlyfairlyharmlessdomainofbusinessintelligence,certainquestionsregardingthepotentialformanipulationdoarise.Companiesalreadyparticipateinmanagingonlineperceptionsaspartofthenormalcourseofpublic-relationseorts:...companiescantcontrolconsumer-generatedcon-tent.Theycan,however,paycloseattentiontoit.Inmanycases,oftentoalargedegree,theycanevenin”u-enceit.Infact,inasurveyconductedbyAberdeen[ofmorethan250enterprisesusingsocialmediamoni-toringandanalysissolutionsinadiversesetofenter-prisesŽ],morethantwiceasmanycompanieswithsocialmediamonitoringcapabilitiesactivelycontributetoconsumerconversationsthanremainpassiveobservers(67%versus33%).Overathirdofallcompanies(39%)contributetoonlineconversationsonafrequentbasis,interactingwithconsumersinaneorttoswayopinion,correctmisinformation,solicitfeedback,rewardloyalty,testnewideas,orforanynumberofotherreasons.„ZabinandJeeries[327]Anditisalsothecasethatsomearguablymildformsofmanipula-tionhavebeensuggested.Forinstance,onesetofauthors,instudyingthestrategicimplicationsforacompanyofoeringonlineconsumerreviews,notesthatifitispossibleforthesellertodecidethetim-ingtooerconsumerreviewsattheindividualproductlevel,itmaynotalwaysbeoptimaltooerconsumerreviewsataveryearlystageofnewproductintroduction,evenifsuchreviewsareavailableŽ([57],quotationfromtheJuly2004working-paperversion),andothershaveworkedonamanufacturer-orientedsystemthatranksreviewsaccord-ingtotheirexpectedeectonsales,Žnotingthatthesemightnotbetheonesthatareconsideredtobemosthelpfultousers[106].Butstill,thereareconcernsthatcorporationsmighttrytofurthergamethesystemŽbytakingadvantageofknowledgeofhowrankingsystemsworkinordertosuppressnegativepublicity[124]orengageinotherso-calledblack-hatsearchengineoptimizationŽandrelated activities.Indeed,therehasalreadybeenaterm„sockpuppetŽ„coinedtorefertoostensiblydistinctonlineidentitiescreatedtogivethefalseimpressionofexternalsupportforapositionoropinion;StoneandRichtel[280]listseveralratherattention-grabbingexamplesofwell-knownwritersandCEOsengaginginsock-puppetry.Onarelatednote,DasandChen[67]recommendLeinweberandMadhavan[183]asaninterestingreviewofthehistoryofmarketmanipulationthroughdis-Onereasonthesepotentialsforabusearerelevanttothissurveyisthat,aspointedoutearlierintheIntroduction,sentiment-analysistechnologiesallowuserstoconsultmanypeoplewhoareunknowntothem;butthismeanspreciselythatitisharderforuserstoevaluatethetrustworthinessofthosepeople(orpeopleŽ)theyareconsulting.Thus,opinion-miningsystemsmightpotentiallymakeiteasierforuserstobemis-ledbymaliciousentities,aproblemthatdesignersofsuchsystemsmightwishtoprevent.Onthe”ipside,aninformation-accesssystemthatis(perhapsunfairly)perceivedtobevulnerabletomanipulationisonethatisunlikelytobewidelyused;thus,again,buildersofsuchsystemsmightwishtotakemeasurestomakeitdiculttogametheIntheremainderofthissection,then,wediscussseveralaspectsoftheproblemofpossiblemanipulationofreputation.Inparticular,welookatevidenceastowhetherreviewshaveademonstrableeco-nomicimpact:ifreviewsdosigni“cantlyaectcustomerpurchases,thenthereisarguablyaneconomicincentiveforcompaniestoengageinuntowardmeasurestomanipulatepublicperception;ifreviewsdonotsigni“cantlyaectcustomerpurchases,thenthereislittlerea-son,fromaneconomicpointofview,forentitiestotrytoarti“ciallychangetheoutputofsentiment-analysissystems„or,asDewally[74]asserts,thestockmarketdoesnotappeartoreacttotheserecom-...Thefearsraisedbythemediaaboutthedestabilizingpowerofsuchtraderswhoparticipateinthesediscussionsarethusgroundless.ŽIfsuchclaimsaretrue,thenitwouldseemthattryingtomanipulateperceptionsconveyedbyonlinereview-accesssystemswouldoerlittleadvantagestocompanies,andsotheywouldnotengageinit. BroaderImplications6.1EconomicImpactofReviewsAsmentionedearlierintheIntroductiontothissurvey,manyreadersofonlinereviewssaythatthesereviewssigni“cantlyin”uencetheirpur-chasingdecisions[63].However,whilethesereadersmayhavebelievedthattheyweresigni“cantlyin”uenced,Žperceptionandrealitycandier.Akeyreasontounderstandtherealeconomicimpactofreviewsisthattheresultsofsuchananalysishaveimportantimplicationsforhowmucheortcompaniesmightorshouldwanttoexpendononlinereputationmonitoringandmanagement.Giventheriseofonlinecommerce,itisnotsurprisingthatabodyofworkcenteredwithintheeconomicsandmarketingliteraturestud-iesthequestionofwhetherthepolarity(oftenreferredtoasvalenceŽ)and/orvolumeofreviewsavailableonlinehaveameasurable,signif-icantin”uenceonactualconsumerpurchasing.Eversincetheclas-sicmarketforlemonsŽpaper[6]demonstratingsomeproblemsformakersofhigh-qualitygoods,economistshavelookedatthevalueofmaintainingagoodreputationasameanstoovercometheseprob-lems[77,162,268,269],amongotherstrategies.(SeetheintroductiontoDewallyandEderington[75],fromwhichtheabovereferenceshavebeentaken,forabriefreview.)Onewaytoacquireagoodreputationis,ofcourse,byreceivingmanypositivereviewsofoneselfasamerchant;anotherisfortheproductsoneoerstoreceivemanypositivereviews.Forthepurposesofourdiscussion,weregardexperimentswhereinthebuyingishypotheticalasbeingoutofscope;instead,wefocusoneco-nomicanalysesofthebehaviorofpeopleengagedinrealshoppingandspendingrealmoney. Notethatresearchersintheeconomicscommunityhaveatraditionofcirculatingandrevisingworkingpapers,sometimesforyears,beforeproducinganarchivalversion.Inthereferencesthatfollow,wehavecitedthearchivalversionwhenjournal-versionpublicationdatahasbeenavailabletous,inordertoenabletheinterestedreadertoaccessthe“nal,peer-reviewedversionofthework.Butbecauseofthispolicy,thereaderwhowishestodelveintothisliteraturefurthershouldkeepinmindthefollowingtwopoints.First,manycitationswithintheliteraturearetopreliminaryworkingpapers.Thismeansthatourcitationsmaynotpreciselymatchthosegiveninthepapersthemselves(e.g.,theremaybetitlemismatches).Second,workthatwasdoneearliermaybecitedwithalaterpublicationdate;therefore,thedatesgiveninourcitationsshouldnotbetakentoindicateresearch 6.1EconomicImpactofReviewsThegeneralformthatmoststudiestakeistousesomeformofhedo-nicregression[259]toanalyzethevalueandthesigni“canceofdier-entitemfeaturestosomefunction,suchasameasureofutilitytothecustomer,usingpreviouslyrecordeddata.(ExceptionsincludeResnicketal.[253],whorananempiricalexperimentcreatingnewŽsellersoneBay,andJinandKato[136],whomadeactualpurchasestovalidatesellerclaims.)Speci“ceconomicfunctionsthathavebeenexaminedincluderevenue(box-ocetake,salesrankonAmazon,etc.),revenuegrowth,stocktradingvolume,andmeasuresthatauction-siteslikeeBaymakeavailable,suchasbidpriceorprobabilityofabidorsalebeingmade.Thetypeofproductconsideredvaries(although,understand-ably,thoseoeredbyeBayandAmazonhavereceivedmoreattention):examplesincludebooks,collectiblecoins,movies,craftbeer,stocks,andusedcars.Itisimportanttonotethatsomeconclusionsdrawnfromonedomainoftendonotcarryovertoanother;forinstance,reviewsseemtobein”uentialforbig-ticketitemsbutlesssoforcheaperitems.Buttherearealsocon”icting“ndingswithinthesamedomain.Moreover,dierentsubsegmentsoftheconsumerpopulationmayreactdierently:forexample,peoplewhoaremorehighlymotivatedtopurchasemaytakeratingsmoreseriously.Additionally,insomestudies,positiverat-ingshaveaneectbutnegativeonesdonot,andinotherstudiestheoppositeeectisseen;thetimingofsuchfeedbackandvariouschar-acteristicsofthemerchantorofthefeedbackitself(e.g.,volume)mayalsobeafactor.Nonetheless,toglossovermanydetailsforthesakeofbrevity:ifoneallowsanyeect„includingcorrelationevenifsaidcorre-lationisshowntobenotpredictive„thatpassesastatisticalsig-ni“cancetestatthe0.05leveltobeclassedassigni“cant,Žthenmanystudies“ndthatreviewpolarityhasasigni“canteconomiceect[13,14,23,31,35,47,59,62,68,72,75,76,81,82,128,136,145,180,195,196,198,207,208,214,237,250,253,278,297,331].Butthereareafewstudiesthatconcludeemphaticallythatreviewpositivityorneg-ativityhasnosigni“canteconomiceect[56,74,80,87,100,194,325].Duanetal.[80]explicitlyrelatetheir“ndingstotheissueofcorpo-ratemanipulation:Fromthemanagerialperspective,weshowthatconsumersarerationalininferringmoviequalityfromonlineuser BroaderImplicationsreviewswithoutbeingundulyin”uencedbytherating,thuspresentingachallengetobusinessesthattrytoin”uencesalesthroughplantingonlineword-of-mouth.ŽWithrespecttoeectsthathavebeenfound,theliteraturesurveycontainedinResnicketal.[253]statesthatAtthelargerendofeectsizesforpositiveevaluations,themodelin[Livingston[196]]“ndsthatsellerswithmorethan675positivecommentsearnedapremiumof$45.76,morethan10%ofthemeansellingprice,ascomparedtonewsellerswithnofeedback....Atthelargerendofeectsizesfornegatives,[Lucking-Reileyetal.[198]],lookingatcollectiblecoins,“ndsthatamovefrom2to3negativescutsthepriceby11%,about$19fromameanpriceof$173.Butingeneral,theclaimsofstatisticallysigni“canteectsthathavebeenmadetendtobe(a)quali“edbyanumberofimportantcaveats,and(b)quitesmallinabsolutetermsperitem,althoughontheotherhandagain,smalleectsperitemcanaddupwhenmanyitemsareinvolved.Withregardtothisdiscussion,thefollowingexcerptfromHouserandWooders[128]isperhapsilluminating:...onaverage,3.46percentofsalesisattributabletothesellerspositivereputationstock.Similarly,ouresti-matesimplythattheaveragecosttosellersstemmingfromneutralornegativereputationscoresis$2.28,or0.93percentofthe“nalsalesprice.Ifthesepercent-agesareappliedtoallofeBaysauctions($1.6billioninthefourthquarterof2000),thiswouldimplythatsellerspositivereputationsaddedmorethan$55mil-liontothevalueofsales,whilenon-positivesreducedsalesbyabout$15million.Ignoringforthemomentthefactthat,asmentionedabove,otherpapersreportdieringorevenopposite“ndings,wesimplynotethatthechoiceofwhethertofocuson0.93%,Ž$2.28,Žor$55millionŽ 6.1EconomicImpactofReviews(andwhethertoviewthelatteramountasseemingparticularlylargeornot)isoneweprefertoleavetothereader.Letusnowmentionsomeparticularpapersand“ndingsofpartic-ularinterest.6.1.1SurveysSummarizingRelevantEconomicLiteratureResnicketal.[253]andBajariandHorta¸csu[24]aregoodentrypointsintothisbodyofliterature.Theyprovideverythoroughoverviewsanddiscussionofthemethodologicalissuesunderlyingthestudiesmentionedabove.Hankin[118]suppliesseveralvisualsummariesthataremodeledaftertheliterature-comparisontablesinDellarocas[71],Resnicketal.[253],andBajariandHorta¸csu[24].Alistofanumberofpapersonthegeneralconceptofsentimentinbehavioral“nancecanbefoundathttp://sentiment.behavioural“nance.net/.6.1.2Economic-ImpactStudiesEmployingAutomatedTextAnalysisInmostofthestudiescitedabove,theorientationofareviewwasderivedfromanexplicitratingindicationsuchasnumberofstars,butafewstudiesappliedmanualorautomaticsentimentclassi“cationtoreviewtext[13,14,35,47,67,68,214,237].Atleastonerelatedsetofstudiesclaimsthatthetextofthereviewscontainsinformationthatin”uencesthebehavioroftheconsumers,andthatthenumericratingsalonecannotcapturetheinformationinthetextŽ[106]„seealsoGhoseetal.[107],whoadditionallyattempttoassignadollarvalueŽtovariousadjective-nounpairs,adverb-verbpairs,orsimilarlexicalcon“gurations.Inarelatedvein,PavlouandDimoka[237]suggestthattheapparentsuccessoffeedbackmecha-nismstofacilitatetransactionsamongstrangersdoesnotmainlycomefromtheircrudenumericalratings,butratherfromtheirrichfeed-backtextcomments.ŽAlso,ChevalierandMayzlin[59]interprettheir“ndingsontheeectofreviewlengthasprovidingsomeevidencethatpeopledoreadthereviewsratherthansimplyrelyingonnumerical BroaderImplicationsOntheotherhand,CabralandHorta¸csu[47],inaninterestingexperiment,lookat41oddcasesoffeedbackonsellerspostedoneBay:whatwasunusualwasthatthefeedbacktextwasclearlypositive,butthenumericalratingwasnegative(presumablyduetousererror).Anal-ysisrevealsthatthesereviewshaveastronglysigni“cant(botheco-nomicallyandstatisticallyŽ)detrimentaleectonsalesgrowthrate„indicatingthatcustomersseemedtoignorethetextinfavoroftheincorrectsummaryinformation.Insomeofthesetext-basedstudies,whatwasanalyzedwasnotsentimentpersebutthedegreeofpolarization(disagreement)amongasetofopinionateddocuments[13,68]or,inspiredinpartbyPangandLee[233],theaverageprobabilityofasentencebeingsubjectivewithinagivenreview[106].GhoseandIpeirotis[106]alsotakeintoaccountthestandarddeviationforsentencesubjectivitywithinareview,inordertoexaminewhetherreviewscontainingamixofsubjectiveandobjectivesentencesseemtohavedierenteectsfromreviewsthataremostlypurelysubjectiveorpurelyobjective.Someinitiallyunexpectedtexteectsareoccasionallyreported.Forexample,Archaketal.[14]foundthatamazingcamera,Žexcellentcamera,Žandrelatedphraseshaveanegativeeectondemand.Theyhypothesizethatconsumersconsidersuchphrases,especiallyiffewdetailsaresubsequentlyfurnishedinthereview,toindicatehyperboleandhenceviewthereviewitselfasuntrustworthy.Similarly,Archaketal.[14]andGhoseetal.[107]discoverthatapparentlypositivecom-mentslikedecentqualityŽorgoodpackagingŽalsohadanegativeeect,andhypothesizethattheveryfactthatmanyreviewscontainhyperboliclanguagemeanthatwordslikedecentŽareinterpretedaslukewarm.These“ndingsmightseempertinenttothedistinctionbetweenthepolarityandthecontextualpolarityoftermsandphrases,bor-rowingtheterminologyofWilsonetal.[319].Priorpolarityreferstothesentimentatermevokesinisolation,asopposedtothesentimentthetermevokeswithinaparticularsurroundingcontext;PolanyiandZaenen[242]pointoutthatidentifyingpriorpolarityalonemaynotsuf-“ce.Withrespecttothisdistinction,thestatusoftheobservationsofArchaketal.[14]justmentionedisnotentirelyclear.Thesuperlatives 6.2ImplicationsforManipulation(amazingŽ)areclearlyintendedtoconveypositivesentimentregard-lessofwhetherthereviewauthorsactuallymanagedtoconvinceread-ers;thatis,contextisonlyneededtoexplaintheeconomiceectofloweredsales,nottheinterpretationofthereviewitself.Inthecaseofwordslikedecent,Žonecouldpotentiallymakethecasethatthepriororientationofsuchwordsisinfactneutralratherthanpositive;butalternatively,onecouldargueinsteadthatinasettingwheremanyreviewsarehighlyenthusiastic,thecontextualorientationofdecentŽisindeeddierentfromitspriororientation.6.1.3InteractionswithWordofMouth(WOM)Onefactorthatsomestudiespointoutisthatthenumberofreviews,positiveornegative,maysimplyre”ectwordofmouth,Žsothatinsomecases,whatisreallytheunderlyingcorrelative(ifany)ofeconomicimpactisnottheamountofpositivefeedbackpersebutmerelytheamountoffeedbackintotal.Thisexplainswhyinsomesettings(butnotall),negativefeedbackisseentoincreaseŽsales:theincreasedbuzzŽbringsmoreattentiontotheproduct(orperhapssimplyindicatesmoreattentionisbeingpaidtotheproduct,inwhichcaseitwouldnotbepredictiveperse).6.2ImplicationsforManipulationRegardingtheincentivesformanipulation,itisdiculttodrawacon-clusiononewayortheotherfromthestudieswehavejustexamined.Onecautiouswaytoreadtheresultssummarizedintheprevioussectionisasfollows.Whiletheremaybesomeeconomicbene“tinsomesettingsforacorporationtoplantpositivereviewsorotherwiseattempttouseuntowardmeanstomanufactureanarti“ciallyin”atedreputationorsuppressnegativeinformation,itseemsthatingeneral,agreatdealofeortandresourceswouldberequiredtodosoforperhapsfairlymarginalreturns.Moreworkisclearlyrequired,though;asBajariandHorta¸csu[24]conclude,Thereisstillplentyofworktobedonetounderstandhowmarketparticipantsutilizetheinformationcontainedinthefeedbackforumsystem.ŽSurveyingthestateoftheartinthissubjectisbeyondthescopeofthissurvey;afairlyconcise BroaderImplicationsreviewofissuesregardingonlinereputationsystemsmaybefoundinDellarocas[71].Wewouldliketoconclude,though,bypointingoutaresultthatindicatesthatevenifillegitimatereviewsdogetthrough,opinion-miningsystemscanstillbevaluabletoconsumers.AwerbuchandKleinberg[22]studythecompetitivecollaborativelearningŽsettinginwhichsomeoftheusersareassumedtobeByzantineŽ(mali-cious,dishonest,coordinated,andabletoeavesdroponcommunica-tions),andproductorresourcequalityvariesovertime.Theauthorsformulatetheproductselectionproblemasatypeofmulti-armedban-ditŽproblem.Theyshowthestrikingresultthatevenifonlyaconstantfractionofusersarehonestand(unbeknownsttothem)groupedintomarketsegmentssuchthatallmembersofablocksharethesameproductpreferences„withtheimplicationthattherecommendationsofanhonestusermaybeuselesstohonestusersindierentmarketsegments„thenthereisstillanalgorithmbywhich,intimepolyno-mialinlog(),theaverageregretperhonestuserisarbitrarilysmall(assumingthatthenumberofproductsorresourcesonoerisRoughlyspeaking,thealgorithmcausesuserstotendtoraisetheprob-abilityofgettingrecommendationsfromvaluablesources.Thus,eveninthefaceofratherstioddsandformidableadversaries,honestuserscan„atleastintheory„stillgetgoodadvicefromsentiment-analysis 7 PubliclyAvailableResources 7.1Datasets7.1.1AcquiringLabelsforDataOnesourceofopinion,sentiment,andsubjectivitylabelsis,ofcourse,manualannotation[172,309].However,researchersinthe“eldhavealsomanagedto“ndwaystoavoidmanualannotationbyleveragingpre-existingresources.Acom-montechniqueistouselabelsthathavebeenmanuallyassigned,butnotbytheexperimentersthemselves;thisexplainswhyresearchersinopinionminingandsentimentanalysishavetakenadvantageofRottenTomatoes,Epinions,Amazon,andothersiteswhereusersfurnishrat-ingsalongwiththeirreviews.Someothernoteworthytechniquesareasfollows:SentimentsummariescanbegatheredbytreatingthereviewsnippetsthatRottenTomatoesfurnishesasone-sentencesummaries[33].Subjectivevs.non-subjectivetextsonthesametopiccanbegatheredbyselectingeditorialsversusnon-editorialnewswire PubliclyAvailableResources[308,326]orbyselectingmoviereviewsversusplotsum-maries[222,232].Ifsentiment-orientedsearchenginesalreadyexist(oneexam-pleusedtobeOpinmind),thenonecanissuetopicalqueriestosuchsearchenginesandharvesttheresultstogetsentiment-bearingsentencesmoreorlessguaranteedtobeon-topic[206].(Ontheotherhand,thereissomethingcircu-laraboutthisapproach,sinceitbootstrapsoofsomeoneelsessolutiontotheopinion-miningproblem.)Onemightbeabletoderiveaectlabelsfromemoticonsemoticons€Textpolaritymaybeinferredfromcorrelationswithstock-marketbehaviororothereconomicindicators[168,107].Viewpointlabelscanbederivedfromimagesofpartylogosthatusersdisplay[160].Negativeopinionscanbegatheredbyassumingthatwhenonenewsgrouppostcitesanother,itistypicallydonetoindi-catenegativesentimenttowardthecitedpost[4].Amorere“nedapproachtakesintoaccountindicationsofshout-ing,Žsuchastextrenderedallincapitalletters[110].Onepointtomentionwithregardstositeswhereusersratethecontributionsofotherusers„suchastheexamplesofAmazonandEpinionsmentionedabove„isapotentialbiastowardpositivescores[59,74,128,132,240,253],aswehavementionedabove.Insomecases,thiscomesaboutbecauseofsociologicaleects.Forexample,PinchandAthanasiades[240],inastudyofamusic-orientedsitecalledACIDplanet,foundthatvariousforcestendtocauseuserstogivehighratingstoeachothersmusic.Theusersthemselvesrefertothisphe-nomenonasR=RŽ(reviewmeandIwillreviewyou),amongother,lesspolite,names,andtheACIDplanetadministratorsintroducedaformofanonymousreviewingtoavoidthisissueincertainscenarios.Thus,thereisthequestionofwhetheronecantrusttheautomati-callydeterminedlabelsthatoneistrainingonesclassi“ersupon.(Afterall,youoftengetwhatyoupayfor,astheysay.)Indeed,Liuetal.[193]essentiallyre-labeledtheirreview-qualityAmazondataduetoconcerns 7.1Datasetsaboutbias,asdiscussedinSection5.2.4.Ontheotherhand,whilethisphenomenonimpliesthatreviewersmaynotalwaysbesincere,wehypothesizethatthisphenomenondoesnotgreatlyaectthequalityoftheauthorsmeta-datalabelsatre”ectingtheintendedsentimentofthereviewitself.Thatis,wehypothesizethatinmanycasesonecanstilltrustthereviewslabel,evenifonedoesnottrustthereview.7.1.2AnAnnotatedListofDatasetsThefollowinglistisinalphabeticalorder.Blog06[registrationandfeerequired]TheUniversityofGlasgowdistributesthis25GBTRECtestcol-lection,consistingofblogpostsoverarangeoftopics.Accessinformationisavailableathttp://ir.dcs.gla.ac.uk/test collections/access to data.html.IncludedinthedatasetaretopblogsŽthatwereprovidedbyNielsenBuzzMetricsandsupplementedbytheUniversityofAmsterdamŽ[227],andsomespamblogs,alsoknownassplogs,Žthatwereplantedinthecorpusinordertosimulateamorerealisticset-ting.Assessmentsincluderelevancejudgmentsandlabelsastowhetherpostscontainrelevantopinionsandwhatthepolarityoftheopinionswas(positive,negative,oramixtureofboth).MacdonaldandOunis[199]givemoredetailsonthecreationofthecorpusandthecollectionsfeatures,andincludesomecomparisonwithanothercollectionofblogpostings,theBlogPulsedataset(contactinformationcanbefoundonthefollowingagreementform:http://www.blogpulse.com/www2006-workshop/datashare-agreement.pdf,butitmaybeoutofdate).Congressional”oor-debatetranscripts:http://www.cs.cornell.edu/home/llee/data/convote.htmlThisdataset,“rstintroducedinThomasetal.[294],includesspeechesasindividualdocumentstogetherwith:Automaticallyderivedlabelsforwhetherthespeakersup-portedoropposedthelegislationdiscussedinthedebatethespeechappearsin,allowingforexperimentswiththiskindofsentimentanalysis. PubliclyAvailableResourcesIndicationsofwhichdebateŽeachspeechcomesfrom,allow-ingforconsiderationofconversationalstructure.Indicationsofby-namereferencesbetweenspeakers,allow-ingforexperimentsonagreementclassi“cationifoneassignsgold-standardagreementlabelsfromthesupport/opposelabelsassignedtothepairofspeakersinquestion.TheedgeweightsandotherinformationderivedtocreatethegraphsusedinThomasetal.[294],facilitatingimplemen-tationofalternativegraph-basedmethodsuponthegraphsconstructedinthatearlierwork.Cornellmovie-reviewdatasets:http://www.cs.cornell.edu/people/pabo/movie-review-data/Thesecorpora,“rstintroducedinPangandLee[232,233],consistofthefollowingdatasets,whichincludeautomaticallyderivedlabels.Sentimentpolaritydatasets:„document-level:polaritydatasetv2.0:1000positiveand1000negativeprocessedreviews.(Anearlierver-sionofthisdataset(v1.0)was“rstintroducedinPangetal.[235].)„sentence-level:sentencepolaritydatasetv1.0:5331positiveand5331negativeprocessedsentences/snippets.Sentiment-scaledatasets:scaledatasetv1.0:acollectionofdocumentswhoselabelscomefromaratingscale.Subjectivitydatasetv1.0:5000subjectiveand5000objectiveprocessedsentences.Weshouldpointoutthattheexistenceofthepolarity-baseddatasetsdoesnotindicatethatthecurators(i.e.,us)believethatreviewswithmiddlingratingsarenotimportanttoconsiderinpractice(indeed,thesentiment-scalecorporacontainsuchdocuments).Rather,therationaleincreatingthepolaritydatasetwasasfollows.Atthe 7.1Datasetstimethecorpuscreationwasbegun,theapplicationofmachinelearn-ingtechniquestosentimentclassi“cationwasverynew,and,asdis-cussedinSection3,itwasnaturaltoassumethattheproblemcouldbeverychallengingtosuchtechniques.Therefore,thepolaritycor-puswasconstructedtobeaseasyŽfortext-categorizationtechniquesaspossible:thedocumentsfellintooneoftwowell-separatedandsize-balancedcategories.Thepointwas,then,tousethiscorpusasalenstostudytherelativedicultyofsentimentpolarityclassi“ca-tionascomparedtostandardtopic-basedclassi“cation,wheretwo-balanced-classproblemswithwell-separatedcategoriesposeverylittlechallenge.AlistofpapersthatuseorreportperformanceontheCornellmovie-reviewdatasetscanbefoundathttp://www.cs.cornell.edu/people/pabo/movie-review-data/otherexperiments.html.Customerreviewdatasets:http://www.cs.uic.edu/Thisdataset,introducedinHuandLiu[129],consistsofreviewsof“veelectronicsproductsdownloadedfromAmazonandCnet.Thesentenceshavebeenmanuallylabeledastowhetheranopin-ionisexpressed,andifso,whatfeaturefromapre-de“nedlistisbeingevaluated.Anaddendumwithnineproductsisalsoavailable(http://www.cs.uic.edu/liub/FBS/Reviews-9-products.rar)andhasbeenutilizedinrecentwork[78].Thecurator,BingLiu,alsodistributesacomparative-sentencedatasetthatisavailablebyrequest.Economining:http://economining.stern.nyu.edu/datasets.htmlThissite,hostedbytheSternSchoolatNewYorkUniversity,consistsofthreesetsofdata:Transactionsandpricepremiums.FeedbackpostingsformerchantsatAmazon.com.Automaticallyderivedsentimentscoresforfrequentevalua-tionphrasesatAmazon.com. PubliclyAvailableResourcesTheseformedthebasisfortheworkreportedinGhoseetal.[107],whichfocusesoninteractionsbetweensentiment,subjectivity,andeconomicFrenchsentences:http://www.psor.ucl.ac.be/personal/yb/Resource.htmlThisdataset,introducedinBestgenetal.[36],consistsof702sentencesfromaBelgian…Frenchnewspaper,withlabelsassignedbytenjudgesastounpleasant,neutralorpleasantcontent,usingaseven-pointscale.MPQACorpus:http://www.cs.pitt.edu/mpqa/databaserelease/TheMPQAOpinionCorpuscontains535newsarticlesfromawidevarietyofnewssources,manuallyannotatedatthesententialandsub-sententiallevelforopinionsandotherprivatestates(i.e.,beliefs,emo-tions,sentiments,speculations,andsoon).Wiebeetal.[309]describestheoverallannotationscheme;Wilsonetal.[319]describesthecontex-tualpolarityannotationsandanagreementstudy.Multiple-aspectrestaurantreviews:http://people.csail.mit.edu/bsnyder/naacl07Thecorpus,introducedinSnyderandBarzilay[272],consistsof4,488reviews,bothinraw-textandinfeature-vectorform.Eachreviewgivesanexplicit1-to-5ratingfor“vedierentaspects„food,ambiance,ser-vice,value,andoverallexperience„alongwiththetextofthereviewitself,allprovidedbythereviewauthor.Aratingof“vewasthemostcommonoverallaspects,andSnyderandBarzilay[272]reportthat30.5%ofthe3,488reviewsintheirrandomlyselectedtrainingsethadaratingof“veforall“veaspects,althoughnoothertupleofratingswasrepresentedbymorethan5%ofthetrainingset.ThecodeusedinSny-derandBarzilay[272]isalsodistributedattheaforementionedURL.Theoriginalsourceforthereviewswashttp://www.we8there.com/;datafromthesamewebsitewasalsousedbyHigashinakaetal.[122].Multi-DomainSentimentDataset:http://www.cis.upenn.edu/mdredze/datasets/sentiment/Thisdataset,introducedinBlitzeretal.[40],consistsofproduct 7.2EvaluationCampaignsreviewsfromseveraldierentproducttypestakenfromAmazon.com,somewith1-to-5starlabels,someunlabeled.NTCIRmultilingualcorpus[registrationrequired]ThecorpusfortheNTCIR6pilottaskconsistsofnewsarticlesinJapanese,Chinese,andEnglishandformedthebasisoftheOpinionAnalysisTaskatNTCIR6[267].Thetrainingdatacontainsannotationsregardingopinionholders,theopinionsheldbyopinionholder,andsentimentpolarity,aswellasrelevanceinformationforasetofpre-determinedtopics.ThecorpusoftheNTCIRMultilingualOpinion-AnalysisTask(MOAT)isdrawnfromJapanese,Chinese,andEnglishblogs.Review-searchresultssets:http://www.cs.cornell.edu/home/llee/data/search-subj.htmlThiscorpus,usedbyPangandLee[234],consistsofthetop20resultsreturnedbytheYahoo!searchengineinresponsetoeachofasetof69queriescontainingthewordreview.ŽThequeriesweredrawnfromthepubliclyavailablelistofrealMSNusersqueriesreleasedforthe2005KDDCupcompetition[185];theKDDdataitselfisavailableathttp://www.acm.org/sigs/sigkdd/kdd2005/Labeled800Queries.zip.Thesearch-engineresultsinthecorpusareannotatedastowhethertheyaresubjectiveornot.NotethatsalespitchesŽweremarkedobjec-tiveonthepremisethattheyrepresentbiasedreviewsthatusersmightwishtoavoidseeing.7.2EvaluationCampaigns7.2.1TRECOpinion-RelatedCompetitionsTheTREC-BLOGŽwiki,http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/,isausefulsourceofinformationonthecompetitionssketchedbelow.TREC2006BlogTrack.TREC2006involvedaBlogtrack,withanopinionretrievaltaskdesignedpreciselytofocusontheopinionatedcharacterthatmanyblogshave:participatingsystemshadtoretrieve PubliclyAvailableResourcesblogpostsexpressinganopinionaboutaspeci“edtopic.Fourteengroupsparticipated;Ounisetal.[227]giveanoverviewoftheresults.Some“ndingsareasfollows.Withrespecttoperformanceonopiniondetection,theparticipatingsystemsseemedtofallintotwogroups.Opinion-detectionabilityandrelevance-determinationabilityseemedtobestronglycorrelated.Whilethebestsystemswereaboutequallygoodatdetectingnegativesentimentaspositivesentiment,systemsperformingatthemedianseemedtobeabitmoreeectiveatlocat-ingdocumentswithnegativesentiment.Mostparticipantsfollowedapipelinedapproach,where“rsttopicrelevancewastackled,andthenopiniondetectionwasappliedupontheresults.Perhapsthemostsur-prisingobservationwasthattheorganizersdiscoveredthatitwaspos-sibletoachieveverygoodrelativeperformancebyomittingthesecondphaseofthepipeline;butwetakeheartinthefactthatthe“eldisstillrelativelyyoungandhasroomtogrowandmature.TREC2007BlogTrack.TheTREC2007Blogtrackretainedtheopinionretrievaltaskandinstituteddeterminingthesentimentstatus(positive,negative,ormixed)oftheretrievedopinionsasasubtask.The2007and2006BlogTrackresultsareanalyzedinOunisetal.[228].Theyfoundthatlexicon-basedapproaches„eitherwherethediscriminativenessoftermswasdeterminedonlabeledtrainingdataorwherethetermsweremanuallycompiled„constitutedthemaineectiveapproaches.TREC2008BlogTrack.IntheTREC2008Blogtrack,thepolarity-identi“cationproblemwasre-posedasoneofrankingofpositive-polarityretrieveddocumentsbydegreeofpositivity,and,similarly,rankingofnegative-polarityretrieveddocumentsbydegreeofnegativ-ity.(MixedopinionateddocumentsŽwerenottobeincludedinthese7.2.2NTCIROpinion-RelatedCompetitionsTheNationalInstituteofInformatics(NII)runsannualmeetingscode-namedNTCIR(NIITestCollectionforInformationRetrievalSystems).OpinionanalysiswasfeaturedatanNTCIR-5workshop,andservedasapilottaskatNTCIR-6andafull-blowntaskatNTCIR-7. 7.2EvaluationCampaignsNTCIR-6opinionanalysispilottask.ThedatasetconsistsofnewswiredocumentsinChinese,Japanese,andEnglish;theorganiz-ersdescribethisaswhatwebelievetobethe“rstmultilingualopin-ionanalysisdatasetovercomparabledataŽ[93].Thefourconstituenttasks,intentionallydesignedtobefairlysimplesoastoencouragepar-ticipationfrommanygroups,wereasfollows:Detectionofopinionatedsentences.Detectionofopinionholders.(optional)Polaritylabelingofopinionatedsentencesaspos-itive,negative,orneutral.(optional)Detectionofsentencesrelevanttoagiventopic.Duetovariationinannotatorlabelings,twoevaluationstandardswerede“ned.Inthestrictevaluation,ananswerisconsideredcorrectifallthreeannotatorsagreedonit.Inthelenientevaluation,onlyamajority(i.e.,two)oftheannotatorswererequiredtoagreewithananswerforittobeconsideredcorrect.Sekietal.[267]giveanoverviewandtheresultsofthisevaluationexercise,notingthatdierencesbetweenlanguagesmakedirectcompar-isondicult,especiallysinceprecisionandrecallwerede“ned(slightly)dierentlyacrosslanguages.Ashortenedversionofthisoverviewalsoexists[93].NTCIR-7MultilingualopinionanalysistaskMOAT,2008.Subse-quenttotheNTCIR-6pilottask,anewdatasetwasselected,drawnfromblogsinJapanese,traditionalandsimpli“edChinese,andEnglish;accordingtotheorganizers,Weplantoselectandbalanceusefultop-icsforopinionminingresearchers,suchastopicsconcerningproductreviews,moviereviews,andsoon.ŽThisexerciseinvolvessixsubtasks:Detectionofopinionatedsentencesandopinionfragmentswithinopinionatedsentences.Polaritylabelingofopinionfragmentsaspositive,negativeorneutral.(optional)Strengthlabelingofopinionfragmentsasveryweak,average,orverystrong. PubliclyAvailableResources(optional)Detectionofopinionholders.(optional)Detectionofopiniontargets.(optional)DetectionofsentencesthatarerelevanttoagivenAsinthepreviouscompetition,bothstrictandlenientevaluationstan-dardsaretobeapplied.OpQACorpus[availablebyrequest]Stoyanovetal.[283]describestheconstructionofthiscorpus,whichisacollectionofopinionquestionsandanswerstogetherwith98documentsselectedfromtheMPQAdataset.7.3LexicalResourcesThefollowinglistisinalphabeticalorder.GeneralInquirer:http://www.wjh.harvard.edu/Thissiteprovidesentry-pointstovariousresourcesassociatedwiththeGeneralInquirer[281].Includedaremanually-classi“edtermslabeledwithvarioustypesofpositiveornegativesemanticorientation,andwordshavingtodowithagreementordisagreement.NTUSentimentDictionary[registrationrequired]ThissentimentdictionarylistingthepolaritiesofmanyChinesewordswasdevelopedbyacombinationofautomatedandmanualmeans[171].Aregistrationformforacquiringitisavailableathttp://nlg18.csie.ntu.edu.tw:8080/opinion/userform.jsp.OpinionFindersSubjectivityLexicon:http://www.cs.pitt.edu/mpqa/ThelistofsubjectivitycluesthatispartofOpinionFinderisavailablefordownload.Theseclueswerecompiledfromseveralsources,repre-sentingseveralyearsofeort,andwereusedinWilsonetal.[319]. 7.4Tutorials,Bibliographies,andOtherReferencesSentiWordnet:http://sentiwordnet.isti.cnr.it/SentiWordnet[91]isalexicalresourceforopinionmining.EachsynsetofWordNet[95],apubliclyavailablethesaurus-likeresource,isassignedoneofthreesentimentscores„positive,negative,orobjective„wherethesescoreswereautomaticallygeneratedusingasemi-supervisedmethoddescribedinEsuliandSebastiani[90].TaboadaandGrievesTurneyadjectivelist[availablethroughtheYahoo!sentimentAIgroup]Reportedarethesemantic-orientationvaluesaccordingtothemethodproposedbyTurney[298]for1700adjectives.7.4Tutorials,Bibliographies,andOtherReferencesBingLiuhasachapteronopinionmininginhisbookonWebdatamining[190].Slidesforthatchapterareavail-ableathttp://www.cs.uic.edu/liub/teach/cs583-spring-07/opinion-mining.pdf.SlidesforJanyceWiebestutorial,Semantics,opinion,andsen-timentintext,ŽattheEUROLAN2007SummmerSchoolareavail-ableathttp://www.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07wiebe.ppt.ThefollowingareonlinebibliographiesthatcontaininformationinBibTeXformat:http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html,themainwebsiteforthissurvey,http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html,maintainedbyAndreaEsuli,http://research.microsoft.com/jtsun/OpinionMiningPaperList.html,maintainedbyJian-TaoSun,http://www.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07bib.htmlwithactual.bib“leathttp:// PubliclyAvailableResourceswww.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07.bib,maintainedbyJanyceWiebe.EsuliandWiebessiteshaveadditionalsearchcapabilities.MembersoftheYahoo!groupsentimentAIŽ(http://tech.groups.yahoo.com/group/SentimentAI/)haveaccesstotheresourcesthathavebeencontributedthere(suchassomelinkstocorporaandpapers)andaresubscribedtotheassociatedmailinglist.Joiningisfree. 8 ConcludingRemarks Whenaskedhowheknewapiecewas“nished,heresponded,Whenthedinnerbellrings.Ž„apocryphalanecdoteaboutAlexanderCalderOurgoalinthissurveyhasbeentocovertechniquesandapproachesthatpromisetodirectlyenableopinion-orientedinformation-seekingsystems,andtoconveytothereaderasenseofourexcitementabouttheintellectualrichnessandbreadthofthearea.Weverymuchencouragethereadertotakeupthemanyopenchallengesthatremain,andhopewehaveprovidedsomeresourcesthatwillprovehelpfulinthisregard.Onthetopicofresources:wehavealreadyindicatedabovethatthebibliographicdatabaseusedinthissurveyispubliclyavailable.Infact,theURLmentionedabove,http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html,isourpersonallymain-tainedhomepageforthissurvey.Anysubsequenteditionsorversionsofthissurveythatmaybeproduced,orrelatednews,willbeannounced Indeed,wehavevagueaspirationstoproducingadirectorscutŽoneday.Wecertainlyhaveaccumulatedsomenumberofouttakes:wedidnotmanageto“ndawaytowork ConcludingRemarksSpeakingofresources,wehavedrawnconsiderablyonthoseofmanyothersduringthecourseofthiswork.Wethushaveanumberofsincereacknowledgmentstomake.ThissurveyisbaseduponworksupportedinpartbytheNationalScienceFoundationundergrantno.IIS-0329064,aCornellUniver-sityProvostsAwardforDistinguishedScholarship,aYahoo!ResearchAlliancegift,andanAlfredP.SloanResearchFellowship.Anyopin-ions,“ndings,andconclusionsorrecommendationsexpressedarethoseoftheauthorsanddonotnecessarilyre”ecttheviewsorocialpoli-cies,eitherexpressedorimplied,ofanysponsoringinstitutions,theUSgovernment,oranyotherentity.Wewouldliketowholeheartedlythanktheanonymousreferees,whoprovidedoutstandingfeedbackastonishinglyquickly.Theirinsightscontributedimmenselytothe“nalformofthissurveyonmanylev-els.Itishardtodescribeourlevelofgratitudetothemfortheirtimeandtheirwisdom,excepttosaythis:wehave,invariouscapacities,seenmanyexamplesofreviewinginthecommunity,butthisisthebestwehaveeverencountered.WealsothankEricBreckforhiscarefulreadingofandcommentaryonportionsofthissurvey.Allremainingerrorsandfaultsare,ofcourse,ourown.WearealsoverythankfultoFabrizioSebastiani,forallofhisedi-torialguidanceandcare.Weowehimagreatdebt.WealsogreatlyappreciatethehelpwereceivedfromJamieCallan,who,alongwithFabrizio,servesasEditorinChiefoftheFoundationsandTrendsinInformationRetrievalseries,andJamesFinlay,ofNowPublishers,thepublisherofthisseries.Finally,anumberofunexpectedhealthproblemsaroseinourfam-iliesduringthewritingofthissurvey.Despitethis,itwasourfamilieswhosustaineduswiththeircheerfulandunlimitedsupport(onmanylevels),nottheotherwayaround.Thus„toendonasentimentalnote„thisworkisdedicatedtothem. somevariantofOncemore,withfeelingŽintothetitle,orto“ndaplacefortheheadingSentimentofawoman,Žortoformallyproveapotentialundecidabilityresultforsubjec-tivitydetection(JonKleinberg,personalcommunication)basedonreviewsofBrotherhoodoftheWolf(itsthebestdarnedFrenchwerewolfkung-fumovieIveeverseenŽ). References [1]A.Abbasi,Aectintensityanalysisofdarkwebforums,ŽinProceedingsofIntelligenceandSecurityInformatics),pp.282…288,2007.[2]L.A.AdamicandN.Glance,Thepoliticalblogosphereandthe2004U.S.election:Dividedtheyblog,ŽinProceedingsofLinkKDD,2005.[3]A.AgarwalandP.Bhattacharyya,Sentimentanalysis:Anewapproachforeectiveuseoflinguisticknowledgeandexploitingsimilaritiesinasetofdocumentstobeclassi“ed,ŽinProceedingsoftheInternationalConferenceonNaturalLanguageProcessing(ICON),2005.[4]R.Agrawal,S.Rajagopalan,R.Srikant,andY.Xu,Miningnewsgroupsusingnetworksarisingfromsocialbehavior,ŽinProceedingsofWWW,pp.529…535,[5]E.M.Airoldi,X.Bai,andR.Padman,Markovblanketsandmeta-heuristicsearch:Sentimentextractionfromunstructuredtext,ŽLectureNotesinCom-puterScience,vol.3932(AdvancesinWebMiningandWebUsageAnalysis),pp.167…187,2006.[6]G.A.Akerlof,ThemarketforLemonsŽ:Qualityuncertaintyandthemarketmechanism,ŽTheQuarterlyJournalofEconomics,vol.84,pp.488…500,1970.[7]S.M.AlMasum,H.Prendinger,andM.Ishizuka,SenseNet:Alinguistictooltovisualizenumerical-valencebasedsentimentoftextualdata,ŽinProceed-ingsoftheInternationalConferenceonNaturalLanguageProcessing(ICON)pp.147…152,2007.(Posterpaper).[8]J.Allan,Introductiontotopicdetectionandtracking,ŽinTopicDetectionandTracking:Event-basedInformationOrganization,(J.Allan,ed.),pp.1…16,Norwell,MA,USA:KluwerAcademicPublishers,ISBN0-7923-7664-1,2002. [9]C.O.Alm,D.Roth,andR.Sproat,Emotionsfromtext:Machinelearningfortext-basedemotionprediction,ŽinProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[10]A.Anagnostopoulos,A.Z.Broder,andD.Carmel,Samplingsearch-engineWorldWideWeb,vol.9,pp.397…429,2006.[11]R.K.AndoandT.Zhang,Aframeworkforlearningpredictivestruc-turesfrommultipletasksandunlabeleddata,ŽJournalofMachineLearningResearch,vol.6,pp.1817…1853,2005.[12]A.AndreevskaiaandS.Bergler,MiningWordNetforafuzzysentiment:Sen-timenttagextractionfromWordNetglosses,ŽinProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[13]W.AntweilerandM.Z.Frank,Isallthattalkjustnoise?Theinforma-tioncontentofinternetstockmessageboards,ŽJournalofFinance,vol.59,pp.1259…1294,2004.[14]N.Archak,A.Ghose,andP.Ipeirotis,Showmethemoney!Derivingthepricingpowerofproductfeaturesbyminingconsumerreviews,ŽinProceeoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining,2007.[15]S.Argamon,ed.,ProceedingsoftheIJCAIWorkshoponDOINGITWITHSTYLE:ComputationalApproachestoStyleAnalysisandSynthesis.2003.[16]S.Argamon,J.Karlgren,andJ.G.Shanahan,eds.,ProceedingsoftheSIGIRWorkshoponStylisticAnalysisofTextForInformationAccess.ACM,2005.[17]S.Argamon,J.Karlgren,andO.Uzuner,eds.,ProceedingsoftheSIGIRWork-shoponStylisticsforTextRetrievalinPractice.ACM,2006.[18]S.Argamon-Engelson,M.Koppel,andG.Avneri,Style-basedtextcatego-rization:WhatnewspaperamIreading?ŽinProceedingsoftheAAAIWork-shoponTextCategorization,pp.1…4,1998.[19]Y.AttaliandJ.Burstein,Automatedessayscoringwithe-raterv.2,ŽofTechnology,Learning,andAssessment,vol.26,February2006.[20]A.AueandM.Gamon,Automaticidenti“cationofsentimentvocabulary:Exploitinglowassociationwithknownsentimentterms,ŽinProceedingsoftheACLWorkshoponFeatureEngineeringforMachineLearninginNaturalLanguageProcessing,2005.[21]A.AueandM.Gamon,Customizingsentimentclassi“erstonewdomains:Acasestudy,ŽinProceedingsofRecentAdvancesinNaturalLanguagePro-cessing(RANLP),2005.[22]B.AwerbuchandR.Kleinberg,Competitivecollaborativelearning,ŽinPro-ceedingsoftheConferenceonLearningTheory(COLT),pp.233…248,2005.(JournalversiontoappearinJournalofComputerandSystemSciences,spe-cialissueoncomputationallearningtheory).[23]P.BajariandA.Horta¸csu,Thewinnerscurse,reserveprices,andendogenousentry:EmpiricalinsightsfromeBayauctions,ŽRANDJournalofEconomicsvol.34,pp.329…355,2003.[24]P.BajariandA.Horta¸csu,Economicinsightsfrominternetauctions,ŽnalofEconomicLiterature,vol.42,pp.457…486,2004. [25]C.F.Baker,C.J.Fillmore,andJ.B.Lowe,TheBerkeleyFramenetProject,ŽProceedingsofCOLING/ACL,1998.[26]A.Ban“eld,UnspeakableSentences:NarrationandRepresentationintheLan-guageofFiction.RoutledgeandKeganPaul,1982.[27]M.Bansal,C.Cardie,andL.Lee,Thepowerofnegativethinking:Exploitinglabeldisagreementinthemin-cutclassi“cationframework,ŽinProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2008.(Posterpaper).[28]R.Bar-Haim,I.Dagan,B.Dolan,L.Ferro,D.Giampiccolo,B.Magnini,andI.Szpektor,ThesecondPASCALrecognisingtextualentailmentchallenge,ŽProceedingsoftheSecondPASCALChallengesWorkshoponTextualEntailment,2006.[29]R.BarzilayandL.Lee,Learningtoparaphrase:Anunsupervisedapproachusingmultiple-sequencealignment,ŽinProceedingsoftheJointHumanLan-guageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL),pp.16…23,2003.[30]R.BarzilayandK.McKeown,Extractingparaphrasesfromaparallelcorpus,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.50…57,2001.[31]S.Basuroy,S.Chatterjee,andS.A.Ravid,Howcriticalarecriticalreviews?Theboxoceeectsof“lmcritics,starpowerandbudgets,ŽJournalof,vol.67,pp.103…117,2003.[32]M.Bautin,L.Vijayarenu,andS.Skiena,Internationalsentimentanalysisfornewsandblogs,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[33]P.Beineke,T.Hastie,C.Manning,andS.Vaithyanathan,Exploringsen-timentsummarization,ŽinProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText,AAAItechnicalreportSS-04-07,2004.[34]F.Benamara,C.Cesarano,A.Picariello,D.Reforgiato,andV.S.Subrahma-nian,Sentimentanalysis:Adjectivesandadverbsarebetterthanadjectivesalone,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.(Shortpaper).[35]J.Berger,A.T.Sorensen,andS.J.Rasmussen,Negativepublicity:Whenisnegativeapositive?,ŽManuscript.PDF“leslastmodi“cationdate:October16,2007,:http://www.stanford.edu/asorense/papers/Negative Publicity.pdf,2007.[36]Y.Bestgen,C.Fairon,andL.Kerves,Unbarom`etreaectifeectif:Corpusder´ef´erenceetm´ethodepourd´eterminerlavalenceaectivedephrases,ŽinJourn´eesinternationalesdanalysestatistiquedesdonn´estextuelles(JADT)pp.182…191,2004.[37]S.Bethard,H.Yu,A.Thornton,V.Hatzivassiloglou,andD.Jurafsky,Auto-maticextractionofopinionpropositionsandtheirholders,ŽinProceeoftheAAAISpringSymposiumonExploringAttitudeandAectinText[38]D.Biber,VariationAcrossSpeechand.CambridgeUniversityPress, [39]D.M.Blei,A.Y.Ng,andM.I.Jordan,LatentDirichletallocation,ŽofMachineLearningResearch,vol.3,pp.993…1022,2003.[40]J.Blitzer,M.Dredze,andF.Pereira,Biographies,Bollywood,boom-boxesandblenders:Domainadaptationforsentimentclassi“cation,ŽinProceeoftheAssociationforComputationalLinguistics(ACL),2007.[41]S.R.K.Branavan,H.Chen,J.Eisenstein,andR.Barzilay,Learningdocument-levelsemanticpropertiesfromfree-textannotations,ŽinProceed-ingsoftheAssociationforComputationalLinguistics(ACL),2008.[42]E.BreckandC.Cardie,Playingthetelephonegame:Determiningthehier-archicalstructureofperspectiveandspeechexpressions,ŽinProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2004.[43]E.Breck,Y.Choi,andC.Cardie,Identifyingexpressionsofopinionincon-text,ŽinProceedingsoftheInternationalJointConferenceonArti“cialIntel-ligence(IJCAI),Hyderabad,India,2007.[44]S.BrinandL.Page,Theanatomyofalarge-scalehypertextualwebsearchengine,ŽinProceedingsofthe7thInternationalWorldWideWebConferencepp.107…117,1998.[45]R.F.BruceandJ.M.Wiebe,Recognizingsubjectivity:Acasestudyinmanualtagging,ŽNaturalLanguageEngineering,vol.5,1999.[46]J.K.Burgoon,J.P.Blair,T.Qin,andJ.F.Nunamaker,Jr.,Detectingdecep-tionthroughlinguisticanalysis,ŽinProceedingsofIntelligenceandSecurityInformatics(ISI),number2665inLectureNotesinComputerScience,p.958,[47]L.CabralandA.Horta¸csu,Thedynamicsofsellerreputation:TheoryandevidencefromeBay,ŽWorkingPaper,downloadedversionrevisedinMarch,2006,URLhttp://pages.stern.nyu.edu/lcabral/workingpapers/Cabral Mar06.pdf,2006.[48]J.Carbonell,SubjectiveUnderstanding:ComputerModelsofBeliefSystemsPhDthesis,Yale,1979.[49]C.Cardie,Empiricalmethodsininformationextraction,ŽAIMagazinevol.18,pp.65…79,1997.[50]C.Cardie,C.Farina,T.Bruce,andE.Wagner,UsingnaturallanguageprocessingtoimproveeRulemaking,ŽinProceedingsofDigitalGovernmentResearch(dg.o),2006.[51]C.Cardie,J.Wiebe,T.Wilson,andD.Litman,Combininglow-levelandsummaryrepresentationsofopinionsformulti-perspectivequestionanswer-ing,ŽinProceedingsoftheAAAISpringSymposiumonNewDirectionsinQuestionAnswering,pp.20…27,2003.[52]G.Carenini,R.Ng,andA.Pauls,Multi-documentsummarizationofeval-uativetext,ŽinProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),pp.305…312,2006.[53]G.Carenini,R.T.Ng,andA.Pauls,Interactivemultimediasummariesofevaluativetext,ŽinProceedingsofIntelligentUserInterfaces(IUI),pp.124…131,ACMPress,2006.[54]D.CartwrightandF.Harary,Structuralbalance:AgeneralizationofHeiderstheory,ŽPsychologicalReview,vol.63,pp.277…293,1956. [55]P.ChaovalitandL.Zhou,Moviereviewmining:Acomparisonbetweensupervisedandunsupervisedclassi“cationapproaches,ŽinProceedingsoftheHawaiiInternationalConferenceonSystemSciences(HICSS),2005.[56]P.-Y.S.Chen,S.-Y.Wu,andJ.Yoon,Theimpactofonlinerecommendationsandconsumerfeedbackonsales,ŽinInternationalConferenceonInformationSystems(ICIS),pp.711…724,2004.[57]Y.ChenandJ.Xie,Onlineconsumerreview:Word-of-mouthasanewelementofmarketingcommunicationmix,ŽManagementScience,vol.54,pp.477…491,2008.[58]P.Chesley,B.Vincent,L.Xu,andR.Srihari,Usingverbsandadjectivestoautomaticallyclassifyblogsentiment,ŽinAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.27…29,2006.[59]J.A.ChevalierandD.Mayzlin,Theeectofwordofmouthonsales:Onlinebookreviews,ŽJournalofMarketingResearch,vol.43,pp.345…354,August[60]Y.Choi,E.Breck,andC.Cardie,Jointextractionofentitiesandrelationsforopinionrecognition,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2006.[61]Y.Choi,C.Cardie,E.Rilo,andS.Patwardhan,Identifyingsourcesofopin-ionswithconditionalrandom“eldsandextractionpatterns,ŽinProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[62]E.K.Clemons,G.Gao,andL.M.Hitt,Whenonlinereviewsmeethyper-dierentiation:Astudyofthecraftbeerindustry,ŽJournalofManagementInformationSystems,vol.23,pp.149…171,2006.[63]comScore/theKelseygroup,Onlineconsumer-generatedreviewshavesig-ni“cantimpactonoinepurchasebehavior,ŽPressRelease,http://www.comscore.com/press/release.asp?press=1928,November2007.[64]J.G.ConradandF.Schilder,Opinionmininginlegalblogs,ŽinProceeoftheInternationalConferenceonArti“cialIntelligenceandLaw(ICAIL)pp.231…236,NewYork,NY,USA:ACM,2007.[65]W.B.CroftandJ.Laerty,eds.,Languagemodelingforinformationretrieval.Number13intheInformationRetrievalSeries.Kluwer/Springer,2003.[66]S.DasandM.Chen,Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards,ŽinProceedingsoftheAsiaPaci“cFinanceAssociationAnnualConference(APFA),2001.[67]S.R.DasandM.Y.Chen,Yahoo!forAmazon:SentimentextractionfromsmalltalkontheWeb,ŽManagementScience,vol.53,pp.1375…1388,2007.[68]S.R.Das,P.Tufano,andF.deAsisMartinez-Jerez,eInformation:Aclinicalstudyofinvestordiscussionandsentiment,ŽFinancialManagement,vol.34,pp.103…137,2005.[69]K.Dave,S.Lawrence,andD.M.Pennock,Miningthepeanutgallery:Opin-ionextractionandsemanticclassi“cationofproductreviews,ŽinProceeofWWW,pp.519…528,2003.[70]S.DavidandT.J.Pinch,Sixdegreesofreputation:Theuseandabuseofonlinereviewandrecommendationsystems,ŽFirstMonday,July2006.(SpecialIssueonCommercialApplicationsoftheInternet). [71]C.Dellarocas,Thedigitizationofword-of-mouth:Promiseandchallengesofonlinereputationsystems,ŽManagementScience,vol.49,pp.1407…1424,2003.(Specialissueone-businessandmanagementscience).[72]C.Dellarocas,X.Zhang,andN.F.Awad,Exploringthevalueofonlineproductratingsinrevenueforecasting:Thecaseofmotionpictures,ŽofInteractiveMarketing,vol.21,pp.23…45,2007.[73]A.DevittandK.Ahmad,Sentimentanalysisin“nancialnews:Acohesion-basedapproach,ŽinProceedingsoftheAssociationforComputationalLin-guistics(ACL),pp.984…991,2007.[74]M.Dewally,Internetinvestmentadvice:Investingwitharockofsalt,ŽcialAnalystsJournal,vol.59,pp.65…77,July/August2003.[75]M.DewallyandL.Ederington,Reputation,certi“cation,warranties,andinformationasremediesforseller-buyerinformationasymmetries:Lessonsfromtheonlinecomicbookmarket,ŽJournalofBusiness,vol.79,pp.693…730,March2006.[76]S.DewanandV.Hsu,Adverseselectioninelectronicmarkets:Evidencefromonlinestampauctions,ŽJournalofIndustrialEconomics,vol.52,pp.497…516,December2004.[77]D.W.Diamond,Reputationacquisitionindebtmarkets,ŽJournalofPolit-icalEconomy,vol.97,pp.828…862,1989.[78]X.Ding,B.Liu,andP.S.Yu,Aholisticlexicon-basedapproachtoopin-ionmining,ŽinProceedingsoftheConferenceonWebSearchandWebDataMining(WSDM),2008.[79]L.DiniandG.Mazzini,Opinionclassi“cationthroughinformationextrac-tion,ŽinProceedingsoftheConferenceonDataMiningMethodsandDatabasesforEngineering,FinanceandOtherFields(DataMining)pp.299…310,2002.[80]W.Duan,B.Gu,andA.B.Whinston,Doonlinereviewsmatter?„Anempiricalinvestigationofpaneldata,ŽSocialScienceResearchNetwork(SSRN)WorkingPaperSeries,http://ssrn.com/paper=616262,versionasofJanuary,2005.[81]D.H.Eaton,Valuinginformation:EvidencefromguitarauctionsoneBay,ŽJournalofAppliedEconomicsandPolicy,vol.24,pp.1…19,2005.[82]D.H.Eaton,Theimpactofreputationtimingandsourceonauctionout-TheB.E.JournalofEconomicAnalysisandPolicy,vol.7,2007.[83]M.Efron,Culturalorientation:Classifyingsubjectivedocumentsbycocia-tion[sic]analysis,ŽinProceedingsoftheAAAIFallSymposiumonStyleandMeaninginLanguage,Art,Music,andDesign,pp.41…48,2004.[84]K.EguchiandV.Lavrenko,Sentimentretrievalusinggenerativemodels,ŽProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.345…354,2006.[85]K.EguchiandC.Shah,Opinionretrievalexperimentsusinggenerativemod-els:ExperimentsfortheTREC2006blogtrack,ŽinProceedingsofTREC[86]P.Ekman,EmotionintheHumanFace.CambridgeUniversityPress,Seconded.,1982. [87]J.EliashbergandS.M.Shugan,Filmcritics:In”uencersorpredictors?,ŽJournalofMarketing,vol.61,pp.68…78,April1997.[88]C.Engstr¨TopicDependenceinSentimentClassi“cation.Mastersthesis,UniversityofCambridge,2004.[89]A.EsuliandF.Sebastiani,Determiningthesemanticorientationoftermsthroughglossanalysis,ŽinProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),2005.[90]A.EsuliandF.Sebastiani,Determiningtermsubjectivityandtermorien-tationforopinionmining,ŽinProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[91]A.EsuliandF.Sebastiani,SentiWordNet:Apubliclyavailablelexicalresourceforopinionmining,ŽinProceedingsofLanguageResourcesandEval-uation(LREC),2006.[92]A.EsuliandF.Sebastiani,PageRankingWordNetsynsets:Anapplicationtoopinionmining,ŽinProceedingsoftheAssociationforComputationalLin-guistics(ACL),2007.[93]D.K.Evans,L.-W.Ku,Y.Seki,H.-H.Chen,andN.Kando,Opinionanalysisacrosslanguages:AnoverviewofandobservationsfromtheNTCIR6opinionanalysispilottask,ŽinProceedingsoftheWorkshoponCross-LanguageInfor-mationProcessing,vol.4578(ApplicationsofFuzzySetsTheory)ofLectureNotesinComputerScience,pp.456…463,2007.[94]A.Fader,D.R.Radev,M.H.Crespin,B.L.Monroe,K.M.Quinn,andM.Colaresi,MavenRank:Identifyingin”uentialmembersoftheUSsenateusinglexicalcentrality,ŽinProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP),2007.[95]C.Fellbaum,ed.,Wordnet:AnElectronicLexicalDatabase.MITPress,1998.[96]D.Feng,E.Shaw,J.Kim,andE.Hovy,Learningtodetectconversationfocusofthreadeddiscussions,ŽinProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL)pp.208…215,2006.[97]A.FinnandN.Kushmerick,LearningtoclassifydocumentsaccordingtoJournaloftheAmericanSocietyforInformationScienceandTech-nology(JASIST),vol.7,2006.(Specialissueoncomputationalanalysisofstyle).[98]A.Finn,N.Kushmerick,andB.Smyth,Genreclassi“cationanddomaintransferforinformation“ltering,ŽinProceedingsofthe24thBCS-IRSGEuro-peanColloquiumonIRResearch:AdvancesinInformationRetrieval,number2291inLectureNotesinComputerScience,pp.353…362,Glasgow,2002.[99]P.W.Foltz,D.Laham,andT.K.Landauer,Automatedessayscoring:Appli-cationstoeducationtechnology,ŽinProceedingsofED-MEDIA,pp.939…944,[100]C.Forman,A.Ghose,andB.Wiesenfeld,Examiningtherelationshipbetweenreviewsandsales:Theroleofrevieweridentitydisclosureinelec-tronicmarkets,ŽInformationSystemsResearch,vol.19,2008.(Specialissueontheinterplaybetweendigitalandsocialnetworks). [101]G.Forman,AnextensiveempiricalstudyoffeatureselectionmetricsfortextJournalofMachineLearningResearch,vol.3,pp.1289…1305,[102]T.Fukuhara,H.Nakagawa,andT.Nishida,Understandingsentimentofpeoplefromnewsarticles:Temporalsentimentanalysisofsocialevents,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia,2007.[103]M.Gamon,Sentimentclassi“cationoncustomerfeedbackdata:Noisydata,largefeaturevectors,andtheroleoflinguisticanalysis,ŽinProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2004.[104]M.Gamon,A.Aue,S.Corston-Oliver,andE.Ringger,Pulse:Miningcus-tomeropinionsfromfreetext,ŽinProceedingsoftheInternationalSymposiumonIntelligentDataAnalysis(IDA),number3646inLectureNotesinCom-puterScience,pp.121…132,2005.[105]R.Ghani,K.Probst,Y.Liu,M.Krema,andA.Fano,Textminingforproductattributeextraction,ŽSIGKDDExplorationsNewsletter,vol.8,pp.41…48,[106]A.GhoseandP.G.Ipeirotis,Designingnovelreviewrankingsystems:Pre-dictingusefulnessandimpactofreviews,ŽinProceedingsoftheInternationalConferenceonElectronicCommerce(ICEC),2007.(Invitedpaper).[107]A.Ghose,P.G.Ipeirotis,andA.Sundararajan,Opinionminingusingecono-metrics:Acasestudyonreputationsystems,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),2007.[108]N.Godbole,M.Srinivasaiah,andS.Skiena,Large-scalesentimentanalysisfornewsandblogs,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.[109]A.B.GoldbergandX.Zhu,Seeingstarswhentherearentmanystars:Graph-basedsemi-supervisedlearningforsentimentcategorization,ŽinTextGraphs:HLT/NAACLWorkshoponGraph-basedAlgorithmsforNaturalLanguageProcessing,2006.[110]A.B.Goldberg,X.Zhu,andS.Wright,Dissimilarityingraph-basedsemi-supervisedclassi“cation,ŽinArti“cialIntelligenceandStatistics(AISTATS)[111]S.Greene,Spin:LexicalSemantics,Transitivity,andtheIdenti“cationofImplicitSentiment.PhDthesis,UniversityofMaryland,2007.[112]G.Grefenstette,Y.Qu,J.G.Shanahan,andD.A.Evans,Couplingnichebrowsersandaectanalysisforanopinionminingapplication,ŽinProceedingsofRecherchedInformationAssist´eeparOrdinateur(RIAO)[113]M.L.Gregory,N.Chinchor,P.Whitney,R.Carter,E.Hetzler,andA.Turner,User-directedsentimentanalysis:Visualizingtheaectivecontentofdocu-ments,ŽinProceedingsoftheWorkshoponSentimentandSubjectivityinTextpp.23…30,Sydney,Australia,July2006.[114]B.Gu,P.Konana,A.Liu,B.Rajagopalan,andJ.Ghosh,Predictivevalueofstockmessageboardsentiments,ŽMcCombsResearchPaperNo.IROM-11-06,versiondatedNovember,2006. [115]R.V.Guha,R.Kumar,P.Raghavan,andA.Tomkins,Propagationoftrustanddistrust,ŽinProceedingsofWWW,pp.403…412,2004.[116]B.A.Hagedorn,M.Ciaramita,andJ.Atserias,Worldknowledgeinbroad-coverageinformation“ltering,ŽinProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2007.(Posterpaper).[117]J.T.Hancock,L.Curry,S.Goorha,andM.Woodworth,Automatedlinguis-ticanalysisofdeceptiveandtruthfulsynchronouscomputer-mediatedcommu-nication,ŽinProceedingsoftheHawaiiInternationalConferenceonSystemSciences(HICSS),p.22c,2005.[118]L.Hankin,Theeectsofuserreviewsononlinepurchasingbehavioracrossmultipleproductcategories,ŽMasters“nalprojectreport,UCBerkeleySchoolofInformation,http://www.ischool.berkeley.edu/ report.pdf,May2007.[119]V.HatzivassiloglouandK.McKeown,Predictingthesemanticorientationofadjectives,ŽinProceedingsoftheJointACL/EACLConference,pp.174…181,[120]V.HatzivassiloglouandJ.Wiebe,Eectsofadjectiveorientationandgrad-abilityonsentencesubjectivity,ŽinProceedingsoftheInternationalConfer-enceonComputationalLinguistics(COLING),2000.[121]M.Hearst,Direction-basedtextinterpretationasaninformationaccessre“nement,ŽinText-BasedIntelligentSystems,(P.Jacobs,ed.),pp.257…274,LawrenceErlbaumAssociates,1992.[122]R.Higashinaka,M.Walker,andR.Prasad,Learningtogeneratenaturalisticutterancesusingreviewsinspokendialoguesystems,ŽACMTransactionsonSpeechandLanguageProcessing(TSLP),2007.[123]P.HitlinandL.Rainie,Theuseofonlinereputationandratingsystems,ŽPewInternet&AmericanLifeProjectMemo,October2004.[124]T.Homan,Onlinereputationmanagementishot„butisitethical?ŽComputerworld,February2008.[125]T.Hofmann,Probabilisticlatentsemanticindexing,ŽinProceedingsof,pp.50…57,1999.[126]D.HopkinsandG.King,Extractingsystematicsocialsciencemeaningfromtext,Ž.Manuscriptavailableathttp://gking.harvard.edu/“les/words.pdf,2007versionwastheonemostrecentlyconsulted,2007.[127]J.A.Horrigan,Onlineshopping,ŽPewInternet&AmericanLifeProjectReport,2008.[128]D.HouserandJ.Wooders,Reputationinauctions:Theory,andevi-dencefromeBay,ŽJournalofEconomicsandManagementStrategy,vol.15,pp.252…369,2006.[129]M.HuandB.Liu,Miningandsummarizingcustomerreviews,ŽinPro-ceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.168…177,2004.[130]M.HuandB.Liu,Miningopinionfeaturesincustomerreviews,ŽinProceed-ingsofAAAI,pp.755…760,2004.[131]M.Hu,A.Sun,andE.-P.Lim,Comments-orientedblogsummarizationbysentenceextraction,ŽinProceedingsoftheACMSIGIRConferenceon InformationandKnowledgeManagement(CIKM),pp.901…904,2007.(Posterpaper).[132]N.Hu,P.A.Pavlou,andJ.Zhang,Canonlinereviewsrevealaproductstruequality?:Empirical“ndingsandanalyticalmodelingofonlineword-of-mouthcommunication,ŽinProceedingsofElectronicCommerce(EC),pp.324…330,USA,NewYork,NY:ACM,2006.[133]A.HuettnerandP.Subasic,Fuzzytypingfordocumentmanagement,ŽinACL2000CompanionVolume:TutorialAbstractsandDemonstrationNotespp.26…27,2000.[134]M.HurstandK.Nigam,Retrievingtopicalsentimentsfromonlinedocumentcollections,ŽinDocumentgnitionandRetrievalXI,pp.27…34,2004.[135]C.Jacquemin,SpottingandDiscoveringTermsthroughNaturalLanguagePro-cessing.MITPress,2001.[136]G.JinandA.Kato,Price,qualityandreputation:Evidencefromanonline“eldexperiment,ŽTheRANDJournalofEconomics,vol.37,2006.[137]X.Jin,Y.Li,T.Mah,andJ.Tong,Sensitivewebpageclassi“cationforcontentadvertising,ŽinProceedingsoftheInternationalWorkshoponDataMiningandAudienceIntelligenceforAdvertising,2007.[138]N.JindalandB.Liu,Identifyingcomparativesentencesintextdocuments,ŽProceedingsoftheACMSpecialInterestGrouponInformationRetrieval,2006.[139]N.JindalandB.Liu,Miningcomparativesentencesandrelations,ŽinPro-ceedingsofAAAI,2006.[140]N.JindalandB.Liu,Reviewspamdetection,ŽinProceedingsofWWW2007.(Posterpaper).[141]N.JindalandB.Liu,Opinionspamandanalysis,ŽinProceedingsoftheConferenceonWebSearchandWebDataMining(WSDM),pp.219…230,[142]N.KajiandM.Kitsuregawa,Automaticconstructionofpolarity-taggedcor-pusfromHTMLdocuments,ŽinProceedingsoftheCOLING/ACLMainCon-ferencePosterSessions,2006.[143]N.KajiandM.Kitsuregawa,BuildinglexiconforsentimentanalysisfrommassivecollectionofHTMLdocuments,ŽinProceedingsoftheJointCon-ferenceonEmpiricalMethodsinNaturalLanguageProcessingandCom-putationalNaturalLanguageLearning(EMNLP-CoNLL),pp.1075…1083,[144]A.Kale,A.Karandikar,P.Kolari,A.Java,T.Finin,andA.Joshi,Modelingtrustandin”uenceintheblogosphereusinglinkpolarity,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.(Shortpaper).[145]K.KalyanamandS.H.McIntyre,Theroleofreputationinonlineauctionmarkets,ŽSantaClaraUniversityWorkingPaper02/03-10-WP,2001,datedJune26.[146]J.Kamps,M.Marx,R.J.Mokken,andM.deRijke,UsingWordNettomeasuresemanticorientationofadjectives,ŽinProceedingsofLREC,2004. [147]S.D.Kamvar,M.T.Schlosser,andH.Garcia-Molina,TheEigentrustalgo-rithmforreputationmanagementinP2Pnetworks,ŽinProceedingsofWWWpp.640…651,NewYork,NY,USA:ACM,ISBN1-58113-680-3,2003.[148]H.KanayamaandT.Nasukawa,Fullyautomaticlexiconexpansionfordomain-orientedsentimentanalysis,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),(Sydney,Australia),pp.355…363,July2006.[149]M.Kantrowitz,Methodandapparatusforanalyzingaectandemotionintext,ŽU.S.Patent6622140,Patent“ledinNovember2000,2003.[150]J.KarlgrenandD.Cutting,Recognizingtextgenreswithsimplemetricsusingdiscriminantanalysis,ŽinProceedingsofCOLING,pp.1071…1075,1994.[151]Y.Kawai,T.Kumamoto,andK.Tanaka,Fairnewsreader:Recommend-ingnewsarticleswithdierentsentimentsbasedonuserpreference,ŽinPro-ceedingsofKnowledge-BasedIntelligentInformationandEngineeringSystemsnumber4692inLectureNotesinComputerScience,pp.612…622,[152]A.KennedyandD.Inkpen,Sentimentclassi“cationofmoviereviewsusingcontextualvalenceshifters,ŽComputationalIntelligence,vol.22,pp.110…125,[153]B.Kessler,G.Nunberg,andH.Sch¨utze,Automaticdetectionoftextgenre,ŽProceedingsoftheThirty-FifthAnnualMeetingoftheAssociationforCom-putationalLinguisticsandEighthConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,pp.32…38,1997.[154]P.Kim,Theforresterwave:Brandmonitoring,Q32006,ŽForresterWave(whitepaper),2006.[155]S.-M.KimandE.Hovy,Determiningthesentimentofopinions,ŽinPro-ceedingsoftheInternationalConferenceonComputationalLinguistics(COL-,2004.[156]S.-M.KimandE.Hovy,Automaticdetectionofopinionbearingwordsandsentences,ŽinCompanionVolumetothePdingsoftheInternationalJointConferenceonNaturalLanguageProcessing(IJCNLP),2005.[157]S.-M.KimandE.Hovy,Identifyingopinionholdersforquestionansweringinopiniontexts,ŽinProceedingsoftheAAAIWorkshoponQuestionAnsweringinRestrictedDomains,2005.[158]S.-M.KimandE.Hovy,Automaticidenti“cationofproandconreasonsinonlinereviews,ŽinProceedingsoftheCOLING/ACLMainConferencePoster,pp.483…490,2006.[159]S.-M.KimandE.Hovy,Identifyingandanalyzingjudgmentopinions,ŽinProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChap-teroftheACLConference(HLT-NAACL),2006.[160]S.-M.KimandE.Hovy,Crystal:Analyzingpredictiveopinionsontheweb,ŽProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLan-guageProcessingandComputationalNaturalLanguageLearning(EMNLP-,2007.[161]S.-M.Kim,P.Pantel,T.Chklovski,andM.Pennacchiotti,Automaticallyassessingreviewhelpfulness,ŽinProceedingsoftheConferenceonEmpirical MethodsinNaturalLanguageProcessing(EMNLP),pp.423…430,Sydney,Australia,July2006.[162]B.KleinandK.Leer,Theroleofmarketforcesinassuringcontractualperformance,ŽJournalofPoliticalEconomy,vol.89,pp.615…641,1981.[163]J.Kleinberg,Authoritativesourcesinahyperlinkedenvironment,ŽinPro-ceedingsofthe9thACM-SIAMSymposiumonDiscreteAlgorithms(SODA)pp.668…677,1998.(ExtendedversioninJournaloftheACM,46:604…632,[164]J.KleinbergandE.Tardos,Approximationalgorithmsforclassi“cationprob-lemswithpairwiserelationships:MetriclabelingandMarkovrandom“elds,ŽJournaloftheACM,vol.49,pp.616…639,ISSN0004-5411,2002.[165]J.KleinbergandE.Tardos,AlgorithmDesign.AddisonWesley,2006.[166]N.Kobayashi,K.Inui,Y.Matsumoto,K.Tateishi,andT.Fukushima,Col-lectingevaluativeexpressionsforopinionextraction,ŽinProceedingsoftheInternationalJointConferenceonNaturalLanguageProcessing(IJCNLP)[167]M.KoppelandJ.Schler,Theimportanceofneutralexamplesforlearningsentiment,ŽinWorkshopontheAnalysisofInformalandFormalInformationExchangeDuringNegotiations(FINEXIN),2005.[168]M.KoppelandI.Shtrimberg,Goodnewsorbadnews?Letthemarketdecide,ŽinProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText:TheoriesandApplications,pp.86…88,2004.[169]L.-W.Ku,L.-Y.Li,T.-H.Wu,andH.-H.Chen,Majortopicdetectionanditsapplicationtoopinionsummarization,ŽinProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),pp.627…628,2005.(Posterpaper).[170]L.-W.Ku,Y.-T.Liang,andH.-H.Chen,Opinionextraction,summarizationandtrackinginnewsandblogcorpora,ŽinAAAISymposiumonComputa-tionalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.100…107,2006.[171]L.-W.Ku,Y.-T.Liang,andH.-H.Chen,Taggingheterogeneousevaluationcorporaforopinionatedtasks,ŽinConferenceonLanguageResourcesandEvaluation(LREC),2006.[172]L.-W.Ku,Y.-S.Lo,andH.-H.Chen,Testcollectionselectionandgoldstan-dardgenerationforamultiply-annotatedopinioncorpus,ŽinProceedingsoftheACLDemoandPosterSessions,pp.89…92,2007.[173]T.KudoandY.Matsumoto,Aboostingalgorithmforclassi“cationofsemi-structuredtext,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2004.[174]S.Kurohashi,K.Inui,andY.Kato,eds.,WorkshoponInformationCredibilityontheWeb,2007.[175]N.Kwon,S.Shulman,andE.Hovy,MultidimensionaltextanalysisforeRule-making,ŽinProceedingsofDigitalGovernmentResearch(dg.o),2006.[176]J.Laerty,A.McCallum,andF.Pereira,Conditionalrandom“elds:Proba-bilisticmodelsforsegmentingandlabelingsequencedata,ŽinProceedingsof,pp.282…289,2001. [177]J.D.LaertyandC.Zhai,Documentlanguagemodels,querymodels,andriskminimizationforinformationretrieval,ŽinProceedingsofSIGIRpp.111…119,2001.[178]M.Laver,K.Benoit,andJ.Garry,Extractingpolicypositionsfrompolit-icaltextsusingwordsasdata,ŽAmericanPoliticalScienceReview,vol.97,pp.311…331,2003.[179]V.LavrenkoandW.BruceCroft,Relevance-basedlanguagemodels,ŽinProceedingsofSIGIR,pp.120…127,2001.[180]C.G.LawsonandV.C.Slawson,Reputationinaninternetauctionmarket,ŽEconomicInquiry,vol.40,pp.533…650,2002.[181]L.Lee,ImsorryDave,ImafraidIcantdothatŽ:Linguistics,statistics,andnaturallanguageprocessingcirca2001,ŽinComputerScience:Re”ectionsontheField,Re”ectionsfromtheField,(CommitteeontheFundamentalsofComputerScience:ChallengesandOpportunities,ComputerScienceandTelecommunicationsBoard,NationalResearchCouncil,ed.),pp.111…118,TheNationalAcademiesPress,2004.[182]Y.-B.LeeandS.H.Myaeng,Textgenreclassi“cationwithgenre-revealingandsubject-revealingfeatures,ŽinProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2002.[183]D.LeinweberandA.Madhavan,Threehundredyearsofstockmarketmanip-JournalofInvesting,vol.10,pp.7…16,Summer2001.[184]H.LiandK.Yamanishi,Miningfromopenanswersinquestionnairedata,ŽProceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.443…449,2001.(JournalversioninIEEEIntelligentvol.17,no.5,pp.58…63,2002).[185]Y.Li,Z.Zheng,andH.Dai,KDDCUP-2005report:Facingagreatchal-SIGKDDExplorations,vol.7,pp.91…99,2005.[186]W.-H.LinandA.Hauptmann,Arethesedocumentswrittenfromdierentperspectives?Atestofdierentperspectivesbasedonstatisticaldistributiondivergence,ŽinProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING)/PdingsoftheAssociationforComputationalLin-guistics(ACL),pp.1057…1064,Sydney,Australia:AssociationforComputa-tionalLinguistics,July2006.[187]W.-H.Lin,T.Wilson,J.Wiebe,andA.Hauptmann,Whichsideareyouon?Identifyingperspectivesatthedocumentandsentencelevels,ŽinProceeoftheConferenceonNaturalLanguageLearning(CoNLL),2006.[188]J.Liscombe,G.Riccardi,andD.Hakkani-T¨ur,Usingcontexttoimproveemotiondetectioninspokendialogsystems,Žinpeech,pp.1845…1848,[189]L.V.Lita,A.H.Schlaikjer,W.Hong,andE.Nyberg,Qualitativedimensionsinquestionanswering:Extendingthede“nitionalQAtask,ŽinProceedingsof,pp.1616…1617,2005.(Studentabstract).[190]B.Liu,Webdatamining;Exploringhyperlinks,contents,andusagedata,ŽOpinionMining.Springer,2006.[191]B.Liu,M.Hu,andJ.Cheng,Opinionobserver:Analyzingandcomparingopinionsontheweb,ŽinProceedingsofWWW,2005. [192]H.Liu,H.Lieberman,andT.Selker,Amodeloftextualaectsensingusingreal-worldknowledge,ŽinProceedingsofIntelligentUserInterfaces(IUI)pp.125…132,2003.[193]J.Liu,Y.Cao,C.-Y.Lin,Y.Huang,andM.Zhou,Low-qualityprod-uctreviewdetectioninopinionsummarization,ŽinProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning(EMNLP-CoNLL),pp.334…342,2007.(Posterpaper).[194]Y.Liu,Word-of-mouthformovies:Itsdynamicsandimpactonboxocerevenue,ŽJournalofMarketing,vol.70,pp.74…89,2006.[195]Y.Liu,J.Huang,A.An,andX.Yu,ARSA:Asentiment-awaremodelforpredictingsalesperformanceusingblogs,ŽinProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2007.[196]J.A.Livingston,Howvaluableisagoodreputation?Asampleselectionmodelofinternetauctions,ŽTheReviewofEconomicsandStatistics,vol.87,pp.453…465,August2005.[197]L.Lloyd,D.Kechagias,andS.Skiena,Lydia:Asystemforlarge-scalenewsanalysis,ŽinProceedingsofStringProcessingandInformationRetrievalnumber3772inLectureNotesinComputerScience,pp.161…166,[198]D.Lucking-Reiley,D.Bryan,N.Prasad,andD.Reeves,PenniesfromeBay:Thedeterminantsofpriceinonlineauctions,ŽJournalofIndustrialEco-,vol.55,pp.223…233,2007.[199]C.MacdonaldandI.Ounis,TheTRECBlogs06collection:Creatingandanalysingablogtestcollection,ŽTechnicalReportTR-2006-224,DepartmentofComputerScience,UniversityofGlasgow,2006.[200]Y.MaoandG.Lebanon,Sequentialmodelsforsentimentprediction,ŽinICMLWorkshoponLearninginStructuredOutputSpaces,2006.[201]Y.MaoandG.Lebanon,Isotonicconditionalrandom“eldsandlocalsenti-ment”ow,ŽinAdvancesinNeuralInformationProcessingSystems,2007.[202]L.W.MartinandG.Vanberg,Arobusttransformationprocedureforinter-pretingpoliticaltext,ŽPoliticalAnalysis,vol.16,pp.93…100,2008.[203]H.MasumandY.-C.Zhang,Manifestoforthereputationsociety,Ž,vol.9,2004.[204]S.Matsumoto,H.Takamura,andM.Okumura,Sentimentclassi“cationusingwordsub-sequencesanddependencysub-trees,ŽinProceedingsofPAKDD05,the9thPaci“c-AsiaConferenceonAdvancesinKnowledgeDiscoveryandDataMining,2005.[205]R.McDonald,K.Hannan,T.Neylon,M.Wells,andJ.Reynar,Structuredmodelsfor“ne-to-coarsesentimentanalysis,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.432…439,Prague,CzechRepublic:AssociationforComputationalLinguistics,June2007.[206]Q.Mei,X.Ling,M.Wondra,H.Su,andC.X.Zhai,Topicsentimentmixture:Modelingfacetsandopinionsinweblogs,ŽinProceedingsofWWW,pp.171…180,NewYork,NY,USA:ACMPress,2007.(ISBN978-1-59593-654-7). [207]M.I.MelnikandJ.Alm,DoesasellerseCommercereputationmatter?Evi-dencefromeBayauctions,ŽJournalofIndustrialEconomics,vol.50,pp.337…349,2002.[208]M.I.MelnikandJ.Alm,Sellerreputation,informationsignals,andpricesforheterogeneouscoinsoneBay,ŽSouthernEconomicJournal,vol.72,pp.305…328,2005.[209]R.Mihalcea,C.Banea,andJ.Wiebe,Learningmultilingualsubjectivelan-guageviacross-lingualprojections,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.976…983,Prague,CzechRepublic,June[210]R.MihalceaandC.Strapparava,Learningtolaugh(automatically):Com-putationalmodelsforhumorrecognition,ŽJournalofComputationalIntelli-gence,2006.[211]G.MishneandM.deRijke,Capturingglobalmoodlevelsusingblogposts,ŽAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.145…152,2006.[212]G.MishneandM.deRijke,Moodviews:Toolsforblogmoodanalysis,ŽAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.153…154,2006.[213]G.MishneandM.deRijke,Astudyofblogsearch,ŽinProceedingsoftheEuropeanConferenceonInformationRetrievalResearch(ECIR),2006.[214]G.MishneandN.Glance,Predictingmoviesalesfrombloggersentiment,ŽAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.155…158,2006.[215]S.Morinaga,K.Yamanishi,K.Tateishi,andT.Fukushima,MiningproductreputationsontheWeb,ŽinProceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.341…349,2002.(Industrytrack).[216]F.MostellerandD.L.Wallace,AppliedBayesianandClassicalInference:TheCaseoftheFederalistPapers.Springer-Verlag,1984.[217]T.MullenandN.Collier,Sentimentanalysisusingsupportvectormachineswithdiverseinformationsources,ŽinProceedingsoftheConferenceonEmpir-icalMethodsinNaturalLanguageProcessing(EMNLP),pp.412…418,July2004.(Posterpaper).[218]T.MullenandR.Malouf,Takingsides:Userclassi“cationforinformalonlinepoliticaldiscourse,ŽInternetResearch,vol.18,pp.177…190,2008.[219]T.MullenandR.Malouf,Apreliminaryinvestigationintosentimentanalysisofinformalpoliticaldiscourse,ŽinAAAISymposiumonCompu-tationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.159…162,[220]J.-C.Na,H.Sui,C.Khoo,S.Chan,andY.Zhou,Eectivenessofsimplelin-guisticprocessinginautomaticsentimentclassi“cationofproductreviews,ŽinConferenceoftheInternationalSocietyforKnowledgeOrganization(ISKO)pp.49…54,2004.[221]T.NasukawaandJ.Yi,Sentimentanalysis:Capturingfavorabilityusingnaturallanguageprocessing,ŽinProceedingsoftheConferenceonKnowledgeCapture(K-CAP),2003. [222]V.Ng,S.Dasgupta,andS.M.N.Ari“n,Examiningtheroleoflinguis-ticknowledgesourcesintheautomaticidenti“cationandclassi“cationofreviews,ŽinProceedingsoftheCOLING/ACLMainConferencePosterSes-,pp.611…618,Sydney,Australia:AssociationforComputationalLinguis-tics,July2006.[223]X.Ni,G.-R.Xue,X.Ling,Y.Yu,andQ.Yang,Exploringintheweblogspacebydetectinginformativeandaectivearticles,ŽinProceedingsofWWW,2007.(Industrialpracticeandexperiencetrack).[224]N.Nicolov,F.Salvetti,M.Liberman,andJ.H.Martin,eds.,AAAISym-posiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW)AAAIPress,2006.[225]K.NigamandM.Hurst,Towardsarobustmetricofpolarity,ŽinAttitudeandAectinText:TheoriesandApplications,number20inInformationRetrievalSeries,(J.G.Shanahan,Y.Qu,andJ.Wiebe,eds.),[226]Y.Niu,X.Zhu,J.Li,andG.Hirst,Analysisofpolarityinformationinmedicaltext,ŽinProceedingsoftheAmericanMedicalInformaticsAssociation2005AnnualSymposium,2005.[227]I.Ounis,M.deRijke,C.Macdonald,G.Mishne,andI.Soboro,OverviewoftheTREC-2006blogtrack,ŽinProceedingsofthe15thTextRetrievalCon-ference(TREC),2006.[228]I.Ounis,C.Macdonald,andI.Soboro,OntheTRECblogtrack,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia,2008.[229]S.Owsley,S.Sood,andK.J.Hammond,Domainspeci“caectiveclassi“-cationofdocuments,ŽinAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.181…183,2006.[230]M.Palmer,D.Gildea,andP.Kingsbury,Thepropositionbank:Acorpusannotatedwithsemanticroles,ŽComputationalLinguistics,vol.31,March[231]B.Pang,K.Knight,andD.Marcu,Syntax-basedalignmentofmultipletrans-lations:Extractingparaphrasesandgeneratingnewsentences,ŽinProceeofHLT/NAACL,2003.[232]B.PangandL.Lee,Asentimentaleducation:Sentimentanalysisusingsub-jectivitysummarizationbasedonminimumcuts,ŽinProceedingsoftheAsso-ciationforComputationalLinguistics(ACL),pp.271…278,2004.[233]B.PangandL.Lee,Seeingstars:Exploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscales,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.115…124,2005.[234]B.PangandL.Lee,Usingverysimplestatisticsforreviewsearch:Anexplo-ration,ŽinProceedingsoftheInternationalConferenceonComputationalLin-guistics(COLING),2008.(Posterpaper).[235]B.Pang,L.Lee,andS.Vaithyanathan,Thumbsup?Sentimentclassi“ca-tionusingmachinelearningtechniques,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.79…86, [236]D.-H.Park,J.Lee,andI.Han,Theeectofon-lineconsumerreviewsonconsumerpurchasingintention:Themoderatingroleofinvolvement,ŽInternationalJournalofElectronicCommerce,vol.11,pp.125…148,(ISSN1086-4415),2007.[237]P.A.PavlouandA.Dimoka,Thenatureandroleoffeedbacktextcommentsinonlinemarketplaces:Implicationsfortrustbuilding,pricepremiums,andsellerdierentiation,ŽInformationSystemsResearch,vol.17,pp.392…414,[238]S.Piao,S.Ananiadou,Y.Tsuruoka,Y.Sasaki,andJ.McNaught,Miningopinionpolarityrelationsofcitations,ŽinInternationalWorkshoponCom-putationalSemantics(IWCS),pp.366…371,2007.(Shortpaper).[239]R.Picard,AectiveComputing.MITPress,1997.[240]T.PinchandK.Athanasiades,ACIDplanet:Astudyofusersofanon-linemusiccommunity,Ž2005.http://sts.nthu.edu.tw/sts camp/“les/ACIDplanet%20by%20Trevor%20Pinch.ppt,Presentedatthe50thSocietyforEthnomu-sicology(SEM)conference.[241]G.PinskiandF.Narin,Citationin”uenceforjournalaggregatesofscienti“cpublications:Theory,withapplicationtotheliteratureofphysics,ŽtionProcessingandManagement,vol.12,pp.297…312,1976.[242]L.PolanyiandA.Zaenen,Contextuallexicalvalenceshifters,ŽinProceeoftheAAAISpringSymposiumonExploringAttitudeandAectinTextAAAItechnicalreportSS-04-07,2004.[243]J.M.PonteandW.BruceCroft,Alanguagemodelingapproachtoinforma-tionretrieval,ŽinProceedingsofSIGIR,pp.275…281,1998.[244]A.-M.PopescuandO.Etzioni,Extractingproductfeaturesandopinionsfromreviews,ŽinProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[245]R.Quirk,S.Greenbaum,G.Leech,andJ.Svartvik,AcomprehensivegrammaroftheEnglishlanguage.Longman,1985.[246]D.Radev,T.Allison,S.Blair-Goldensohn,J.Blitzer,A.C¸elebi,S.Dimitrov,E.Drabek,A.Hakim,W.Lam,D.Liu,J.Otterbacher,H.Qi,H.Saggion,S.Teufel,M.Topper,A.Winkel,andZ.Zhang,MEAD„Aplatformformultidocumentmultilingualtextsummarization,ŽinConferenceonLanguageResourcesandEvaluation(LREC),Lisbon,Portugal,May2004.[247]D.R.Radev,E.Hovy,andK.McKeown,Introductiontothespecialissueonsummarization,ŽComputationalLinguistics,vol.28,pp.399…408,(ISSN0891-2017),2002.[248]L.RainieandJ.Horrigan,Election2006online,ŽPewInternet&AmericanLifeProjectReport,January2007.[249]J.Read,Usingemoticonstoreducedependencyinmachinelearningtech-niquesforsentimentclassi“cation,ŽinProceedingsoftheACLStudentResearchWorkshop,2005.[250]D.A.ReinsteinandC.M.Snyder,Thein”uenceofexpertreviewsoncon-sumerdemandforexperiencegoods:Acasestudyofmoviecritics,ŽofIndustrialEconomics,vol.53,pp.27…51,2005. [251]E.ReiterandR.Dale,BuildingNaturalLanguageGenerationSystems.Cam-bridge,2000.[252]P.Resnick,K.Kuwabara,R.Zeckhauser,andE.Friedman,ReputationCommunicationsoftheAssociationforComputingMachinery(CACM),vol.43,pp.45…48,(ISSN0001-0782),2000.[253]P.Resnick,R.Zeckhauser,J.Swanson,andK.Lockwood,Thevalueofrep-utationoneBay:Acontrolledexperiment,ŽExperimentalEconomics,vol.9,pp.79…101,2006.[254]E.Rilo,S.Patwardhan,andJ.Wiebe,Featuresubsumptionforopinionanalysis,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2006.[255]E.RiloandJ.Wiebe,Learningextractionpatternsforsubjectiveexpres-sions,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2003.[256]E.Rilo,J.Wiebe,andW.Phillips,Exploitingsubjectivityclassi“cationtoimproveinformationextraction,ŽinProceedingsofAAAI,pp.1106…1111,[257]E.Rilo,J.Wiebe,andT.Wilson,Learningsubjectivenounsusingextrac-tionpatternbootstrapping,ŽinProceedingsoftheConferenceonNaturalLan-guageLearning(CoNLL),pp.25…32,2003.[258]E.Rogers,DiusionofInnovations.FreePress,NewYork,1962.(ISBN0743222091.Fiftheditiondated2003).[259]S.Rosen,Hedonicpricesandimplicitmarkets:Productdierentiationinpurecompetition,ŽTheJournalofPoliticalEconomy,vol.82,pp.34…55,Jan…Feb[260]D.RothandW.Yih,Probabilisticreasoningforentityandrelationrecogni-tion,ŽinProceedingsoftheInternationalConferenceonComputationalLin-guistics(COLING),2004.[261]V.L.RubinandE.D.Liddy,Assessingcredibilityofweblogs,ŽinAAAISym-posiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW)pp.187…190,2006.[262]W.Sack,Onthecomputationofpointofview,ŽinProceedingsofAAAIp.1488,1994.(Studentabstract).[263]F.Sebastiani,Machinelearninginautomatedtextcategorization,ŽACMComputingSurveys,vol.34,pp.1…47,2002.[264]Y.Seki,K.Eguchi,andN.Kando,Analysisofmulti-documentviewpointsummarizationusingmulti-dimensionalgenres,ŽinProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText:TheoriesandApplications,pp.142…145,2004.[265]Y.Seki,K.Eguchi,N.Kando,andM.Aono,Multi-documentsummarizationwithsubjectivityanalysisatDUC2005,ŽinProceedingsoftheDocumentUnderstandingConference(DUC),2005.[266]Y.Seki,K.Eguchi,N.Kando,andM.Aono,Opinion-focusedsummarizationanditsanalysisatDUC2006,ŽinProceedingsoftheDocumentUnderstandingConference(DUC),pp.122…130,2006.[267]Y.Seki,D.KirkEvans,L.-W.Ku,H.-H.Chen,N.Kando,andC.-Y.Lin,OverviewofopinionanalysispilottaskatNTCIR-6,ŽinProceedingsofthe WorkshopMeetingoftheNationalInstituteofInformatics(NII)TestCollec-tionforInformationRetrievalSystems(NTCIR),pp.265…278,2007.[268]C.Shapiro,Consumerinformation,productquality,andsellerreputation,ŽBellJournalofEconomics,vol.13,pp.20…35,1982.[269]C.Shapiro,Premiumsforhighqualityproductsasreturnstoreputations,ŽQuarterlyJournalofEconomics,vol.98,pp.659…680,1983.[270]B.Shneiderman,Treevisualizationwithtree-maps:2-dspace-“llingapproach,ŽACMTransactionsonGraphics,vol.11,pp.92…99,1992.[271]S.Shulman,J.Callan,E.Hovy,andS.Zavestoski,Languageprocessingtech-nologiesforelectronicrulemaking:Aprojecthighlight,ŽinProceedingsofDig-italGovernmentResearch(dg.o),pp.87…88,2005.[272]B.SnyderandR.Barzilay,MultipleaspectrankingusingtheGoodGriefalgorithm,ŽinProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL),pp.300…307,2007.[273]S.Somasundaran,J.Ruppenhofer,andJ.Wiebe,Detectingarguingandsentimentinmeetings,ŽinProceedingsoftheSIGdialWorkshoponDiscourseandDialogue,2007.[274]S.Somasundaran,T.Wilson,J.Wiebe,andV.Stoyanov,QAwithattitude:Exploitingopiniontypeanalysisforimprovingquestionansweringinon-linediscussionsandthenews,ŽinProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.[275]X.Song,Y.Chi,K.Hino,andB.Tseng,Identifyingopinionleadersintheblogosphere,ŽinProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),pp.971…974,2007.[276]E.Spertus,Smokey:Automaticrecognitionofhostilemessages,ŽinProceed-ingsofInnovativeApplicationsofArti“cialIntelligence(IAAI),pp.1058…1065,1997.[277]E.Stamatatos,N.Fakotakis,andG.Kokkinakis,Textgenredetectionusingcommonwordfrequencies,ŽinProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2000.[278]S.S.Standi“rd,Reputationande-commerce:eBayauctionsandtheasym-metricalimpactofpositiveandnegativeratings,ŽJournalofManagementvol.27,pp.279…295,2001.[279]A.StepinskiandV.Mittal,Afact/opinionclassi“erfornewsarticles,ŽProceedingsoftheACMSpecialInterestGrouponInformationRetrieval,pp.807…808,NewYork,NY,USA:ACMPress,2007.(ISBN978-1-[280]B.StoneandM.Richtel,Thehandthatcontrolsthesockpuppetcouldgetslapped,ŽTheNewYorkTimes,July162007.[281]P.J.Stone,TheGeneralInquirer:AComputerApproachtoContentAnalysisTheMITPress,1966.[282]V.StoyanovandC.Cardie,Partiallysupervisedcoreferenceresolutionforopinionsummarizationthroughstructuredrulelearning,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)pp.336…344,Sydney,Australia:AssociationforComputationalLinguistics,July2006. [283]V.Stoyanov,C.Cardie,D.Litman,andJ.Wiebe,Evaluatinganopin-ionannotationschemeusinganewmulti-perspectivequestionandanswercorpus,ŽinProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText,AAAITechnicalReportSS-04-07.[284]V.Stoyanov,C.Cardie,andJ.Wiebe,Multi-perspectivequestionansweringusingtheOpQAcorpus,ŽinProceedingsoftheHumanLanguageTechnol-ogyConferenceandtheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(HLT/EMNLP),pp.923…930,Vancouver,BritishColumbia,Canada:AssociationforComputationalLinguistics,October2005.[285]P.SubasicandA.Huettner,Aectanalysisoftextusingfuzzysemantictyping,ŽIEEETransactionsonFuzzySystems,vol.9,pp.483…496,2001.[286]M.Taboada,C.Anthony,andK.Voll,Methodsforcreatingsemanticori-entationdictionaries,ŽinConferenceonLanguageResourcesandEvaluation,pp.427…432,2006.[287]M.Taboada,M.A.Gillies,andP.McFetridge,Sentimentclassi“cationtech-niquesfortrackingliteraryreputation,ŽinLRECWorkshop:TowardsCom-putationalModelsofLiteraryAnalysis,pp.36…43,2006.[288]H.Takamura,T.Inui,andM.Okumura,Extractingsemanticorientationofwordsusingspinmodel,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.133…140,2005.[289]H.Takamura,T.Inui,andM.Okumura,Latentvariablemodelsforseman-ticorientationsofphrases,ŽinProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[290]H.Takamura,T.Inui,andM.Okumura,Extractingsemanticorientationsofphrasesfromdictionary,ŽinProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL)[291]K.Tateishi,Y.Ishiguro,andT.Fukushima,Opinioninformationretrievalfromtheinternet,ŽInformationProcessingSocietyofJapan(IPSJ)SIG2001,vol.69,no.7,pp.75…82,2001.(AlsocitedasAreputationsearchenginethatgatherspeoplesopinionsfromtheInternetŽ,IPSJTechni-calReportNL-14411.InJapanese).[292]J.Tatemura,Virtualreviewersforcollaborativeexplorationofmoviereviews,ŽinProceedingsofIntelligentUserInterfaces(IUI),pp.272…275,[293]L.Terveen,W.Hill,B.Amento,D.McDonald,andJ.Creter,PHOAKS:Asystemforsharingrecommendations,ŽCommunicationsoftheAssociationforComputingMachinery(CACM),vol.40,pp.59…62,1997.[294]M.Thomas,B.Pang,andL.Lee,Getoutthevote:Determiningsupportoroppositionfromcongressional”oor-debatetranscripts,ŽinProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)pp.327…335,2006.[295]R.TokuhisaandR.Terashima,Relationshipbetweenutterancesandenthu-siasminnon-task-orientedconversationaldialogue,ŽinProceedingsoftheSIGdialWorkshoponDiscourseandDialogue,pp.161…167,Sydney,Australia:AssociationforComputationalLinguistics,July2006. [296]R.M.Tong,Anoperationalsystemfordetectingandtrackingopinionsinon-linediscussion,ŽinProceedingsoftheWorkshoponOperationalTextClas-si“cation(OTC),2001.[297]R.TumarkinandR.F.Whitelaw,Newsornoise?Internetpostingsandstockprices,ŽFinancialAnalystsJournal,vol.57,pp.41…51,May/June[298]P.Turney,Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassi“cationofreviews,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.417…424,2002.[299]P.D.TurneyandM.L.Littman,Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation,ŽACMTransactionsonInformationSystems(TOIS),vol.21,pp.315…346,2003.[300]S.WanandK.McKeown,Generatingoverviewsummariesofongoingemailthreaddiscussions,ŽinProceedingsoftheInternationalConferenceonCom-putationalLinguistics(COLING),pp.549…555,Geneva,Switzerland,2004.[301]M.White,C.Cardie,andV.Ng,Detectingdiscrepanciesinnumericestimatesusingmultidocumenthypertextsummaries,ŽinProceedingsoftheConferenceonHumanLanguageTechnology,pp.336…341,2002.[302]M.White,C.Cardie,V.Ng,K.Wagsta,andD.McCullough,Detectingdis-crepanciesandimprovingintelligibility:TwopreliminaryevaluationsofRIP-TIDES,ŽinProceedingsoftheDocumentUnderstandingConference(DUC)[303]C.Whitelaw,N.Garg,andS.Argamon,Usingappraisalgroupsforsentimentanalysis,ŽinProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),pp.625…631,ACM,2005.[304]J.Wiebe,Learningsubjectiveadjectivesfromcorpora,ŽinProceedingsof,2000.[305]J.Wiebe,E.Breck,C.Buckley,C.Cardie,P.Davis,B.Fraser,D.Litman,D.Pierce,E.Rilo,T.Wilson,D.Day,andM.Maybury,Recognizingandorganizingopinionsexpressedintheworldpress,ŽinProceedingsoftheAAAISpringSymposiumonNewDirectionsinQuestionAnswering,2003.[306]J.WiebeandR.Bruce,Probabilisticclassi“ersfortrackingpointofview,ŽProceedingsoftheAAAISpringSymposiumonEmpiricalMethodsinDis-courseInterpretationandGeneration,pp.181…187,1995.[307]J.WiebeandR.Mihalcea,Wordsenseandsubjectivity,ŽinProceedingsoftheConferenceonComputationalLinguistics/AssociationforComputationalLinguistics(COLING/ACL),2006.[308]J.WiebeandT.Wilson,Learningtodisambiguatepotentiallysubjectiveexpressions,ŽinProceedingsoftheConferenceonNaturalLanguageLearning,pp.112…118,2002.[309]J.Wiebe,T.Wilson,andC.Cardie,Annotatingexpressionsofopinionsandemotionsinlanguage,ŽLanguageResourcesandEvaluation(formerlyCom-putersandtheHumanities),vol.39,pp.164…210,2005.[310]J.M.Wiebe,Identifyingsubjectivecharactersinnarrative,ŽinProceeoftheInternationalConferenceonComputationalLinguistics(COLING)pp.401…408,1990. [311]J.M.Wiebe,Trackingpointofviewinnarrative,ŽComputationalLinguisticsvol.20,pp.233…287,1994.[312]J.M.Wiebe,R.F.Bruce,andT.P.OHara,Developmentanduseofagoldstandarddatasetforsubjectivityclassi“cations,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.246…253,1999.[313]J.M.WiebeandW.J.Rapaport,Acomputationaltheoryofperspectiveandreferenceinnarrative,ŽinProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.131…138,1988.[314]J.M.WiebeandE.Rilo,Creatingsubjectiveandobjectivesentenceclassi“ersfromunannotatedtexts,ŽinProceedingsoftheConferenceonComputationalLinguisticsandIntelligentTextProcessing(CICLing),number3406inLectureNotesinComputerScience,pp.486…497,2005.[315]J.M.Wiebe,T.Wilson,andM.Bell,Identifyingcollocationsforrecogniz-ingopinions,ŽinProceedingsoftheACL/EACLWorkshoponCollocation:ComputationalExtraction,Analysis,andExploitation,2001.[316]J.M.Wiebe,T.Wilson,R.Bruce,M.Bell,andM.Martin,Learningsub-jectivelanguage,ŽComputationalLinguistics,vol.30,pp.277…308,September[317]Y.WilksandJ.Bien,Beliefs,pointsofviewandmultipleenvironments,ŽProceedingsoftheinternationalNATOsymposiumonarti“cialandhumanintelligence,pp.147…171,USA,NewYork,NY:ElsevierNorth-Holland,Inc.,[318]Y.WilksandM.Stevenson,Thegrammarofsense:Usingpart-of-speechtagsasa“rststepinsemanticdisambiguation,ŽJournalofNaturalLanguageEngineering,vol.4,pp.135…144,1998.[319]T.Wilson,J.Wiebe,andP.Homann,Recognizingcontextualpolarityinphrase-levelsentimentanalysis,ŽinProceedingsoftheHumanLanguageTech-nologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(HLT/EMNLP),pp.347…354,2005.[320]T.Wilson,J.Wiebe,andR.Hwa,Justhowmadareyou?Findingstrongandweakopinionclauses,ŽinProceedingsofAAAI,pp.761…769,2004.(ExtendedversioninComputationalIntelligence,vol.22,no.2,pp.73…99,2006).[321]H.Yang,L.Si,andJ.Callan,KnowledgetransferandopiniondetectionintheTREC2006blogtrack,ŽinProceedingsofTREC,2006.[322]K.Yang,N.Yu,A.Valerio,andH.Zhang,WIDITinTREC-2006blogtrack,ŽProceedingsofTREC,2006.[323]J.Yi,T.Nasukawa,R.Bunescu,andW.Niblack,Sentimentanalyzer:Extractingsentimentsaboutagiventopicusingnaturallanguageprocessingtechniques,ŽinProceedingsoftheIEEEInternationalConferenceonDataMining(ICDM),2003.[324]J.YiandW.Niblack,SentimentmininginWebFountain,ŽinProceedingsoftheInternationalConferenceonDataEngineering(ICDE),2005.[325]P.-L.Yin,Informationdispersionandauctionprices,ŽSocialScienceResearchNetwork(SSRN)WorkingPaperSeries,VersiondatedMarch2005.[326]H.YuandV.Hatzivassiloglou,Towardsansweringopinionquestions:Sepa-ratingfactsfromopinionsandidentifyingthepolarityofopinionsentences,Ž ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2003.[327]J.ZabinandA.Jeeries,Socialmediamonitoringandanalysis:Generat-ingconsumerinsightsfromonlineconversation,ŽAberdeenGroupBenchmarkReport,January2008.[328]Z.ZhangandB.Varadarajan,Utilityscoringofproductreviews,ŽinPro-ceedingsoftheACMSIGIRConferenceonInformationandKnowledgeMan-agement(CIKM),pp.51…57,2006.[329]L.Zhou,J.K.Burgeon,andD.P.Twitchell,Alongitudinalanalysisoflanguagebehaviorofdeceptionine-mail,ŽinProceedingsofIntelligenceandSecurityInformatics(ISI),number2665inLectureNotesinComputerSci-ence,p.959,2008.[330]L.ZhouandE.Hovy,Onthesummarizationofdynamicallyintroducedinfor-mation:Onlinediscussionsandblogs,ŽinAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.237…242,2006.[331]F.ZhuandX.Zhang,Thein”uenceofonlineconsumerreviewsonthedemandforexperiencegoods:Thecaseofvideogames,ŽinInternationalCon-ferenceonInformationSystems(ICIS),2006.[332]L.Zhuang,F.Jing,X.-Y.Zhu,andL.Zhang,Moviereviewminingandsummarization,ŽinProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),2006.

Related Contents


Next Show more