OpinionMiningandSentimentAnalysisBoPangandLillianLeeYahooResearch701FirstAvenueSunnyvaleCA94089USAbopangyahooinccom onsummarizationofevaluativetextandonbroaderissuesregardingprivacymanipulat ID: 521015
Download Pdf The PPT/PDF document "FoundationsandTrendsInformationRetrieval..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
FoundationsandTrendsInformationRetrievalVol.2,Nos.1 2(2008)1 1352008B.PangandL.LeeDOI:10.1561/1500000001 OpinionMiningandSentimentAnalysisBoPangandLillianLeeYahoo!Research,701FirstAvenue,Sunnyvale,CA94089,USA,bopang@yahoo-inc.com onsummarizationofevaluativetextandonbroaderissuesregardingprivacy,manipulation,andeconomicimpactthatthedevelopmentofopinion-orientedinformation-accessservicesgivesriseto.Tofacilitatefuturework,adiscussionofavailableresources,benchmarkdatasets,andevaluationcampaignsisalsoprovided. 1 Introduction Romanceshouldneverbeginwithsentiment.Itshouldbeginwithscienceandendwithasettlement.OscarWilde,AnIdealHusband1.1TheDemandforInformationonOpinionsandSentimentWhatotherpeoplethinkhasalwaysbeenanimportantpieceofinfor-mationformostofusduringthedecision-makingprocess.LongbeforeawarenessoftheWorldWideWebbecamewidespread,manyofusaskedourfriendstorecommendanautomechanicortoexplainwhotheywereplanningtovoteforinlocalelections,requestedreferencelettersregardingjobapplicantsfromcolleagues,orconsultedReportstodecidewhatdishwashertobuy.ButtheInternetandtheWebhavenow(amongotherthings)madeitpossibletondoutabouttheopinionsandexperiencesofthoseinthevastpoolofpeoplethatarenei-therourpersonalacquaintancesnorwell-knownprofessionalcriticsthatis,peoplewehaveneverheardof.Andconversely,moreandmorepeoplearemakingtheiropinionsavailabletostrangersviatheInternet. IntroductionIndeed,accordingtotwosurveysofmorethan2000Americanadultseach[63,127],81%ofInternetusers(or60%ofAmericans)havedoneonlineresearchonaproductatleastonce;20%(15%ofallAmericans)dosoonatypicalday;amongreadersofonlinereviewsofrestaurants,hotels,andvariousservices(e.g.,travelagenciesordoctors),between73%and87%reportthatreviewshadasignicantinuenceontheirpurchase;consumersreportbeingwillingtopayfrom20%to99%morefora5-star-rateditemthana4-star-rateditem(thevariancestemsfromwhattypeofitemorserviceisconsidered);32%haveprovidedaratingonaproduct,service,orper-sonviaanonlineratingssystem,and30%(including18%ofonlineseniorcitizens)havepostedanonlinecommentorreviewregardingaproductorservice.Wehastentopointoutthatconsumptionofgoodsandservicesisnottheonlymotivationbehindpeoplesseekingoutorexpressingopinionsonline.Aneedforpoliticalinformationisanotherimportantfactor.Forexample,inasurveyofover2500Americanadults,RainieandHorrigan[248]studiedthe31%ofAmericansover60millionpeoplethatwere2006campaigninternetusers,denedasthosewhogatheredinformationaboutthe2006electionsonlineandexchangedviewsviaemail.Ofthese,28%saidthatamajorreasonfortheseonlineactivitieswastogetperspectivesfromwithintheircommunity,and34%saidthatamajorreasonwastogetperspectivesfromoutsidetheircommunity;27%hadlookedonlinefortheendorsementsorratingsofexternalorganizations; Section6.1discussesquantitativeanalysesofactualeconomicimpact,asopposedtocon-sumerperception.Interestingly,HitlinandRainie[123]reportthatIndividualswhohaveratedsomethingonlinearealsomoreskepticaloftheinformationthatisavailableontheWeb. 1.1TheDemandforInformationonOpinionsandSentiment28%saidthatmostofthesitestheyusesharetheirpointofview,but29%saidthatmostofthesitestheyusechal-lengetheirpointofview,indicatingthatmanypeoplearenotsimplylookingforvalidationsoftheirpre-existingopinions;8%postedtheirownpoliticalcommentaryonline.Theuserhungerforandrelianceupononlineadviceandrecom-mendationsthatthedataaboverevealsismerelyonereasonbehindthesurgeofinterestinnewsystemsthatdealdirectlywithopinionsasarst-classobject.But,Horrigan[127]reportsthatwhileamajorityofAmericaninternetusersreportpositiveexperiencesduringonlineprod-uctresearch,atthesametime,58%alsoreportthatonlineinformationwasmissing,impossibletond,confusing,and/oroverwhelming.Thus,thereisaclearneedtoaidconsumersofproductsandofinformationbybuildingbetterinformation-accesssystemsthanarecurrentlyinTheinterestthatindividualusersshowinonlineopinionsaboutproductsandservices,andthepotentialinuencesuchopinionswield,issomethingthatvendorsoftheseitemsarepayingmoreandmoreattentionto[124].Thefollowingexcerptfromawhitepaperisillustra-tiveoftheenvisionedpossibilities,orattheleasttherhetoricsurround-ingthepossibilities:WiththeexplosionofWeb2.0platformssuchasblogs,discussionforums,peer-to-peernetworks,andvariousothertypesofsocialmedia...consumershaveattheirdisposalasoapboxofunprecedentedreachandpowerbywhichtosharetheirbrandexperiencesandopinions,positiveornegative,regardinganyproductorservice.Asmajorcompaniesareincreasinglycomingtorealize,theseconsumervoicescanwieldenormousinuenceinshapingtheopinionsofotherconsumersand,ulti-mately,theirbrandloyalties,theirpurchasedecisions,andtheirownbrandadvocacy....Companiescanrespondtotheconsumerinsightstheygeneratethroughsocialmediamonitoringandanalysisbymodifyingtheir Introductionmarketingmessages,brandpositioning,productdevel-opment,andotheractivitiesaccordingly.ZabinandJeeries[327]Butindustryanalystsnotethattheleveragingofnewmediaforthepurposeoftrackingproductimagerequiresnewtechnologies;hereisarepresentativesnippetdescribingtheirconcerns:Marketershavealwaysneededtomonitormediaforinformationrelatedtotheirbrandswhetheritsforpublicrelationsactivities,fraudviolations,competitiveintelligence.Butfragmentingmediaandchangingconsumerbehaviorhavecrippledtraditionalmonitoringmethods.Technoratiestimatesthat75,000newblogsarecreateddaily,alongwith1.2millionnewpostseachday,manydiscussingconsumeropinionsonproductsandservices.Tactics[ofthetraditionalsort]suchasclippingservices,eldagents,andadhocresearchsimplycantkeeppace.Kim[154]Thus,asidefromindividuals,anadditionalaudienceforsystemscapa-bleofautomaticallyanalyzingconsumersentiment,asexpressedinnosmallpartinonlinevenues,arecompaniesanxioustounderstandhowtheirproductsandservicesareperceived.1.2WhatMightbeInvolved?AnExampleExaminationoftheConstructionofanOpinion/ReviewSearchEngineCreatingsystemsthatcanprocesssubjectiveinformationeectivelyrequiresovercominganumberofnovelchallenges.Toillustratesomeofthesechallenges,letusconsidertheconcreteexampleofwhatbuild-inganopinion-orreview-searchapplicationcouldinvolve.Aswehavediscussed,suchanapplicationwouldllanimportantandprevalent Presumably,theauthormeansthedetectionorpreventionoffraudviolations,asopposedtothecommission 1.2WhatMightbeInvolved?informationneed,whetheronerestrictsattentiontoblogsearch[213]orconsidersthemoregeneraltypesofsearchthathavebeendescribedabove.Thedevelopmentofacompletereview-oropinion-searchapplica-tionmightinvolveattackingeachofthefollowingproblems.(1)Iftheapplicationisintegratedintoageneral-purposesearchengine,thenonewouldneedtodeterminewhethertheuserisinfactlookingforsubjectivematerial.Thismayormaynotbeadicultprobleminandofitself:perhapsqueriesofthistypewilltendtocontainindicatortermslikereview,reviews,oropinions,orperhapstheapplicationwouldprovideacheckboxtotheusersothatheorshecouldindi-catedirectlythatreviewsarewhatisdesired;butingeneral,queryclassicationisadicultproblemindeed,itwasthesubjectofthe2005KDDCupchallenge[185].(2)Besidesthestill-openproblemofdeterminingwhichdocu-mentsaretopicallyrelevanttoanopinion-orientedquery,anadditionalchallengewefaceinournewsettingissimultaneouslyorsubsequentlydeterminingwhichdocu-mentsorportionsofdocumentscontainreview-likeoropin-ionatedmaterial.Sometimesthisisrelativelyeasy,asintextsfetchedfromreview-aggregationsitesinwhichreview-orientedinformationispresentedinrelativelystereotypedformat:examplesincludeEpinions.comandAmazon.com.However,blogsalsonotoriouslycontainquiteabitofsubjec-tivecontentandthusareanotherobviousplacetolook(andaremorerelevantthanshoppingsitesforqueriesthatcon-cernpolitics,people,orothernon-products),butthedesiredmaterialwithinblogscanvaryquitewidelyincontent,style,presentation,andevenlevelofgrammaticality.(3)Onceonehastargetdocumentsinhand,oneisstillfacedwiththeproblemofidentifyingtheoverallsentimentexpressedbythesedocumentsand/orthespecicopinionsregard-ingparticularfeaturesoraspectsoftheitemsortopicsinquestion,asnecessary.Again,whilesomesitesmakethis Introductionkindofextractioneasierforinstance,userreviewspostedtoYahoo!Moviesmustspecifygradesforpre-denedsetsofcharacteristicsoflmsmorefree-formtextcanbemuchharderforcomputerstoanalyze,andindeedcanposeaddi-tionalchallenges;forexample,ifquotationsareincludedinanewspaperarticle,caremustbetakentoattributetheviewsexpressedineachquotationtothecorrectentity.(4)Finally,thesystemneedstopresentthesentimentinforma-tionithasgarneredinsomereasonablesummaryfashion.Thiscaninvolvesomeorallofthefollowingactions:(a)Aggregationofvotesthatmayberegisteredondierentscales(e.g.,onereviewerusesastarsystem,butanotheruseslettergrades).(b)Selectivehighlightingofsomeopinions.(c)Representationofpointsofdisagreementandpointsofconsensus.(d)Identicationofcommunitiesofopinionholders.(e)Accountingfordierentlevelsofauthorityamongopinionholders.Notethatitmightbemoreappropriatetoproduceavisual-izationofsentimentdataratherthanatextualsummaryofit,whereastextualsummariesarewhatisusuallycreatedinstandardtopic-basedmulti-documentsummarization.1.3OurChargeandApproachChallenges(2),(3),and(4)intheabovelistareveryactiveareasofresearch,andthebulkofthissurveyisdevotedtoreviewingworkinthesethreesub-elds.However,duetospacelimitationsandthefocusofthejournalseriesinwhichthissurveyappears,wedonotandcannotaimtobecompletelycomprehensive.Inparticular,whenwebegantowritethissurvey,weweredirectlychargedtofocusoninformation-accessapplications,asopposedtoworkofmorepurelylinguisticinterest.Westressthattheimportanceofworkinthelatterveinisabsolutelynotinquestion. 1.4EarlyHistoryGivenourmandate,thereaderwillnotbesurprisedthatwedescribetheapplicationsthatsentiment-analysissystemscanfacilitateandreviewmanykindsofapproachestoavarietyofopinion-orientedclas-sicationproblems.Wehavealsochosentoattempttodrawattentiontosingle-andmulti-documentsummarizationofevaluativetext,espe-ciallysinceinterestingconsiderationsregardinggraphicalvisualizationarise.Finally,wemovebeyondjustthetechnicalissues,devotingsig-nicantattentiontothebroaderimplicationsthatthedevelopmentofopinion-orientedinformation-accessserviceshave:welookatquestionsofprivacy,manipulation,andwhetherornotreviewscanhavemeasur-ableeconomicimpact.1.4EarlyHistoryAlthoughtheareaofsentimentanalysisandopinionmininghasrecentlyenjoyedahugeburstofresearchactivity,therehasbeenasteadyundercurrentofinterestforquiteawhile.Onecouldcountearlyprojectsonbeliefsasforerunnersofthearea[48,317].Laterworkfocusedmostlyoninterpretationofmetaphor,narrative,pointofview,aect,evidentialityintext,andrelatedareas[121,133,149,262,306,310,311,312,313].Theyear2001orsoseemstomarkthebeginningofwidespreadawarenessoftheresearchproblemsandopportunitiesthatsentimentanalysisandopinionminingraise[51,66,69,79,192,215,221,235,291,296,298,305,326],andsubsequentlytherehavebeenliterallyhundredsofpaperspublishedonthesubject.Factorsbehindthislandrushinclude:theriseofmachinelearningmethodsinnaturallanguageprocessingandinformationretrieval;theavailabilityofdatasetsformachinelearningalgorithmstobetrainedon,duetotheblossomingoftheWorldWideWeband,specically,thedevelopmentofreview-aggregationweb-sites;and,ofcourserealizationofthefascinatingintellectualchallengesandcom-mercialandintelligenceapplicationsthattheareaoers. Introduction1.5ANoteonTerminology:OpinionMining,SentimentAnalysis,Subjectivity,andAllthatThebeginningofwisdomisthedenitionofterms,wroteSocrates.Theaphorismishighlyapplicablewhenitcomestotheworldofsocialmediamonitoringandanalysis,whereanysemblanceofuniversalagreementonterminologyisaltogetherlacking.Today,vendors,practitioners,andthemediaalikecallthisstill-nascentarenaeverythingfrombrandmoni-toring,buzzmonitoringandonlineanthropology,tomarketinuenceanalytics,conversationminingandonlineconsumerintelligence....Intheend,thetermsocialmediamonitoringandanalysisisitselfaverbalcrutch.Itisplaceholder[sic],tobeuseduntilsomethingbetter(andshorter)takesholdintheEnglishlanguagetodescribethetopicofthisreport.ZabinandJeeries[327]Theabovequotationhighlightstheproblemsthathavearisenintryingtonameanewarea.Thequotationisparticularlyaptinthecontextofthissurveybecausetheeldofsocialmediamonitoringandanalysis(orhoweveronechoosestorefertoit)ispreciselyonethatthebodyofworkwereviewisveryrelevantto.Andindeed,therehasbeentodatenouniformterminologyestablishedfortherelativelyyoungeldwediscussinthissurvey.Inthissection,wesimplymentionsomeofthetermsthatarecurrentlyinvogue,andattempttoindicatewhatthesetermstendtomeaninresearchpapersthattheinterestedreadermayencounter.Thebodyofworkwereviewisthatwhichdealswiththecomputa-tionaltreatmentof(inalphabeticalorder),andjectivityintext.Suchworkhascometobeknownasopinionminingsentimentanalysis,and/orsubjectivityanalysis.Thephrasesreviewappraisalextractionhavebeenused,too,andtherearesomeconnectionstoaectivecomputing,wherethegoalsincludeenablingcomputerstorecognizeandexpressemotions[239].Thisproliferationoftermsreectsdierencesintheconnotationsthatthesetermscarry, 1.5ANoteonTerminologybothintheiroriginalgeneral-discourseusagesandintheusagesthathaveevolvedinthetechnicalliteratureofseveralcommunities.In1994,Wiebe[311],inuencedbythewritingsoftheliterarytheoristBaneld[26],centeredtheideaofsubjectivityaroundthatofprivatestates,denedbyQuirketal.[245]asstatesthatarenotopentoobjectiveobservationorverication.Opinions,evaluations,emotions,andspeculationsallfallintothiscategory;butacanonicalexampleofresearchtypicallydescribedasatypeofsubjectivityanalysisistherecognitionofopinion-orientedlanguageinordertodistinguishitfromobjectivelanguage.Whiletherehasbeensomeresearchself-identiedassubjectivityanalysisontheparticularapplicationareaofdetermin-ingthevaluejudgments(e.g.,fourstarsorC+)expressedintheevaluativeopinionsthatarefound,thisapplicationhasnottendedtobeamajorfocusofsuchwork.ThetermopinionminingappearsinapaperbyDaveetal.[69]thatwaspublishedintheproceedingsofthe2003WWWconference;thepublicationvenuemayexplainthepopularityofthetermwithincommunitiesstronglyassociatedwithWebsearchorinformationretrieval.AccordingtoDaveetal.[69],theidealopinion-miningtoolwouldprocessasetofsearchresultsforagivenitem,generatingalistofproductattributes(quality,features,etc.)andaggregatingopinions Toseethatthedistinctionsincommonusagecanbesubtle,considerhowinterrelatedthefollowingsetofdenitionsgiveninMerriam-WebstersOnlineDictionarySynonyms:opinion,view,belief,conviction,persuasion,sentimentmeanajudgmentoneholdsastrue.OpinionimpliesaconclusionthoughtoutyetopentodisputeeachexpertseemedtohaveadierentopinionViewsuggestsasubjectiveopinionveryassertiveinstatinghisviewsBeliefimpliesoftendeliberateacceptanceandintellectualassentarmbeliefinherpartysplatformConvictionappliestoarmlyandseriouslyheldbeliefconvictionthatanimallifeisassacredashumanPersuasionsuggestsabeliefgroundedonassurance(asbyevidence)ofitstruthwasofthepersuasionthateverythingchangesSentimentsuggestsasettledopinionreectiveofonesfeelingsherfeministsentimentsarewell-known Introductionabouteachofthem(poor,mixed,good).Muchofthesubsequentresearchself-identiedasopinionminingtsthisdescriptioninitsemphasisonextractingandanalyzingjudgmentsonvariousaspectsofgivenitems.However,thetermhasrecentlyalsobeeninterpretedmorebroadlytoincludemanydierenttypesofanalysisofevaluativetext[190].Thehistoryofthephrasesentimentanalysisparallelsthatofopin-ionminingincertainrespects.Thetermsentimentusedinreferencetotheautomaticanalysisofevaluativetextandtrackingofthepredic-tivejudgmentsthereinappearsin2001papersbyDasandChen[66]andTong[296],duetotheseauthorsinterestinanalyzingmarketsenti-ment.Itsubsequentlyoccurredwithin2002papersbyTurney[298]andPangetal.[235],whichwerepublishedintheproceedingsoftheannualmeetingoftheAssociationforComputationalLinguistics(ACL)andtheannualconferenceonEmpiricalMethodsinNaturalLanguagePro-cessing(EMNLP).Moreover,NasukawaandYi[221]entitledtheir2003paper,Sentimentanalysis:Capturingfavorabilityusingnaturallan-guageprocessing,andapaperinthesameyearbyYietal.[323]wasnamedSentimentAnalyzer:Extractingsentimentsaboutagiventopicusingnaturallanguageprocessingtechniques.Theseeventstogethermayexplainthepopularityofsentimentanalysisamongcommuni-tiesself-identiedasfocusedonNLP.Asizeablenumberofpapersmentioningsentimentanalysisfocusonthespecicapplicationofclassifyingreviewsastotheirpolarity(eitherpositiveornegative),afactthatappearstohavecausedsomeauthorstosuggestthatthephraserefersspecicallytothisnarrowlydenedtask.However,nowa-daysmanyconstruethetermmorebroadlytomeanthecomputationaltreatmentofopinion,sentiment,andsubjectivityintext.Thus,whenbroadinterpretationsareapplied,sentimentanalysisandopinionminingdenotethesameeldofstudy(whichitselfcanbeconsideredasub-areaofsubjectivityanalysis).Wehaveattemptedtousethesetermsmoreorlessinterchangeablyinthissurvey.Thisisinnosmallpartbecauseweviewtheeldasrepresentingauniedbodyofwork,andwouldthusliketoencourageresearchersintheareatoshareterminologyregardlessofthepublicationvenuesatwhichtheirpapersmightappear. 2 Applications Sentimentwithoutactionistheruinofthesoul.EdwardAbbeyWeusedoneapplicationofopinionminingandsentimentanalysisasamotivatingexampleintheIntroduction,namely,websearchtargetedtowardreviews.Butotherapplicationsabound.Inthissection,weseektoenumeratesomeofthepossibilities.Itisimportanttomentionthatbecauseofallthepossibleapplica-tions,thereareagoodnumberofcompanies,largeandsmall,thathaveopinionminingandsentimentanalysisaspartoftheirmission.How-ever,wehaveelectednottomentionthesecompaniesindividuallyduetothefactthattheindustriallandscapetendstochangequiterapidly,sothatlistsofcompaniesriskfallingoutofdateratherquickly.2.1ApplicationstoReview-RelatedWebsitesClearly,thesamecapabilitiesthatareview-orientedsearchenginewouldhavecouldalsoserveverywellasthebasisforthecreationandautomatedupkeepofreview-andopinion-aggregationwebsites.Thatis,asanalternativetositeslikeEpinionsthatsolicitfeedbackandreviews, onecouldimaginesitesthatproactivelygathersuchinformation.Topicsneednotberestrictedtoproductreviews,butcouldincludeopinionsaboutcandidatesrunningforoce,politicalissues,andsoforth.Therearealsoapplicationsofthetechnologieswediscusstomoretraditionalreview-solicitationsites,aswell.Summarizinguserreviewsisanimportantproblem.Onecouldalsoimaginethaterrorsinuserratingscouldbexed:therearecaseswhereusershaveclearlyacci-dentallyselectedalowratingwhentheirreviewindicatesapositiveevaluation[47].Moreover,asdiscussedlaterinthissurvey(seeSec-tion5.2.4,forexample),thereissomeevidencethatuserratingscanbebiasedorotherwiseinneedofcorrection,andautomatedclassierscouldprovidesuchupdates.2.2ApplicationsasaSub-ComponentTechnologySentiment-analysisandopinion-miningsystemsalsohaveanimportantpotentialroleasenablingtechnologiesforothersystems.Onepossibilityisasanaugmentationtorecommendationsystems[292,293],sinceitmightbehoovesuchasystemnottorecommenditemsthatreceivealotofnegativefeedback.Detectionofames(overlyheatedorantagonisticlanguage)inemailorothertypesofcommunication[276]isanotherpossibleuseofsubjectivitydetectionandclassication.Inonlinesystemsthatdisplayadsassidebars,itishelpfultodetectwebpagesthatcontainsensitivecontentinappropriateforadsplace-ment[137];formoresophisticatedsystems,itcouldbeusefultobringupproductadswhenrelevantpositivesentimentsaredetected,andper-hapsmoreimportantly,nixtheadswhenrelevantnegativestatementsarediscovered.Ithasalsobeenarguedthatinformationextractioncanbeimprovedbydiscardinginformationfoundinsubjectivesentences[256].Questionansweringisanotherareawheresentimentanalysiscanproveuseful[274,284,189].Forexample,opinion-orientedquestionsmayrequiredierenttreatment.Alternatively,Litaetal.[189]suggestthatfordenitionalquestions,providingananswerthatincludesmoreinformationabouthowanentityisviewedmaybetterinformtheuser. 2.3ApplicationsinBusinessandGovernmentIntelligenceSummarizationmayalsobenetfromaccountingformultipleview-points[265].Additionally,therearepotentiallyrelationstocitationanalysis,where,forexample,onemightwishtodeterminewhetheranauthoriscitingapieceofworkassupportingevidenceorasresearchthatheorshedismisses[238].Similarly,oneeortseekstousesemanticorientationtotrackliteraryreputation[287].Ingeneral,thecomputationaltreatmentofaecthasbeenmoti-vatedinpartbythedesiretoimprovehuman computerinteraction[188,192,295].2.3ApplicationsinBusinessandGovernmentIntelligenceTheeldofopinionminingandsentimentanalysisiswell-suitedtovarioustypesofintelligenceapplications.Indeed,businessintelligenceseemstobeoneofthemainfactorsbehindcorporateinterestintheConsider,forinstance,thefollowingscenario(thetextofwhichalsoappearsinLee[181]).Amajorcomputermanufacturer,disappointedwithunexpectedlylowsales,ndsitselfconfrontedwiththequestion:Whyarentconsumersbuyingourlaptop?Whileconcretedatasuchasthelaptopsweightorthepriceofacompetitorsmodelareobviouslyrelevant,answeringthisquestionrequiresfocusingmoreonpeoplespersonalviewsofsuchobjectivecharacteristics.Moreover,subjectivejudgmentsregardingintangiblequalitiese.g.,thedesignistackyorcustomerservicewascondescendingorevenmisperceptionse.g.,updateddevicedriversarenotavailablewhensuchdevicedriversdoinfactexistmustbetakenintoaccountaswell.Sentiment-analysistechnologiesforextractingopinionsfromunstructuredhuman-authoreddocumentswouldbeexcellenttoolsforhandlingmanybusiness-intelligencetasksrelatedtotheonejustdescribed.Continuingwithourexamplescenario:itwouldbediculttotrytodirectlysurveylaptoppurchaserswhohavenotboughtthecompanysproduct.Rather,wecouldemployasystemthat(a)ndsreviewsorotherexpressionsofopinionontheWebnewsgroups,individualblogs,andaggregationsitessuchasEpinionsarelikelyto beproductivesourcesandthen(b)createscondensedversionsofindividualreviewsoradigestofoverallconsensuspoints.Thiswouldsaveananalystfromhavingtoreadpotentiallydozensorevenhun-dredsofversionsofthesamecomplaints.NotethatInternetsourcescanvarywildlyinform,tenor,andevengrammaticality;thisfactunder-scorestheneedforrobusttechniquesevenwhenonlyonelanguage(e.g.,English)isconsidered.Besidesreputationmanagementandpublicrelations,onemightper-hapshopethatbytrackingpublicviewpoints,onecouldperformtrendpredictioninsalesorotherrelevantdata[214].(SeeourdiscussionofBroaderImplications(Section6)formorediscussionofpotentialeco-nomicimpact.)Governmentintelligenceisanotherapplicationthathasbeencon-sidered.Forexample,ithasbeensuggestedthatonecouldmonitorsourcesforincreasesinhostileornegativecommunications[1].2.4ApplicationsAcrossDierentDomainsOneexcitingturnofeventshasbeentheconuenceofinterestinopin-ionsandsentimentwithincomputersciencewithinterestinopinionsandsentimentinotherelds.Asiswellknown,opinionsmatteragreatdealinpolitics.Someworkhasfocusedonunderstandingwhatvotersarethinking[83,110,126,178,219],whereasotherprojectshaveasalongtermgoaltheclar-icationofpoliticianspositions,suchaswhatpublicguressupportoroppose,toenhancethequalityofinformationthatvotershaveaccessto[27,111,294].SentimentanalysishasspecicallybeenproposedasakeyenablingtechnologyineRulemaking,allowingtheautomaticanalysisoftheopin-ionsthatpeoplesubmitaboutpendingpolicyorgovernment-regulationproposals[50,175,271].Onarelatednote,therehasbeeninvestigationintoopinionmininginweblogsdevotedtolegalmatters,sometimesknownasblawgs[64].Interactionswithsociologypromisetobeextremelyfruitful.Forinstance,theissueofhowideasandinnovationsdiuse[258]involvesthequestionofwhoispositivelyornegativelydisposedtowardwhom, 2.4ApplicationsAcrossDierentDomainsandhencewhowouldbemoreorlessreceptivetonewinformationtransmissionfromagivensource.Totakejustoneotherexample:structuralbalancetheoryiscentrallyconcernedwiththepolarityoftiesbetweenpeople[54]andhowthisrelatestogroupcohe-sion.Theseideashavebeguntobeappliedtoonlinemediaanalysis[58,144]. 3 GeneralChallenges 3.1ContrastswithStandardFact-BasedTextualAnalysisTheincreasinginterestinopinionminingandsentimentanalysisispartlyduetoitspotentialapplications,whichwehavejustdiscussed.Equallyimportantarethenewintellectualchallengesthattheeldpresentstotheresearchcommunity.Sowhatmakesthetreatmentofevaluativetextdierentfromclassictextminingandfact-basedTaketextcategorization,forexample.Traditionally,textcategoriza-tionseekstoclassifydocumentsbytopic.Therecanbemanypossiblecategories,thedenitionsofwhichmightbeuser-andapplication-dependent;andforagiventask,wemightbedealingwithasfewastwoclasses(binaryclassication)orasmanyasthousandsofclasses(e.g.,classifyingdocumentswithrespecttoacomplextaxonomy).Incontrast,withsentimentclassication(seeSection4.1formoredetailsonprecisedenitions),weoftenhaverelativelyfewclasses(e.g.,pos-itiveor3stars)thatgeneralizeacrossmanydomainsandusers.Inaddition,whilethedierentclassesintopic-basedcategorizationcanbecompletelyunrelated,thesentimentlabelsthatarewidely 3.2FactorsthatMakeOpinionMiningDicultconsideredinpreviousworktypicallyrepresentopposing(ifthetaskisbinaryclassication)orordinal/numericalcategories(ifclassicationisaccordingtoamulti-pointscale).Infact,theregression-likenatureofstrengthoffeeling,degreeofpositivity,andsoonseemsratheruniquetosentimentcategorization(althoughonecouldarguethatthesamephenomenonexistswithrespecttotopic-basedrelevance).Therearealsomanycharacteristicsofanswerstoopinion-orientedquestionsthatdierfromthoseforfact-basedquestions[284].Asaresult,opinion-orientedinformationextraction,asawaytoapproachopinion-orientedquestionanswering,naturallydiersfromtraditionalinformationextraction(IE)[49].Interestingly,inamannerthatissim-ilartothesituationfortheclassesinsentiment-basedclassication,thetemplatesforopinion-orientedIEalsooftengeneralizewellacrossdier-entdomains,sinceweareinterestedinroughlythesamesetofeldsforeachopinionexpression(e.g.,holder,type,strength)regardlessofthetopic.Incontrast,traditionalIEtemplatescandiergreatlyfromonedomaintoanotherthetypicaltemplateforrecordinginformationrelevanttoanaturaldisasterisverydierentfromatypicaltemplateforstoringbibliographicinformation.Thesedistinctionsmightmakeourproblemsappeardeceptivelysimplerthantheircounterpartsinfact-basedanalysis,butthisisfarfromthetruth.Inthenextsection,wesampleafewexamplestoshowwhatmakestheseproblemsdicultcomparedtotraditionalfact-basedtextanalysis.3.2FactorsthatMakeOpinionMiningDicultLetusbeginwithasentimentpolaritytext-classicationexample.Sup-posewewishtoclassifyanopinionatedtextaseitherpositiveornegative,accordingtotheoverallsentimentexpressedbytheauthorwithinit.Isthisadiculttask?Toanswerthisquestion,rstconsiderthefollowingexample,consistingofonlyonesentence(byMarkTwain):JaneAustensbooksmaddenmesothatIcantconcealmyfrenzyfromthereader.JustasthetopicofthistextsegmentcanbeidentiedbythephraseJaneAusten,thepresenceofwordslikemaddenandfrenzysuggests GeneralChallengesnegativesentiment.Soonemightthinkthisisaneasytask,andhypothesizethatthepolarityofopinionscangenerallybeidentiedbyasetofkeywords.But,theresultsofanearlystudybyPangetal.[235]onmoviereviewssuggestthatcomingupwiththerightsetofkeywordsmightbelesstrivialthanonemightinitiallythink.ThepurposeofPangetal.spilotstudywastobetterunderstandthedicultyofthedocument-levelsentiment-polarityclassicationproblem.Twohumansubjectswereaskedtopickkeywordsthattheywouldconsidertobegoodindi-catorsofpositiveandnegativesentiment.AsshowninFigure3.1,theuseofthesubjectslistsofkeywordsachievesabout60%accuracywhenemployedwithinastraightforwardclassicationpolicy.Incontrast,wordlistsofthesamesizebutchosenbasedonexaminationofthecorpusstatisticsachievesalmost70%accuracyeventhoughsomeoftheterms,suchasstill,mightnotlookthatintuitiveatrst.However,thefactthatitmaybenon-trivialforhumanstocomeupwiththebestsetofkeywordsdoesnotinitselfimplythattheproblemisharderthantopic-basedcategorization.Whilethefeaturestillmightnotbelikelyforanyhumantoproposefromintrospection,giventrainingdata,itscorrelationwiththepositiveclasscanbediscoveredviaadata-drivenapproach,anditsutility(atleastin Proposedwordlists Accuracy Ties (%) (%) Human1 positive:dazzling,brilliant,phenomenal,excellent, 58 75 negative:suck,terrible,awful,unwatchable,hideous Human2 positive:gripping,mesmerizing,riveting,spectacular,cool,awesome,thrilling,badass,excellent,moving,exciting 64 39 negative:bad,cliched,sucks,boring,stupid,slow Statistics-based positive:love,wonderful,best,great,superb,still,beautiful 69 16 negative:bad,worst,stupid,waste,boring,?,! Fig.3.1Sentimentclassicationusingkeywordlistscreatedbyhumansubjects(Human1andHuman2),withcorrespondingresultsusingkeywordsselectedviaexaminationofsimplestatisticsofthetestdata(Statistics-based).AdaptedfromFigures1and2inPangetal.[235]. 3.2FactorsthatMakeOpinionMiningDicultthemoviereviewdomain)doesmakesenseinretrospect.Indeed,applyingmachinelearningtechniquesbasedonunigrammodelscanachieveover80%inaccuracy[235],whichismuchbetterthantheper-formancebasedonhand-pickedkeywordsreportedabove.However,thislevelofaccuracyisnotquiteonparwiththeperformanceonewouldexpectintypicaltopic-basedbinaryclassication.Whydoesthisproblemappearharderthanthetraditionaltaskwhenthetwoclassesweareconsideringherearesodierentfromeachother?Ourdiscussionofalgorithmsforclassicationandextraction(Section4)willprovideamorein-depthanswertothisquestion,butthefollowingareafewexamples(fromamongthemanyweknow)showingthattheupperboundonproblemdiculty,fromtheviewpointofmachines,isveryhigh.NotethatnotalloftheissuestheseexamplesraisehavebeenfullyaddressedintheexistingbodyofworkinthisComparedtotopic,sentimentcanoftenbeexpressedinamoresubtlemanner,makingitdiculttobeidentiedbyanyofasentenceordocumentstermswhenconsideredinisolation.ConsiderthefollowingIfyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.(reviewbyLucaTurinandTaniaSanchezoftheGivenchyperfumeAmarige,inPerfumes:TheGuide,Viking2008.)Noostensiblynegativewordsoccur.SherunsthegamutofemotionsfromAtoB.(DorothyParker,speakingaboutKatharineHepburn.)Noostensiblynegativewordsoccur.Infact,theexamplethatopensthissection,whichwastakenfromthefollowingquotefromMarkTwain,isalsofollowedbyasentencewithnoostensiblynegativewords:JaneAustensbooksmaddenmesothatIcantconcealmyfrenzyfromthereader.EverytimeIreadPrideandPrejudiceIwanttodigherupandbeatherovertheskullwithherownshin-bone. GeneralChallengesArelatedobservationisthatalthoughthesecondsentenceindicatesanextremelystrongopinion,itisdiculttoassociatethepresenceofthisstrongopinionwithspecickeywordsorphrasesinthissentence.Indeed,subjectivitydetectioncanbeadiculttaskinitself.ConsiderthefollowingquotefromCharlotteBront¨e,inalettertoGeorgeLewes:YousayImustfamiliarisemymindwiththefactthatMissAustenisnotapoetess,hasnosentiment(youscornfullyenclosethewordininvertedcommas),hasnoeloquence,noneoftheravishingenthusiasmofpoetry;andthenyouadd,Imustlearntoacknowl-edgeherasoneofthegreatestartists,ofthegreatestpaintersofhumancharacter,andoneofthewriterswiththenicestsenseofmeanstoanendthateverlived.Notethenelinebetweenfactsandopinions:whileMissAustenisnotapoetesscanbeconsideredtobeafact,noneoftheravishingenthusiasmofpoetryshouldprobablybeconsideredasanopinion,eventhoughthetwophrasess(arguably)conveysimilarinformation.Thus,notonlycanwenoteasilyidentifysimplekeywordsforsub-jectivity,butwealsondthatlikethefactthatdonotnecessarilyguaranteetheobjectivetruthofwhatfollowsthemandbigramslikenosentimentapparentlydonotguaranteetheabsenceofopinions,either.Wecanalsogetaglimpseofhowopinion-orientedinformation Onecanchallengeouranalysisofthepoetessclause,asananonymousreviewerindeeddidwhichdisagreementperhapssupportsourgreaterpointaboutthedicultiesthatcansometimespresentthemselves.Dierentresearchersexpressdierentopinionsaboutwhetherdistinguishingbetweensubjectiveandobjectivelanguageisdicultforhumansinthegeneralcase.Forexample,KimandHovy[159]notethatinapilotstudysponsoredbyNIST,humanannotatorsoftendisagreedonwhetherabeliefstatementwasorwasnotanopinion.However,otherresearchershavefoundinter-annotatoragreementratesinvarioustypesofsubjectivity-classicationtaskstobesatisfactory[45,273,274,309];asummaryprovidedbyoneoftheanonymousrefereesisthat[although]thereisvariationfromstudytostudy,onaverage,about85%ofannotationsarenotmarkedasuncertainbyeitherannotator,andforthesecases,inter-coderagreementisveryhigh(kappavaluesover80).Asinothersettings,morecarefuldenitionsofthedistinctionstobemadetendtoleadtobetteragreementrates.Inanyevent,thepointsweareexploringintheBront¨equotemaybemademoreclearbyreplacingJaneAustenisnotapoetesswithsomethinglikeJaneAustendoesnotwritepoetryforaliving,butisalsonopoetinthebroadersense. 3.2FactorsthatMakeOpinionMiningDicultextractioncanbedicult.Forinstance,itisnon-trivialtorecognizeopinionholders.Intheexamplequotedabove,theopinionisnotthatoftheauthor,buttheopinionofYou,whichreferstoGeorgeLewesinthisparticularletter.Also,observethatgiventhecontext(youscornfullyenclosethewordininvertedcommas,togetherwiththereportedendorsementofAustenasagreatartist),itisclearthathasnosentimentisnotmeanttobeashow-stoppingcriticismofAustenfromLewes,andBront¨esdisagreementwithhimonthissubjectisalsosubtlyrevealed.Ingeneral,sentimentandsubjectivityarequitecontext-sensitive,and,atacoarsergranularity,quitedomaindependent(inspiteofthefactthatthegeneralnotionofpositiveandnegativeopinionsisfairlyconsistentacrossdierentdomains).Notethatalthoughdomaindepen-dencyisinpartaconsequenceofchangesinvocabulary,eventheexactsameexpressioncanindicatedierentsentimentindierentdomains.Forexample,goreadthebookmostlikelyindicatespositivesen-timentforbookreviews,butnegativesentimentformoviereviews.(ThisexamplewasfurnishedtousbyBobBland.)Wewilldiscusstopic-sentimentinteractioninmoredetailinSection4.4.Itdoesnottakeaseasonedwriteroraprofessionaljournalisttoproducetextsthataredicultformachinestoanalyze.ThewritingsofWebuserscanbejustaschallenging,ifnotassubtle,intheirownwayseeFigure3.2foranexample.InthecaseofFigure3.2,itshouldbepointedoutthatmightbemoreusefultolearntorecognizethequalityofareview(seeSection5.2formoredetaileddiscussionsonthatsubject).Still,itisinterestingtoobservetheimportanceofmodelingdiscoursestructure.Whiletheoveralltopicofadocument Fig.3.2Exampleofmoviereviewsproducedbywebusers:a(slightlyreformatted)screen-shotofuserreviewsforTheNightmareBeforeChristmas GeneralChallengesshouldbewhatthemajorityofthecontentisfocusingonregardlessoftheorderinwhichpotentiallydierentsubjectsarepresented,foropinions,theorderinwhichdierentopinionsarepresentedcanresultinacompletelyoppositeoverallsentimentpolarity.Infact,somewhatincontrastwithtopic-basedtextcategorization,ordereectscancompletelyoverwhelmfrequencyeects.Considerthefollowingexcerpt,againfromamoviereview:Thislmshouldbebrilliant.Itsoundslikeagreattheactorsarerstgrade,andthesupportingcastisgoodaswell,andStalloneisattemptingtodeliveragoodperformance.However,itcantholdup.Asindicatedbythe(inserted)emphasis,wordsthatarepositiveinorientationdominatethisexcerpt,andyettheoverallsentimentisnegativebecauseofthecruciallastsentence;whereasintraditionaltextclassication,ifadocumentmentionscarsrelativelyfrequently,thenthedocumentismostlikelyatleastsomewhatrelatedtocars.Orderdependencealsomanifestsitselfatmorene-grainedlevelsofanalysis:AisbetterthanBconveystheexactoppositeopinionfromBisbetterthanA.Ingeneral,modelingsequentialinformationanddiscoursestructureseemsmorecrucialinsentimentanalysis(furtherdiscussionappearsinSection4.7).Asnotedearlier,notalloftheissueswehavejustdiscussedhavebeenfullyaddressedintheliterature.Thisisperhapspartofthecharmofthisemergingarea.Inthefollowingsections,weaimtogiveanoverviewofaselectionofpastheroiceortstoaddresssomeoftheseissues,andmarchthroughthepositivesandthenegatives,chargedwithunbiasedfeeling,armedwithhardfacts.Fastenyourseatbelts.Itsgoingtobeabumpynight!BetteDavis,AllAboutEvescreenplaybyJosephMankiewicz OnecouldargueaboutwhetherinthecontextofmoviereviewsthewordStallonehasasemanticorientation.Notethatthisisnotuniquetoopinionexpressions;AkilledBandBkilledAalsoconveydierentfactualinformation. 4 ClassicationandExtraction TheBucketList,whichwaswrittenbyJustinZack-hamanddirectedbyRobReiner,seemstohavebeencreatedbyapplyingalgorithmstosentiment.DavidDenbymoviereview,TheNewYorker,January7,2007Afundamentaltechnologyinmanycurrentopinion-miningandsentiment-analysisapplicationsisclassicationnotethatinthissur-vey,wegenerallyconstruethetermclassicationbroadly,sothatitencompassesregressionandranking.Thereasonthatclassicationissoimportantisthatmanyproblemsofinterestcanbeformulatedasapply-ingclassication/regression/rankingtogiventextualunits;examplesincludemakingadecisionforaparticularphraseordocument(howpositiveisit?),orderingasetoftexts(rankthesereviewsbyhowpos-itivetheyare),givingasinglelabeltoanentiredocumentcollection(whereonthescalebetweenliberalandconservativedothewritingsofthisauthorlie?),andcategorizingtherelationshipbetweentwoenti-tiesbasedontextualevidence(doesAapproveofBsactions?).Thissectioniscenteredonapproachestothesekindsofproblems. ClassicationandExtractionPartOne(p.24.)coversfundamentalbackground.Specically,Section4.1providesadiscussionofkeyconceptsinvolvedincommonformulationsofclassicationproblemsinsentimentanalysisandopin-ionmining.FeaturesthathavebeenexploredforsentimentanalysistasksarediscussedinSection4.2.PartTwo(p.37.)isdevotedtoanin-depthdiscussionofdierenttypesofapproachestoclassication,regression,andrankingproblems.ThebeginningofPartTwoshouldbeconsultedforadetailedoutline,butitisappropriateheretoindicatehowwecoverextraction,sinceitplaysakeyroleinmanysentiment-orientedapplicationsandsosomereadersmaybeparticularlyinterestedinit.First,extractionproblems(e.g.,retrievingopinionsonvariousfea-turesofalaptop)areoftensolvedbycastingmanysub-problemsasclassicationproblems(e.g.,givenatextspan,determinewhetheritexpressesanyopinionatall).Therefore,ratherthanhaveasepa-ratesectiondevotedcompletelytotheentiretyoftheextractiontask,wehaveintegrateddiscussionofextraction-orientedclassicationsub-problemsintotheappropriateplacesinourdiscussionofdierenttypesofapproachestoclassicationingeneral(Sections4.3 4.8).Section4.9coversthoseremainingaspectsofextractionthatcanbethoughtofasdistinctfromclassication.Second,extractionisoftenameanstothefurthergoalofprovid-ingeectivesummariesoftheextractedinformationtousers.DetailsonhowtocombineinformationminedfrommultiplesubjectivetextsegmentsintoasuitablesummarycanbefoundinSection5.PartOne:Fundamentals4.1ProblemFormulationsandKeyConceptsMotivatedbydierentreal-worldapplications,researchershavecon-sideredawiderangeofproblemsoveravarietyofdierenttypesofcorpora.Wenowexaminethekeyconceptsinvolvedintheseproblems.Thisdiscussionalsoservesasaloosegroupingofthemajorproblems,whereeachgroupconsistsofproblemsthataresuitableforsimilartreatmentaslearningtasks. 4.1ProblemFormulationsandKeyConcepts4.1.1SentimentPolarityandDegreesofPositivityOnesetofproblemssharethefollowinggeneralcharacter:givenanopinionatedpieceoftext,whereinitisassumedthattheoverallopin-ioninitisaboutonesingleissueoritem,classifytheopinionasfallingunderoneoftwoopposingsentimentpolarities,orlocateitspositiononthecontinuumbetweenthesetwopolarities.Alargeportionofworkinsentiment-relatedclassication/regression/rankingfallswithinthiscategory.EguchiandLavrenko[84]pointoutthatthepolarityorpos-itivitylabelssoassignedmaybeusedsimplyforsummarizingthecon-tentofopinionatedtextunitsonatopic,whethertheybepositiveornegative,orforonlyretrievingitemsofagivensentimentorientation(say,positive).Thebinaryclassicationtaskoflabelinganopinionateddocumentasexpressingeitheranoverallpositiveoranoverallnegativeopin-ioniscalledsentimentpolarityclassicationpolarityclassicationAlthoughthisbinarydecisiontaskhasalsobeentermedsentimentclas-sicationintheliterature,asmentionedabove,inthissurveywewillusesentimentclassicationtoreferbroadlytobinarycategorization,multi-classcategorization,regression,and/orranking.Muchworkonsentimentpolarityclassicationhasbeenconductedinthecontextofreviews(e.g.,thumbsuporthumbsdownformoviereviews).Whileinthiscontextpositiveandnegativeopin-ionsareoftenevaluative(e.g.,likevs.dislike),thereareotherproblemswheretheinterpretationofpositiveandnegativeissub-tlydierent.Oneexampleisdeterminingwhetherapoliticalspeechisinsupportoforoppositiontotheissueunderdebate[27,294];arelatedtaskisclassifyingpredictiveopinionsinelectionforumsintolikelytowinandunlikelytowin[160].Sincetheseproblemsareallcon-cernedwithtwoopposingsubjectiveclasses,asmachinelearningtaskstheyareoftenamenabletosimilartechniques.Notethatanumberofotheraspectsofpoliticallyorientedtext,suchaswhetherliberalorconservativeviewsareexpressed,havebeenexplored;sincethelabelsusedinthoseproblemscanusuallybeconsideredpropertiesofasetofdocumentsrepresentingauthorsattitudesovermultipleissuesratherthanpositiveornegativesentimentwithrespecttoasingleissue,we ClassicationandExtractiondiscussthemunderadierentheadingfurtherbelow(viewpointsandperspectives,Section4.1.4).Theinputtoasentimentclassierisnotnecessarilyalwaysstrictlyopinionated.Classifyinganewsarticleintogoodorbadnewshasbeenconsideredasentimentclassicationtaskintheliterature[168].Butapieceofnewscanbegoodorbadnewswithoutbeingsubjective(i.e.,withoutbeingexpressiveoftheprivatestatesoftheauthor):forinstance,thestockpriceroseisobjectiveinformationthatisgenerallyconsideredtobegoodnewsinappropriatecontexts.Itisnotourmainintenttoprovideaclean-cutdenitionforwhatshouldbeconsideredsentimentpolarityclassicationproblems,butitisperhapsusefultopointoutthat(a)indeterminingthesentimentpolarityofopinionatedtextswheretheauthorsdoexplicitlyexpresstheirsentimentthroughstatementslikethislaptopisgreat,(arguably)objectiveinformationsuchaslongbatterylifeisoftenusedtohelpdeterminetheoverallsentiment;(b)thetaskofdeterminingwhetherapieceofobjectiveinformationisgoodorbadisstillnotquitethesameasclassifyingitintooneofseveraltopic-basedclasses,andhenceinheritsthechallengesinvolvedinsentimentanalysis;and(c)aswewilldiscussinmoredetaillater,thedistinctionbetweensubjectiveandobjectiveinformationcanbesubtle.Islongbatterylifeobjective?Alsoconsiderthedierencebetweenthebatterylasts2hoursvs.thebatteryonlylasts2hours.Relatedcategories.Analternativewayofsummarizingreviewsistoextractinformationonwhythereviewerslikedordislikedtheproduct.KimandHovy[158]notethatsuchproandconexpressionscandierfrompositiveandnegativeopinionexpressions,althoughthetwocon-ceptsopinion(Ithinkthislaptopisterric)andreasonforopinion(Thislaptoponlycosts$399)areforthepurposesofanalyzingevaluativetextstronglyrelated.Inadditiontopotentiallyformingthebasisfortheproductionofmoreinformativesentiment-orientedsummaries,identifyingproandconreasonscanpotentiallybeusedto Whileitisofutterimportancethattheproblemitselfshouldbewell-dened,itisofless,ifany,importancetodecidewhichtasksshouldbelabeledaspolarityclassicationWhetherthisshouldbeconsideredasanobjectivestatementmaybeupfordebate:onecanimagineanotherreviewerretorting,youcallthatbatterylife? 4.1ProblemFormulationsandKeyConceptshelpdecidethehelpfulnessofindividualreviews:evaluativejudgmentsthataresupportedbyreasonsarelikelytobemoretrustworthy.AnothertypeofcategorizationrelatedtodegreesofpositivityisconsideredbyNiuetal.[226],whoseektodeterminethepolarityofoutcomes(improvementvs.death,say)describedinmedicaltexts.Additionalproblemsrelatedtothedeterminationofdegreeofpos-itivitysurroundtheanalysisofcomparativesentences[139].ThemainideaisthatsentencessuchasThenewmodelismoreexpensivethantheoldoneorIpreferthenewmodeltotheoldmodelareimportantsourcesofinformationregardingtheauthorsevaluations.RatinginferenceordinalregressionThemoregeneralproblemofratinginference,whereonemustdeterminetheauthorsevaluationwithrespecttoamulti-pointscale(e.g.,onetovestarsforareview)canbeviewedsimplyasamulti-classtextcategorizationproblem.Predict-ingdegreeofpositivityprovidesmorene-grainedratinginformation;atthesametime,itisaninterestinglearningprobleminitself.Butincontrasttomanytopic-basedmulti-classclassicationproblems,sentiment-relatedmulti-classclassicationcanalsobenat-urallyformulatedasaregressionproblembecauseratingsareordinal.Itcanbearguedtoconstituteaspecialtypeof(ordinal)regressionproblembecausethesemanticsofeachclassmaynotsimplydirectlycorrespondtoapointonascale.Morespecically,eachclassmayhaveitsowndistinctvocabulary.Forinstance,ifweareclassifyinganauthorsevaluationintooneofthepositive,neutral,andnegativeclasses,anoverallneutralopinioncouldbeamixtureofpositiveandnegativelanguage,oritcouldbeidentiedwithsignaturewordssuchasmediocre.Thispresentsuswithinterestingopportunitiestoexploretherelationshipsbetweenclasses.Notethedierencebetweenratinginferenceandpredictingstrengthofopinion(discussedinSection4.1.2);forinstance,itispossibletofeelquitestrongly(highonthestrengthscale)thatsomethingismediocre(middlingontheevaluationscale).Also,notethatthelabelneutralissometimesusedasalabelfortheobjectiveclass(lackofopinion)intheliterature.Inthissurvey,weuseneutralonlyintheaforementionedsenseofasentimentthatliesbetweenpositiveandnegative. ClassicationandExtractionInterestingly,CabralandHorta¸csu[47]observethatneutralcom-mentsinfeedbacksystemsarenotnecessarilyperceivedbyusersaslyingattheexactmid-pointbetweenpositiveandnegativecomments;rather,theinformationcontainedinaneutralratingisperceivedbyuserstobemuchclosertonegativefeedbackthanpositive.Ontheotherhand,theyalsonotethatintheirdata,sellerswerelesslikelytoretaliateagainstneutralcomments,asopposedtonegatives:...abuyerleavinganegativecommenthasa40%chanceofbeinghitback,whileabuyerleavinganeutralcommentonlyhasa10%chanceofbeingretaliateduponbytheseller.Agreement.Theopposingnatureofpolarityclassesalsogivesrisetoexplorationofagreementdetection,e.g.,givenapairoftexts,decidingwhethertheyshouldreceivethesameordieringsentiment-relatedlabelsbasedontherelationshipbetweentheelementsofthepair.Thisisoftennotdenedasastandaloneproblembutconsideredasasub-taskwhoseresultisusedtoimprovethelabelingoftheopinionsheldbytheentitiesinvolved[272,294].Adierenttypeofagreementtaskhasalsobeenconsideredinthecontextofperspectives,where,forexample,alabelofconservativetendstoindicateagreementwithparticularpositionsonawidevarietyofissues.4.1.2SubjectivityDetectionandOpinionIdenticationWorkinpolarityclassicationoftenassumestheincomingdocumentstobeopinionated.Formanyapplications,though,wemayneedtodecidewhetheragivendocumentcontainssubjectiveinformationornot,oridentifywhichportionsofthedocumentaresubjective.Indeed,thisproblemwasthefocusofthe2006BlogtrackatTREC[227].Atleastoneopinion-trackingsystemratessubjectivityandsentimentseparately[108].Mihalceaetal.[209]summarizetheevidenceofsev-eralprojectsonsubsententialanalysis[12,90,289,319]asfollows:theproblemofdistinguishingsubjectiveversusobjectiveinstanceshasoftenprovedtobemoredicultthansubsequentpolarityclassication,soimprovementsinsubjectivityclassicationpromisetopositivelyimpactsentimentclassication. 4.1ProblemFormulationsandKeyConceptsEarlyworkbyHatzivassiloglouandWiebe[120]examinedtheeectsofadjectiveorientationandgradabilityonsentencesubjectiv-ity.Thegoalwastotellwhetheragivensentenceissubjectiveornotjudgingfromtheadjectivesappearinginthatsentence.Anumberofprojectsaddresssentence-levelorsub-sentence-levelsubjectivitydetec-tionindierentdomains[33,156,232,255,308,315,319,326].Wiebeetal.[316]presentacomprehensivesurveyofsubjectivityrecognitionusingdierentcluesandfeatures.Wilsonetal.[320]addresstheproblemofdeterminingclause-levelopinionstrength(e.g.,howmadareyou?).Notethattheproblemofdeterminingopinionstrengthisdierentfromratinginference.Classi-fyingapieceoftextasexpressinganeutralopinion(givingitamid-pointscore)forratinginferencedoesnotequalclassifyingthatpieceoftextasobjective(lackofopinion):onecanhaveastrongopinionthatsomethingismediocreorso-so.Recentworkalsoconsidersrelationsbetweenwordsensedisam-biguationandsubjectivity[307].Subjectivitydetectionorrankingatthedocumentlevelcanbethoughtofashavingitsrootsinstudiesingenreclassication(seeSection4.1.5formoredetail).Forinstance,YuandHatzivassiloglou[326]achievehighaccuracy(97%)withaNaiveBayesclassieronaparticularcorpusconsistingofWallStreetJournalarticles,wherethetaskistodistinguisharticlesunderNewsandBusiness(facts)fromarticlesunderEditorialandLettertotheEditor(opinions).(ThistaskwassuggestedearlierbyWiebeetal.[315],andasimilarcorpuswasexploredinpreviouswork[308,316].)Workinthisdirectionisnotlim-itedtothebinarydistinctionbetweensubjectiveandobjectivelabels.Recentworkincludestheresearchbyparticipantsinthe2006TRECBlogtrack[227]andothers[69,97,222,223,234,279,316,326].4.1.3JointTopic SentimentAnalysisOnesimplifyingassumptionsometimesmadebyworkondocument-levelsentimentclassicationisthateachdocumentunderconsiderationisfocusedonthesubjectmatterweareinterestedin.Thisisinpartbecauseonecanoftenassumethatthedocumentsetwascreated ClassicationandExtractionbyrstcollectingonlyon-topicdocuments(e.g.,byrstrunningatopic-basedquerythroughastandardsearchengine).However,itispossiblethatthereareinteractionsbetweentopicandopinionthatmakeitdesirabletoconsiderthetwosimultaneously;forexample,Rilofetal.[256]ndthattopic-basedtextlteringandsubjectivityl-teringarecomplementaryinthecontextofexperimentsininformationAlso,evenarelevantopinion-bearingdocumentmaycontaino-topicpassagesthattheusermaynotbeinterestedin,andsoonemaywishtodiscardsuchpassages.Anotherinterestingcaseiswhenadocumentcontainsmaterialonmultiplesubjectsthatmaybeofinteresttotheuser.Insuchaset-ting,itisusefultoidentifythetopicsandseparatetheopinionsasso-ciatedwitheachofthem.Twoexamplesofthetypesofdocumentsforwhichthiskindofanalysisisappropriateare(1)comparativestudiesofrelatedproducts,and(2)textsthatdiscussvariousfeatures,aspects,orattributes.4.1.4ViewpointsandPerspectivesMuchworkonanalyzingsentimentandopinionsinpoliticallyori-entedtextfocusesongeneralattitudesexpressedthroughtextsthatarenotnecessarilytargetedataparticularissueornarrowsubject.Forinstance,Grefenstetteetal.[112]experimentedwithdeterminingthepoliticalorientationofwebsitesessentiallybyclassifyingtheconcate-nationofallthedocumentsfoundonthatsite.Wegroupthistypeofworkundertheheadingofviewpointsandperspectives,andincludeunderthisrubricworkonclassifyingtextsasliberal,conservative,lib-ertarian,etc.[219],placingtextsalonganideologicalscale[178,202],orrepresentingIsraeliversusPalestinianviewpoints[186,187].Althoughbinaryor-aryclassicationmaybeused,here,theclassestypicallycorrespondnottoopinionsonasingle,narrowlydenedtopic,buttoacollectionofbundledattitudesandbeliefs.Thiscouldpotentiallyenabledierentapproachesfrompolarity Whenthecontextisclear,weoftenusethetermfeaturetorefertofeature,aspect,orattributeinthissurvey. 4.1ProblemFormulationsandKeyConceptsclassication.Ontheotherhand,ifwetreatthesetofdocumentsasameta-document,andthedierentissuesbeingdiscussedasmeta-features,thenthisproblemstillsharessomecommongroundwithpolarityclassicationoritsmulti-class,regression,andrankingvari-ants.Indeed,someoftheapproachesexploredintheliteratureforthesetwoproblemsindividuallycouldverywellbeadaptedtoworkforeitheroneofthem.Theotherpointofdeparturefromthepolarityclassicationproblemisthatthelabelsbeingconsideredaremoreaboutattitudesthatdonotnaturallycorrespondwithdegreeofpositivity.Whileassigning-simplelabelsremainsaclassicationproblem,ifwemovefartherandaimatservingmoreexpressiveandopen-endedopinionstotheuser,weneedtosolveextractionproblems.Forinstance,onemaybeinterestedinobtainingdescriptionsofopinionsofagreatercomplexitythansimplelabelsdrawnfromaverysmallset,i.e.,onemightbeseekingsomethingmorelikeachievingworldpeaceisdicultthanlikemildlypositive.Infact,muchofthepriorworkonperspectivesandviewpointsseekstoextractmoreperspective-relatedinformation(e.g.,opinionholders).Themotivationwastoenablemulti-perspectivequestionanswering,wheretheusercouldaskquestionssuchaswhatisMissAmericasperspectiveonworldpeace?ratherthanafact-basedquestion(e.g.,whoisthenewMissAmerica?).Naturally,suchworkisoftenframedinthecontextofextractionproblems,theparticularcharacteristicsofwhicharecoveredinSection4.9.4.1.5OtherNon-FactualInformationinTextResearchershaveconsideredvariousaecttypes,suchasthesixuniversalemotions[86]:anger,disgust,fear,happiness,sadness,andsurprise[192,9,285].Aninterestingapplicationisinhuman computerinteraction:ifasystemdeterminesthatauserisupsetorannoyed,forinstance,itcouldswitchtoadierentmodeofinteraction[188].Otherrelatedareasofresearchincludecomputationalapproachesforhumorrecognitionandgeneration[210].Manyinterestingaectualaspectsoftextlikehappinessormoodarealsobeingexploredinthecontextofinformaltextresourcessuchasweblogs[224].Potential ClassicationandExtractionapplicationsincludemonitoringlevelsofhatefulorviolentrhetoric,perhapsinmultilingualsettings[1].Inadditiontoclassicationbasedonaectandemotion,anotherrelatedareaofresearchthataddressesnon-topic-basedcategorizationisthatofdeterminingthegenreoftexts[97,98,150,153,182,277].Sincesubjectivegenres,suchaseditorial,areoftenoneofthepossiblecategories,suchworkcanbeviewedascloselyrelatedtosubjectivitydetection.Indeed,thisrelationhasbeenobservedinworkfocusedonlearningsubjectivelanguage[316].Therehasalsobeenresearchthatconcentratesonclassifyingdoc-umentsaccordingtotheirsourcesourcestyle,withstatisticallydetectedstylisticvariation[38]servingasanimportantcue.Author-shipidenticationisperhapsthemostsalientexampleMostellerandWallaces[216]classicBayesianstudyoftheauthorshipoftheFeder-alistPapersisonewell-knowninstance.Argamon-Engelsonetal.[18]considertherelatedproblemofidentifyingnottheparticularauthorofatext,butitspublisher(e.g.,theNewYorkTimesTheDaily);theworkofKessleretal.[153]ondeterminingadocumentsbrow(e.g.,high-browvs.popular,orlow-brow)hassimilargoals.Severalrecentworkshopshavebeendedicatedtostyleanalysisintext[15,16,17].Determiningstylisticcharacteristicscanbeusefulinfacetedsearch[10].Anotherproblemthathasbeenconsideredinintelligenceandsecu-ritysettingsisthedetectionofdeceptivelanguage[46,117,329].4.2FeaturesConvertingapieceoftextintoafeaturevectororotherrepresenta-tionthatmakesitsmostsalientandimportantfeaturesavailableisanimportantpartofdata-drivenapproachestotextprocessing.Thereisanextensivebodyofworkthataddressesfeatureselectionformachinelearningapproachesingeneral,aswellasforlearningapproachestai-loredtothespecicproblemsofclassictextcategorizationandinfor-mationextraction[101,263].Acomprehensivediscussionofsuchworkisbeyondthescopeofthissurvey.Inthissection,wefocusonndingsinfeatureengineeringthatarespecictosentimentanalysis. 4.2Features4.2.1TermPresencevs.FrequencyItistraditionalininformationretrievaltorepresentapieceoftextasafeaturevectorwhereintheentriescorrespondtoindividualterms.Oneinuentialndinginthesentiment-analysisareaisasfollows.TermfrequencieshavetraditionallybeenimportantinstandardIR,asthepopularityoftf-idfweightingshows;butincontrast,Pangetal.[235]obtainedbetterperformanceusingpresenceratherthanfrequency.Thatis,binary-valuedfeaturevectorsinwhichtheentriesmerelyindi-catewhetheratermoccurs(value1)ornot(value0)formedamoreeectivebasisforreviewpolarityclassicationthandidreal-valuedfeaturevectorsinwhichentryvaluesincreasewiththeoccurrencefre-quencyofthecorrespondingterm.Thisndingmaybeindicativeofaninterestingdierencebetweentypicaltopic-basedtextcategorizationandpolarityclassication:Whileatopicismorelikelytobeempha-sizedbyfrequentoccurrencesofcertainkeywords,overallsentimentmaynotusuallybehighlightedthroughrepeateduseofthesameterms.(WediscussedthispointpreviouslyinSection3.2onfactorsthatmakeopinionminingdicult.)Onarelatednote,hapaxlegomena,orwordsthatappearasingletimeinagivencorpus,havebeenfoundtobehigh-precisionindicatorsofsubjectivity[316].Yangetal.[322]lookatraretermsthatarenotlistedinapre-existingdictionary,onthepremisethatnovelversionsofwords,suchasbugfested,mightcorrelatewithemphasisandhencesubjectivityinblogs.4.2.2Term-basedFeaturesBeyondTermUnigramsPositioninformationndsitswayintofeaturesfromtimetotime.Thepositionofatokenwithinatextualunit(e.g.,inthemiddlevs.neartheendofadocument)canpotentiallyhaveimportanteectsonhowmuchthattokenaectstheoverallsentimentorsubjectivitystatusoftheenclosingtextualunit.Thus,positioninformationissometimesencodedintothefeaturevectorsthatareemployed[158,235].Whetherhigher-order-gramsareusefulfeaturesappearstobeamatterofsomedebate.Forexample,Pangetal.[235]reportthatuni-gramsoutperformbigramswhenclassifyingmoviereviewsbysentiment ClassicationandExtractionpolarity,butDaveetal.[69]ndthatinsomesettings,bigramsandtrigramsyieldbetterproduct-reviewpolarityclassication.Riloetal.[254]exploretheuseofasubsumptionhierarchytoformallydenedierenttypesoflexicalfeaturesandtherelationshipsbetweentheminordertoidentifyusefulcomplexfeaturesforopinionanalysis.Airoldietal.[5]applyaMarkovBlanketClassiertothisproblemtogetherwithameta-heuristicsearchstrategycalledTabusearchtoarriveatadependencystructureencodingaparsimoniousvocabularyforthepositiveandnegativepolarityclasses.Thecontrastivedistancebetweentermsanexampleofahigh-contrastpairofwordsintermsoftheimplicitevaluationpolaritytheyexpressisdeliciousanddirtywasusedasanautomaticallycomputedfeaturebySnyderandBarzilay[272]aspartofarating-inferencesystem.4.2.3PartsofSpeechPart-of-speech(POS)informationiscommonlyexploitedinsentimentanalysisandopinionmining.Onesimplereasonholdsforgeneraltex-tualanalysis,notjustopinionmining:part-of-speechtaggingcanbeconsideredtobeacrudeformofwordsensedisambiguation[318].Adjectiveshavebeenemployedasfeaturesbyanumberofresearchers[217,303].Oneoftheearliestproposalsforthedata-drivenpredictionofthesemanticorientationofwordswasdevelopedforadjectives[119].Subsequentworkonsubjectivitydetectionrevealedahighcorrelationbetweenthepresenceofadjectivesandsentencesubjectivity[120].Thisndinghasoftenbeentakenasevidencethat(certain)adjectivesaregoodindicatorsofsentiment,andsometimeshasbeenusedtoguidefeatureselectionforsentimentclassication,inthatanumberofapproachesfocusonthepresenceorpolarityofadjectiveswhentryingtodecidethesubjectivityorpolaritystatusoftextualunits,especiallyintheunsupervisedsetting.Ratherthanfocus-ingonisolatedadjectives,Turney[298]proposedtodetectdocumentsentimentbasedonselectedphrases,wherethephrasesarechosenviaanumberofpre-speciedpart-of-speechpatterns,mostincludinganadjectiveoranadverb. 4.2FeaturesThefactthatadjectivesaregoodpredictorsofasentencebeingsubjectivedoesnot,however,implythatotherpartsofspeechdonotcontributetoexpressionsofopinionorsentiment.Infact,inastudybyPangetal.[235]onmovie-reviewpolarityclassication,usingonlyadjectivesasfeatureswasfoundtoperformmuchworsethanusingthesamenumberofmostfrequentunigrams.Theresearcherspointoutthatnouns(e.g.,gem)andverbs(e.g.,love)canbestrongindica-torsforsentiment.Riloetal.[257]specicallystudiedtheextractionofsubjectivenouns(e.g.,concern,hope)viabootstrapping.Therehavebeenseveraltargetedcomparisonsoftheeectivenessofadjec-tives,verbs,andadverbs,wherefurthersubcategorizationoftenplaysarole[34,221,316].4.2.4SyntaxTherehavealsobeenattemptsatincorporatingsyntacticrelationswithinfeaturesets.Suchdeeperlinguisticanalysisseemsparticularlyrelevantwithshortpiecesoftext.Forinstance,KudoandMatsumoto[173]reportthatfortwosentence-levelclassicationtasks,sentimentpolarityclassicationandmodalityidentication(opinion,asser-tion,ordescription),asubtree-basedboostingalgorithmusingdependency-tree-basedfeaturesoutperformedthebag-of-wordsbase-line(althoughtherewerenosignicantdierenceswithrespecttousing-gram-basedfeatures).Nonetheless,theuseofhigher-orderanddependencyorconstituent-basedfeatureshasalsobeenconsid-eredfordocument-levelclassication;Daveetal.[69]ontheonehandandGamon[103],Matsumotoetal.[204],andNgetal.[222]ontheotherhandcometooppositeconclusionsregardingtheeectivenessofdependencyinformation.Parsingthetextcanalsoserveasabasisformodelingvalenceshifterssuchasnegation,intensiers,anddiminishers[152].Collocationsandmorecomplexsyntacticpatternshavealsobeenfoundtobeusefulforsubjectivitydetection[255,316].4.2.5NegationHandlingnegationcanbeanimportantconcerninopinion-andsentiment-relatedanalysis.Whilethebag-of-wordsrepresentations ClassicationandExtractionofIlikethisbookandIdontlikethisbookareconsideredtobeverysimilarbymostcommonly-usedsimilaritymeasures,theonlydieringtoken,thenegationterm,forcesthetwosentencesintooppositeclasses.TheredoesnotreallyexistaparallelsituationinclassicIRwhereasinglenegationtermcanplaysuchaninstrumentalroleinclassication(exceptincaseslikethisdocumentisaboutcarsvs.thisdocumentisnotaboutcars).Itispossibletodealwithnegationsindirectlyasasecond-orderfeatureofatextsegment,thatis,whereaninitialrepresentation,suchasafeaturevector,essentiallyignoresnegation,butthatrepresentationisthenconvertedintoadierentrepresentationthatisnegation-aware.Alternatively,aswasdoneinpreviouswork,negationcanbeencodeddirectlyintothedenitionsoftheinitialfeatures.Forexample,DasandChen[66]proposeattachingNOTtowordsoccurringclosetonegationtermssuchasnoordont,sothatinthesentenceIdontlikedeadlines,thetokenlikeisconvertedintothenewtokenlike-NOT.However,notallappearancesofexplicitnegationtermsreversethepolarityoftheenclosingsentence.Forinstance,itisincorrecttoattachNOTtobestinNowonderthisisconsideredoneofthebest.Naetal.[220]attempttomodelnegationmoreaccurately.Theylookforspecicpart-of-speechtagpatterns(wherethesepatternsdierfordierentnegationwords),andtagthecompletephraseasanegationphrase.Fortheirdatasetofelectronicsreviews,theyobserveabout3%improvementinaccuracyresultingfromtheirmodelingofnegations.Furtherimprovementprobablyneedsdeeper(syntactic)analysisofthesentence[152].Anotherdicultywithmodelingnegationisthatnegationcanoftenbeexpressedinrathersubtleways.Sarcasmandironycanbequitediculttodetect,butevenintheabsenceofsuchsophisticatedrhetoricaldevices,westillseeexamplessuchas[it]avoidsallclich´andpredictabilityfoundinHollywoodmovies(internetreviewbyMargie24)thewordavoidhereisanarguablyunexpectedpolarityreverser.Wilsonetal.[319]discussothercomplexnegation 4.2Features4.2.6Topic-OrientedFeaturesInteractionsbetweentopicandsentimentplayanimportantroleinopinionmining.Forexample,inahypotheticalarticleonWal-mart,thesentencesWal-martreportsthatprotsroseandTargetreportsthatprotsrosecouldindicatecompletelydierenttypesofnews(goodvs.bad)regardingthesubjectofthedocument,Wal-mart[116].Tosomeextent,topicinformationcanbeincorporatedintofeatures.MullenandCollier[217]examinetheeectivenessofvariousfeaturesbasedontopic(e.g.,theytakeintoaccountwhetheraphrasefollowsareferencetothetopicunderdiscussion)undertheexperimentalcondi-tionthattopicreferencesaremanuallytagged.Thus,forexample,inareviewofaparticularworkofartormusic,referencestotheitemreceiveaTHIS WORKtag.Fortheanalysisofpredictiveopinions(e.g.,whetheramessagewithrespecttopartytowin),KimandHovy[160]proposetoemployfeaturegeneralization.Specically,foreachsentenceineachpartynameandcandidatenameisreplacedbyPARTY(i.e.,orOTHER(not).PatternssuchasPARTYwillwin,goPARTYagain,andOTHERwillwinarethenextractedas-gramfeatures.Thisschemeoutperformsusingsimple-gramfeaturesbyabout10%inaccuracywhenclassifyingwhichpartyagivenmessagepredictstowin.Topic sentimentinteractionhasalsobeenmodeledthroughparsetreefeatures,especiallyinopinionextractiontasks.Relationshipsbetweencandidateopinionphrasesandthegivensubjectinadepen-dencytreecanbeusefulinsuchsettings[244].PartTwo:ApproachesTheapproacheswewillnowdiscussallsharethecommonthemeofmappingagivenpieceoftext,suchasadocument,paragraph,orsentence,toalabeldrawnfromapre-speciednitesetortoarealnumber.AsdiscussedinSection4.1,opinion-orientedclassicationcanrangefromsentiment-polaritycategorizationinreviewstodetermining However,unlikeclassicationandregression,rankingdoesnotrequiresuchamappingforeachindividualdocument. ClassicationandExtractionthestrengthofopinionsinnewsarticlestoidentifyingperspectivesinpoliticaldebatestoanalyzingmoodinblogs.Partofwhatispar-ticularlyinterestingabouttheseproblemsisthenewchallengesandopportunitiesthattheypresenttous.Intheremainderofthissection,weexaminedierentsolutionsproposedintheliteraturetotheseprob-lems,looselyorganizedarounddierentaspectsofmachinelearningapproaches.Althoughtheseaspectsmayseemtobegeneralthemesunderlyingmostmachinelearningproblems,weattempttohighlightwhatisuniqueforsentimentanalysisandopinionminingtasks.Forinstance,someunsupervisedlearningapproachesfollowasentiment-specicparadigmforhowlabelsforwordsandphrasesareobtained.Also,supervisedandsemi-supervisedlearningapproachesforopinionminingandsentimentanalysisdierfromstandardapproachestoclas-sicationtasksinpartduetothedierentfeaturesinvolved;butwealsoseeagreatvarietyofattemptsatmodelingvariouskindsofrela-betweenitems,classes,orsub-documentunits.Someoftheserelationshipsareuniquetoourtasks;somebecomemoreimperativetomodelduetothesubtletiesoftheproblemsweaddress.Therestofthissectionisorganizedasfollows.Section4.3coverstheimpactthattheincreasedavailabilityoflabeleddatahashad,includingtheriseofsupervisedlearning.Section4.4considersissuessurround-ingtopicanddomaindependencies.Section4.5describesunsupervisedapproaches.Wenextconsiderincorporatingrelationshipsbetweenvar-ioustypesofentities(Section4.6).Thisisfollowedbyasectiononincorporatingdiscoursestructure(4.7).Section4.8isconcernedwiththeuseoflanguagemodels.Finally,Section4.9investigatescertainissuesinextractionthataresomewhatparticulartoit,andthusarenototherwisediscussedinthesectionsthatprecedeit.Onesuchissueistheidenticationoffeaturesandexpressionsofopinionsinreviews.Anothersetofissuesarisewhenopinion-holderidenticationneedstobeapplied.4.3TheImpactofLabeledDataWorkuptotheearly1990sonsentiment-relatedtasks,suchasdeter-minationofpointofviewandothertypesofcomplexrecognition 4.3TheImpactofLabeledDataproblems,generallyassumedtheexistenceofsub-systemsforsome-timesrathersophisticatedNLPtasks,rangingfromparsingtotheres-olutionofpragmaticambiguities[121,262,310,311,313].GiventhestateoftheartofNLPatthetimeand,justasimportantly,thelackofsucientamountsofappropriatelabeleddata,theresearchdescribedintheseearlypapersnecessarilyconsideredonlyproposalsforsystemsorprototypesystemswithoutlarge-scaleempiricalevaluation;typi-cally,nolearningcomponentwasinvolved(aninterestingexceptionisWiebeandBruce[306],whoproposedbutdidnotevaluatetheuseofdecomposablegraphicalmodels).Operationalsystemswerefocusedonsimplerclassicationtasks,relativelyspeaking(e.g.,categorizationaccordingtoaect),andreliedinsteadonrelativelyshallowanalysisbasedonmanuallyconstructeddiscriminant-wordlexicons[133,296],sincewithsuchalexiconinhand,onecanclassifyatextunitbycon-sideringwhichindicatortermsorphrasesfromthelexiconappearinthegiventext.Theriseofthewidespreadavailablitytoresearchersoforganizedcollectionsofopinionateddocuments(twoexamples:nancial-newsdiscussionboardsandreviewaggregationsitessuchasEpinions)andofothercorporaofmoregeneraltexts(e.g.,newswire)andofotherresources(e.g.,WordNet)wasamajorcontributortoalargeshiftindirectiontowarddata-drivenapproaches.Tobeginwith,theavailabilityoftherawtextsthemselvesmadeitpossibletolearnopinion-relevantlexiconsinanunsupervisedfashion,asisdiscussedinmoredetailinSection4.5.1,ratherthancreatethemmanually.Buttheincreaseintheamountoflabeledsentiment-relevantdata,inparticularwherethelabelsarederivedeitherthroughexplicitresearcher-initiatedmanualannotationeortsorbyothermeans(seeSection7.1.1)wasamajorcontributingfactortoactivityinbothsupervisedandunsupervisedlearning.Intheunsupervisedcase,describedinSection4.5,itfacili-tatedresearchbymakingitpossibletoevaluateproposedalgorithmsinalarge-scalefashion.Unsupervised(andsupervised)learningalsobenettedfromtheimprovementstosub-componentsystemsfortagging,parsing,andsoonthatoccurredduetotheapplicationofdata-driventechniquesinthoseareas.And,ofcourse,theimportancetosupervisedlearningofhavingaccesstolabeleddataisparamount. ClassicationandExtractionOneveryactivelineofworkcanberoughlyglossedastheappli-cationofstandardtext-categorizationalgorithms,surveyedbySebas-tiani[263],toopinion-orientedclassicationproblems.Forexample,Pangetal.[235]compareNaiveBayes,SupportVectorMachines,andmaximum-entropy-basedclassicationonthesentiment-polarityclassicationproblemformoviereviews.Moreextensivecomparisonsoftheperformanceofstandardmachinelearningtechniqueswithothertypesoffeaturesorfeatureselectionschemeshavebeenengagedinlaterwork[5,69,103,204,217];seeSection4.2formoredetail.Wenotethattherehasbeensomeresearchthatexplicitlyconsidersregressionorordinal-regressionformulationsofopinion-miningprob-lems[109,201,233,320]:examplequestionsinclude,howpositiveisthistext?andhowstronglyheldisthisopinion?Anotherrolethatlabeleddatacanplayisinlexiconinduction,although,asdetailedinSection4.5.1,theuseoftheunsupervisedparadigmismorecommon.Morinagaetal.[215]andBethardetal.[37]createanopinion-indicatorlexiconbylookingfortermsthattendtobeassociatedmorehighlywithsubjective-genrenewswire,suchaseditorials,thanwithobjective-genrenewswire.DasandChen[66,67]startwithamanuallycreatedlexiconspecictothenancedomain(exampleterms:bull,bear),butthenassigndiscriminationweightstotheitemsinthelexiconbasedontheircooccurrencewithpositivelylabeledvs.negativelylabeleddocuments.Othertopicsrelatedtosupervisedlearningarediscussedinsomeofthemorespecicsectionsthatfollow.4.4DomainAdaptationandTopic-SentimentInteraction4.4.1DomainConsiderationsTheaccuracyofsentimentclassicationcanbeinuencedbythedomainoftheitemstowhichitisapplied[21,40,88,249,298].Onereasonisthatthesamephrasecanindicatedierentsentimentindierentdomains:recalltheBobBlandexamplementionedear-lier,wheregoreadthebookmostlikelyindicatespositivesenti-mentforbookreviews,butnegativesentimentformoviereviews;orconsiderTurneys[298]observationthatunpredictableisapositive 4.4DomainAdaptationandTopic-SentimentInteractiondescriptionforamovieplotbutanegativedescriptionforacarssteer-ingabilities.Dierenceinvocabulariesacrossdierentdomainsalsoaddstothedicultywhenapplyingclassierstrainedonlabeleddatainonedomaintotestdatainanother.Severalstudiesshowconcreteperformancedierencesfromdomaintodomain.Inanexperimentauxiliarytotheirmainwork,Daveetal.[69]applyaclassiertrainedonapre-assembleddatasetofreviewsofacertaintypetoproductreviewsofadierenttype.Buttheydonotinvestigatetheeectoftraining-testmis-matchindetail.Engstr¨om[88]studieshowtheaccuracyofsentimentclassicationcanbeinuencedbytopic.Read[249]ndsstandardmachinelearningtechniquesforopinionanalysistobebothdomain-dependent(withdomainsrangingfrommoviereviewstonewswirearticles)andtemporallydependent(basedondatasetsspanningdierentrangesoftimeperiodsbutwrittenatleastoneyearapart).Owsleyetal.[229]alsoshowtheimportanceofbuildingadomain-specicclassier.AueandGamon[21]exploredierentapproachestocustomizingasentimentclassicationsystemtoanewtargetdomainintheabsenceoflargeamountsoflabeleddata.Thedierenttypesofdatatheyconsiderrangefromlengthymoviereviewstoshort,phrase-leveluserfeedbackfromwebsurveys.Duetosignicantdierencesinthesedomainsalongseveraldimensions,simplyapplyingtheclassierlearnedondatafromonedomainbarelyoutperformsthebaselineforanotherdomain.Infact,with100or200labeleditemsinthetargetdomain,anEMalgorithmthatutilizesin-domainunlabeleddataandignoresout-of-domaindataaltogetheroutperformsthemethodbasedexclusivelyon(bothin-andout-of-domain)labeleddata.Yangetal.[321]takethefollowingsimpleapproachtodomaintransfer:theyndfeaturesthataregoodsubjectivityindicatorsinbothoftwodierentdomains(intheircase,moviereviewsversusproductreviews),andconsiderthesefeaturestobegooddomain-independentBlitzeretal.[40]explicitlyaddressthedomaintransferprob-lemforsentimentpolarityclassicationbyextendingthestructuralcorrespondencelearningalgorithm)[11],achievinganaverageof46%improvementoverasupervisedbaselineforsentimentpolarity ClassicationandExtractionclassicationof5dierenttypesofproductreviewsminedfromAma-zon.com.ThesuccessofSCLdependsonthechoiceofinbothdomains,basedonwhichthealgorithmlearnsaprojectionmatrixthatmapsfeaturesinthetargetdomainintothefeaturespaceofthesourcedomain.UnlikepreviousworkthatappliedSCLfortag-ging,wherefrequentwordsinbothdomainshappenedtobegoodpredictorsforthetargetlabels(part-of-speechtags),andwerethere-foregoodcandidatesforpivots,herethepivotsarechosenfromthosewithhighestmutualinformationwiththesourcelabel.Theprojec-tionisabletocapturecorrespondences(intermsofexpressedsenti-mentpolarity)betweenpredictableforbookreviewsandpoorlydesignedforkitchenappliancereviews.Furthermore,theyalsoshowthatameasureofdomainsimilaritycancorrelatewellwiththeeaseofadaptationfromonedomaintoanother,therebyenablingbetterschedulingofannotationeorts.Cross-lingualadaptation.Muchoftheliteratureonsentimentanal-ysishasfocusedontextwritteninEnglish.Asaresult,mostoftheresourcesdeveloped,suchaslexicawithsentimentlabels,areinEnglish.Adaptingsuchresourcestootherlanguagesisrelatedtodomainadap-tation:theformeraimsatadaptingfromthesourcelanguagetothetargetlanguageinordertoutilizeexistingresourcesinthesourcelan-guage;whereasthelatterseekstoadaptfromonedomaintoanotherinordertoutilizethelabeleddataavailableinthesourcedomain.Notsurprisingly,weobserveparalleltechniques:insteadofprojectingunseentokensfromthenewdomainintotheoldoneviaco-occurrenceinformationinthecorpus[40],expressionsinthenewlanguagecanbealignedwithexpressionsinthelanguagewithexistingresources.Forinstance,onecandeterminecross-lingualprojectionsthroughbilin-gualdictionaries[209],orparallelcorpora[159,209].Alternatively,onecansimplyapplymachinetranslationasasentiment-analysispre-processingstep[32].4.4.2Topic(andsub-topicorfeature)ConsiderationsEvenwhenoneishandlingdocumentsinthesamedomain,thereisstillanimportantandrelatedsourceofvariation:documenttopic.Itis 4.4DomainAdaptationandTopic-SentimentInteractiontruethatsometimesthetopicispre-determined,suchasinthecaseoffree-formresponsestosurveyquestions.However,inmanysentimentanalysisapplications,topicisanotherimportantconsideration;forinstance,onemaybesearchingtheblogospherejustforopinionatedcommentsaboutCornellUniversity.Oneapproachtointegratingsentimentandtopicwhenoneislookingforopinionateddocumentsonaparticularuser-speciedtopicistosimplyrstperformoneanalysispass,sayfortopic,andthenana-lyzetheresultswithrespecttosentiment[134].(SeeSebastiani[263]forasurveyofmachinelearningapproachestotopic-basedtextcatego-rization.)Suchatwo-passapproachwastakenbyanumberofsystemsattheTRECBlogtrackin2006,accordingtoOunisetal.[227],andothers[234].Alternatively,onemayjointlymodeltopicandsentimentsimultaneously[84,206],ortreatoneasapriorfortheother[85].Buteveninthecasewhereoneisworkingwithdocumentsknowntobeon-topic,notallthesentenceswithinthesedocumentsneedtobeon-topic.HurstandNigam[134,225]proposeatwo-passprocesssimilartothatmentionedabove,whereeachsentenceinthedocumentisrstlabeledason-topicoro-topic,andsentimentanalysisisconductedonlyforthosethatarefoundtobeon-topic.Theirworkreliesonacollocationassumptionthatifasentenceisfoundtobetopicalandtoexhibitasentimentpolarity,thenthepolarityisexpressedwithrespecttothetopicinquestion.ThisassumptionisalsousedbyNasukawaandYi[221]andGamon[103].Arelatedissueisthatitisalsopossibleforadocumenttocontainmultipletopics.Forinstance,areviewcanbeacomparisonoftwoprod-ucts.Or,evenwhenasingleitemisdiscussedinadocument,onecanconsiderfeaturesoraspectsoftheproducttorepresentmultiple(sub-)topics.Ifallbutthemaintopiccanbedisregarded,thenonepossibil-ityisasfollows:simplyconsidertheoverallsentimentdetectedwithinthedocumentregardlessofthefactthatitmaybeformedfromamixtureofopinionsondierenttopicstobeassociatedwiththeprimarytopic,leavingthesentimenttowardothertopicsundetermined(indeed,theseothertopicsmayneverbeidentied).Butitismorecommontotrytoidentifythetopicsandthendeterminetheopinionsregardingeachofthesetopicsseparately.Insomework,theimportant ClassicationandExtractiontopicsarepre-dened,makingthistaskeasier[323].Inotherworkinextraction,thisisnotthecase;theproblemoftheidenticationofproductfeaturesisaddressedinSection4.9,andSection4.6.3discussestechniquesthatincorporaterelationshipsbetweendierentfeatures.4.5UnsupervisedApproaches4.5.1UnsupervisedLexiconInductionQuiteanumberofunsupervisedlearningapproachestakethetackofrstcreatingasentimentlexiconinanunsupervisedmanner,andthendeterminingthedegreeofpositivity(orsubjectivity)ofatextunitviasomefunctionbasedonthepositiveandnegative(orsimplysubjective)indicators,asdeterminedbythelexicon,withinit.EarlyexamplesofsuchanapproachincludeHatzivassiloglouandWiebe[120],Turney[298],andYuandHatzivassiloglou[326].Someinterestingvariantsofthisgeneraltechniquearetousethepolarityoftheprevioussentenceasatie-breakerwhenthescoringfunctiondoesnotindicateadenitiveclassicationofagivensentence[130],ortoincorporateinformationdrawnfromsomelabeleddataaswell[33].Acrucialcomponenttoapplyingthistypeoftechniqueis,ofcourse,thecreationofthelexiconviatheunsupervisedlabelingofwordsorphraseswiththeirsentimentpolarity(alsoreferredtoassemanticori-intheliterature)orsubjectivitystatus[12,45,89,90,91,92,119,130,143,146,257,286,288,289,290,299,303,304].Inearlywork,HatzivassiloglouandMcKeown[119]presentanapproachbasedonlinguisticheuristics.Theirtechniqueisbuiltonthefactthatinthecaseofpolarityclassication,thetwoclassesofinterestrepresentopposites,andwecanutilizeoppositionconstraintstohelpmakelabelingdecisions.Specically,constraintsbetweenpairsofadjectivesareinducedfromalargecorpusbylookingatwhetherthetwowordsarelinkedbyconjunctionssuchasbut(evidenceforopposingorientations:elegantbutover-priced)orand(evidenceforthesameorientation:cleverandinformative).Thetaskisthencastasaclusteringorbinary-partitioningproblemwheretheinferredconstraintsaretobeobeyed. Forthepurposesofthecurrentdiscussion,weignorethesupervisedaspectsoftheirwork. 4.5UnsupervisedApproachesOncetheclusteringhasbeencompleted,thelabelsofpositiveorientationandnegativeorientationneedtobeassigned;ratherthanuseexternalinformationtomakethisdecision,HatzivassiloglouandMcKeown[119]simplygivethepositiveorientationlabeltotheclasswhosemembershavethehighestaveragefrequency.Butinotherwork,seedwordsforwhichthepolarityisalreadyknownareassumedtobesupplied,inwhichcaselabelscanbedeterminedbypropagatingthelabelsoftheseedwordstotermsthatco-occurwiththemingeneraltextorindictionaryglosses,ortosynonyms,wordsthatco-occurwiththeminotherWordNet-denedrelations,orotherrelatedwords(and,alongthesamelines,oppositelabelscanbegivenbasedonsimilarinforma-tion)[12,20,89,90,130,146,148,155,288,298,299].Thejointuseofmutualinformationandco-occurrenceinageneralcorpuswithasmallsetofseedwords,atechniqueemployedbyanumberofresearchers,wassuggestedbyTurney[298];hisideawastoessentiallycomparewhetheraphrasehasagreatertendencytoco-occurwithincertaincontextwin-dowswiththewordpoororwiththewordexcellent,takingcaretoaccountforthefrequencieswithwhichpoorandexcellentoccur,wherethedataonwhichsuchcomputationsaretobemadecomefromtheresultsofparticulartypesofWebsearch-enginequeries.Muchoftheworkcitedabovefocusesonidentifyingthepriorpolar-oftermsorphrases,tousetheterminologyofWilsonetal.[319],orwhatwemightbyextensioncalltermsandphrasespriorsubjectivity,meaningthesemanticorientationthattheseitemsmightbesaidtogenerallybearwhentakenoutofcontext.Suchpriorinformationismeant,ofcourse,toservetowardfurtheridentifyingcontextualpolarityorsubjectivity[242,319].Lexiconsforgeneration.ItisworthnotingthatHigashinakaetal.[122]focusonalexicon-inductiontaskthatfacilitatesnaturallanguagegeneration.Theyconsidertheproblemoflearningadictionarythatmapssemanticrepresentationstoverbalizations,wherethedatacomesfromreviews.Althoughreviewsarenotexplicitlymarkedupwithrespecttotheirsemantics,theydocontainexplicitratingandaspectindicators.Forexample,fromsuchdata,theylearnthatonewaytoexpresstheconceptatmosphererating:5isniceandcomfortable. ClassicationandExtraction4.5.2OtherUnsupervisedApproachesBootstrappingisanotherapproach.Theideaistousetheoutputofanavailableinitialclassiertocreatelabeleddata,towhichasupervisedlearningalgorithmmaybeapplied.RiloandWiebe[255]usethismethodinconjunctionwithaninitialhigh-precisionclassiertolearnextractionpatternsforsubjectiveexpressions.(Aninteresting,ifsimple,patterndiscovered:thenounfact,asinThefactis...,exhibitshighcorrelationwithsubjectivity.)KajiandKitsuregawa[142]useasimilarmethodtoautomaticallyconstructacorpusofHTMLdocumentswithpolaritylabels.Similarworkinvolvingself-trainingisdescribedinWiebeandRilo[314]andRiloetal.[257].PangandLee[234]experimentwithadierenttypeofunsuper-visedapproach.Theproblemtheyconsideristoranksearchresultsforreview-seekingqueriessothatdocumentsthatcontainevaluativetextareplacedaheadofthosethatdonot.Theyproposeasimpleblankslatemethodbasedontherarityofwordswithinthesearchresultsthatareretrieved(asopposedtowithinatrainingcorpus).Theintuitionisthatwordsthatappearfrequentlywithinthesetofdocu-mentsreturnedforanarrowtopic(thesearchset)aremorelikelytodescribeobjectiveinformation,sinceobjectiveinformationshouldtendtoberepeatedwithinthesearchset;incontrast,itwouldseemthatpeoplesopinionsandhowtheyexpressthemmaydier.Counterin-tuitively,though,PangandLeendthatwhenthevocabularytobeconsideredisrestrictedtothemostfrequentwordsinthesearchset(asanoise-reductionmeasure),thesubjectivedocumentstendtobethosethatcontainahigherpercentageofwordsthatarerare,perhapsduetothefactthatmostreviewscoverthemainfeaturesoraspectsoftheobjectbeingreviewed.(Thisechoesourpreviousobservationthatunderstandingtheobjectiveinformationinadocumentcanbecrit-icalforunderstandingtheopinionsandsentimentitexpresses.)Theperformanceofthissimplemethodisonparwiththatofamethodbasedonastate-of-the-artsubjectivitydetectionsystem,Opinion-Finder[255,314].AcomparisonofsupervisedandunsupervisedmethodscanbefoundinChaovalitandZhou[55]. 4.6ClassicationBasedonRelationshipInformation4.6ClassicationBasedonRelationshipInformation4.6.1RelationshipsBetweenSentencesandBetweenDocumentsOneinterestingcharacteristicofdocument-levelsentimentanalysisisthefactthatadocumentcanconsistofsub-documentunits(para-graphsorsentences)withdierent,sometimesopposinglabels,wheretheoverallsentimentlabelforthedocumentisafunctionofthesetorsequenceoflabelsatthesub-documentlevel.Asanalternativetotreatingadocumentasabagoffeatures,then,therehavebeenvar-iousattemptstomodelthestructureofadocumentviaanalysisofsub-documentunits,andtoexplicitlyutilizetherelationshipsbetweentheseunits,inordertoachieveamoreaccurategloballabeling.Mod-elingtherelationshipsbetweenthesesub-documentunitsmayleadtobettersub-documentlabelingaswell.Anopinionatedpieceoftextcanoftenconsistofevaluativeportions(thosethatcontributetotheoverallsentimentofthedocument,e.g.,thisisagreatmovie)andnon-evaluativeportions(e.g.,thePower-pugirlslearnedthatwithgreatpowercomesgreatresponsibility).Theoverlapbetweenthevocabularyusedforevaluativeportionsandnon-evaluativeportionsmakesitparticularlyimportanttomodelthecontextinwhichthesetextsegmentsoccur.PangandLee[232]pro-poseatwo-stepprocedureforpolarityclassicationformoviereviews,whereintheyrstdetecttheobjectiveportionsofadocument(e.g.,plotdescriptions)andthenapplypolarityclassicationtotheremain-derofthedocumentaftertheremovalofthesepresumablyuninforma-tiveportions.Importantly,insteadofmakingthesubjective objectivedecisionforeachsentenceindividually,theypostulatethattheremightbeacertaindegreeofcontinuityinsubjectivitylabels(anauthorusu-allydoesnotswitchtoofrequentlybetweenbeingsubjectiveandbeingobjective),andincorporatethisintuitionbyassigningpreferencesforpairsofnearbysentencestoreceivesimilarlabels.Allthesentencesinthedocumentarethenlabeledasbeingeithersubjectiveorobjectivethroughacollectiveclassicationprocess,wherethisprocessemploysareformulationofthetaskasoneofndingaminimums-tcutintheappropriategraph[165].Twokeypropertiesofthisapproachare(1)it ClassicationandExtractionaordsthendingofansolutiontotheunderlyingoptimizationproblemviaanalgorithmthatisecientbothintheoryandinprac-tice,and(2)itmakesiteasytointegrateawidevarietyofknowledgesourcesbothaboutindividualpreferencesthatitemsmayhaveforoneortheotherclassandaboutthepair-wisepreferencesthatitemsmayhaveforbeingplacedinthesameclassregardlessofwhichparticularclassthatis.Follow-upworkhasusedalternatetechniquestodetermineedgeweightswithinaminimum-cutframeworkforvarioustypesofsentiment-relatedbinaryclassicationproblemsatthedocumentlevel[3,27,111,294].(Themoregeneralrating-inferenceproblemcanalso,inspecialcases,besolvedusingaminimum-cutformulation[233].)Othershaveconsideredmoresophisticatedgraph-basedtechniques[109].4.6.2RelationshipsBetweenDiscourseParticipantsAninterestingsettingforopinionminingiswhenthetextstobeana-lyzedformpartofarunningdiscussion,suchasinthecaseofindividualturnsinpoliticaldebates,poststoonlinediscussionboards,andcom-mentsonblogposts.Onefascinatingaspectofthiskindofsettingistherichinformationsourcethatreferencesbetweensuchtextsrepresent,sincesuchinformationcanbeexploitedforbettercollectivelabelingofthesetofdocuments.Utilizingsuchrelationshipscanbeparticularlyhelpfulbecausemanydocumentsinthesettingswehavedescribedcanbequiteterse(orcomplicated),andhencediculttoclassifyontheirown,butwecaneasilycategorizeadicultdocumentifwendwithinitindicationsofagreementwithaclearly,say,positivetext.Basedonmanualexaminationof100responsesinnewsgroupsdevotedtothreedistinctcontroversialtopics(abortion,guncontrolandimmigration),Agrawaletal.[4]observethattherelationshipbetweentwoindividualsintheresponded-tonetworkismorelikelytobeantagonisticoverall,74%oftheresponsesexaminedwerefoundtobeantagonistic,whereasonly7%werefoundtobereinforcing.Bythenassumingthatrespond-tolinksimplydisagreement,theyeectivelyclassifyusersintooppositecampsviagraphpartitioning,outperform-ingmethodsthatdependsolelyonthetextualinformationwithinaparticulardocument. 4.6ClassicationBasedonRelationshipInformationSimilarly,MullenandMalouf[218]examinequotingbehavioramongusersofthepolitics.comdiscussionsiteausercanrefertoanotherpostbyquotingpartofitorbyaddressingtheotheruserbynameoruserIDwhohavebeenclassiedaseitherliberalorcon-servative.Theresearchersndthatasignicantfractionofthepostsofinteresttothemcontainquotedmaterial,andthat,incontrasttointer-bloglinkingpatternsdiscussedinAdamicandGlance[2],whereliberalandconservativeblogsiteswerefoundtotendtolinktositesofsimilarpoliticalorientations,andinaccordancewiththeAgrawaletal.[4]ndingscitedabove,politics.composterstendtoquoteusersattheoppositeendofthepoliticalspectrum.Toperformthenalpolitical-orientationclassication,usersareclusteredsothatthosewhotendtoquotethesameentitiesareplacedinthesamecluster.(Efron[83]similarlyusesco-citationanalysisforthesameproblem.)Ratherthanassumethatquotingalwaysindicatesagreementordisagreementregardlessofthecontext,Thomasetal.[294]buildanagreementdetectorforthetaskofanalyzingtranscriptsofcongressionaloor-debates,wheretheclassiercategorizescertainexplicitreferencestootherspeakersasrepresentingagreement(e.g.,IheartilysupportMrSmithsviews!)ordisagreement.Theythenencodeevidenceofahighlikelihoodofagreementbetweentwospeakersasarelationshipconstraintbetweentheutterancesmadebythespeakers,andcollec-tivelyclassifytheindividualspeechesastowhethertheysupportoropposethelegislationunderdiscussion,usingaminimum-cutformula-tionoftheclassicationproblem,asdescribedabove.Follow-upworkattemptstomakemorereneduseofdisagreementinformation[27].4.6.3RelationshipsBetweenProductFeaturesPopescuandEtzioni[244]treatthelabelingofopinionwordsregard-ingproductfeaturesasacollectivelabelingprocess.Theyproposeaniterativealgorithmwhereinthepolarityassignmentsforindividualwordsarecollectivelyadjustedthrougharelaxation-labelingprocess.Startingfromglobalwordlabelscomputedoveralargetextcollec-tionthatreectthesentimentorientationforeachparticularwordingeneralsettings,PopescuandEtzionigraduallyre-denethelabelfrom ClassicationandExtractiononethatisgenerictoonethatisspecictoareviewcorpustoonethatisspecictoagivenproductfeatureto,nally,onethatisspecictotheparticularcontextinwhichthewordoccurs.Theymakesuretorespectsentence-levellocalconstraintsthatopinionsconnectedbyconnectivessuchasbutorandshouldreceiveoppositeorthesamepolarities.TheideaofutilizingdiscourseinformationtohelpwiththeinferenceofrelationshipsbetweenproductattributescanalsobefoundintheworkofSnyderandBarzilay[272],whoutilizeagreementinformationinataskwhereonemustpredictratingsformultipleaspectsofthesameitem(e.g.,foodandambianceforarestaurant).Theirapproachistoconstructalinearclassiertopredictwhetherallaspectsofaproductaregiventhesamerating,andcombinethispredictionwiththatofindividual-aspectclassierssoastominimizeacertainlossfunction(whichtheytermthegrief).Interestingly,SnyderandBarzilay[272]giveanexamplewhereacollectionofinde-pendentaspect-ratingpredictorscannotassignacorrectsetofaspectratings,butaugmentationwiththeiragreementclassicationallowsperfectratingassignment;intheirspecicexample,theagreementclassierisabletousethepresenceofthephrasebutnottopredictacontrastingratingbetweentwoaspects.AnimportantobservationthatSnyderandBarzilay[272]makeabouttheirformulationisthathavingthepieceofinformationthatallaspectratingsagreecutsdownthespaceofpossibleratingtuplestoafargreaterdegreethanhavingtheinformationthatnotalltheaspectratingsarethesame.Notethattheconsiderationsdiscussedhererelatetothetopic-specicnatureofopinionsthatwediscussedinthecontextofdomainadaptationinSection4.4.4.6.4RelationshipsBetweenClassesRegressionformulations(whereweincludeordinalregressionunderthisumbrellaterm)arequitewell-suitedtotheratingreferenceproblemofpredictingthedegreeofpositivityinopinionateddocumentssuchasproductreviews,andtosimilarproblemssuchasdeterminingthestrengthwithwhichanopinionisheld.Inasense,regressionimplic-itlymodelssimilarityrelationshipsbetweenclassesthatcorrespondto 4.6ClassicationBasedonRelationshipInformationpointsonascale,suchasthenumberofstarsgivenbyareviewer.Incontrast,standardmulti-classcategorizationfocusesoncapturingthedistinctfeaturespresentineachclass,andignoresthefactthat5starsismuchmorelike4starsthan2stars.Onamoviereviewdataset,PangandLee[233]observethataone-vs-allmulti-classcategoriza-tionschemecanoutperformregressionforathree-classclassicationproblem(positive,neutral,andnegative),perhapsduetoeachclassexhibitingasucientlydistinctvocabulary,butformorene-grainedclassication,regressionemergesasthebetterofthetwo.Furthermore,whileregression-basedmodelsimplicitlyencodetheintuitionthatsimilaritemsshouldreceivesimilarlabels,PangandLee[233]formulateratinginferenceasametriclabelingproblem-[164],sothatanaturalnotionofdistancebetweenclasses(2starsand3starsaremoresimilartoeachotherthan1starand4starsare)iscapturedexplicitly.Morespecically,anoptimallabelingiscomputedthatbalancestheoutputofaclassierthatconsidersitemsinisolationwiththeimportanceofassigningsimilarlabelstosimilaritems.KoppelandSchler[167]considerasimilarversionofthisproblem,butwhereoneoftheclasses,correspondingtoobjective,doesnotlieonthepositive-to-negativecontinuum.GoldbergandZhu[109]presentagraph-basedalgorithmthataddressestheratinginferenceprobleminthesemi-supervisedlearningsetting,whereaclosed-formsolutiontotheunderlyingoptimizationproblemisfoundthroughcomputationonamatrixinducedbyagraphrepresentinginter-documentsimilarityrelationships,andthelossfunctionencodesthedesireforsimilaritemstoreceivesimilarlabels.MaoandLebanon[201](MaoandLebanon[200]isashorterversion)proposetouseisotonicconditionalrandomeldstocapturetheordinallabelsoflocal(sentence-level)sentiments.Givenwordsthatarestronglyassociatedwithpositiveandnegativesentiment,theyformulateconstraintsontheparameterstoreecttheintuitionthataddingapositive(negative)wordshouldaectthelocalsentimentlabelpositively(negatively).Wilsonetal.[320]treatclassication(e.g.,classifyinganopinionaccordingtoitsstrength)asanordinalregressiontask.McDonaldetal.[205]leveragerelationshipsbetweenlabelsassignedatdierentclassicationstages,suchasthewordlevelorsentencelevel, ClassicationandExtractionndingthatane-to-coarsecategorizationprocedureisaneectivestrategy.4.7IncorporatingDiscourseStructureComparedtothecasefortraditionaltopic-basedinformationaccesstasks,discoursestructure(e.g.,twistsandturnsindocuments)tendstohavemoreeectonoverallsentimentlabels.Forinstance,Pangetal.[235]observethatsomeformofdiscoursestructuremodelingcanhelptoextractthecorrectlabelinthefollowingexampleIhatetheSpiceGirls....[3thingstheauthorhatesaboutthem]...WhyIsawthismovieisareally,really,reallylongstory,butIdid,andonewouldthinkIddespiseeveryminuteofit.But...Okay,Imashamedofit,butIenjoyedit.Imean,Iadmititsareallyawfulmovie,...[they]actwackyashell...theninthoorofhell...acheap[beep]movie...Theplotissuchamessthatitsterrible.ButIlovedit.Inspiteofthepredominantnumberofnegativesentences,theoverallsentimenttowardthemovieunderdiscussionispositive,largelyduetotheorderinwhichthesesentencesarepresented.Needlesstosay,suchinformationislostinabag-of-wordsrepresentation.Earlyworkattemptstopartiallyaddressthisproblemviaincor-poratinglocationinformationinthefeatureset[235].Specically,thepositionatwhichatokenappearscanbeappendedtothetokenitselftoformposition-taggedfeatures,sothatthesameunigramappearingin,say,therstquarterandthelastquarterofthedocumentaretreatedastwodierentfeatures;buttheperformanceofthissimpleschemedoesnotdiergreatlyfromthatwhichresultsfromusingunigramsOnarelatednote,ithasbeenobservedthatpositionmattersinthecontextofsummarizingsentimentinadocument.Inparticular,incontrasttotopic-basedtextsummarization,wherethebeginningsofarticlesusuallyserveasstrongbaselinesintermsofsummarizingtheobjectiveinformationinthem,thelastsentencesofareviewhave 4.8LanguageModelsbeenshowntoserveasamuchbettersummaryoftheoverallsentimentofthedocumentthantherstsentences,andtobealmostasgoodasusingthemost(automatically-computed)subjectivesentences,intermsofhowaccuratelytheyrepresenttheoverallsentimentofthedocument[232].TheoriesoflexicalcohesionmotivatetherepresentationusedbyDevittandAhmad[73]forsentimentpolarityclassicationofnan-cialnews.Anotherwayofcapturingdiscoursestructureinformationindoc-umentsistomodeltheglobalsentimentofadocumentasatrajec-toryoflocalsentiments.Forexample,MaoandLebanon[200]proposeusingsentimentowasasequentialmodeltorepresentanopinionateddocument.Morespecically,eachsentenceinthedocumentreceivesalocalsentimentscorefromanisotonic-conditional-random-eld-basedsentencelevelpredictor.Thesentimentowisdenedasafunction:[0(theordinalset),wheretheinterval[(/n,t/n)ismappedtothelabelofthethsentenceinadocumentwithtences.Theowisthensmoothedoutthroughconvolutionwithasmoothingkernel.Finally,thedistancesbetweentwoows(e.g.,tancebetweenthetwosmoothed,continuousfunctions)shouldreect,tosomedegree,thedistancesbetweenglobalsentiments.Onasmalldataset,MaoandLebanonobservethatthesentimentowrepresenta-tion(especiallywhenobjectivesentencesareexcluded)outperformsaplainbag-of-wordsrepresentationinpredictingglobalsentimentwithanearestneighborclassier.4.8LanguageModelsTheriseoftheuseoflanguagemodelsininformationretrievalhasbeenaninterestingrecentdevelopment[65,177,179,243].Theyhavebeenappliedtovariousopinion-miningandsentiment-analysistasks,andinfactthesubjectivity-extractionworkofPangandLee[232]isademoapplicationfortheheavilylanguage-modeling-orientedLingPipe http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html. ClassicationandExtractionOnecharacteristicoflanguagemodelingapproachesthatdieren-tiatesthemsomewhatfromotherclassication-orienteddata-driventechniqueswehavediscussedsofaristhatlanguagemodelsareoftenconstructedusinglabeleddata,but,giventhattheyaremechanismsforassigningprobabilitiestotextratherthanlabelsdrawnfromaniteset,theycannot,strictlyspeaking,bedenedaseithersupervisedorunsupervisedclassiers.Ontheotherhand,therearevariouswaystoconverttheiroutputtolabelswhennecessary.Anexampleofworkinthelanguage-modelingveinisthatofEguchiandLavrenko[84],whoranksentencesbybothsentimentrelevancyandtopicrelevancy,basedonpreviousworkonrelevancelanguagemodels[179].Theyproposeagenerativemodelthatjointlymodelssentimentwords,topicwords,andsentimentpolarityinasentenceasatriple.LinandHauptmann[186]considertheproblemofexaminingwhethertwocollectionsoftextsrepresentdierentperspectives.Intheirstudy,employingReutersdata,twoexamplesofdierentperspectivesarethePalestinianviewpointvs.theIsraeliviewpointinwrittentextandBushvs.Kerryinpresidentialdebates.TheybasetheirnotionofdierenceinperspectiveupontheKullback Leibler(KL)divergencebetweenposte-riordistributionsinducedfromdocumentcollectionpairs,anddiscoverthattheKLdivergencebetweendierentaspectsisanorderofmagni-tudesmallerthanthatbetweendierenttopics.Thisperhapsprovidesyetanotherreasonthatopinion-orientedclassicationhasbeenfoundtobemoredicultthantopic-basedclassication.Researchemployingprobabilisticlatentsemanticanalysis[125]orlatentDirichletallocationLDA)[39]canalsobecastaslanguage-modelingwork[41,195,206].Thebasicideaistoinferlan-guagemodelsthatcorrespondtounobservedfactorsinthedata,withthehopethatthefactorsthatarelearnedrepresenttopicsorsentiment4.9SpecialConsiderationsforExtractionOpinion-orientedextraction.Manyapplications,suchassummariza-tionorquestionanswering,requireworkingwithpiecesofinformationthatneedtobepulledfromoneormoretextualunits.Forexample, 4.9SpecialConsiderationsforExtractionmulti-perspectivequestion answering)systemmightneedtorespondtoopinion-orientedquestionssuchasWasthemostrecentpresidentialelectioninZimbabweregardedasafairelection?[51];theanswermaybeencodedinaparticularsentenceofaparticulardoc-ument,ormayneedtobestitchedtogetherfrompiecesofevidencefoundinmultipledocuments.Informationextraction)ispreciselytheeldofnaturallanguageprocessingdevotedtothistypeoftask[49].Hence,itisnotsurprisingthattheapplicationofinformation-extractiontechniquestoopinionminingandsentimentanalysishasbeenproposed[51,79].Inthissurvey,weusethetermopinion-orientedinformationextractionopinion-orientedIE)torefertoinformationextractionproblemsparticulartosentimentanalysisandopinionmin-ing.(Wesometimesshortenthephrasetoopinionextraction,whichshouldnotbeconstruednarrowlyasfocusingontheextractionofopin-ionexpressions;forinstance,determiningproductfeaturesisincludedundertheumbrellaofthisterm.)Pastresearchinthisareahasbeendominatedbyworkontwotypesoftexts:Opinion-orientedinformationextractionfromreviewsasnotedabove,attractedagreatdealofinterestinrecentyears.Infact,thetermopinionmining,whenconstruedinitsnarrowsense,hasoftenbeenusedtodescribeworkinthiscontext.Reviews,whiletypically(butnotalways)devotedtoasingleitem,suchasaproduct,service,orevent,generallycommentonmultipleaspects,facets,orfeaturesofthatitem,andallsuchcommentarymaybeimportant.Extractingandanalyzingopinionsassociatedwitheachindi-vidualaspectcanhelpprovidemoreinformativesummariza-tionsorenablemorene-grainedopinion-orientedretrieval.Otherworkhasfocusedonnewswire.Unlikereviews,anewsarticleisrelativelylikelytocontaindescriptionsofopinionsthatdonotbelongtothearticlesauthor;anexampleisaquotationfromapoliticalgure.Thispropertyofjournalistictextmakestheidenticationofopinionholders(alsoknownasopinionsources)andthecorrectassociationofopinion ClassicationandExtractionholderswithopinionsimportanttasks,whereasforreviews,allexpressedopinionsaretypicallythoseoftheauthor,soopinion-holderidenticationisalesssalientproblem.Thus,whennewswirearticlesarethefocus,theemphasishastendedtobeonidentifyingexpressionsofopinions,theagentexpressingeachopinion,and/orthetypeandstrengthofeachopinion.Earlyworkinthisdirectionrstcarefullydevelopedandevaluatedalow-levelopinionannotationscheme[45,283,309,312],whichfacilitatedthestudyofsub-taskssuchasidentifyingopinionholdersandanalyzingopinionsatthephraselevel[37,42,43,51,60,61,157,320].Itisimportanttounderstandthesimilaritiesanddierencesbetweenopinion-orientedIEandstandardfact-orientedIE.Theysharesomesub-tasksincommon,suchasentityrecognition;forexample,asmentionedabove,determinationofopinionholdersisanactivelineofresearch[37,42,61,158].WhattrulysetstheproblemapartfromstandardorclassicIEisthespecictypesofentitiesandrelationsthatareconsideredimportant.Forinstance,althoughidenticationofproductfeaturesisinsomesenseastandardentityrecognitionprob-lem,anopinionextractionsystemwouldbemostlyinterestedinfea-turesforwhichassociatedopinionsexist;similarly,anopinionholderisnotjustanynamedentityinanewsarticle,butonethatexpressesopinions.Examplesofthetypesofrelationsparticularlypertinenttoopinionminingarethosecenteredaroundcomparisonsconsider,forexample,therelationsencodedbysuchsentencesasThenewmodelismoreexpensivethantheoldoneorIpreferproductAoverproductB([139,191],longerversionofthelatteravailableasJindalandLiu[138])orbetweenagentsandreportedbeliefs,asdescribedinSection4.9.2.Notethattherelationsofinterestcanformacomplexhierarchicalstructure,asinthecasewhereanopinionisattributedtoonepartybyanother,sothatitisunclearwhethertherstpartytrulyholdstheopinioninquestion[42].Itisalsoimportanttounderstandwhichaspectsofopinion-orientedextractionarementionedinthissectionasopposedtotheprevioussec-tions.Asdiscussedearlier,manysub-problemsofopinionextractionare 4.9SpecialConsiderationsforExtractioninfactclassicationproblemsforrelativelysmalltextualunits.Exam-plesincludebothdeterminingwhetherornotatextspanissubjectiveandclassifyingagiventextspanalreadydeterminedtobesubjectivebythestrengthoftheopinionexpressed.Thus,manykeytechniquesinvolvedinbuildinganopinionextractionsystemarealreadydiscussedintheprevioussections.Inthissection,weinsteadfocusonthemiss-ingpieces,describingapproachestoproblemsthataresomewhatspe-cialtoextractiontasksinsentimentanalysis.Whilethesesub-taskscanbe(andoftenare)castasclassicationproblems,theydonothavenaturalcounterpartsoutsideoftheextractioncontext.Specically,Sec-tion4.9.1isdevotedtotheidenticationoffeaturesandexpressionsofopinionsinreviews.Section4.9.2considerstechniquesthathavebeenemployedwhenopinion-holderidenticationisanissue.Finally,wemakethefollowingorganizationalnote.Onemayoftenwanttopresenttheoutputofopinionextractioninsummarizedform;conversely,someformsofsentimentsummarizationrelyontheoutputofopinionextraction.Opinion-orientedsummarizationisdiscussedinSection5.4.9.1IdentifyingProductFeaturesandOpinionsinReviewsInthecontextofreviewmining[130,166,215,244,323,324],twoimportantextraction-relatedsub-tasksare(1)Theidenticationofproductfeatures,and(2)theextractionofopinionsassociatedwiththesefeatures.Whilethekeyfeaturesoraspectsareknowninsomecases,manysystemsstartfromproblem(1).Asnotedabove,identicationofproductfeaturesisinsomesenseastandardinformationextractiontaskwithlittletodistinguishitfromothernon-sentiment-relatedproblems.Afterall,thenotionofthefea-turesthatagivenproducthasseemsfairlyobjective.However,HuandLiu[130]showthatonecanbenetfromlightsentimentanalysisevenforthissub-task,asdescribedshortly. ClassicationandExtractionExistingworkonidentifyingproductfeaturesdiscussedinreviews(task(1))oftenreliesonthesimplelinguisticheuristicthat(explicit)featuresareusuallyexpressedasnounsornounphrases.Thisnarrowsdownthecandidatewordsorphrasestobeconsidered,butobviouslynotallnounsornounphrasesareproductfeatures.Yietal.[323]con-siderthreeincreasinglystrictheuristicstoselectfromnounphrasesbasedonpart-of-speech-tagpatterns.HuandLiu[130]followtheintu-itionthatfrequentnounsornounphrasesarelikelytobefeatures.Theyidentifyfrequentfeaturesthroughassociationmining,andthenapplyheuristic-guidedpruningaimedatremoving(a)multi-wordcandidatesinwhichthewordsdonotappeartogetherinacertainorder,and(b)single-wordcandidatesforwhichsubsumingsuper-stringshavebeencollected(theideaistoconcentrateonmorespecicconcepts,sothat,forexample,lifeisdiscardedinfavorofbatterylife).Thesetech-niquesbythemselvesoutperformageneral-purposeterm-extractionand-indexingsystemknownasFASTR[135].FurthermoreandhereistheobservationthatisrelevanttosentimenttheF-measurecanbefurtherimproved(althoughprecisiondropsslightly)viathefollowingexpansionprocedure:adjectivesappearinginthesamesentenceasfre-quentfeaturesareassumedtobeopinionwords,andnounsandnounphrasesco-occurringwiththeseopinionwordsinothersentencesaretakentobeinfrequentfeatures.Incontrast,PopescuandEtzioni[244]considerproductfeaturestobeconceptsformingcertainrelationshipswiththeproduct(forexam-ple,forascanner,itssizeisoneofitsproperties,whereasitscoverisoneofitsparts)andseektoidentifythefeaturesconnectedwiththeproductnamethroughcorrespondingmeronymydiscriminators.Notethatthisapproach,whichdoesnotinvolvesentimentanalysispersebutsimplyfocusesmoreonthetaskofidentifyingdierenttypesoffeatures,achievedbetterperformancethanthatyieldedbythetech-niquesofHuandLiu[130].Therehasalsobeenworkthatfocusesonextractingattribute-valuepairsfromtextualproductdescriptions,butnotnecessarilyinthecontextofopinionmining.Ofworkinthisvein,Ghanietal.[105]directlycompareagainstthemethodproposedbyHuandLiu[130]. 4.9SpecialConsiderationsforExtractionToidentifyexpressionsofopinionsassociatedwithfeatures(task(2)),asimpleheuristicistosimplyextractadjectivesthatappearinthesamesentenceasthefeatures[130].Deeperanalysescanmakeuseofparseinformationandmanuallyorsemi-automaticallydevelopedrulesorsentiment-relevantlexicons[215,244].4.9.2ProblemsInvolvingOpinionHoldersInthecontextofanalysisofnewswireandrelatedgenres,weneedtoidentifytextspanscorrespondingbothtoopinionholdersandtoexpressionsoftheopinionsheldbythem.Asistruewithothersegmentationtasks,identifyingopinionholderscanbeviewedasasequencelabelingproblem.Choietal.[61]exper-imentwithanapproachthatcombinesConditionalRandomFields)[176]andextractionpatterns.ACRFmodelistrainedonacertaincollectionoflexical,syntactic,andsemanticfeatures.Inpar-ticular,extractionpatternsarelearnedtoprovidesemantictaggingaspartofthesemanticfeatures.(CRFshavealsobeenusedtodetectopinionexpressions[43].)Alternatively,giventhatthestatusofanopinionholderdependsbydenitionontheexpressionofanopinion,theidenticationofopinionholderscanbenetfrom,orperhapsevenrequire,account-ingforopinionexpressionseithersimultaneouslyorasapre-processingOneexampleofsimultaneousprocessingistheworkofBethardetal.[37],whospecicallyaddressthetaskofidentifyingbothopin-ionsandopinionsources.Theirapproachisbasedonsemanticparsingwheresemanticconstituentsofsentences(e.g.,agentorproposi-tion)aremarked.Byutilizingopinionwordsautomaticallylearnedbyabootstrappingapproach,theyfurtherrenethesemanticrolestoidentifypropositionalopinions,i.e.,opinionsthatgenerallyfunctionasthesententialcomplementofapredicate.Thisenablesthemtocon-centrateonverbsandextractverb-specicinformationfromsemanticframessuchasaredenedinFrameNet[25]andPropBank[230].Asanotherexampleofthesimultaneousapproach,Choietal.[60]employanintegerlinearprogrammingapproachtohandlethejoint ClassicationandExtractionextractionofentitiesandrelations,drawingontheworkofRothandYih[260]onusingglobalinferencebasedonconstraints.Asanalternativetothesimultaneousapproach,asystemcanstartbyidentifyingopinionexpressions,andthenproceedtotheanalysisoftheopinions,includingtheidenticationofopinionholders.Indeed,KimandHovy[159]denetheproblemofopinionholderidenticationasidentifyingopinionsourcesgivenanopinionexpressioninasentence.Inparticular,structuralfeaturesfromasyntacticparsetreeareselectedtomodelthelong-distance,structuralrelationbetweenaholderandanexpression.KimandHovyshowthatincorporatingthepatternsofpathsbetweenholderandexpressionoutperformsasimplecombinationoflocalfeatures(e.g.,thetypeoftheholdernode)andothernon-structuralfeatures(e.g.,thedistancebetweenthecandidateholdernodeandtheexpressionnode).Onenalremarkisthatthetaskofdeterminingwhichmentionsofopinionholdersareco-referent(sourcecoreferenceresolution)diersinpracticeininterestingwaysfromtypicalnounphrasecoreferenceresolution,dueinparttothewayinwhichopinion-orienteddatasetsmaybeannotated[282]. 5 Summarization Sofar,wehavetalkedaboutanalyzingandextractingopinioninfor-mationfromindividualdocuments.Thefocusofthissectionisonaggregatingandrepresentingsentimentinformationdrawnfromanindividualdocumentorfromacollectionofdocuments.Forexample,ausermightdesireanat-a-glancepresentationofthemainpointsmadeinasinglereview;creatingsuchsingle-documentsentimentsummariesisdescribedinSection5.1.Anotherapplicationconsideredwithinthisparadigmistheautomaticdeterminationofmarketsentiment,orthemajorityleaningofanentirebodyofinvestors,fromtheindividualremarksofthoseinvestors[66,67];thisisatypeofmulti-documentopinion-orientedsummarization,describedinSection5.2.5.1Single-DocumentOpinion-OrientedSummarizationThereisclearlyatightconnectionbetweenextractionoftopic-basedinformationfromasingledocument[49]andtopic-basedsummariza-tionofthatdocument,sincetheinformationthatispulledoutcanserveasasummary;seeRadevetal.[247,Section2.1]forabriefreview. Obviously,thisconnectionbetweenextractionandsummarizationholdsinthecaseofsentiment-basedsummarization,aswell.Onewayinwhichthisconnectionismademanifestinsingle-documentopinion-orientedsummarizationisasfollows:thereareapproachesthatcreatetextualsentimentsummariesbasedonextrac-tionofsentencesorsimilartextunits.Forexample,Beinekeetal.[33]attempttoselectasinglepassagethatreectstheopinionofthedocumentsauthor(s),mirroringthepracticeoflmadvertise-mentsthatpresentsnippetsfromreviewsofthemovie.Train-ingandtestdataisacquiredfromthewebsiteRottenTomatoes(http://www.rottentomatoes.com),whichprovidesaroughlysentence-lengthsnippetforeachreview.However,Beinekeetal.[33]notethatlowaccuracycanresultevenforhigh-qualityextractionmeth-odsbecausetheRottenTomatoesdataincludesonlyasinglesnippetperreview,whereasseveralsentencesmightbeperfectlyviablealter-natives.Intermsofcreatinglongersummaries,MaoandLebanon[200]suggestthatbytrackingthesentimentowwithinadocumenti.e.,howsentimentorientationchangesfromonesentencetothenext,asdiscussedinSection4.7onecancreatesentimentsummariesbychoosingthesentencesatlocalextremaoftheow(plustherstandlastsentence).Aninterestingfeatureofthisapproachisthatbyincor-poratingadocumentsow,thetechniquetakesintoaccounttheentiredocumentinaholisticway.Bothapproachesjustmentionedseektoselecttheabsolutelymostimportantsentencestopresent.Alterna-tively,onecouldsimplyextractallsubjectivesentences,aswasdonebyPangandLee[232]tocreatesubjectivityextracts.Theysug-gestedthattheseextractscouldbeusedassummaries,although,asmentionedabove,theyfocusedontheuseoftheseextractsasanaidtodownstreampolarityclassication,ratherthanassummariesperse.Finally,wenotethatsentencesarealsousedinmulti-documentsentimentsummarizationaswell,asdescribedinSection5.2.Othersentimentsummarizationmethodscanworkdirectlyotheoutputofopinion-orientedinformation-extractionsystems.Indeed, Beinekeetal.[33]usethetermsentimentsummarytorefertoasinglepassage,butweprefertonotrestrictthattermsdenitionsotightly. 5.1Single-DocumentOpinion-OrientedSummarizationCardieetal.[51],speakingaboutthemorerestrictedtypeofextractionreferredtobythetechnicalterminformationextraction,proposeto...summaryrepresentationsasinformationextraction(IE)sce-nariotemplates...[thus]wepostulatethatmethodsfrominformationextraction...willbeadequatefortheautomaticcreationofopinion-basedsummaryrepresentations.(AsimilarobservationwasmadebyDiniandMazzini[79].)NotethattheseIEtemplatesdonotformcoher-enttextontheirown.However,theycanbeincorporatedasisintoIndeed,oneinterestingaspectoftheproblemofextractingsenti-mentinformationfromasingledocument(orfrommultipledocuments,asdiscussedinSection5.2)isthatsometimesgraph-basedoutputseemsmuchmoreappropriateorusefulthantext-basedoutput.Forexample,graph-basedsummariesareverysuitablewhentheinformationthatismostimportantisthesetofentitiesdescribedandtheopinionsthatsomeoftheseentitiesholdabouteachother[305].Figure5.1showsanexampleofahuman-generatedsummaryintheformofagraphdepictingvariousnegativeopinionsexpressedduringtheaftermathofHurricaneKatrina.Notetheinclusionoftextsnippetsonthearrowstosupporttheinferenceofanegativeopinion;ingeneral,providingsomesenseoftheevidencefromwhichopinionsareinferredislikelytobehelpfultotheuser.WhilesummarizationtechnologiesmaynotbeabletoachievethelevelofsophisticationofinformationpresentationexhibitedbyFigure5.1,currentresearchismakingprogresstowardthatgoal.InFigure5.2,weseeaproposedsummarywhereopinionholdersandtheobjectsoftheiropinionsareconnectedbyedges,andvariousanno-tationsderivedfromIEoutputareincluded,suchasthestrengthofvariousattitudes.Ofcourse,graphicalelementscanalsobeusedtorepresentasin-glebit,numberorgradeasaverysuccinctsummaryofadocuments Theexceptionsaretheedgesfromnewsmediaandtheedgesfrompeoplewhodidntevacuate.Itis(perhapsintentionally)ambiguouswhetherthelackofsupportingquotesisduemerelytothelackofsucientlyjuicyonesorismeanttoindicatethatitisutterlyobviousthattheseentitiesblamemanyothers.Wealsonotethatthehurricaneitselfisnotrepresented. 64Summarization Fig.5.1GraphicbyBillMarshforTheNewYorkTimes,October1,2005,depictingneg-ativeopinionsofvariousentitiestowardeachotherintheaftermathofHurricaneKatrina.RelationtoopinionsummarizationpointedoutbyEricBreck(ClaireCardie,personalcommunication).sentiment.Variationsofstars,lettergrades,andthumbsup/thumbsdowniconsarecommon.Morecomplexvisualizationschemesappliedonasentence-by-sentencebasishavealsobeenproposed[7].5.2Multi-DocumentOpinion-OrientedSummarizationLanguageisitselfthecollectiveartofexpression,asummaryofthousandsuponthousandsofindividualintuitions.Theindividualgetslostinthecollectivecre-ation,buthispersonalexpressionhasleftsometraceinacertaingiveandexibilitythatareinherentinall 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.2Figure2(labeled3)ofCardieetal.[51]:proposalforasummaryrepresentationderivedfromtheoutputofaninformation-extractionsystem.collectiveworksofthehumanspirit.EdwardSapir,LanguageandLiterature,1921.ConnectiontosentimentanalysispointedoutbyDasandChen[67].5.2.1SomeProblemConsiderationsThereneverwasintheworldtwoopinionsalike,nomorethantwohairs,ortwograins;themostuniversalqualityisdiversity.MicheldeMontaigne,Whereanopinionisgeneral,itisusuallycorrect.JaneAusten,ManseldParkWebrieydiscussheresomepointstokeepinmindinregardstomulti-documentsentimentsummarization,althoughtoacertaindegree,workinsentimentsummarizationhasnotyetreachedalevelwheretheseproblemshavecometothefore.Determiningwhichdocumentsorportionsofdocumentsexpressthesameopinionisnotalwaysaneasytask;but,clearlyitisonethatneedstobeaddressedinthesummarizationsetting,sincereadersofsentimentsummariessurelyareinterestedintheoverallsentimentinthecorpuswhichmeansthesystemmustdeterminesharedsentimentswithinthedocumentcollectionathand. Thisissuecanstillariseevenwhenlabelshavebeenpredeter-mined,iftheitemsthathavebeenpre-labeledcomefromdierentsub-collections.Forinstance,somedocumentsmayhavepolaritylabels,whereasothersmaycontainratingsona1-to-5scale.Andevenwhentheratingsaredrawnfromthesameset,calibrationissuesmayarise.ConsiderthefollowingfromRottenTomatoesfrequently-asked-questionspage(http://www.rottentomatoes.com/pages/faq#judge):OntheBlade2reviewspage,youhaveanegativereviewfromJamesBerardinelli2.5/4stars,andapositivereviewfromEricLurio.WhyisBerardinellisreviewlabeledRottenandLuriosreviewlabeledFresh?Youreseeingthisdiscrepancybecausestarsystemsarenotconsistentbetweencritics.ForcriticslikeRogerEbertandJamesBerardinelli,2.5starsorloweroutof4starsisalwaysnegative.Forothercritics,2.5starscaneitherbepositiveornegative.EventhoughEricLuriousesa5starsystem,hisgradingisveryrelaxed.So,2starscanbepositive.Also,theresalwaysthepossibilityofthewebmasterorcriticputtingthewrongratingonareview.Asanotherexample,inreconcilingreviewsofconferencesubmissions,program-committeemembersmustoftentakeintoaccountthefactthatcertainreviewersalwaystendtoassignlowscorestopapers,whileoth-ershavetheoppositetendency.Indeed,webelievethiscalibrationissuemaybethereasonwhyreviewsofcarsonEpinionscomenotonlywithanumberofstarsannotation,butalsoathumbsup/thumbsdownindicator,inordertoclarifywhether,regardlessoftheratingassigned,thereviewauthoractuallyintendstomakeapositiverecommendationornot.Anadditionalobservationtotakenoteofisthefactthatwhentworeviewersagreeonarating,theymayhavedierentreasonsfordoingso,anditmaybeimportanttoindicatethesereasonsinthesummary.Arelatedpointisthatwhenareviewerassignsamiddlingrating,itmaybebecauseheorshethinksthatmostaspectsoftheitemunderdiscussionareso-so,butitmayalsobebecauseheorshesees 5.2Multi-DocumentOpinion-OrientedSummarizationbothstrongpositivesandstrongnegatives.Or,reviewersmayhavethesameopinionsaboutindividualitemfeatures,butweighttheseindividualfactorsdierently,leadingtoadierentoverallsentiment.Indeed,RottenTomatoessummarizesasetofreviewsbothwiththeTomatometerpercentageofreviewsjudgedtobepositiveandanaverageratingona1-to-10scale.Theidea,againaccordingtotheFAQ(http://www.rottentomatoes.com/pages/faq#avgvstmeter),isasfollows:TheAverageRatingmeasurestheoverallqualityofaproductbasedonanaverageofindividualcriticscores.TheTomatometersimplymeasuresthepercentageofcriticswhorecommendacertainproduct.Forexample,whileMeninBlackscored90%ontheTomatometer,theaverageratingisonly7.5/10.ThatmeansthatwhileyourelikelytoenjoyMIB,itprobablywasntacontenderforBestPictureattheOscars.Incontrast,ToyStory2receivedaperfect100%ontheTomatometerwithanaverageratingof9.6/10.Thatmeans,notonlyareyoucertaintoenjoyit,youllalsobeimpressedwiththedirection,story,cinematography,andalltheotherthingsthatmaketrulygreatlmsTheproblemofdecidingwhethertwosentencesortextpas-sageshavethesamesemanticcontentisonethatisfacednotjustbyopinion-orientedmulti-documentsummarizers,butbytopic-basedmulti-documentsummarizersaswell[247];thishasbeenoneofthemotivationsbehindworkonparaphraserecognition[29,30,231]andtextualentailment[28].But,aspointedoutinKuetal.[170],whileintraditionalsummarizationredundantinformationisoftendiscarded,inopinionsummarizationonewantstotrackandreportthedegreeofredundancy,sinceintheopinion-orientedsettingtheuseristyp-icallyinterestedinthe(relative)numberoftimesagivensentimentisexpressedinthecorpus.Careninietal.[52]notethatachallengeinsentimentsummariza-tionisthatthepiecesofinformationtobesummarizedpeoples opinionsareoftenconicting,whichisabitdierentfromtheusualsituationintopic-basedsummarization,wheretypicallyonedoesnotassumethatthereareconictingsetsoffactsinthedocumentset(althoughthereareexceptions[301,302]).5.2.2TextualSummariesInstandardtopic-basedmulti-documentsummarization,creatingtex-tualsummarieshasbeenamainfocusofeort.Hence,despitethedif-ferencesintopic-andopinion-basedsummarizationmentionedabove,severalresearchershavedevelopedsystemsthatcreatetextualsum-mariesofopinion-orientedinformation.5.2.2.1LeveragingExistingTopic-BasedTechnologiesOnelineofattackistoadaptexistingtopic-basedmulti-documentsum-marizationalgorithmstothesentimentsetting.Sometimestheadaptationconsistssimplyofmodifyingtheinputtothesepre-existingalgorithms.Forinstance,Sekietal.[264]proposethatoneapplystandardmulti-documentsummarizationtoasub-collectionofdocumentsthatareonthesametopicandthataredeterminedtobelongtosomerelevantgenreoftext,suchasargumentative.Inothercases,pre-existingtopic-basedsummarizationtechniquesaremodied.Forexample,Careninietal.[52]generatenatural-languagesummariesintheformofanevaluativeargumentusingtheclassicnatural-languagegenerationpipelineofcontentselection,lexicalselectionandsentenceplanning,andsentencerealization[251],assumingtheexistenceofapre-denedproduct-featurehierarchy.Thesystemexplicitlyproducestextualdescriptionsofaggregateinforma-tion.Thesystemiscapableofrelayingdataabouttheaveragesenti-mentandsignaling,ifappropriate,thatthedistributionofresponsesisbi-modal(thisallowsonetoreportsplitvotes).Theycomparethissystemagainstamodicationofanexistingsentence-extractionsystem,MEAD[246].Theformerapproachseemsmorewell-suitedforgeneraloverviews,whereasthelatterseemsbetteratprovidingmorevarietyinexpressionandmoredetail;seeFigure5.3.Relatedtothe 5.2Multi-DocumentOpinion-OrientedSummarization Summarycreatedviaatruenatural-language-generationapproachAlmostalluserslovedtheCanonG3possiblybecausesomeusersthoughtthephysicalappearancewasverygood.Furthermore,sev-eralusersfoundthemanualfeaturesandthespecialfeaturestobeverygood.Also,someuserslikedtheconveniencebecausesomeusersthoughtthebatterywasexcellent.Finally,someusersfoundtheediting/viewinginterfacetobegooddespitethefactthatsev-eralcustomersreallydislikedtheviewnder.However,thereweresomenegativeevaluations.Somecustomersthoughtthelenswaspooreventhoughsomecustomersfoundtheopticalzoomcapabilitytobeexcellent.Mostcustomersthoughtthequalityoftheimageswasverygood.Summarycreatedbyamodiedsentence-extractionsystemBottomline,wellmadecamera,easytouse,veryexibleandpower-fulfeaturestoincludetheabilitytouseexternalashandlense/lterschoices.Ithasabeautifuldesign,lotsoffeatures,veryeasytouse,verycongurableandcustomizable,andthebatterydurationisamazing!Greatcolors,pictures,andwhitebalance.Thecameraisadreamtooperateinautomode,butalsogivestremendousexibilityinaperturepriority,shutterpriority,andmanualmodes.Idhighlyrecommendthiscameraforanyonewhoislookingforexcellentqual-itypicturesandacombinationofeaseofuseandtheexibilitytogetadvancedwithmanyoptionstoadjustifyoulike. Fig.5.3Sampleautomaticallygeneratedsummaries.AdaptedfromFigure2ofCareninietal.[52].latterapproach,sentenceextractionmethodshavealsobeenusedtocreatesummariesforopinion-orientedqueriesortopics[265,266].Whilewearenotawareofthefollowingtechniquebeingusedinstandardtopic-basedsummarization,weseenoreasonwhyitisnotapplicabletothatsetting,atleastinprinciple.Kuetal.[170](shortversionavailableasKuetal.[169])proposethefollowingsimpleschemetocreateatextualsummaryofasetofdocumentsknowninadvancetobeonthesametopic.Sentencesconsideredtoberepresentativeofthetopicarecollected,andthepolarityofeachsuchsentenceiscomputedbasedonwhatsentiment-bearingwordsitcontains,withnegationtakenintoaccount.Then,tocreateasummaryofthepositivedocuments,thesystemsimplyreturnstheheadlineofthedocumentwiththemostpositiveon-topicsentences,andsimilarlyforthenegative documents.Theauthorsshowthefollowingexamplesforthepositiveandthenegativesummary,respectively:Positive:ChineseScientistsSuggestProperLegislationforCloneTechnology.Negative:UKGovernmentStopsFundingforSheepCloningTeam.Theclevernessofthismethodisthatheadlinesare,byconstruction,goodsummaries(atleastofthearticletheyaredrawnfrom),sothatuencyandinformativeness,althoughperhapsnotappropriateness,areguaranteed.Anotherperhapsunconventionaltypeofmulti-documentsum-maryistheselectionofafewdocumentsofinterestfromthecorpusforpresentationtotheuser.Inthisvein,Kawaietal.[151]havedevel-opedanewsportalsitecalledFairNewsReaderthatattemptstodeterminetheaectcharacteristicsofarticlestheuserhasbeenread-ingsofar(e.g.,happinessorfear)andthenrecommendsarticlesthatareonthesametopicbuthaveoppositeaectcharacteristics.Onecouldimagineextendingthisconcepttoanewsportalthatpresentedtotheuseropinionsopposinghisorherpre-conceivedones(PhoebeSen-gers,personalcommunication).Onarelatednote,Liu[190]mentionsthatonemightdesireasummarizationsystemtopresentarepresen-tativesampleofopinions,sothatbothpositiveandnegativepointsofviewarecovered,ratherthanjustthedominantsentiment.Asofthetimeofthiswriting,Amazonpresentsthemosthelpfulfavorablereviewside-by-sidewiththemosthelpfulcriticalreviewifoneclicksonthe[x]customerreviewslinknexttothestarsindicator.Additionally,onecouldinterprettheopinion-leaderidenticationworkofSongetal.[275]assuggestingthatblogpostswrittenbyopinionleaderscouldserveasanalternativetypeofrepresentativesample.Summarizingonlinediscussionsandblogsisanareaofrelatedwork[131,300,330].Thefocusofsuchworkisnotonsummarizingtheopinionsperse,althoughZhouandHovy[330]notethatonemaywanttovarytheemphasisontheopinionsexpressedversusthefacts 5.2Multi-DocumentOpinion-OrientedSummarization5.2.2.2TextualSummarizationWithoutTopic-basedSummarizationTechniquesOtherworkintheareaoftextualmulti-documentsentimentsumma-rizationdepartsfromtopic-basedwork.Themainreasonseemstobethatredundancyeliminationismuchlessofaconcern:usersmaywishtolookatmanyindividualopinionsregardlessofwhethertheseindivid-ualopinionsexpressthesameoverallsentiment,andtheseusersmaynotparticularlycarewhetherthetextualoverviewtheyperuseiscoher-ent.Thus,inseveralcases,textualsummariesaregeneratedsimplybylistingsomeorallopinionatedsentences.Theseareoftengroupedbyfeature(sub-topic)and/orpolarity,perhapswithsomerankingheuris-ticsuchasfeatureimportanceapplied[129,170,324,332].5.2.3Non-textualSummariesIntheprevioussection,wehavediscussedthecreationofmariesoftheopinioninformationexpressedwithinacorpus.Butinsettingswherethepolarityororientationofeachindividualdocu-mentwithinacollectionissummedupinasinglebit(e.g.,thumbsup/thumbsdown),number(e.g.,3.5stars),orgrade(e.g.,B+),analternativewaytoobtainasuccinctsummaryoftheoverallsentimentistoreportsummarystatistics,suchasthenumberofreviewsthatarethumbsuportheaveragenumberofstarsoraveragegrade.Manysystemstakethisapproachtosummarization.Summarystatisticsareoftenquitesuitedtographicalrepresenta-tions;wedescribesomenoteworthyvisualaspectsofthesesummarieshere(evaluationoftheuser-interfaceaspectshasnotbeenafocusofattentioninthecommunitytodate).5.2.3.1BoundedSummaryStatistics:AveragesandRelativeFrequenciesWeusethetermboundedtorefertosummarystatisticsthatliewithinapredeterminedrange.Examplesaretheaveragenumberofstars(range:0to5stars,say)orthepercentageofpositiveopinions(range:0%to100%). Thermometer-typeimagesareonemeansfordisplayingsuchstatistics.OneexampleistheTomatometerontheRottenTomatoeswebsite,whichissimplyabarbrokenintotwodierentlycoloredportions;theportionofthebarthatiscoloredredindicatesthefractionofpositivereviewsofagivenmovie.Thisrepresentationextendsstraightforwardlyto-arycategorizationschemes,suchaspos-itive/middling/negative,viatheuseofcolors.Thethermometer-graphicconceptalsogeneralizesinotherways;forinstance,thedepictionofanumberofstarscanbeconsideredtobeavariantofthisidea.Insteadofusingsizeorextenttodepictboundedsummarystatistics,asisdonewiththermometerrepresentations,onecanusecolorshad-ing.Thischoiceseemsparticularlyappropriateinsettingswheretheamountofdisplayreal-estatethatcanbedevotedtoanyparticu-laritemunderevaluationishighlylimitedorwheresizeorloca-tionisreservedtorepresentsomeotherinformation.Forinstance,Gamonetal.[104]usecolortorepresentthegeneralassessmentof(automaticallydetermined)productfeatures.InFigure5.4,weseethateachofmanyfeaturesortopics,suchashandlingorvw,service,isrepresentedbyashadedbox.Thecolorsforanygivenboxrangefromredtowhitetogreen,indicatinggradationsoftheaveragesentimenttowardthattopic,movingfromnegativetoneutral(orobjective)topositive,respectively.Notethatonecanquicklygleanfromsuchadisplaywhatwaslikedandwhatwasdislikedabouttheproductunderdiscussion,despitethelargenumberoftopicsunderevaluationpeoplelikedrivingthiscarbutdisliketheservice.AsshowninFigure5.5,asimilarinter-face(togetherwithausabilitystudy)ispresentedinCareninietal.[53].Somedierencesarethatnatural-languagesummarizationisalsoemployed,sothatthesummaryisbothverbalandvisual;thefeaturesaregroupedintoahierarchy,thusleveragingtheabil-ityofTreemaps[270]todisplayhierarchicaldatavianesting;andtheinterfacealsoincludesaway(notdepictedinthegure)toseeanat-a-glancesummaryofthepolaritiesoftheindividualsentencescommentingonaparticularfeature.Ademoisavailableonlineathttp://www.cs.ubc.ca/carenini/storage/SEA/demo.html. 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.4Figure2ofGamonetal.[104],depicting(automaticallydetermined)topicsdis-cussedinreviewsoftheVolkswagenGolf.Thesizeofeachtopicboxindicatesthenumberofmentionsofthattopic.Theshadingofeachtopicbox,rangingfromredtowhitetogreen,indicatestheaveragesentiment,rangingfromnegativetoneutral/nonetonegative,respectively.Atthebottom,thesentencesmostindicativeofnegativesentimentforthetopicvw,servicearedisplayed.5.2.3.2UnboundedSummaryStatisticsAsjustdescribed,thermometergraphicsandcolorshadingcanbeusedtorepresentboundedstatisticssuchasthemeanor,inthecaseof-colorthermometers,relativedistributionsofratingsacrossdierentclasses.Butboundedstatisticsbythemselvesdonotprovideotherimportantpiecesofinformation,suchastheactualnumberofopinionswithineachclass.(Weconsiderrawfrequenciestobeconceptuallyunbounded,althoughtherearepracticallimitstohowmanyopinionscanbeaccountedfor.)Intuitively,theobservationthat50%ofthereviewsofaparticularproductarenegativeismoreofa Weadmittobeingglass-half-emptypeople. 74Summarization Fig.5.5Figure4ofCareninietal.[53],showingasummaryofreviewsofaparticularproduct.Anautomaticallygeneratedtextsummaryisontheleft;avisualsummaryisontheright.Thesizeofeacheachboxinthevisualsummaryindicatesthenumberofmentionsofthecorrespondingtopic,whichoccupyapre-denedhierarchy.Theshadingofeachtopicbox,rangingfromredtoblacktogreen,indicatestheaveragesentiment,rangingfromnegativetoneutral/nonetonegative,respectively.Atthebottomisshownthesourcefortheportionofthegeneratednatural-languagesummarythathasbeenlabeledwiththefootnote4.bigdealifthatstatisticisbasedon10,000reviewsthanifitbasedononlytwo.Anotherproblemspecictothemeanasasummarystatisticisthatreview-aggregationsitesseemtooftenexhibithighlyskewedrat-ingdistributions,withaparticularbiastowardhighlypositivereviews[74,59,128,253,132,240].Sincetherecanoftenbeasecondmode,orbump,attheextremelowendoftheratingscale,indicatingpolariza-tionforexample,Huetal.[132]remarkthat54%oftheitemsinasampleofAmazonbook,DVD,andvideoproductswithmorethan20reviewsfailbothstatisticalnormalityandunimodalitytestsreport-ingonlythemeanratingscoremaynotprovideenoughinformation.Toputitanotherway,divulgingtheaveragedoesnotgivetheuser Onarelatednote,WilliamSaresNewYorkTimesMay1,2005articleBlurbospherequotesCharlesMcGrath,formereditoroftheNewYorkTimesBookReview,asasking,hasthereeverbeenabookthatwasntacclaimed? 5.2Multi-DocumentOpinion-OrientedSummarizationenoughinformationtodistinguishbetweenasetofmiddlingreviewsandasetofpolarizedreviews.Ontheotherhand,itisworthpointingoutthatjustgivingthenum-berofpositiveandnegativereviews,respectively,ontheassumptionthattheusercanalwaysderivethepercentagesfromthesecounts,maynotsuce.CabralandHorta¸csu[47]observethatonceeBayswitchedtodisplayingthepercentageofpiecesoffeedbackonsellersthatwerenegative,asopposedtosimplytherawnumbers,thennegativereviewsbegantohaveameasurableeconomicimpact(seeSection6).Hence,notsurprisingly,sentimentsummariestendtoincludedataontheaveragerating,thedistributionofratings,and/orthenumberofratings.Visualizationofunboundedsummarystatistics.Ofthetwosystemsdescribedabovethatrepresenttheaveragepolarityofopinionsviacolor,bothrepresentthequantityoftheopinionsonagiventopicviasize.Thismeansthatthecountdataforpositiveandfornegativeopinionsarenotexplicitlypresentedseparately.Inothersystems,thisisnotthecase;rather,frequenciesfordierentclassesarebrokenoutanddisplayed.Forinstance,asofthetimeofthiswriting,Amazondisplaysanaverageratingasanumberofstarswiththenumberofreviewsnexttoit;mousingoverthestarsbringsupahistogramofreviewerratingsannotatedwithcountsforthe5-starreviews,4-starreviews,etc.(Fur-thermousingoverthebarsofthehistogrambringsupthepercentageofreviewsthateachofthosecountsrepresent.)Asanotherexample,asampleoutputoftheOpinionObserversys-tem[191]isdepictedinFigure5.6,wheretheportionofabarproject-ingabovethecenteredhorizonlinerepresentsthenumberofpositiveopinionsaboutacertainproductfeature,andtheportionofthebarbelowthelinerepresentsthenumberofnegativeopinions.(Thesameideacanbeusedtorepresentpercentagestoo,ofcourse.)Anicefeatureofthisvisualizationisthatbecauseoftheuseofahorizonline,twoseparatefrequencydatapointsthepositiveandnegativecountscanberepresentedbywhatisvisuallyoneobject,namely,asolidbar,andonecaneasilysimultaneouslycomparenegativesagainstnegatives 76Summarization Fig.5.6Figure2ofLiuetal.[191].Threecellphonesarerepresented,eachbyadierentcolor.Foreachfeature(General,LCD,etc.),threebarsareshown,oneforeachofthreecellphones.Foragivenfeatureandphone,theportionsofthebaraboveandbelowthehorizontallinerepresentthenumberofreviewsthatexpresspositiveornegativeviewsaboutthatcamerasfeature,respectively.(Thesystemcanalsoplotthepercentagesofpositiveornegativeopinions,ratherthantherawnumbers.)Thepaneontheupper-rightdisplaysthepositivesentencesregardingoneoftheproducts.andpositivesagainstpositives.Thissimultaneouscomparisonismademuchmoredicultifthebarsallhaveoneendplantedatthesamelocation,asisthecaseforstandardhistogramssuchastheonedepictedinFigure5.7.WhilethedataforthefeaturesarepresentedsequentiallyinFigure5.6(rstGeneral,thenLCD,andsoforth),analternativevisualizationtechniquecalledaroseplotisexempliedinFigure5.8,whichdepictsasampleoutputofthesystemdevelopedbyGregoryetal.[113].Themedianandquartilesacrossadocumentsub-collectionofthepercentageofpositiveandnegativewordsperdocument,togetherwithsimilardataforotherpossibleaect-classicationdimensions,arerepresentedviaavariantofboxplots.(Adaptationtorawcountsrather 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.7AportionofFigure4ofYiandNiblack[324],rotatedtosavespaceandfacilitatecomparisonwithFigure5.6.NoticethatsimultaneouscomparisonofthenegativecountsandthepositivecountsfortwodierentproductsisnotaseasyasitisinFigure5.6.thanpercentagesisstraightforward.)MappingthisideatoproductcomparisonsinthestyleofOpinionObserver,onecouldassociatedif-ferentfeatureswithdierentcompassdirections,e.g.,thefeaturebatterylifewithsouthwest,aslongasthenumberoffeaturesbeingreportedonisnottoolarge.Thereasonthatthisrepresentationmightproveadvantageousinsomesettingsisthatinsomesituations,acir-culararrangementmaybemorecompactthanasequentialone,anditmaybeeasierforausertorememberafeatureasbeingsouthwestthanasbeingthefthofeight.Anadditionalfunctionalityofthesystemthatisnotshowninthegureistheabilitytodepicthowmuchanindividualdocumentspositive/negativepercentagediersfromtheaverageforagivendocumentgrouptowhichthedocumentbelongs.AsimilarcircularlayoutisproposedinSubasicandHuettner[285]forvisualizingvariousdimensionsofaectwithinasingledocument.Morinagaetal.[215]opttorepresentdegreesofassociationbetweenproductsandopinion-indicativetermsofapre-speciedpolarity.First, 78Summarization Fig.5.8Figure7ofGregoryetal.[113].Ontherightaretworoseplots,oneforeachoftwoproducts;ontheleftistheplotslegend.Ineachroseplot,theuppertwopetalsrepresentpositivityandnegativity,respectively(oftheothersixpetals,thebottomtwoareviceandvirtue,etc.).Similarlytoboxplots,themedianvalueisindicatedbyadarkarc,andthequartilesby(colored)bandsaroundthemedianarc.Darkershadingforoneofthetwopetalsinapair(e.g.,positiveandnegative)aremeanttoindicatethenegativeendofthespectrumfortheaectdimensionrepresentedbythegivenpetalpair.Thehistogrambeloweachroserelatestothenumberofdocumentsrepresented.opinionsaregatheredusingtheauthorspre-existingsystem[291].Coding-lengthandprobabilisticcriteriaareusedtodeterminewhichtermstofocuson,andprincipalcomponentanalysisisthenappliedtoproduceatwo-dimensionalvisualization,suchthatnearnesscorre-spondstostrengthofassociation,asintheauthorspreviouswork[184].Thus,inFigure5.9,weseethatcellphoneAisassociatedwithwhatwerecognizeaspositiveterms,whereascellphoneCisassociatedwithnegativeterms. 5.2Multi-DocumentOpinion-OrientedSummarization Fig.5.9Figure5ofMorinagaetal.[215]:principal-components-analysisvisualizationofassociationsbetweenproducts(squares)andautomaticallyselectedopinion-orientedterms5.2.3.3TemporalVariationandSentimentTimelinesSofar,thesummarieswehaveconsidereddonotexplicitlyincorpo-rateanytemporaldimension.However,timeisoftenanimportantFirst,usersmaywishtoviewindividualreviewsinreversechrono-logicalorder,i.e.,newestrst.Indeed,atthetimeofthiswriting,thisisoneofthetwosortingoptionsthatAmazonpresents. Second,inmanyapplications,analystsandotherusersareinter-estedintrackingchangesinsentimentaboutaproduct,politicalcan-didate,company,orissueovertime.Clearly,onecancreateasentimenttimelinesimplybyplottingthevalueofachosensummarystatisticatdierenttimes;thechosenstatisticcanreecttheprevailingpolarity[170,296]orsimplythenumberofmentions,inwhichcasewhatisbeingmeasuredisperhapsnotsomuchpublicopinion,butratherpub-licawareness[102,197,211,212].Suchworkisstronglyrelatedataconceptualleveltotopicdetectionandtracking[8],areviewofwhichisbeyondthescopeofthissurvey.MishneanddeRijke[212]alsodepictthederivativeofthesummarystatisticconsideredasafunctionoftime.5.2.4Review(er)QualityHowdoweidentifywhatisgood?Andhowdowecensurewhatisbad?Wewillarguethatdevelopingahumanereputationsystemecologycanprovidebet-teranswerstothesetwogeneralquestionsrestrain-ingthebasersideofhumannature,whileliberatingthehumanspirittoreachforeverhighergoals.Manifestoforthereputationsociety.MasumandZhang[203]Whencreatingsummariesofreviewsoropinionatedtext,animpor-tanttypeofinformationthatdeservescarefulconsiderationiswhetherornotindividualreviewsarehelpfuloruseful.Forexample,asystemmightwanttodownweightorevendiscardunhelpfulreviewsbeforecre-atingsummariesorcomputingaggregatestatistics,asinLiuetal.[193].Alternatively,thesystemcoulduseallreviews,butprovidehelpfulnessindicatorsforindividualreviewsasasummaryoftheirexpectedutil-ity.Indeed,non-summarizationsystemscouldusesuchinformation,too:forinstance,areview-orientedsearchenginecouldrankitssearchresultsbyhelpfulness.Somewebsitesalreadygatherhelpfulnessinformationfromhumanreaders.Forexample,Amazon.comannotatesreviewswithcommentslike120of140peoplefoundthefollowingreviewhelpful,meaning 5.2Multi-DocumentOpinion-OrientedSummarizationthatofthe140peoplewhopressedoneoftheyesornobut-tonsinresponsetothequestionWasthisreviewhelpfultoyou?wedeemthese140peopleutilityevaluators120choseyes.Simi-larly,theInternetMovieDatabase(IMDb,http://www.imdb.com)alsoannotatesusercommentswithoutofpeoplefoundthefollowingcommentuseful.ThissimilarityisperhapsnotsurprisingduetothefactthatAmazonownsIMDb,althoughfromaresearchpointofview,notethatthetwopopulationsofusersareprobablyatleastsomewhatdisjoint,meaningthattheremightbeinterestingdierencesbetweenthesourcesofdata.OthersitessolicitingutilityevaluationsincludeYahoo!MoviesandYahoo!TV,whichallowtheusertosortreviewsbyhelpfulness;CitySearch,whichsolicitsutilityevaluationsfromgeneralusersandgivesmorehelpfulreviewsgreaterprominence;andEpin-ions,whichonlyallowsregisteredmemberstoratereviewsanddoesnotappeartohavehelpfulnessasasortcriterion,atleastfornon-registeredvisitors.(Welearnedaboutthesolicitationofutilityeval-uationsbyIMDbfromZhuangetal.[332]andbyCitysearchfromDellarocas[71].)Despitethefactthatmanyreview-aggregationsitesalreadyprovidehelpfulnessinformationgatheredfromhumanusers,therearestillatleasttworeasonswhyautomatichelpfulnessclassicationisausefullineofworktopursue.Itemsthatlackutilityevaluations.Manyreviewsreceiveveryfewutilityevaluations.Forexample,38%ofasampleofroughly20,000AmazonMP3-playerreviews,and31%ofthoseagedatleastthreemonths,receivedthreeorfewerutilityevaluations[161].Similarly,Liuetal.[193]conrmonespriorintuitionsthatAmazonreviewsthatareyoungestandreviewsthataremostlowlyranked(i.e.,determinedtobeleasthelpful)bythesitereceivethefewestutilityevaluations. WenotethatwewereunabletondAmazonsdenitionofhelpful,andconcludethattheydonotsupplyone.Incontrast,Yahoo!speciesthefollowing:Was[areview]informative,wellwrittenoramusingaboveallwasitwashelpfultoyouinlearn-ingaboutthe[lmorshow]?Ifso,thenyoushouldratethatreviewashelpful.Itmightbeinterestingtoinvestigatewhetherthesedieringpolicieshaveimplica-tions.TherehaveinfactbeensomecommentsthatAmazonshouldclarifyitsques-tion(http://www.amazon.com/Was-this-review-helpful-you/forum/Fx1JS1YLZ490S1O/Tx3QHE2JPEXQ1V7/1? encoding=UTF8&asin=B000FL7CAU). Perhapssomereviewsreceivenoutilityevaluationssimplybecausetheyaresoobviouslybadthatnobodybotherstoratethem.Butthisdoesnotimplythatreviewswithoututilityevaluationsmustnecessarilybeunhelpful;certainlywecannotassumethisofreviewstoorecentlywrittentohavebeenreadbymanypeople.Oneimportantrolethatautomatedhelpfulnessclassierscanplay,then,istoprovideutilityratingsinthemanycaseswhenhumanevaluationsarelacking.Skewinutilityevaluations.Anotherinterestingpotentialapplicationofautomatedhelpfulnessclassicationistocorrectforbiasesinhuman-generatedutilityevaluations.Werstconsiderindirectevidencethatsuchbiasesexist.Itturnsoutthatjustasthedistributionofratingscanoftenbeheavilyskewedtowardthepositiveend,asdiscussedinSection5.2.3.2,thedistributionofutilityevaluationscanalsobeheavilyskewedtowardthehelpfulend,probablydueatleastinparttosimilarreasonsasintheproduct-ratingscase.Inacrawlofapproximately4millionuniqueAmazonreviewsforabout670,000books(excludingalternateeditions),theaverageper-centageofyesresponsesamongtheutilityevaluationsisbetween74%and70%,dependingonwhetherreviewswithfewerthan10util-ityevaluationsareexcluded(GueorgiKossinetsandCristianDanescuNiculescu-Mizil,personalcommunication).Similarly,halfofasampleofabout23,000Amazondigital-camerareviewshadhelpful/unhelpfulvoteratiosofover9to1[193].Asintheratingsdistributioncase,onesintuitionisthatthepercentageofreviewsthataretrulyhelpfulisnotashighasthesestatisticswouldseemtoindicate.Anothertypeofindi-rectevidenceofbiasisthatthenumberofutilityevaluationsreceivedbyareviewappearstodecreaseexponentiallyinhelpfulnessrankascomputedbyAmazon[193].(Certainlytherehastobesomesortofdecrease,sinceAmazonshelpfulnessrankingisbasedinpartonthenumberofutilityevaluationsareviewreceives.)Liuetal.[193]conjec-turethatreviewsthathavemanyutilityevaluationswillhaveadispro-portionateinuenceonreaders(andutilityevaluators)becausetheyareviewedasmoreauthoritative,butreviewscouldgetmanyutilityeval-uationsonlybecausetheyaremoreprominentlydisplayed,notbecausereadersactuallycomparedthemagainstotherreviews.(Liuetal.[193] 5.2Multi-DocumentOpinion-OrientedSummarizationcallthistendencyforoften-evaluatedreviewstoquicklyaccumulateevenmoreutilityevaluationsaswinnercirclebias;inotherlitera-tureonpower-laweects,relatedphenomenaarealsoreferredtoasrich-get-richer.)Asformoredirectevidence:Liuetal.[193]conductanre-annotationstudyinwhichtheAmazonreviewersutilityevaluationsoftendidnotmatchthoseofthehumanre-labelers.However,thislatterevidenceshouldbetakenwithagrainofsalt.First,insomeoftheexperimentsinthestudy,groundtruthhelpfulnesswasmeasuredby,amongotherthings,thenumberofaspectsofaproductthataredis-cussedbyareview.Second,inallexperiments,thetestitemsappeartohaveconsistedofonlythetextofasinglereviewconsideredinisolation.ItisnotclearthattherstpointcorrespondstothestandardthatallAmazonreviewersused,orshouldberequiredtouse,andclearly,thesecondpointdescribesanisolated-textsettingthatisnottheonethatrealAmazonreviewersworkin.Toexemplifyboththeseobjections:averyshortreviewwrittenbyareputablecritic(e.g.,atopreviewer)thatpointsoutsomethingthatreviewsmissedcan,indeed,bequitehelpful,butwouldscorepoorlyaccordingtothespecicationofLiuetal.[193].Indeed,thesampleprovidedofareviewthatshouldbelabeledbadstarts,Iwanttopointoutthatyoushouldneverbuyagenericbattery,likethepersonfromSanDiegowhoreviewedtheS410onMay15,2004,wasrecommending.Yesyoudsavemoney,buttherehavebeenmanyreportsofgenericbatteriesexplodingwhenchargedfortoolong.Wewouldviewthiscomment,iftrue,tobequitehelpful,despitethefactthatitfailsthespecication.Anothertechnicalissueisthatthere-labelersusedafour-classcategorizationscheme,whereasessentiallyeverypossiblepercentageofpositiveutilityevaluationscouldformadistinctclassfortheAmazonlabels:itmighthavebeenbettertotreatreviewswithhelpfulnesspercentagesof60%and61%asequivalent,ratherthansayingthatAmazonreviewersratedthelatterasbetterthantheformer. Nonetheless,giventhelargepredominanceofhelpfulamongutil-ityevaluationsdespitethefactthatanecdotalevidencewehavegath-eredindicatesthatnotallreviewsdeservetobecalledhelpful,andgiventhesuggestiveresultsofthere-annotationexperimentjustdescribed,itislikelythatsomeofthehumanutilityevaluationsarenotstronglyrelatedtothequalityofthereviewathand.Thus,webelievethatcorrectionoftheseutilityevaluationsbyautomaticmeansisavalidpotentialapplication.Anoteregardingtheeectofutilityevaluations.Itisimportanttomentiononecaveatbeforeproceedingtodescriberesearchinthisarea.Parketal.[236]attemptedtodeterminewhattheeectofreviewqual-ityactuallyisonpurchasingintention,runningastudyinwhichsub-jectsengagedinhypotheticalbuyingbehavior.Theyfoundnon-uniformeects:low-involvement[i.e.,motivated]consumersareaectedbythequantityratherthanthequalityofreviews...high-involvementcon-sumersareaectedbyreviewquantitymainlywhenthereviewqualityishigh...Theeectofreviewqualityonhigh-involvementconsumersismorepronouncedwithasizablenumberofreviews,whereastheeectofreviewquantityissignicantevenwhenthereviewqualityislow.(MoreontheeconomicimpactsofsentimentanalysisisdescribedinSection6.)5.2.4.1MethodsforAutomaticallyDeterminingReviewQualityInaway,onecouldconsiderthereview-qualitydeterminationproblemasatypeofreadabilityassessmentandapplyessay-scoringtechniques[19,99].However,whilesomeofthesystemsdescribedbelowdotrytotakeintoaccountsomereadability-relatedfeatures,theyaretailoredspecicallytoproductreviews.Kimetal.[161],ZhangandVaradarajan[328],andGhoseandIpeirotis[106]attempttoautomaticallyrankcertainsetsofreviewsontheAmazon.comwebsiteaccordingtotheirhelpfulnessorutility,usingaregressionformulationoftheproblem.Thedomainsconsid-eredareabitdierent:MP3playersanddigitalcamerasintherstcase;Canonelectronics,engineeringbooks,andPG-13moviesinthe 5.2Multi-DocumentOpinion-OrientedSummarizationsecondcase;andAVplayersplusdigitalcamerasinthethirdcase.Liuetal.[193]converttheproblemintooneoflow-qualityreviewdetection(i.e.,binaryclassication),experimentingmostlywithman-ually(re-)annotatedreviewsofdigitalcameras,althoughCNeteditorialratingswerealsoconsideredontheassumptionthatthesecanbecon-sideredtrustworthy.RubinandLiddy[261]alsosketchaproposaltoconsiderwhetherreviewscanbeconsideredcredible.Kimetal.[161]studywhichofamultitudeoflength-based,lexical,POS-count,product-aspect-mentioncount,andmetadatafeaturesaremosteectivewhenutilizingSVMregression.Thebestfeaturecombi-nationturnedouttobereviewlengthplustf-idfscoresforlemmatizedunigramsinthereviewplusthenumberofstarsthereviewerassignedtotheproduct.Somewhatdisappointingly,thebestpairoffeaturesamongthesewasthelengthofthereviewandthenumberofstars.(Usingnumberofstarsastheonlyfeatureyieldedsimilarresultstousingjustthedeviationofthenumberofstarsgivenbytheparticularreviewerfromtheaveragenumberofstarsgrantedbyallreviewersfortheitem.)Theeectivenessofusingallunigramsappearstosubsumethatofusingaselectsubset,suchassentiment-bearingwordsfromtheGeneralInquirerlexicon[281].ZhangandVaradarajan[328]useadierentfeatureset.Theyemployanerclassicationoflexicaltypes,andmoresourcesforsub-jectiveterms,butdonotincludeanymeta-datainformation.Interest-ingly,theyalsoconsiderthesimilaritybetweenthereviewinquestionandtheproductspecication,onthepremisethatagoodreviewshoulddiscussmanyaspectsoftheproduct;andtheyincludethereviewssimilaritytoeditorialreviews,onthepremisethateditorialreviewsrepresenthigh-qualityexamplesofopinion-orientedtext.(DavidandPinch[70]observe,however,thateditorialreviewsforbooksarepaidforandaremeanttoinducesalesofthebook.)However,theselattertwooriginalfeaturesdonotappeartoenhanceperformance.Thefeaturesthatappeartocontributethemostaretheclassofshallowsyntac-ticfeatures,which,theauthorsspeculate,seemtocharacterizestyle;examplesincludecountsofwords,sentences,wh-words,comparativesandsuperlatives,propernouns,etc.Reviewlengthseemstobeveryweaklycorrelatedwithutilityscore. WethusseethatKimetal.[161]ndthatmeta-dataandverysimpletermstatisticssuce,whereasZhangandVaradarajan[328]observethatmoresophisticatedcuesthatappearcorrelatedwithlin-guisticaspectsappeartobemostimportant.Possibly,thedierenceisaresultofthedierenceindomainchoice:wespeculatethatbookandmoviereviewscaninvolvemoresophisticatedlanguageusethanwhatisexhibitedinreviewsofelectronics.Declaringthemselvesinuencedbypriorworkoncreatingsubjectiv-ityextracts[232],GhoseandIpeirotis[106]takeadierentapproach.Theyfocusontherelationshipbetweenthesubjectivityofareviewanditshelpfulness.Thebasisformeasuringreviewsubjectivityisasfollows:usingaclassierthatoutputstheprobabilityofasen-tencebeingsubjective,onecancomputeforagivenreviewtheaver-agesubjectiveness-probabilityoverallitssentences,orthestandarddeviationofthesubjectivityscoresofthesentenceswithinthereview.Theyfoundthatboththestandarddeviationofthesentencesubjectiv-ityscoresandareadabilityscore(reviewlengthincharactersdividedbynumberofsentences)haveastronglystatisticallysignicanteectonutilityevaluations,andthatthisissometimestrueoftheaveragesubjectiveness-probabilityaswell.Theythensuggestonthebasisofthisandotherevidencethatitisextremereviewsthatareconsideredtobemosthelpful,anddevelopahelpfulnesspredictorbasedontheirLiuetal.[193]consideredfeaturesrelatedtoreviewandsentencelength;brand,productandproduct-aspectmentions,withspecialcon-siderationforappearancesinreviewtitles;sentencesubjectivityandpolarity;andparagraphstructure.Thislatterreferstoparagraphsasdelimitedbyautomaticallydeterminedkeywords.Interestingly,thetechniqueoftakingthe30mostfrequentpairsofnounsornounphrasesthatappearatthebeginningofaparagraphaskeywordsyieldsseparatorpairssuchaspros/cons,strength/weakness,andtheupsides/downsides.(Notethatthisdiersfromidenti-fyingproorconreasonsthemselves[157],oridentifyingthepolarityofsentences.Notealsothatotherauthorshaveclaimedthatdier-enttechniquesareneededforsituationsinwhichpro/condelimitersaremandatedbytheformatimposedbyareviewaggregationsite 5.2Multi-DocumentOpinion-OrientedSummarizationbutaseparatedetailedtextualdescriptionmustalsobeincluded,asinEpinions,asopposedtosettingswheresuchdelimitersneednotbepresentorwherealltextisplacedinthecontextofsuchdelim-iters[191].)Somewhatunconventionallywithrespecttoothertext-categorizationwork,thebaselinewastakenasSVMlightrunwiththreesentence-levelstatisticsasfeatures;thatis,theperformanceofaclas-siertrainedusingbag-of-wordfeaturesisnotreported.Giventhisunconventionalstartingpoint,theadditionofthefeaturesthatdonotreectsubjectivityorsentimenthelp.Includingsubjectivityandpolar-ityontopofwhathasalreadybeenmentioneddoesnotyieldfurtherimprovement,anduseoftitle-appearanceformentionsdidnotseemtohelp.Review-oropinion-spamdetectiontheidenticationofdeliber-atelymisleadingreviewsisalineofworkbyJindalandLiu([141],shortversionavailableasJindalandLiu[140])inthesamevein.Onechallengetheseresearchersfacedwasthedicultyinobtaininggroundtruth.Therefore,forexperimentalpurposestheyrstre-framedtheproblemasoneoftryingtorecognizeduplicatereviews,sinceaprioriitishardtoseewhypostingrepeatsofreviewsisjustied.(However,onepotentialproblemwiththeassumptionthatrepeatedreviewscon-stitutesomesortofmanipulationattempt,atleastfortheAmazondatathatwasconsidered,isthatAmazonitselfcross-postsreviewsacrossdierentproductswheredierentincludesdierentinstan-tiations(e.g.,e-bookvs.hardcover)orsubsequenteditionsofthesameitem(GueorgiKossinetsandCristianDanescuNiculescu-Mizil,per-sonalcommunication).Specically,inasampleofover1millionAma-zonbookreviews,aboutone-thirdwereduplicates,butthesewereallduetoAmazonscross-posting.Humanerror(e.g.,accidentallyhittingthesubmitbuttontwice)causesothercasesofnon-maliciousdupli-cates.)Asecondroundofexperimentsattemptedtoidentifyreviewsonbrandsonly,ads,andotherirrelevantreviewscontainingnoopin-ions(e.g.,questions,answers,andrandomtexts).Someofthefeaturesusedweresimilartothoseemployedinthestudiesdescribedabove;othersincludedfeaturesonthereviewauthorandtheutilityevalua-tionsthemselves.Theoverallmessagewasthatthiskindofspamisrelativelyeasytodetect. 5.2.4.2Reviewer-IdentityConsiderationsIntheabove,wehavediscusseddeterminingthequalityofindividualreviews.Analternateapproachistolookatthequalityofthereview-ers;doingsocanbethoughtofasawayofclassifyingallthereviewsauthoredbythesamepersonatonce.Interestingly,onestudyhasfoundthatthereisarealeconomiceecttobeobservedwhenfactoringinreviewercredibility:Guetal.[114]notethataweightedaverageofmessage-boardpostingsinwhichpostercredibilityisfactoredinhaspredictionpoweroverfutureabnor-malreturnsofthestock,butifpostingsareweighteduniformly,thepredictivepowerdisappears.Therehasbeenworkinanumberofareasinthehuman-language-technologiescommunitythatincorporatestheauthority,trustworthiness,inuentialness,orcredibilityofauthors[94,96,141,275].PageRank[44,241]andhubsandauthorities(alsoknownasHITS)[163]areveryinuentialexamplesofworkinlinkanalysisonidentifyingitemsofgreatimportance.Trustmetricsalsoappearinotherwork,suchasresearchintopeer-to-peerandreputationnetworksandinformationcredibility[71,115,147,174,252]. 6 BroaderImplications SentimentisthemightiestforceincivilizationJ.EllenFoster,WhatAmericaOwestoWomen,1893Aswehaveseen,sentiment-analysistechnologieshavemanypoten-tialapplications.Inthissection,webrieydiscusssomeofthelargerimplicationsthattheexistenceofopinion-orientedinformation-accessserviceshas..Onepointthatshouldbementionedisthatapplicationsthatgatherdataaboutpeoplespreferencescantriggerconcernsaboutpri-vacyviolations.Wesuspectthatinmanypeoplesminds,havingonespublicblogscannedbyacoeecompanyforpositivementionsofitsproductisonething;havingonescell-phoneconversationsmonitoredbytherulingpartyofonesowncountryfornegativementionsofgov-ernmentocialsisquiteanother.Itisnotourintenttocommentfur-therhereonprivacyissues,thesenotbeingissuesonwhichwearequaliedtospeak;rather,wesimplywanttobethoroughbyremind-ingthereaderthattheseissuesdoexistandareimportant,andthattheseconcernsapplytoalldata-miningtechnologiesingeneral. BroaderImplications.Butevenifwerestrictattentiontotheapparentlyfairlyharmlessdomainofbusinessintelligence,certainquestionsregardingthepotentialformanipulationdoarise.Companiesalreadyparticipateinmanagingonlineperceptionsaspartofthenormalcourseofpublic-relationseorts:...companiescantcontrolconsumer-generatedcon-tent.Theycan,however,paycloseattentiontoit.Inmanycases,oftentoalargedegree,theycaneveninu-enceit.Infact,inasurveyconductedbyAberdeen[ofmorethan250enterprisesusingsocialmediamoni-toringandanalysissolutionsinadiversesetofenter-prises],morethantwiceasmanycompanieswithsocialmediamonitoringcapabilitiesactivelycontributetoconsumerconversationsthanremainpassiveobservers(67%versus33%).Overathirdofallcompanies(39%)contributetoonlineconversationsonafrequentbasis,interactingwithconsumersinaneorttoswayopinion,correctmisinformation,solicitfeedback,rewardloyalty,testnewideas,orforanynumberofotherreasons.ZabinandJeeries[327]Anditisalsothecasethatsomearguablymildformsofmanipula-tionhavebeensuggested.Forinstance,onesetofauthors,instudyingthestrategicimplicationsforacompanyofoeringonlineconsumerreviews,notesthatifitispossibleforthesellertodecidethetim-ingtooerconsumerreviewsattheindividualproductlevel,itmaynotalwaysbeoptimaltooerconsumerreviewsataveryearlystageofnewproductintroduction,evenifsuchreviewsareavailable([57],quotationfromtheJuly2004working-paperversion),andothershaveworkedonamanufacturer-orientedsystemthatranksreviewsaccord-ingtotheirexpectedeectonsales,notingthatthesemightnotbetheonesthatareconsideredtobemosthelpfultousers[106].Butstill,thereareconcernsthatcorporationsmighttrytofurthergamethesystembytakingadvantageofknowledgeofhowrankingsystemsworkinordertosuppressnegativepublicity[124]orengageinotherso-calledblack-hatsearchengineoptimizationandrelated activities.Indeed,therehasalreadybeenatermsockpuppetcoinedtorefertoostensiblydistinctonlineidentitiescreatedtogivethefalseimpressionofexternalsupportforapositionoropinion;StoneandRichtel[280]listseveralratherattention-grabbingexamplesofwell-knownwritersandCEOsengaginginsock-puppetry.Onarelatednote,DasandChen[67]recommendLeinweberandMadhavan[183]asaninterestingreviewofthehistoryofmarketmanipulationthroughdis-Onereasonthesepotentialsforabusearerelevanttothissurveyisthat,aspointedoutearlierintheIntroduction,sentiment-analysistechnologiesallowuserstoconsultmanypeoplewhoareunknowntothem;butthismeanspreciselythatitisharderforuserstoevaluatethetrustworthinessofthosepeople(orpeople)theyareconsulting.Thus,opinion-miningsystemsmightpotentiallymakeiteasierforuserstobemis-ledbymaliciousentities,aproblemthatdesignersofsuchsystemsmightwishtoprevent.Ontheipside,aninformation-accesssystemthatis(perhapsunfairly)perceivedtobevulnerabletomanipulationisonethatisunlikelytobewidelyused;thus,again,buildersofsuchsystemsmightwishtotakemeasurestomakeitdiculttogametheIntheremainderofthissection,then,wediscussseveralaspectsoftheproblemofpossiblemanipulationofreputation.Inparticular,welookatevidenceastowhetherreviewshaveademonstrableeco-nomicimpact:ifreviewsdosignicantlyaectcustomerpurchases,thenthereisarguablyaneconomicincentiveforcompaniestoengageinuntowardmeasurestomanipulatepublicperception;ifreviewsdonotsignicantlyaectcustomerpurchases,thenthereislittlerea-son,fromaneconomicpointofview,forentitiestotrytoarticiallychangetheoutputofsentiment-analysissystemsor,asDewally[74]asserts,thestockmarketdoesnotappeartoreacttotheserecom-...Thefearsraisedbythemediaaboutthedestabilizingpowerofsuchtraderswhoparticipateinthesediscussionsarethusgroundless.Ifsuchclaimsaretrue,thenitwouldseemthattryingtomanipulateperceptionsconveyedbyonlinereview-accesssystemswouldoerlittleadvantagestocompanies,andsotheywouldnotengageinit. BroaderImplications6.1EconomicImpactofReviewsAsmentionedearlierintheIntroductiontothissurvey,manyreadersofonlinereviewssaythatthesereviewssignicantlyinuencetheirpur-chasingdecisions[63].However,whilethesereadersmayhavebelievedthattheyweresignicantlyinuenced,perceptionandrealitycandier.Akeyreasontounderstandtherealeconomicimpactofreviewsisthattheresultsofsuchananalysishaveimportantimplicationsforhowmucheortcompaniesmightorshouldwanttoexpendononlinereputationmonitoringandmanagement.Giventheriseofonlinecommerce,itisnotsurprisingthatabodyofworkcenteredwithintheeconomicsandmarketingliteraturestud-iesthequestionofwhetherthepolarity(oftenreferredtoasvalence)and/orvolumeofreviewsavailableonlinehaveameasurable,signif-icantinuenceonactualconsumerpurchasing.Eversincetheclas-sicmarketforlemonspaper[6]demonstratingsomeproblemsformakersofhigh-qualitygoods,economistshavelookedatthevalueofmaintainingagoodreputationasameanstoovercometheseprob-lems[77,162,268,269],amongotherstrategies.(SeetheintroductiontoDewallyandEderington[75],fromwhichtheabovereferenceshavebeentaken,forabriefreview.)Onewaytoacquireagoodreputationis,ofcourse,byreceivingmanypositivereviewsofoneselfasamerchant;anotherisfortheproductsoneoerstoreceivemanypositivereviews.Forthepurposesofourdiscussion,weregardexperimentswhereinthebuyingishypotheticalasbeingoutofscope;instead,wefocusoneco-nomicanalysesofthebehaviorofpeopleengagedinrealshoppingandspendingrealmoney. Notethatresearchersintheeconomicscommunityhaveatraditionofcirculatingandrevisingworkingpapers,sometimesforyears,beforeproducinganarchivalversion.Inthereferencesthatfollow,wehavecitedthearchivalversionwhenjournal-versionpublicationdatahasbeenavailabletous,inordertoenabletheinterestedreadertoaccessthenal,peer-reviewedversionofthework.Butbecauseofthispolicy,thereaderwhowishestodelveintothisliteraturefurthershouldkeepinmindthefollowingtwopoints.First,manycitationswithintheliteraturearetopreliminaryworkingpapers.Thismeansthatourcitationsmaynotpreciselymatchthosegiveninthepapersthemselves(e.g.,theremaybetitlemismatches).Second,workthatwasdoneearliermaybecitedwithalaterpublicationdate;therefore,thedatesgiveninourcitationsshouldnotbetakentoindicateresearch 6.1EconomicImpactofReviewsThegeneralformthatmoststudiestakeistousesomeformofhedo-nicregression[259]toanalyzethevalueandthesignicanceofdier-entitemfeaturestosomefunction,suchasameasureofutilitytothecustomer,usingpreviouslyrecordeddata.(ExceptionsincludeResnicketal.[253],whorananempiricalexperimentcreatingnewsellersoneBay,andJinandKato[136],whomadeactualpurchasestovalidatesellerclaims.)Speciceconomicfunctionsthathavebeenexaminedincluderevenue(box-ocetake,salesrankonAmazon,etc.),revenuegrowth,stocktradingvolume,andmeasuresthatauction-siteslikeeBaymakeavailable,suchasbidpriceorprobabilityofabidorsalebeingmade.Thetypeofproductconsideredvaries(although,understand-ably,thoseoeredbyeBayandAmazonhavereceivedmoreattention):examplesincludebooks,collectiblecoins,movies,craftbeer,stocks,andusedcars.Itisimportanttonotethatsomeconclusionsdrawnfromonedomainoftendonotcarryovertoanother;forinstance,reviewsseemtobeinuentialforbig-ticketitemsbutlesssoforcheaperitems.Buttherearealsoconictingndingswithinthesamedomain.Moreover,dierentsubsegmentsoftheconsumerpopulationmayreactdierently:forexample,peoplewhoaremorehighlymotivatedtopurchasemaytakeratingsmoreseriously.Additionally,insomestudies,positiverat-ingshaveaneectbutnegativeonesdonot,andinotherstudiestheoppositeeectisseen;thetimingofsuchfeedbackandvariouschar-acteristicsofthemerchantorofthefeedbackitself(e.g.,volume)mayalsobeafactor.Nonetheless,toglossovermanydetailsforthesakeofbrevity:ifoneallowsanyeectincludingcorrelationevenifsaidcorre-lationisshowntobenotpredictivethatpassesastatisticalsig-nicancetestatthe0.05leveltobeclassedassignicant,thenmanystudiesndthatreviewpolarityhasasignicanteconomiceect[13,14,23,31,35,47,59,62,68,72,75,76,81,82,128,136,145,180,195,196,198,207,208,214,237,250,253,278,297,331].Butthereareafewstudiesthatconcludeemphaticallythatreviewpositivityorneg-ativityhasnosignicanteconomiceect[56,74,80,87,100,194,325].Duanetal.[80]explicitlyrelatetheirndingstotheissueofcorpo-ratemanipulation:Fromthemanagerialperspective,weshowthatconsumersarerationalininferringmoviequalityfromonlineuser BroaderImplicationsreviewswithoutbeingundulyinuencedbytherating,thuspresentingachallengetobusinessesthattrytoinuencesalesthroughplantingonlineword-of-mouth.Withrespecttoeectsthathavebeenfound,theliteraturesurveycontainedinResnicketal.[253]statesthatAtthelargerendofeectsizesforpositiveevaluations,themodelin[Livingston[196]]ndsthatsellerswithmorethan675positivecommentsearnedapremiumof$45.76,morethan10%ofthemeansellingprice,ascomparedtonewsellerswithnofeedback....Atthelargerendofeectsizesfornegatives,[Lucking-Reileyetal.[198]],lookingatcollectiblecoins,ndsthatamovefrom2to3negativescutsthepriceby11%,about$19fromameanpriceof$173.Butingeneral,theclaimsofstatisticallysignicanteectsthathavebeenmadetendtobe(a)qualiedbyanumberofimportantcaveats,and(b)quitesmallinabsolutetermsperitem,althoughontheotherhandagain,smalleectsperitemcanaddupwhenmanyitemsareinvolved.Withregardtothisdiscussion,thefollowingexcerptfromHouserandWooders[128]isperhapsilluminating:...onaverage,3.46percentofsalesisattributabletothesellerspositivereputationstock.Similarly,ouresti-matesimplythattheaveragecosttosellersstemmingfromneutralornegativereputationscoresis$2.28,or0.93percentofthenalsalesprice.Ifthesepercent-agesareappliedtoallofeBaysauctions($1.6billioninthefourthquarterof2000),thiswouldimplythatsellerspositivereputationsaddedmorethan$55mil-liontothevalueofsales,whilenon-positivesreducedsalesbyabout$15million.Ignoringforthemomentthefactthat,asmentionedabove,otherpapersreportdieringorevenoppositendings,wesimplynotethatthechoiceofwhethertofocuson0.93%,$2.28,or$55million 6.1EconomicImpactofReviews(andwhethertoviewthelatteramountasseemingparticularlylargeornot)isoneweprefertoleavetothereader.Letusnowmentionsomeparticularpapersandndingsofpartic-ularinterest.6.1.1SurveysSummarizingRelevantEconomicLiteratureResnicketal.[253]andBajariandHorta¸csu[24]aregoodentrypointsintothisbodyofliterature.Theyprovideverythoroughoverviewsanddiscussionofthemethodologicalissuesunderlyingthestudiesmentionedabove.Hankin[118]suppliesseveralvisualsummariesthataremodeledaftertheliterature-comparisontablesinDellarocas[71],Resnicketal.[253],andBajariandHorta¸csu[24].Alistofanumberofpapersonthegeneralconceptofsentimentinbehavioralnancecanbefoundathttp://sentiment.behaviouralnance.net/.6.1.2Economic-ImpactStudiesEmployingAutomatedTextAnalysisInmostofthestudiescitedabove,theorientationofareviewwasderivedfromanexplicitratingindicationsuchasnumberofstars,butafewstudiesappliedmanualorautomaticsentimentclassicationtoreviewtext[13,14,35,47,67,68,214,237].Atleastonerelatedsetofstudiesclaimsthatthetextofthereviewscontainsinformationthatinuencesthebehavioroftheconsumers,andthatthenumericratingsalonecannotcapturetheinformationinthetext[106]seealsoGhoseetal.[107],whoadditionallyattempttoassignadollarvaluetovariousadjective-nounpairs,adverb-verbpairs,orsimilarlexicalcongurations.Inarelatedvein,PavlouandDimoka[237]suggestthattheapparentsuccessoffeedbackmecha-nismstofacilitatetransactionsamongstrangersdoesnotmainlycomefromtheircrudenumericalratings,butratherfromtheirrichfeed-backtextcomments.Also,ChevalierandMayzlin[59]interprettheirndingsontheeectofreviewlengthasprovidingsomeevidencethatpeopledoreadthereviewsratherthansimplyrelyingonnumerical BroaderImplicationsOntheotherhand,CabralandHorta¸csu[47],inaninterestingexperiment,lookat41oddcasesoffeedbackonsellerspostedoneBay:whatwasunusualwasthatthefeedbacktextwasclearlypositive,butthenumericalratingwasnegative(presumablyduetousererror).Anal-ysisrevealsthatthesereviewshaveastronglysignicant(botheco-nomicallyandstatistically)detrimentaleectonsalesgrowthrateindicatingthatcustomersseemedtoignorethetextinfavoroftheincorrectsummaryinformation.Insomeofthesetext-basedstudies,whatwasanalyzedwasnotsentimentpersebutthedegreeofpolarization(disagreement)amongasetofopinionateddocuments[13,68]or,inspiredinpartbyPangandLee[233],theaverageprobabilityofasentencebeingsubjectivewithinagivenreview[106].GhoseandIpeirotis[106]alsotakeintoaccountthestandarddeviationforsentencesubjectivitywithinareview,inordertoexaminewhetherreviewscontainingamixofsubjectiveandobjectivesentencesseemtohavedierenteectsfromreviewsthataremostlypurelysubjectiveorpurelyobjective.Someinitiallyunexpectedtexteectsareoccasionallyreported.Forexample,Archaketal.[14]foundthatamazingcamera,excellentcamera,andrelatedphraseshaveanegativeeectondemand.Theyhypothesizethatconsumersconsidersuchphrases,especiallyiffewdetailsaresubsequentlyfurnishedinthereview,toindicatehyperboleandhenceviewthereviewitselfasuntrustworthy.Similarly,Archaketal.[14]andGhoseetal.[107]discoverthatapparentlypositivecom-mentslikedecentqualityorgoodpackagingalsohadanegativeeect,andhypothesizethattheveryfactthatmanyreviewscontainhyperboliclanguagemeanthatwordslikedecentareinterpretedaslukewarm.Thesendingsmightseempertinenttothedistinctionbetweenthepolarityandthecontextualpolarityoftermsandphrases,bor-rowingtheterminologyofWilsonetal.[319].Priorpolarityreferstothesentimentatermevokesinisolation,asopposedtothesentimentthetermevokeswithinaparticularsurroundingcontext;PolanyiandZaenen[242]pointoutthatidentifyingpriorpolarityalonemaynotsuf-ce.Withrespecttothisdistinction,thestatusoftheobservationsofArchaketal.[14]justmentionedisnotentirelyclear.Thesuperlatives 6.2ImplicationsforManipulation(amazing)areclearlyintendedtoconveypositivesentimentregard-lessofwhetherthereviewauthorsactuallymanagedtoconvinceread-ers;thatis,contextisonlyneededtoexplaintheeconomiceectofloweredsales,nottheinterpretationofthereviewitself.Inthecaseofwordslikedecent,onecouldpotentiallymakethecasethatthepriororientationofsuchwordsisinfactneutralratherthanpositive;butalternatively,onecouldargueinsteadthatinasettingwheremanyreviewsarehighlyenthusiastic,thecontextualorientationofdecentisindeeddierentfromitspriororientation.6.1.3InteractionswithWordofMouth(WOM)Onefactorthatsomestudiespointoutisthatthenumberofreviews,positiveornegative,maysimplyreectwordofmouth,sothatinsomecases,whatisreallytheunderlyingcorrelative(ifany)ofeconomicimpactisnottheamountofpositivefeedbackpersebutmerelytheamountoffeedbackintotal.Thisexplainswhyinsomesettings(butnotall),negativefeedbackisseentoincreasesales:theincreasedbuzzbringsmoreattentiontotheproduct(orperhapssimplyindicatesmoreattentionisbeingpaidtotheproduct,inwhichcaseitwouldnotbepredictiveperse).6.2ImplicationsforManipulationRegardingtheincentivesformanipulation,itisdiculttodrawacon-clusiononewayortheotherfromthestudieswehavejustexamined.Onecautiouswaytoreadtheresultssummarizedintheprevioussectionisasfollows.Whiletheremaybesomeeconomicbenetinsomesettingsforacorporationtoplantpositivereviewsorotherwiseattempttouseuntowardmeanstomanufactureanarticiallyinatedreputationorsuppressnegativeinformation,itseemsthatingeneral,agreatdealofeortandresourceswouldberequiredtodosoforperhapsfairlymarginalreturns.Moreworkisclearlyrequired,though;asBajariandHorta¸csu[24]conclude,Thereisstillplentyofworktobedonetounderstandhowmarketparticipantsutilizetheinformationcontainedinthefeedbackforumsystem.Surveyingthestateoftheartinthissubjectisbeyondthescopeofthissurvey;afairlyconcise BroaderImplicationsreviewofissuesregardingonlinereputationsystemsmaybefoundinDellarocas[71].Wewouldliketoconclude,though,bypointingoutaresultthatindicatesthatevenifillegitimatereviewsdogetthrough,opinion-miningsystemscanstillbevaluabletoconsumers.AwerbuchandKleinberg[22]studythecompetitivecollaborativelearningsettinginwhichsomeoftheusersareassumedtobeByzantine(mali-cious,dishonest,coordinated,andabletoeavesdroponcommunica-tions),andproductorresourcequalityvariesovertime.Theauthorsformulatetheproductselectionproblemasatypeofmulti-armedban-ditproblem.Theyshowthestrikingresultthatevenifonlyaconstantfractionofusersarehonestand(unbeknownsttothem)groupedintomarketsegmentssuchthatallmembersofablocksharethesameproductpreferenceswiththeimplicationthattherecommendationsofanhonestusermaybeuselesstohonestusersindierentmarketsegmentsthenthereisstillanalgorithmbywhich,intimepolyno-mialinlog(),theaverageregretperhonestuserisarbitrarilysmall(assumingthatthenumberofproductsorresourcesonoerisRoughlyspeaking,thealgorithmcausesuserstotendtoraisetheprob-abilityofgettingrecommendationsfromvaluablesources.Thus,eveninthefaceofratherstioddsandformidableadversaries,honestuserscanatleastintheorystillgetgoodadvicefromsentiment-analysis 7 PubliclyAvailableResources 7.1Datasets7.1.1AcquiringLabelsforDataOnesourceofopinion,sentiment,andsubjectivitylabelsis,ofcourse,manualannotation[172,309].However,researchersintheeldhavealsomanagedtondwaystoavoidmanualannotationbyleveragingpre-existingresources.Acom-montechniqueistouselabelsthathavebeenmanuallyassigned,butnotbytheexperimentersthemselves;thisexplainswhyresearchersinopinionminingandsentimentanalysishavetakenadvantageofRottenTomatoes,Epinions,Amazon,andothersiteswhereusersfurnishrat-ingsalongwiththeirreviews.Someothernoteworthytechniquesareasfollows:SentimentsummariescanbegatheredbytreatingthereviewsnippetsthatRottenTomatoesfurnishesasone-sentencesummaries[33].Subjectivevs.non-subjectivetextsonthesametopiccanbegatheredbyselectingeditorialsversusnon-editorialnewswire PubliclyAvailableResources[308,326]orbyselectingmoviereviewsversusplotsum-maries[222,232].Ifsentiment-orientedsearchenginesalreadyexist(oneexam-pleusedtobeOpinmind),thenonecanissuetopicalqueriestosuchsearchenginesandharvesttheresultstogetsentiment-bearingsentencesmoreorlessguaranteedtobeon-topic[206].(Ontheotherhand,thereissomethingcircu-laraboutthisapproach,sinceitbootstrapsoofsomeoneelsessolutiontotheopinion-miningproblem.)OnemightbeabletoderiveaectlabelsfromemoticonsemoticonsTextpolaritymaybeinferredfromcorrelationswithstock-marketbehaviororothereconomicindicators[168,107].Viewpointlabelscanbederivedfromimagesofpartylogosthatusersdisplay[160].Negativeopinionscanbegatheredbyassumingthatwhenonenewsgrouppostcitesanother,itistypicallydonetoindi-catenegativesentimenttowardthecitedpost[4].Amorerenedapproachtakesintoaccountindicationsofshout-ing,suchastextrenderedallincapitalletters[110].OnepointtomentionwithregardstositeswhereusersratethecontributionsofotheruserssuchastheexamplesofAmazonandEpinionsmentionedaboveisapotentialbiastowardpositivescores[59,74,128,132,240,253],aswehavementionedabove.Insomecases,thiscomesaboutbecauseofsociologicaleects.Forexample,PinchandAthanasiades[240],inastudyofamusic-orientedsitecalledACIDplanet,foundthatvariousforcestendtocauseuserstogivehighratingstoeachothersmusic.Theusersthemselvesrefertothisphe-nomenonasR=R(reviewmeandIwillreviewyou),amongother,lesspolite,names,andtheACIDplanetadministratorsintroducedaformofanonymousreviewingtoavoidthisissueincertainscenarios.Thus,thereisthequestionofwhetheronecantrusttheautomati-callydeterminedlabelsthatoneistrainingonesclassiersupon.(Afterall,youoftengetwhatyoupayfor,astheysay.)Indeed,Liuetal.[193]essentiallyre-labeledtheirreview-qualityAmazondataduetoconcerns 7.1Datasetsaboutbias,asdiscussedinSection5.2.4.Ontheotherhand,whilethisphenomenonimpliesthatreviewersmaynotalwaysbesincere,wehypothesizethatthisphenomenondoesnotgreatlyaectthequalityoftheauthorsmeta-datalabelsatreectingtheintendedsentimentofthereviewitself.Thatis,wehypothesizethatinmanycasesonecanstilltrustthereviewslabel,evenifonedoesnottrustthereview.7.1.2AnAnnotatedListofDatasetsThefollowinglistisinalphabeticalorder.Blog06[registrationandfeerequired]TheUniversityofGlasgowdistributesthis25GBTRECtestcol-lection,consistingofblogpostsoverarangeoftopics.Accessinformationisavailableathttp://ir.dcs.gla.ac.uk/test collections/access to data.html.IncludedinthedatasetaretopblogsthatwereprovidedbyNielsenBuzzMetricsandsupplementedbytheUniversityofAmsterdam[227],andsomespamblogs,alsoknownassplogs,thatwereplantedinthecorpusinordertosimulateamorerealisticset-ting.Assessmentsincluderelevancejudgmentsandlabelsastowhetherpostscontainrelevantopinionsandwhatthepolarityoftheopinionswas(positive,negative,oramixtureofboth).MacdonaldandOunis[199]givemoredetailsonthecreationofthecorpusandthecollectionsfeatures,andincludesomecomparisonwithanothercollectionofblogpostings,theBlogPulsedataset(contactinformationcanbefoundonthefollowingagreementform:http://www.blogpulse.com/www2006-workshop/datashare-agreement.pdf,butitmaybeoutofdate).Congressionaloor-debatetranscripts:http://www.cs.cornell.edu/home/llee/data/convote.htmlThisdataset,rstintroducedinThomasetal.[294],includesspeechesasindividualdocumentstogetherwith:Automaticallyderivedlabelsforwhetherthespeakersup-portedoropposedthelegislationdiscussedinthedebatethespeechappearsin,allowingforexperimentswiththiskindofsentimentanalysis. PubliclyAvailableResourcesIndicationsofwhichdebateeachspeechcomesfrom,allow-ingforconsiderationofconversationalstructure.Indicationsofby-namereferencesbetweenspeakers,allow-ingforexperimentsonagreementclassicationifoneassignsgold-standardagreementlabelsfromthesupport/opposelabelsassignedtothepairofspeakersinquestion.TheedgeweightsandotherinformationderivedtocreatethegraphsusedinThomasetal.[294],facilitatingimplemen-tationofalternativegraph-basedmethodsuponthegraphsconstructedinthatearlierwork.Cornellmovie-reviewdatasets:http://www.cs.cornell.edu/people/pabo/movie-review-data/Thesecorpora,rstintroducedinPangandLee[232,233],consistofthefollowingdatasets,whichincludeautomaticallyderivedlabels.Sentimentpolaritydatasets:document-level:polaritydatasetv2.0:1000positiveand1000negativeprocessedreviews.(Anearlierver-sionofthisdataset(v1.0)wasrstintroducedinPangetal.[235].)sentence-level:sentencepolaritydatasetv1.0:5331positiveand5331negativeprocessedsentences/snippets.Sentiment-scaledatasets:scaledatasetv1.0:acollectionofdocumentswhoselabelscomefromaratingscale.Subjectivitydatasetv1.0:5000subjectiveand5000objectiveprocessedsentences.Weshouldpointoutthattheexistenceofthepolarity-baseddatasetsdoesnotindicatethatthecurators(i.e.,us)believethatreviewswithmiddlingratingsarenotimportanttoconsiderinpractice(indeed,thesentiment-scalecorporacontainsuchdocuments).Rather,therationaleincreatingthepolaritydatasetwasasfollows.Atthe 7.1Datasetstimethecorpuscreationwasbegun,theapplicationofmachinelearn-ingtechniquestosentimentclassicationwasverynew,and,asdis-cussedinSection3,itwasnaturaltoassumethattheproblemcouldbeverychallengingtosuchtechniques.Therefore,thepolaritycor-puswasconstructedtobeaseasyfortext-categorizationtechniquesaspossible:thedocumentsfellintooneoftwowell-separatedandsize-balancedcategories.Thepointwas,then,tousethiscorpusasalenstostudytherelativedicultyofsentimentpolarityclassica-tionascomparedtostandardtopic-basedclassication,wheretwo-balanced-classproblemswithwell-separatedcategoriesposeverylittlechallenge.AlistofpapersthatuseorreportperformanceontheCornellmovie-reviewdatasetscanbefoundathttp://www.cs.cornell.edu/people/pabo/movie-review-data/otherexperiments.html.Customerreviewdatasets:http://www.cs.uic.edu/Thisdataset,introducedinHuandLiu[129],consistsofreviewsofveelectronicsproductsdownloadedfromAmazonandCnet.Thesentenceshavebeenmanuallylabeledastowhetheranopin-ionisexpressed,andifso,whatfeaturefromapre-denedlistisbeingevaluated.Anaddendumwithnineproductsisalsoavailable(http://www.cs.uic.edu/liub/FBS/Reviews-9-products.rar)andhasbeenutilizedinrecentwork[78].Thecurator,BingLiu,alsodistributesacomparative-sentencedatasetthatisavailablebyrequest.Economining:http://economining.stern.nyu.edu/datasets.htmlThissite,hostedbytheSternSchoolatNewYorkUniversity,consistsofthreesetsofdata:Transactionsandpricepremiums.FeedbackpostingsformerchantsatAmazon.com.Automaticallyderivedsentimentscoresforfrequentevalua-tionphrasesatAmazon.com. PubliclyAvailableResourcesTheseformedthebasisfortheworkreportedinGhoseetal.[107],whichfocusesoninteractionsbetweensentiment,subjectivity,andeconomicFrenchsentences:http://www.psor.ucl.ac.be/personal/yb/Resource.htmlThisdataset,introducedinBestgenetal.[36],consistsof702sentencesfromaBelgian Frenchnewspaper,withlabelsassignedbytenjudgesastounpleasant,neutralorpleasantcontent,usingaseven-pointscale.MPQACorpus:http://www.cs.pitt.edu/mpqa/databaserelease/TheMPQAOpinionCorpuscontains535newsarticlesfromawidevarietyofnewssources,manuallyannotatedatthesententialandsub-sententiallevelforopinionsandotherprivatestates(i.e.,beliefs,emo-tions,sentiments,speculations,andsoon).Wiebeetal.[309]describestheoverallannotationscheme;Wilsonetal.[319]describesthecontex-tualpolarityannotationsandanagreementstudy.Multiple-aspectrestaurantreviews:http://people.csail.mit.edu/bsnyder/naacl07Thecorpus,introducedinSnyderandBarzilay[272],consistsof4,488reviews,bothinraw-textandinfeature-vectorform.Eachreviewgivesanexplicit1-to-5ratingforvedierentaspectsfood,ambiance,ser-vice,value,andoverallexperiencealongwiththetextofthereviewitself,allprovidedbythereviewauthor.Aratingofvewasthemostcommonoverallaspects,andSnyderandBarzilay[272]reportthat30.5%ofthe3,488reviewsintheirrandomlyselectedtrainingsethadaratingofveforallveaspects,althoughnoothertupleofratingswasrepresentedbymorethan5%ofthetrainingset.ThecodeusedinSny-derandBarzilay[272]isalsodistributedattheaforementionedURL.Theoriginalsourceforthereviewswashttp://www.we8there.com/;datafromthesamewebsitewasalsousedbyHigashinakaetal.[122].Multi-DomainSentimentDataset:http://www.cis.upenn.edu/mdredze/datasets/sentiment/Thisdataset,introducedinBlitzeretal.[40],consistsofproduct 7.2EvaluationCampaignsreviewsfromseveraldierentproducttypestakenfromAmazon.com,somewith1-to-5starlabels,someunlabeled.NTCIRmultilingualcorpus[registrationrequired]ThecorpusfortheNTCIR6pilottaskconsistsofnewsarticlesinJapanese,Chinese,andEnglishandformedthebasisoftheOpinionAnalysisTaskatNTCIR6[267].Thetrainingdatacontainsannotationsregardingopinionholders,theopinionsheldbyopinionholder,andsentimentpolarity,aswellasrelevanceinformationforasetofpre-determinedtopics.ThecorpusoftheNTCIRMultilingualOpinion-AnalysisTask(MOAT)isdrawnfromJapanese,Chinese,andEnglishblogs.Review-searchresultssets:http://www.cs.cornell.edu/home/llee/data/search-subj.htmlThiscorpus,usedbyPangandLee[234],consistsofthetop20resultsreturnedbytheYahoo!searchengineinresponsetoeachofasetof69queriescontainingthewordreview.ThequeriesweredrawnfromthepubliclyavailablelistofrealMSNusersqueriesreleasedforthe2005KDDCupcompetition[185];theKDDdataitselfisavailableathttp://www.acm.org/sigs/sigkdd/kdd2005/Labeled800Queries.zip.Thesearch-engineresultsinthecorpusareannotatedastowhethertheyaresubjectiveornot.Notethatsalespitchesweremarkedobjec-tiveonthepremisethattheyrepresentbiasedreviewsthatusersmightwishtoavoidseeing.7.2EvaluationCampaigns7.2.1TRECOpinion-RelatedCompetitionsTheTREC-BLOGwiki,http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/,isausefulsourceofinformationonthecompetitionssketchedbelow.TREC2006BlogTrack.TREC2006involvedaBlogtrack,withanopinionretrievaltaskdesignedpreciselytofocusontheopinionatedcharacterthatmanyblogshave:participatingsystemshadtoretrieve PubliclyAvailableResourcesblogpostsexpressinganopinionaboutaspeciedtopic.Fourteengroupsparticipated;Ounisetal.[227]giveanoverviewoftheresults.Somendingsareasfollows.Withrespecttoperformanceonopiniondetection,theparticipatingsystemsseemedtofallintotwogroups.Opinion-detectionabilityandrelevance-determinationabilityseemedtobestronglycorrelated.Whilethebestsystemswereaboutequallygoodatdetectingnegativesentimentaspositivesentiment,systemsperformingatthemedianseemedtobeabitmoreeectiveatlocat-ingdocumentswithnegativesentiment.Mostparticipantsfollowedapipelinedapproach,wherersttopicrelevancewastackled,andthenopiniondetectionwasappliedupontheresults.Perhapsthemostsur-prisingobservationwasthattheorganizersdiscoveredthatitwaspos-sibletoachieveverygoodrelativeperformancebyomittingthesecondphaseofthepipeline;butwetakeheartinthefactthattheeldisstillrelativelyyoungandhasroomtogrowandmature.TREC2007BlogTrack.TheTREC2007Blogtrackretainedtheopinionretrievaltaskandinstituteddeterminingthesentimentstatus(positive,negative,ormixed)oftheretrievedopinionsasasubtask.The2007and2006BlogTrackresultsareanalyzedinOunisetal.[228].Theyfoundthatlexicon-basedapproacheseitherwherethediscriminativenessoftermswasdeterminedonlabeledtrainingdataorwherethetermsweremanuallycompiledconstitutedthemaineectiveapproaches.TREC2008BlogTrack.IntheTREC2008Blogtrack,thepolarity-identicationproblemwasre-posedasoneofrankingofpositive-polarityretrieveddocumentsbydegreeofpositivity,and,similarly,rankingofnegative-polarityretrieveddocumentsbydegreeofnegativ-ity.(Mixedopinionateddocumentswerenottobeincludedinthese7.2.2NTCIROpinion-RelatedCompetitionsTheNationalInstituteofInformatics(NII)runsannualmeetingscode-namedNTCIR(NIITestCollectionforInformationRetrievalSystems).OpinionanalysiswasfeaturedatanNTCIR-5workshop,andservedasapilottaskatNTCIR-6andafull-blowntaskatNTCIR-7. 7.2EvaluationCampaignsNTCIR-6opinionanalysispilottask.ThedatasetconsistsofnewswiredocumentsinChinese,Japanese,andEnglish;theorganiz-ersdescribethisaswhatwebelievetobetherstmultilingualopin-ionanalysisdatasetovercomparabledata[93].Thefourconstituenttasks,intentionallydesignedtobefairlysimplesoastoencouragepar-ticipationfrommanygroups,wereasfollows:Detectionofopinionatedsentences.Detectionofopinionholders.(optional)Polaritylabelingofopinionatedsentencesaspos-itive,negative,orneutral.(optional)Detectionofsentencesrelevanttoagiventopic.Duetovariationinannotatorlabelings,twoevaluationstandardsweredened.Inthestrictevaluation,ananswerisconsideredcorrectifallthreeannotatorsagreedonit.Inthelenientevaluation,onlyamajority(i.e.,two)oftheannotatorswererequiredtoagreewithananswerforittobeconsideredcorrect.Sekietal.[267]giveanoverviewandtheresultsofthisevaluationexercise,notingthatdierencesbetweenlanguagesmakedirectcompar-isondicult,especiallysinceprecisionandrecallweredened(slightly)dierentlyacrosslanguages.Ashortenedversionofthisoverviewalsoexists[93].NTCIR-7MultilingualopinionanalysistaskMOAT,2008.Subse-quenttotheNTCIR-6pilottask,anewdatasetwasselected,drawnfromblogsinJapanese,traditionalandsimpliedChinese,andEnglish;accordingtotheorganizers,Weplantoselectandbalanceusefultop-icsforopinionminingresearchers,suchastopicsconcerningproductreviews,moviereviews,andsoon.Thisexerciseinvolvessixsubtasks:Detectionofopinionatedsentencesandopinionfragmentswithinopinionatedsentences.Polaritylabelingofopinionfragmentsaspositive,negativeorneutral.(optional)Strengthlabelingofopinionfragmentsasveryweak,average,orverystrong. PubliclyAvailableResources(optional)Detectionofopinionholders.(optional)Detectionofopiniontargets.(optional)DetectionofsentencesthatarerelevanttoagivenAsinthepreviouscompetition,bothstrictandlenientevaluationstan-dardsaretobeapplied.OpQACorpus[availablebyrequest]Stoyanovetal.[283]describestheconstructionofthiscorpus,whichisacollectionofopinionquestionsandanswerstogetherwith98documentsselectedfromtheMPQAdataset.7.3LexicalResourcesThefollowinglistisinalphabeticalorder.GeneralInquirer:http://www.wjh.harvard.edu/Thissiteprovidesentry-pointstovariousresourcesassociatedwiththeGeneralInquirer[281].Includedaremanually-classiedtermslabeledwithvarioustypesofpositiveornegativesemanticorientation,andwordshavingtodowithagreementordisagreement.NTUSentimentDictionary[registrationrequired]ThissentimentdictionarylistingthepolaritiesofmanyChinesewordswasdevelopedbyacombinationofautomatedandmanualmeans[171].Aregistrationformforacquiringitisavailableathttp://nlg18.csie.ntu.edu.tw:8080/opinion/userform.jsp.OpinionFindersSubjectivityLexicon:http://www.cs.pitt.edu/mpqa/ThelistofsubjectivitycluesthatispartofOpinionFinderisavailablefordownload.Theseclueswerecompiledfromseveralsources,repre-sentingseveralyearsofeort,andwereusedinWilsonetal.[319]. 7.4Tutorials,Bibliographies,andOtherReferencesSentiWordnet:http://sentiwordnet.isti.cnr.it/SentiWordnet[91]isalexicalresourceforopinionmining.EachsynsetofWordNet[95],apubliclyavailablethesaurus-likeresource,isassignedoneofthreesentimentscorespositive,negative,orobjectivewherethesescoreswereautomaticallygeneratedusingasemi-supervisedmethoddescribedinEsuliandSebastiani[90].TaboadaandGrievesTurneyadjectivelist[availablethroughtheYahoo!sentimentAIgroup]Reportedarethesemantic-orientationvaluesaccordingtothemethodproposedbyTurney[298]for1700adjectives.7.4Tutorials,Bibliographies,andOtherReferencesBingLiuhasachapteronopinionmininginhisbookonWebdatamining[190].Slidesforthatchapterareavail-ableathttp://www.cs.uic.edu/liub/teach/cs583-spring-07/opinion-mining.pdf.SlidesforJanyceWiebestutorial,Semantics,opinion,andsen-timentintext,attheEUROLAN2007SummmerSchoolareavail-ableathttp://www.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07wiebe.ppt.ThefollowingareonlinebibliographiesthatcontaininformationinBibTeXformat:http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html,themainwebsiteforthissurvey,http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html,maintainedbyAndreaEsuli,http://research.microsoft.com/jtsun/OpinionMiningPaperList.html,maintainedbyJian-TaoSun,http://www.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07bib.htmlwithactual.bibleathttp:// PubliclyAvailableResourceswww.cs.pitt.edu/wiebe/pubs/papers/EUROLAN07/eurolan07.bib,maintainedbyJanyceWiebe.EsuliandWiebessiteshaveadditionalsearchcapabilities.MembersoftheYahoo!groupsentimentAI(http://tech.groups.yahoo.com/group/SentimentAI/)haveaccesstotheresourcesthathavebeencontributedthere(suchassomelinkstocorporaandpapers)andaresubscribedtotheassociatedmailinglist.Joiningisfree. 8 ConcludingRemarks Whenaskedhowheknewapiecewasnished,heresponded,Whenthedinnerbellrings.apocryphalanecdoteaboutAlexanderCalderOurgoalinthissurveyhasbeentocovertechniquesandapproachesthatpromisetodirectlyenableopinion-orientedinformation-seekingsystems,andtoconveytothereaderasenseofourexcitementabouttheintellectualrichnessandbreadthofthearea.Weverymuchencouragethereadertotakeupthemanyopenchallengesthatremain,andhopewehaveprovidedsomeresourcesthatwillprovehelpfulinthisregard.Onthetopicofresources:wehavealreadyindicatedabovethatthebibliographicdatabaseusedinthissurveyispubliclyavailable.Infact,theURLmentionedabove,http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html,isourpersonallymain-tainedhomepageforthissurvey.Anysubsequenteditionsorversionsofthissurveythatmaybeproduced,orrelatednews,willbeannounced Indeed,wehavevagueaspirationstoproducingadirectorscutoneday.Wecertainlyhaveaccumulatedsomenumberofouttakes:wedidnotmanagetondawaytowork ConcludingRemarksSpeakingofresources,wehavedrawnconsiderablyonthoseofmanyothersduringthecourseofthiswork.Wethushaveanumberofsincereacknowledgmentstomake.ThissurveyisbaseduponworksupportedinpartbytheNationalScienceFoundationundergrantno.IIS-0329064,aCornellUniver-sityProvostsAwardforDistinguishedScholarship,aYahoo!ResearchAlliancegift,andanAlfredP.SloanResearchFellowship.Anyopin-ions,ndings,andconclusionsorrecommendationsexpressedarethoseoftheauthorsanddonotnecessarilyreecttheviewsorocialpoli-cies,eitherexpressedorimplied,ofanysponsoringinstitutions,theUSgovernment,oranyotherentity.Wewouldliketowholeheartedlythanktheanonymousreferees,whoprovidedoutstandingfeedbackastonishinglyquickly.Theirinsightscontributedimmenselytothenalformofthissurveyonmanylev-els.Itishardtodescribeourlevelofgratitudetothemfortheirtimeandtheirwisdom,excepttosaythis:wehave,invariouscapacities,seenmanyexamplesofreviewinginthecommunity,butthisisthebestwehaveeverencountered.WealsothankEricBreckforhiscarefulreadingofandcommentaryonportionsofthissurvey.Allremainingerrorsandfaultsare,ofcourse,ourown.WearealsoverythankfultoFabrizioSebastiani,forallofhisedi-torialguidanceandcare.Weowehimagreatdebt.WealsogreatlyappreciatethehelpwereceivedfromJamieCallan,who,alongwithFabrizio,servesasEditorinChiefoftheFoundationsandTrendsinInformationRetrievalseries,andJamesFinlay,ofNowPublishers,thepublisherofthisseries.Finally,anumberofunexpectedhealthproblemsaroseinourfam-iliesduringthewritingofthissurvey.Despitethis,itwasourfamilieswhosustaineduswiththeircheerfulandunlimitedsupport(onmanylevels),nottheotherwayaround.Thustoendonasentimentalnotethisworkisdedicatedtothem. somevariantofOncemore,withfeelingintothetitle,ortondaplacefortheheadingSentimentofawoman,ortoformallyproveapotentialundecidabilityresultforsubjec-tivitydetection(JonKleinberg,personalcommunication)basedonreviewsofBrotherhoodoftheWolf(itsthebestdarnedFrenchwerewolfkung-fumovieIveeverseen). References [1]A.Abbasi,Aectintensityanalysisofdarkwebforums,inProceedingsofIntelligenceandSecurityInformatics),pp.282 288,2007.[2]L.A.AdamicandN.Glance,Thepoliticalblogosphereandthe2004U.S.election:Dividedtheyblog,inProceedingsofLinkKDD,2005.[3]A.AgarwalandP.Bhattacharyya,Sentimentanalysis:Anewapproachforeectiveuseoflinguisticknowledgeandexploitingsimilaritiesinasetofdocumentstobeclassied,inProceedingsoftheInternationalConferenceonNaturalLanguageProcessing(ICON),2005.[4]R.Agrawal,S.Rajagopalan,R.Srikant,andY.Xu,Miningnewsgroupsusingnetworksarisingfromsocialbehavior,inProceedingsofWWW,pp.529 535,[5]E.M.Airoldi,X.Bai,andR.Padman,Markovblanketsandmeta-heuristicsearch:Sentimentextractionfromunstructuredtext,LectureNotesinCom-puterScience,vol.3932(AdvancesinWebMiningandWebUsageAnalysis),pp.167 187,2006.[6]G.A.Akerlof,ThemarketforLemons:Qualityuncertaintyandthemarketmechanism,TheQuarterlyJournalofEconomics,vol.84,pp.488 500,1970.[7]S.M.AlMasum,H.Prendinger,andM.Ishizuka,SenseNet:Alinguistictooltovisualizenumerical-valencebasedsentimentoftextualdata,inProceed-ingsoftheInternationalConferenceonNaturalLanguageProcessing(ICON)pp.147 152,2007.(Posterpaper).[8]J.Allan,Introductiontotopicdetectionandtracking,inTopicDetectionandTracking:Event-basedInformationOrganization,(J.Allan,ed.),pp.1 16,Norwell,MA,USA:KluwerAcademicPublishers,ISBN0-7923-7664-1,2002. [9]C.O.Alm,D.Roth,andR.Sproat,Emotionsfromtext:Machinelearningfortext-basedemotionprediction,inProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[10]A.Anagnostopoulos,A.Z.Broder,andD.Carmel,Samplingsearch-engineWorldWideWeb,vol.9,pp.397 429,2006.[11]R.K.AndoandT.Zhang,Aframeworkforlearningpredictivestruc-turesfrommultipletasksandunlabeleddata,JournalofMachineLearningResearch,vol.6,pp.1817 1853,2005.[12]A.AndreevskaiaandS.Bergler,MiningWordNetforafuzzysentiment:Sen-timenttagextractionfromWordNetglosses,inProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[13]W.AntweilerandM.Z.Frank,Isallthattalkjustnoise?Theinforma-tioncontentofinternetstockmessageboards,JournalofFinance,vol.59,pp.1259 1294,2004.[14]N.Archak,A.Ghose,andP.Ipeirotis,Showmethemoney!Derivingthepricingpowerofproductfeaturesbyminingconsumerreviews,inProceeoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining,2007.[15]S.Argamon,ed.,ProceedingsoftheIJCAIWorkshoponDOINGITWITHSTYLE:ComputationalApproachestoStyleAnalysisandSynthesis.2003.[16]S.Argamon,J.Karlgren,andJ.G.Shanahan,eds.,ProceedingsoftheSIGIRWorkshoponStylisticAnalysisofTextForInformationAccess.ACM,2005.[17]S.Argamon,J.Karlgren,andO.Uzuner,eds.,ProceedingsoftheSIGIRWork-shoponStylisticsforTextRetrievalinPractice.ACM,2006.[18]S.Argamon-Engelson,M.Koppel,andG.Avneri,Style-basedtextcatego-rization:WhatnewspaperamIreading?inProceedingsoftheAAAIWork-shoponTextCategorization,pp.1 4,1998.[19]Y.AttaliandJ.Burstein,Automatedessayscoringwithe-raterv.2,ofTechnology,Learning,andAssessment,vol.26,February2006.[20]A.AueandM.Gamon,Automaticidenticationofsentimentvocabulary:Exploitinglowassociationwithknownsentimentterms,inProceedingsoftheACLWorkshoponFeatureEngineeringforMachineLearninginNaturalLanguageProcessing,2005.[21]A.AueandM.Gamon,Customizingsentimentclassierstonewdomains:Acasestudy,inProceedingsofRecentAdvancesinNaturalLanguagePro-cessing(RANLP),2005.[22]B.AwerbuchandR.Kleinberg,Competitivecollaborativelearning,inPro-ceedingsoftheConferenceonLearningTheory(COLT),pp.233 248,2005.(JournalversiontoappearinJournalofComputerandSystemSciences,spe-cialissueoncomputationallearningtheory).[23]P.BajariandA.Horta¸csu,Thewinnerscurse,reserveprices,andendogenousentry:EmpiricalinsightsfromeBayauctions,RANDJournalofEconomicsvol.34,pp.329 355,2003.[24]P.BajariandA.Horta¸csu,Economicinsightsfrominternetauctions,nalofEconomicLiterature,vol.42,pp.457 486,2004. [25]C.F.Baker,C.J.Fillmore,andJ.B.Lowe,TheBerkeleyFramenetProject,ProceedingsofCOLING/ACL,1998.[26]A.Baneld,UnspeakableSentences:NarrationandRepresentationintheLan-guageofFiction.RoutledgeandKeganPaul,1982.[27]M.Bansal,C.Cardie,andL.Lee,Thepowerofnegativethinking:Exploitinglabeldisagreementinthemin-cutclassicationframework,inProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2008.(Posterpaper).[28]R.Bar-Haim,I.Dagan,B.Dolan,L.Ferro,D.Giampiccolo,B.Magnini,andI.Szpektor,ThesecondPASCALrecognisingtextualentailmentchallenge,ProceedingsoftheSecondPASCALChallengesWorkshoponTextualEntailment,2006.[29]R.BarzilayandL.Lee,Learningtoparaphrase:Anunsupervisedapproachusingmultiple-sequencealignment,inProceedingsoftheJointHumanLan-guageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL),pp.16 23,2003.[30]R.BarzilayandK.McKeown,Extractingparaphrasesfromaparallelcorpus,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.50 57,2001.[31]S.Basuroy,S.Chatterjee,andS.A.Ravid,Howcriticalarecriticalreviews?Theboxoceeectsoflmcritics,starpowerandbudgets,Journalof,vol.67,pp.103 117,2003.[32]M.Bautin,L.Vijayarenu,andS.Skiena,Internationalsentimentanalysisfornewsandblogs,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2008.[33]P.Beineke,T.Hastie,C.Manning,andS.Vaithyanathan,Exploringsen-timentsummarization,inProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText,AAAItechnicalreportSS-04-07,2004.[34]F.Benamara,C.Cesarano,A.Picariello,D.Reforgiato,andV.S.Subrahma-nian,Sentimentanalysis:Adjectivesandadverbsarebetterthanadjectivesalone,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.(Shortpaper).[35]J.Berger,A.T.Sorensen,andS.J.Rasmussen,Negativepublicity:Whenisnegativeapositive?,Manuscript.PDFleslastmodicationdate:October16,2007,:http://www.stanford.edu/asorense/papers/Negative Publicity.pdf,2007.[36]Y.Bestgen,C.Fairon,andL.Kerves,Unbarom`etreaectifeectif:Corpusder´ef´erenceetm´ethodepourd´eterminerlavalenceaectivedephrases,inJourn´eesinternationalesdanalysestatistiquedesdonn´estextuelles(JADT)pp.182 191,2004.[37]S.Bethard,H.Yu,A.Thornton,V.Hatzivassiloglou,andD.Jurafsky,Auto-maticextractionofopinionpropositionsandtheirholders,inProceeoftheAAAISpringSymposiumonExploringAttitudeandAectinText[38]D.Biber,VariationAcrossSpeechand.CambridgeUniversityPress, [39]D.M.Blei,A.Y.Ng,andM.I.Jordan,LatentDirichletallocation,ofMachineLearningResearch,vol.3,pp.993 1022,2003.[40]J.Blitzer,M.Dredze,andF.Pereira,Biographies,Bollywood,boom-boxesandblenders:Domainadaptationforsentimentclassication,inProceeoftheAssociationforComputationalLinguistics(ACL),2007.[41]S.R.K.Branavan,H.Chen,J.Eisenstein,andR.Barzilay,Learningdocument-levelsemanticpropertiesfromfree-textannotations,inProceed-ingsoftheAssociationforComputationalLinguistics(ACL),2008.[42]E.BreckandC.Cardie,Playingthetelephonegame:Determiningthehier-archicalstructureofperspectiveandspeechexpressions,inProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2004.[43]E.Breck,Y.Choi,andC.Cardie,Identifyingexpressionsofopinionincon-text,inProceedingsoftheInternationalJointConferenceonArticialIntel-ligence(IJCAI),Hyderabad,India,2007.[44]S.BrinandL.Page,Theanatomyofalarge-scalehypertextualwebsearchengine,inProceedingsofthe7thInternationalWorldWideWebConferencepp.107 117,1998.[45]R.F.BruceandJ.M.Wiebe,Recognizingsubjectivity:Acasestudyinmanualtagging,NaturalLanguageEngineering,vol.5,1999.[46]J.K.Burgoon,J.P.Blair,T.Qin,andJ.F.Nunamaker,Jr.,Detectingdecep-tionthroughlinguisticanalysis,inProceedingsofIntelligenceandSecurityInformatics(ISI),number2665inLectureNotesinComputerScience,p.958,[47]L.CabralandA.Horta¸csu,Thedynamicsofsellerreputation:TheoryandevidencefromeBay,WorkingPaper,downloadedversionrevisedinMarch,2006,URLhttp://pages.stern.nyu.edu/lcabral/workingpapers/Cabral Mar06.pdf,2006.[48]J.Carbonell,SubjectiveUnderstanding:ComputerModelsofBeliefSystemsPhDthesis,Yale,1979.[49]C.Cardie,Empiricalmethodsininformationextraction,AIMagazinevol.18,pp.65 79,1997.[50]C.Cardie,C.Farina,T.Bruce,andE.Wagner,UsingnaturallanguageprocessingtoimproveeRulemaking,inProceedingsofDigitalGovernmentResearch(dg.o),2006.[51]C.Cardie,J.Wiebe,T.Wilson,andD.Litman,Combininglow-levelandsummaryrepresentationsofopinionsformulti-perspectivequestionanswer-ing,inProceedingsoftheAAAISpringSymposiumonNewDirectionsinQuestionAnswering,pp.20 27,2003.[52]G.Carenini,R.Ng,andA.Pauls,Multi-documentsummarizationofeval-uativetext,inProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),pp.305 312,2006.[53]G.Carenini,R.T.Ng,andA.Pauls,Interactivemultimediasummariesofevaluativetext,inProceedingsofIntelligentUserInterfaces(IUI),pp.124 131,ACMPress,2006.[54]D.CartwrightandF.Harary,Structuralbalance:AgeneralizationofHeiderstheory,PsychologicalReview,vol.63,pp.277 293,1956. [55]P.ChaovalitandL.Zhou,Moviereviewmining:Acomparisonbetweensupervisedandunsupervisedclassicationapproaches,inProceedingsoftheHawaiiInternationalConferenceonSystemSciences(HICSS),2005.[56]P.-Y.S.Chen,S.-Y.Wu,andJ.Yoon,Theimpactofonlinerecommendationsandconsumerfeedbackonsales,inInternationalConferenceonInformationSystems(ICIS),pp.711 724,2004.[57]Y.ChenandJ.Xie,Onlineconsumerreview:Word-of-mouthasanewelementofmarketingcommunicationmix,ManagementScience,vol.54,pp.477 491,2008.[58]P.Chesley,B.Vincent,L.Xu,andR.Srihari,Usingverbsandadjectivestoautomaticallyclassifyblogsentiment,inAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.27 29,2006.[59]J.A.ChevalierandD.Mayzlin,Theeectofwordofmouthonsales:Onlinebookreviews,JournalofMarketingResearch,vol.43,pp.345 354,August[60]Y.Choi,E.Breck,andC.Cardie,Jointextractionofentitiesandrelationsforopinionrecognition,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2006.[61]Y.Choi,C.Cardie,E.Rilo,andS.Patwardhan,Identifyingsourcesofopin-ionswithconditionalrandomeldsandextractionpatterns,inProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[62]E.K.Clemons,G.Gao,andL.M.Hitt,Whenonlinereviewsmeethyper-dierentiation:Astudyofthecraftbeerindustry,JournalofManagementInformationSystems,vol.23,pp.149 171,2006.[63]comScore/theKelseygroup,Onlineconsumer-generatedreviewshavesig-nicantimpactonoinepurchasebehavior,PressRelease,http://www.comscore.com/press/release.asp?press=1928,November2007.[64]J.G.ConradandF.Schilder,Opinionmininginlegalblogs,inProceeoftheInternationalConferenceonArticialIntelligenceandLaw(ICAIL)pp.231 236,NewYork,NY,USA:ACM,2007.[65]W.B.CroftandJ.Laerty,eds.,Languagemodelingforinformationretrieval.Number13intheInformationRetrievalSeries.Kluwer/Springer,2003.[66]S.DasandM.Chen,Yahoo!forAmazon:Extractingmarketsentimentfromstockmessageboards,inProceedingsoftheAsiaPacicFinanceAssociationAnnualConference(APFA),2001.[67]S.R.DasandM.Y.Chen,Yahoo!forAmazon:SentimentextractionfromsmalltalkontheWeb,ManagementScience,vol.53,pp.1375 1388,2007.[68]S.R.Das,P.Tufano,andF.deAsisMartinez-Jerez,eInformation:Aclinicalstudyofinvestordiscussionandsentiment,FinancialManagement,vol.34,pp.103 137,2005.[69]K.Dave,S.Lawrence,andD.M.Pennock,Miningthepeanutgallery:Opin-ionextractionandsemanticclassicationofproductreviews,inProceeofWWW,pp.519 528,2003.[70]S.DavidandT.J.Pinch,Sixdegreesofreputation:Theuseandabuseofonlinereviewandrecommendationsystems,FirstMonday,July2006.(SpecialIssueonCommercialApplicationsoftheInternet). [71]C.Dellarocas,Thedigitizationofword-of-mouth:Promiseandchallengesofonlinereputationsystems,ManagementScience,vol.49,pp.1407 1424,2003.(Specialissueone-businessandmanagementscience).[72]C.Dellarocas,X.Zhang,andN.F.Awad,Exploringthevalueofonlineproductratingsinrevenueforecasting:Thecaseofmotionpictures,ofInteractiveMarketing,vol.21,pp.23 45,2007.[73]A.DevittandK.Ahmad,Sentimentanalysisinnancialnews:Acohesion-basedapproach,inProceedingsoftheAssociationforComputationalLin-guistics(ACL),pp.984 991,2007.[74]M.Dewally,Internetinvestmentadvice:Investingwitharockofsalt,cialAnalystsJournal,vol.59,pp.65 77,July/August2003.[75]M.DewallyandL.Ederington,Reputation,certication,warranties,andinformationasremediesforseller-buyerinformationasymmetries:Lessonsfromtheonlinecomicbookmarket,JournalofBusiness,vol.79,pp.693 730,March2006.[76]S.DewanandV.Hsu,Adverseselectioninelectronicmarkets:Evidencefromonlinestampauctions,JournalofIndustrialEconomics,vol.52,pp.497 516,December2004.[77]D.W.Diamond,Reputationacquisitionindebtmarkets,JournalofPolit-icalEconomy,vol.97,pp.828 862,1989.[78]X.Ding,B.Liu,andP.S.Yu,Aholisticlexicon-basedapproachtoopin-ionmining,inProceedingsoftheConferenceonWebSearchandWebDataMining(WSDM),2008.[79]L.DiniandG.Mazzini,Opinionclassicationthroughinformationextrac-tion,inProceedingsoftheConferenceonDataMiningMethodsandDatabasesforEngineering,FinanceandOtherFields(DataMining)pp.299 310,2002.[80]W.Duan,B.Gu,andA.B.Whinston,Doonlinereviewsmatter?Anempiricalinvestigationofpaneldata,SocialScienceResearchNetwork(SSRN)WorkingPaperSeries,http://ssrn.com/paper=616262,versionasofJanuary,2005.[81]D.H.Eaton,Valuinginformation:EvidencefromguitarauctionsoneBay,JournalofAppliedEconomicsandPolicy,vol.24,pp.1 19,2005.[82]D.H.Eaton,Theimpactofreputationtimingandsourceonauctionout-TheB.E.JournalofEconomicAnalysisandPolicy,vol.7,2007.[83]M.Efron,Culturalorientation:Classifyingsubjectivedocumentsbycocia-tion[sic]analysis,inProceedingsoftheAAAIFallSymposiumonStyleandMeaninginLanguage,Art,Music,andDesign,pp.41 48,2004.[84]K.EguchiandV.Lavrenko,Sentimentretrievalusinggenerativemodels,ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.345 354,2006.[85]K.EguchiandC.Shah,Opinionretrievalexperimentsusinggenerativemod-els:ExperimentsfortheTREC2006blogtrack,inProceedingsofTREC[86]P.Ekman,EmotionintheHumanFace.CambridgeUniversityPress,Seconded.,1982. [87]J.EliashbergandS.M.Shugan,Filmcritics:Inuencersorpredictors?,JournalofMarketing,vol.61,pp.68 78,April1997.[88]C.Engstr¨TopicDependenceinSentimentClassication.Mastersthesis,UniversityofCambridge,2004.[89]A.EsuliandF.Sebastiani,Determiningthesemanticorientationoftermsthroughglossanalysis,inProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),2005.[90]A.EsuliandF.Sebastiani,Determiningtermsubjectivityandtermorien-tationforopinionmining,inProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[91]A.EsuliandF.Sebastiani,SentiWordNet:Apubliclyavailablelexicalresourceforopinionmining,inProceedingsofLanguageResourcesandEval-uation(LREC),2006.[92]A.EsuliandF.Sebastiani,PageRankingWordNetsynsets:Anapplicationtoopinionmining,inProceedingsoftheAssociationforComputationalLin-guistics(ACL),2007.[93]D.K.Evans,L.-W.Ku,Y.Seki,H.-H.Chen,andN.Kando,Opinionanalysisacrosslanguages:AnoverviewofandobservationsfromtheNTCIR6opinionanalysispilottask,inProceedingsoftheWorkshoponCross-LanguageInfor-mationProcessing,vol.4578(ApplicationsofFuzzySetsTheory)ofLectureNotesinComputerScience,pp.456 463,2007.[94]A.Fader,D.R.Radev,M.H.Crespin,B.L.Monroe,K.M.Quinn,andM.Colaresi,MavenRank:IdentifyinginuentialmembersoftheUSsenateusinglexicalcentrality,inProceedingsoftheConferenceonEmpiricalMeth-odsinNaturalLanguageProcessing(EMNLP),2007.[95]C.Fellbaum,ed.,Wordnet:AnElectronicLexicalDatabase.MITPress,1998.[96]D.Feng,E.Shaw,J.Kim,andE.Hovy,Learningtodetectconversationfocusofthreadeddiscussions,inProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL)pp.208 215,2006.[97]A.FinnandN.Kushmerick,LearningtoclassifydocumentsaccordingtoJournaloftheAmericanSocietyforInformationScienceandTech-nology(JASIST),vol.7,2006.(Specialissueoncomputationalanalysisofstyle).[98]A.Finn,N.Kushmerick,andB.Smyth,Genreclassicationanddomaintransferforinformationltering,inProceedingsofthe24thBCS-IRSGEuro-peanColloquiumonIRResearch:AdvancesinInformationRetrieval,number2291inLectureNotesinComputerScience,pp.353 362,Glasgow,2002.[99]P.W.Foltz,D.Laham,andT.K.Landauer,Automatedessayscoring:Appli-cationstoeducationtechnology,inProceedingsofED-MEDIA,pp.939 944,[100]C.Forman,A.Ghose,andB.Wiesenfeld,Examiningtherelationshipbetweenreviewsandsales:Theroleofrevieweridentitydisclosureinelec-tronicmarkets,InformationSystemsResearch,vol.19,2008.(Specialissueontheinterplaybetweendigitalandsocialnetworks). [101]G.Forman,AnextensiveempiricalstudyoffeatureselectionmetricsfortextJournalofMachineLearningResearch,vol.3,pp.1289 1305,[102]T.Fukuhara,H.Nakagawa,andT.Nishida,Understandingsentimentofpeoplefromnewsarticles:Temporalsentimentanalysisofsocialevents,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia,2007.[103]M.Gamon,Sentimentclassicationoncustomerfeedbackdata:Noisydata,largefeaturevectors,andtheroleoflinguisticanalysis,inProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2004.[104]M.Gamon,A.Aue,S.Corston-Oliver,andE.Ringger,Pulse:Miningcus-tomeropinionsfromfreetext,inProceedingsoftheInternationalSymposiumonIntelligentDataAnalysis(IDA),number3646inLectureNotesinCom-puterScience,pp.121 132,2005.[105]R.Ghani,K.Probst,Y.Liu,M.Krema,andA.Fano,Textminingforproductattributeextraction,SIGKDDExplorationsNewsletter,vol.8,pp.41 48,[106]A.GhoseandP.G.Ipeirotis,Designingnovelreviewrankingsystems:Pre-dictingusefulnessandimpactofreviews,inProceedingsoftheInternationalConferenceonElectronicCommerce(ICEC),2007.(Invitedpaper).[107]A.Ghose,P.G.Ipeirotis,andA.Sundararajan,Opinionminingusingecono-metrics:Acasestudyonreputationsystems,inProceedingsoftheAssociationforComputationalLinguistics(ACL),2007.[108]N.Godbole,M.Srinivasaiah,andS.Skiena,Large-scalesentimentanalysisfornewsandblogs,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.[109]A.B.GoldbergandX.Zhu,Seeingstarswhentherearentmanystars:Graph-basedsemi-supervisedlearningforsentimentcategorization,inTextGraphs:HLT/NAACLWorkshoponGraph-basedAlgorithmsforNaturalLanguageProcessing,2006.[110]A.B.Goldberg,X.Zhu,andS.Wright,Dissimilarityingraph-basedsemi-supervisedclassication,inArticialIntelligenceandStatistics(AISTATS)[111]S.Greene,Spin:LexicalSemantics,Transitivity,andtheIdenticationofImplicitSentiment.PhDthesis,UniversityofMaryland,2007.[112]G.Grefenstette,Y.Qu,J.G.Shanahan,andD.A.Evans,Couplingnichebrowsersandaectanalysisforanopinionminingapplication,inProceedingsofRecherchedInformationAssist´eeparOrdinateur(RIAO)[113]M.L.Gregory,N.Chinchor,P.Whitney,R.Carter,E.Hetzler,andA.Turner,User-directedsentimentanalysis:Visualizingtheaectivecontentofdocu-ments,inProceedingsoftheWorkshoponSentimentandSubjectivityinTextpp.23 30,Sydney,Australia,July2006.[114]B.Gu,P.Konana,A.Liu,B.Rajagopalan,andJ.Ghosh,Predictivevalueofstockmessageboardsentiments,McCombsResearchPaperNo.IROM-11-06,versiondatedNovember,2006. [115]R.V.Guha,R.Kumar,P.Raghavan,andA.Tomkins,Propagationoftrustanddistrust,inProceedingsofWWW,pp.403 412,2004.[116]B.A.Hagedorn,M.Ciaramita,andJ.Atserias,Worldknowledgeinbroad-coverageinformationltering,inProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2007.(Posterpaper).[117]J.T.Hancock,L.Curry,S.Goorha,andM.Woodworth,Automatedlinguis-ticanalysisofdeceptiveandtruthfulsynchronouscomputer-mediatedcommu-nication,inProceedingsoftheHawaiiInternationalConferenceonSystemSciences(HICSS),p.22c,2005.[118]L.Hankin,Theeectsofuserreviewsononlinepurchasingbehavioracrossmultipleproductcategories,Mastersnalprojectreport,UCBerkeleySchoolofInformation,http://www.ischool.berkeley.edu/ report.pdf,May2007.[119]V.HatzivassiloglouandK.McKeown,Predictingthesemanticorientationofadjectives,inProceedingsoftheJointACL/EACLConference,pp.174 181,[120]V.HatzivassiloglouandJ.Wiebe,Eectsofadjectiveorientationandgrad-abilityonsentencesubjectivity,inProceedingsoftheInternationalConfer-enceonComputationalLinguistics(COLING),2000.[121]M.Hearst,Direction-basedtextinterpretationasaninformationaccessrenement,inText-BasedIntelligentSystems,(P.Jacobs,ed.),pp.257 274,LawrenceErlbaumAssociates,1992.[122]R.Higashinaka,M.Walker,andR.Prasad,Learningtogeneratenaturalisticutterancesusingreviewsinspokendialoguesystems,ACMTransactionsonSpeechandLanguageProcessing(TSLP),2007.[123]P.HitlinandL.Rainie,Theuseofonlinereputationandratingsystems,PewInternet&AmericanLifeProjectMemo,October2004.[124]T.Homan,Onlinereputationmanagementishotbutisitethical?Computerworld,February2008.[125]T.Hofmann,Probabilisticlatentsemanticindexing,inProceedingsof,pp.50 57,1999.[126]D.HopkinsandG.King,Extractingsystematicsocialsciencemeaningfromtext,.Manuscriptavailableathttp://gking.harvard.edu/les/words.pdf,2007versionwastheonemostrecentlyconsulted,2007.[127]J.A.Horrigan,Onlineshopping,PewInternet&AmericanLifeProjectReport,2008.[128]D.HouserandJ.Wooders,Reputationinauctions:Theory,andevi-dencefromeBay,JournalofEconomicsandManagementStrategy,vol.15,pp.252 369,2006.[129]M.HuandB.Liu,Miningandsummarizingcustomerreviews,inPro-ceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.168 177,2004.[130]M.HuandB.Liu,Miningopinionfeaturesincustomerreviews,inProceed-ingsofAAAI,pp.755 760,2004.[131]M.Hu,A.Sun,andE.-P.Lim,Comments-orientedblogsummarizationbysentenceextraction,inProceedingsoftheACMSIGIRConferenceon InformationandKnowledgeManagement(CIKM),pp.901 904,2007.(Posterpaper).[132]N.Hu,P.A.Pavlou,andJ.Zhang,Canonlinereviewsrevealaproductstruequality?:Empiricalndingsandanalyticalmodelingofonlineword-of-mouthcommunication,inProceedingsofElectronicCommerce(EC),pp.324 330,USA,NewYork,NY:ACM,2006.[133]A.HuettnerandP.Subasic,Fuzzytypingfordocumentmanagement,inACL2000CompanionVolume:TutorialAbstractsandDemonstrationNotespp.26 27,2000.[134]M.HurstandK.Nigam,Retrievingtopicalsentimentsfromonlinedocumentcollections,inDocumentgnitionandRetrievalXI,pp.27 34,2004.[135]C.Jacquemin,SpottingandDiscoveringTermsthroughNaturalLanguagePro-cessing.MITPress,2001.[136]G.JinandA.Kato,Price,qualityandreputation:Evidencefromanonlineeldexperiment,TheRANDJournalofEconomics,vol.37,2006.[137]X.Jin,Y.Li,T.Mah,andJ.Tong,Sensitivewebpageclassicationforcontentadvertising,inProceedingsoftheInternationalWorkshoponDataMiningandAudienceIntelligenceforAdvertising,2007.[138]N.JindalandB.Liu,Identifyingcomparativesentencesintextdocuments,ProceedingsoftheACMSpecialInterestGrouponInformationRetrieval,2006.[139]N.JindalandB.Liu,Miningcomparativesentencesandrelations,inPro-ceedingsofAAAI,2006.[140]N.JindalandB.Liu,Reviewspamdetection,inProceedingsofWWW2007.(Posterpaper).[141]N.JindalandB.Liu,Opinionspamandanalysis,inProceedingsoftheConferenceonWebSearchandWebDataMining(WSDM),pp.219 230,[142]N.KajiandM.Kitsuregawa,Automaticconstructionofpolarity-taggedcor-pusfromHTMLdocuments,inProceedingsoftheCOLING/ACLMainCon-ferencePosterSessions,2006.[143]N.KajiandM.Kitsuregawa,BuildinglexiconforsentimentanalysisfrommassivecollectionofHTMLdocuments,inProceedingsoftheJointCon-ferenceonEmpiricalMethodsinNaturalLanguageProcessingandCom-putationalNaturalLanguageLearning(EMNLP-CoNLL),pp.1075 1083,[144]A.Kale,A.Karandikar,P.Kolari,A.Java,T.Finin,andA.Joshi,Modelingtrustandinuenceintheblogosphereusinglinkpolarity,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.(Shortpaper).[145]K.KalyanamandS.H.McIntyre,Theroleofreputationinonlineauctionmarkets,SantaClaraUniversityWorkingPaper02/03-10-WP,2001,datedJune26.[146]J.Kamps,M.Marx,R.J.Mokken,andM.deRijke,UsingWordNettomeasuresemanticorientationofadjectives,inProceedingsofLREC,2004. [147]S.D.Kamvar,M.T.Schlosser,andH.Garcia-Molina,TheEigentrustalgo-rithmforreputationmanagementinP2Pnetworks,inProceedingsofWWWpp.640 651,NewYork,NY,USA:ACM,ISBN1-58113-680-3,2003.[148]H.KanayamaandT.Nasukawa,Fullyautomaticlexiconexpansionfordomain-orientedsentimentanalysis,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),(Sydney,Australia),pp.355 363,July2006.[149]M.Kantrowitz,Methodandapparatusforanalyzingaectandemotionintext,U.S.Patent6622140,PatentledinNovember2000,2003.[150]J.KarlgrenandD.Cutting,Recognizingtextgenreswithsimplemetricsusingdiscriminantanalysis,inProceedingsofCOLING,pp.1071 1075,1994.[151]Y.Kawai,T.Kumamoto,andK.Tanaka,Fairnewsreader:Recommend-ingnewsarticleswithdierentsentimentsbasedonuserpreference,inPro-ceedingsofKnowledge-BasedIntelligentInformationandEngineeringSystemsnumber4692inLectureNotesinComputerScience,pp.612 622,[152]A.KennedyandD.Inkpen,Sentimentclassicationofmoviereviewsusingcontextualvalenceshifters,ComputationalIntelligence,vol.22,pp.110 125,[153]B.Kessler,G.Nunberg,andH.Sch¨utze,Automaticdetectionoftextgenre,ProceedingsoftheThirty-FifthAnnualMeetingoftheAssociationforCom-putationalLinguisticsandEighthConferenceoftheEuropeanChapteroftheAssociationforComputationalLinguistics,pp.32 38,1997.[154]P.Kim,Theforresterwave:Brandmonitoring,Q32006,ForresterWave(whitepaper),2006.[155]S.-M.KimandE.Hovy,Determiningthesentimentofopinions,inPro-ceedingsoftheInternationalConferenceonComputationalLinguistics(COL-,2004.[156]S.-M.KimandE.Hovy,Automaticdetectionofopinionbearingwordsandsentences,inCompanionVolumetothePdingsoftheInternationalJointConferenceonNaturalLanguageProcessing(IJCNLP),2005.[157]S.-M.KimandE.Hovy,Identifyingopinionholdersforquestionansweringinopiniontexts,inProceedingsoftheAAAIWorkshoponQuestionAnsweringinRestrictedDomains,2005.[158]S.-M.KimandE.Hovy,Automaticidenticationofproandconreasonsinonlinereviews,inProceedingsoftheCOLING/ACLMainConferencePoster,pp.483 490,2006.[159]S.-M.KimandE.Hovy,Identifyingandanalyzingjudgmentopinions,inProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChap-teroftheACLConference(HLT-NAACL),2006.[160]S.-M.KimandE.Hovy,Crystal:Analyzingpredictiveopinionsontheweb,ProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLan-guageProcessingandComputationalNaturalLanguageLearning(EMNLP-,2007.[161]S.-M.Kim,P.Pantel,T.Chklovski,andM.Pennacchiotti,Automaticallyassessingreviewhelpfulness,inProceedingsoftheConferenceonEmpirical MethodsinNaturalLanguageProcessing(EMNLP),pp.423 430,Sydney,Australia,July2006.[162]B.KleinandK.Leer,Theroleofmarketforcesinassuringcontractualperformance,JournalofPoliticalEconomy,vol.89,pp.615 641,1981.[163]J.Kleinberg,Authoritativesourcesinahyperlinkedenvironment,inPro-ceedingsofthe9thACM-SIAMSymposiumonDiscreteAlgorithms(SODA)pp.668 677,1998.(ExtendedversioninJournaloftheACM,46:604 632,[164]J.KleinbergandE.Tardos,Approximationalgorithmsforclassicationprob-lemswithpairwiserelationships:MetriclabelingandMarkovrandomelds,JournaloftheACM,vol.49,pp.616 639,ISSN0004-5411,2002.[165]J.KleinbergandE.Tardos,AlgorithmDesign.AddisonWesley,2006.[166]N.Kobayashi,K.Inui,Y.Matsumoto,K.Tateishi,andT.Fukushima,Col-lectingevaluativeexpressionsforopinionextraction,inProceedingsoftheInternationalJointConferenceonNaturalLanguageProcessing(IJCNLP)[167]M.KoppelandJ.Schler,Theimportanceofneutralexamplesforlearningsentiment,inWorkshopontheAnalysisofInformalandFormalInformationExchangeDuringNegotiations(FINEXIN),2005.[168]M.KoppelandI.Shtrimberg,Goodnewsorbadnews?Letthemarketdecide,inProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText:TheoriesandApplications,pp.86 88,2004.[169]L.-W.Ku,L.-Y.Li,T.-H.Wu,andH.-H.Chen,Majortopicdetectionanditsapplicationtoopinionsummarization,inProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),pp.627 628,2005.(Posterpaper).[170]L.-W.Ku,Y.-T.Liang,andH.-H.Chen,Opinionextraction,summarizationandtrackinginnewsandblogcorpora,inAAAISymposiumonComputa-tionalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.100 107,2006.[171]L.-W.Ku,Y.-T.Liang,andH.-H.Chen,Taggingheterogeneousevaluationcorporaforopinionatedtasks,inConferenceonLanguageResourcesandEvaluation(LREC),2006.[172]L.-W.Ku,Y.-S.Lo,andH.-H.Chen,Testcollectionselectionandgoldstan-dardgenerationforamultiply-annotatedopinioncorpus,inProceedingsoftheACLDemoandPosterSessions,pp.89 92,2007.[173]T.KudoandY.Matsumoto,Aboostingalgorithmforclassicationofsemi-structuredtext,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2004.[174]S.Kurohashi,K.Inui,andY.Kato,eds.,WorkshoponInformationCredibilityontheWeb,2007.[175]N.Kwon,S.Shulman,andE.Hovy,MultidimensionaltextanalysisforeRule-making,inProceedingsofDigitalGovernmentResearch(dg.o),2006.[176]J.Laerty,A.McCallum,andF.Pereira,Conditionalrandomelds:Proba-bilisticmodelsforsegmentingandlabelingsequencedata,inProceedingsof,pp.282 289,2001. [177]J.D.LaertyandC.Zhai,Documentlanguagemodels,querymodels,andriskminimizationforinformationretrieval,inProceedingsofSIGIRpp.111 119,2001.[178]M.Laver,K.Benoit,andJ.Garry,Extractingpolicypositionsfrompolit-icaltextsusingwordsasdata,AmericanPoliticalScienceReview,vol.97,pp.311 331,2003.[179]V.LavrenkoandW.BruceCroft,Relevance-basedlanguagemodels,inProceedingsofSIGIR,pp.120 127,2001.[180]C.G.LawsonandV.C.Slawson,Reputationinaninternetauctionmarket,EconomicInquiry,vol.40,pp.533 650,2002.[181]L.Lee,ImsorryDave,ImafraidIcantdothat:Linguistics,statistics,andnaturallanguageprocessingcirca2001,inComputerScience:ReectionsontheField,ReectionsfromtheField,(CommitteeontheFundamentalsofComputerScience:ChallengesandOpportunities,ComputerScienceandTelecommunicationsBoard,NationalResearchCouncil,ed.),pp.111 118,TheNationalAcademiesPress,2004.[182]Y.-B.LeeandS.H.Myaeng,Textgenreclassicationwithgenre-revealingandsubject-revealingfeatures,inProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2002.[183]D.LeinweberandA.Madhavan,Threehundredyearsofstockmarketmanip-JournalofInvesting,vol.10,pp.7 16,Summer2001.[184]H.LiandK.Yamanishi,Miningfromopenanswersinquestionnairedata,ProceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.443 449,2001.(JournalversioninIEEEIntelligentvol.17,no.5,pp.58 63,2002).[185]Y.Li,Z.Zheng,andH.Dai,KDDCUP-2005report:Facingagreatchal-SIGKDDExplorations,vol.7,pp.91 99,2005.[186]W.-H.LinandA.Hauptmann,Arethesedocumentswrittenfromdierentperspectives?Atestofdierentperspectivesbasedonstatisticaldistributiondivergence,inProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING)/PdingsoftheAssociationforComputationalLin-guistics(ACL),pp.1057 1064,Sydney,Australia:AssociationforComputa-tionalLinguistics,July2006.[187]W.-H.Lin,T.Wilson,J.Wiebe,andA.Hauptmann,Whichsideareyouon?Identifyingperspectivesatthedocumentandsentencelevels,inProceeoftheConferenceonNaturalLanguageLearning(CoNLL),2006.[188]J.Liscombe,G.Riccardi,andD.Hakkani-T¨ur,Usingcontexttoimproveemotiondetectioninspokendialogsystems,inpeech,pp.1845 1848,[189]L.V.Lita,A.H.Schlaikjer,W.Hong,andE.Nyberg,Qualitativedimensionsinquestionanswering:ExtendingthedenitionalQAtask,inProceedingsof,pp.1616 1617,2005.(Studentabstract).[190]B.Liu,Webdatamining;Exploringhyperlinks,contents,andusagedata,OpinionMining.Springer,2006.[191]B.Liu,M.Hu,andJ.Cheng,Opinionobserver:Analyzingandcomparingopinionsontheweb,inProceedingsofWWW,2005. [192]H.Liu,H.Lieberman,andT.Selker,Amodeloftextualaectsensingusingreal-worldknowledge,inProceedingsofIntelligentUserInterfaces(IUI)pp.125 132,2003.[193]J.Liu,Y.Cao,C.-Y.Lin,Y.Huang,andM.Zhou,Low-qualityprod-uctreviewdetectioninopinionsummarization,inProceedingsoftheJointConferenceonEmpiricalMethodsinNaturalLanguageProcessingandComputationalNaturalLanguageLearning(EMNLP-CoNLL),pp.334 342,2007.(Posterpaper).[194]Y.Liu,Word-of-mouthformovies:Itsdynamicsandimpactonboxocerevenue,JournalofMarketing,vol.70,pp.74 89,2006.[195]Y.Liu,J.Huang,A.An,andX.Yu,ARSA:Asentiment-awaremodelforpredictingsalesperformanceusingblogs,inProceedingsoftheACMSpecialInterestGrouponInformationRetrieval(SIGIR),2007.[196]J.A.Livingston,Howvaluableisagoodreputation?Asampleselectionmodelofinternetauctions,TheReviewofEconomicsandStatistics,vol.87,pp.453 465,August2005.[197]L.Lloyd,D.Kechagias,andS.Skiena,Lydia:Asystemforlarge-scalenewsanalysis,inProceedingsofStringProcessingandInformationRetrievalnumber3772inLectureNotesinComputerScience,pp.161 166,[198]D.Lucking-Reiley,D.Bryan,N.Prasad,andD.Reeves,PenniesfromeBay:Thedeterminantsofpriceinonlineauctions,JournalofIndustrialEco-,vol.55,pp.223 233,2007.[199]C.MacdonaldandI.Ounis,TheTRECBlogs06collection:Creatingandanalysingablogtestcollection,TechnicalReportTR-2006-224,DepartmentofComputerScience,UniversityofGlasgow,2006.[200]Y.MaoandG.Lebanon,Sequentialmodelsforsentimentprediction,inICMLWorkshoponLearninginStructuredOutputSpaces,2006.[201]Y.MaoandG.Lebanon,Isotonicconditionalrandomeldsandlocalsenti-mentow,inAdvancesinNeuralInformationProcessingSystems,2007.[202]L.W.MartinandG.Vanberg,Arobusttransformationprocedureforinter-pretingpoliticaltext,PoliticalAnalysis,vol.16,pp.93 100,2008.[203]H.MasumandY.-C.Zhang,Manifestoforthereputationsociety,,vol.9,2004.[204]S.Matsumoto,H.Takamura,andM.Okumura,Sentimentclassicationusingwordsub-sequencesanddependencysub-trees,inProceedingsofPAKDD05,the9thPacic-AsiaConferenceonAdvancesinKnowledgeDiscoveryandDataMining,2005.[205]R.McDonald,K.Hannan,T.Neylon,M.Wells,andJ.Reynar,Structuredmodelsforne-to-coarsesentimentanalysis,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.432 439,Prague,CzechRepublic:AssociationforComputationalLinguistics,June2007.[206]Q.Mei,X.Ling,M.Wondra,H.Su,andC.X.Zhai,Topicsentimentmixture:Modelingfacetsandopinionsinweblogs,inProceedingsofWWW,pp.171 180,NewYork,NY,USA:ACMPress,2007.(ISBN978-1-59593-654-7). [207]M.I.MelnikandJ.Alm,DoesasellerseCommercereputationmatter?Evi-dencefromeBayauctions,JournalofIndustrialEconomics,vol.50,pp.337 349,2002.[208]M.I.MelnikandJ.Alm,Sellerreputation,informationsignals,andpricesforheterogeneouscoinsoneBay,SouthernEconomicJournal,vol.72,pp.305 328,2005.[209]R.Mihalcea,C.Banea,andJ.Wiebe,Learningmultilingualsubjectivelan-guageviacross-lingualprojections,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.976 983,Prague,CzechRepublic,June[210]R.MihalceaandC.Strapparava,Learningtolaugh(automatically):Com-putationalmodelsforhumorrecognition,JournalofComputationalIntelli-gence,2006.[211]G.MishneandM.deRijke,Capturingglobalmoodlevelsusingblogposts,AAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.145 152,2006.[212]G.MishneandM.deRijke,Moodviews:Toolsforblogmoodanalysis,AAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.153 154,2006.[213]G.MishneandM.deRijke,Astudyofblogsearch,inProceedingsoftheEuropeanConferenceonInformationRetrievalResearch(ECIR),2006.[214]G.MishneandN.Glance,Predictingmoviesalesfrombloggersentiment,AAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.155 158,2006.[215]S.Morinaga,K.Yamanishi,K.Tateishi,andT.Fukushima,MiningproductreputationsontheWeb,inProceedingsoftheACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining(KDD),pp.341 349,2002.(Industrytrack).[216]F.MostellerandD.L.Wallace,AppliedBayesianandClassicalInference:TheCaseoftheFederalistPapers.Springer-Verlag,1984.[217]T.MullenandN.Collier,Sentimentanalysisusingsupportvectormachineswithdiverseinformationsources,inProceedingsoftheConferenceonEmpir-icalMethodsinNaturalLanguageProcessing(EMNLP),pp.412 418,July2004.(Posterpaper).[218]T.MullenandR.Malouf,Takingsides:Userclassicationforinformalonlinepoliticaldiscourse,InternetResearch,vol.18,pp.177 190,2008.[219]T.MullenandR.Malouf,Apreliminaryinvestigationintosentimentanalysisofinformalpoliticaldiscourse,inAAAISymposiumonCompu-tationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.159 162,[220]J.-C.Na,H.Sui,C.Khoo,S.Chan,andY.Zhou,Eectivenessofsimplelin-guisticprocessinginautomaticsentimentclassicationofproductreviews,inConferenceoftheInternationalSocietyforKnowledgeOrganization(ISKO)pp.49 54,2004.[221]T.NasukawaandJ.Yi,Sentimentanalysis:Capturingfavorabilityusingnaturallanguageprocessing,inProceedingsoftheConferenceonKnowledgeCapture(K-CAP),2003. [222]V.Ng,S.Dasgupta,andS.M.N.Arin,Examiningtheroleoflinguis-ticknowledgesourcesintheautomaticidenticationandclassicationofreviews,inProceedingsoftheCOLING/ACLMainConferencePosterSes-,pp.611 618,Sydney,Australia:AssociationforComputationalLinguis-tics,July2006.[223]X.Ni,G.-R.Xue,X.Ling,Y.Yu,andQ.Yang,Exploringintheweblogspacebydetectinginformativeandaectivearticles,inProceedingsofWWW,2007.(Industrialpracticeandexperiencetrack).[224]N.Nicolov,F.Salvetti,M.Liberman,andJ.H.Martin,eds.,AAAISym-posiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW)AAAIPress,2006.[225]K.NigamandM.Hurst,Towardsarobustmetricofpolarity,inAttitudeandAectinText:TheoriesandApplications,number20inInformationRetrievalSeries,(J.G.Shanahan,Y.Qu,andJ.Wiebe,eds.),[226]Y.Niu,X.Zhu,J.Li,andG.Hirst,Analysisofpolarityinformationinmedicaltext,inProceedingsoftheAmericanMedicalInformaticsAssociation2005AnnualSymposium,2005.[227]I.Ounis,M.deRijke,C.Macdonald,G.Mishne,andI.Soboro,OverviewoftheTREC-2006blogtrack,inProceedingsofthe15thTextRetrievalCon-ference(TREC),2006.[228]I.Ounis,C.Macdonald,andI.Soboro,OntheTRECblogtrack,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia,2008.[229]S.Owsley,S.Sood,andK.J.Hammond,Domainspecicaectiveclassi-cationofdocuments,inAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.181 183,2006.[230]M.Palmer,D.Gildea,andP.Kingsbury,Thepropositionbank:Acorpusannotatedwithsemanticroles,ComputationalLinguistics,vol.31,March[231]B.Pang,K.Knight,andD.Marcu,Syntax-basedalignmentofmultipletrans-lations:Extractingparaphrasesandgeneratingnewsentences,inProceeofHLT/NAACL,2003.[232]B.PangandL.Lee,Asentimentaleducation:Sentimentanalysisusingsub-jectivitysummarizationbasedonminimumcuts,inProceedingsoftheAsso-ciationforComputationalLinguistics(ACL),pp.271 278,2004.[233]B.PangandL.Lee,Seeingstars:Exploitingclassrelationshipsforsentimentcategorizationwithrespecttoratingscales,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.115 124,2005.[234]B.PangandL.Lee,Usingverysimplestatisticsforreviewsearch:Anexplo-ration,inProceedingsoftheInternationalConferenceonComputationalLin-guistics(COLING),2008.(Posterpaper).[235]B.Pang,L.Lee,andS.Vaithyanathan,Thumbsup?Sentimentclassica-tionusingmachinelearningtechniques,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),pp.79 86, [236]D.-H.Park,J.Lee,andI.Han,Theeectofon-lineconsumerreviewsonconsumerpurchasingintention:Themoderatingroleofinvolvement,InternationalJournalofElectronicCommerce,vol.11,pp.125 148,(ISSN1086-4415),2007.[237]P.A.PavlouandA.Dimoka,Thenatureandroleoffeedbacktextcommentsinonlinemarketplaces:Implicationsfortrustbuilding,pricepremiums,andsellerdierentiation,InformationSystemsResearch,vol.17,pp.392 414,[238]S.Piao,S.Ananiadou,Y.Tsuruoka,Y.Sasaki,andJ.McNaught,Miningopinionpolarityrelationsofcitations,inInternationalWorkshoponCom-putationalSemantics(IWCS),pp.366 371,2007.(Shortpaper).[239]R.Picard,AectiveComputing.MITPress,1997.[240]T.PinchandK.Athanasiades,ACIDplanet:Astudyofusersofanon-linemusiccommunity,2005.http://sts.nthu.edu.tw/sts camp/les/ACIDplanet%20by%20Trevor%20Pinch.ppt,Presentedatthe50thSocietyforEthnomu-sicology(SEM)conference.[241]G.PinskiandF.Narin,Citationinuenceforjournalaggregatesofscienticpublications:Theory,withapplicationtotheliteratureofphysics,tionProcessingandManagement,vol.12,pp.297 312,1976.[242]L.PolanyiandA.Zaenen,Contextuallexicalvalenceshifters,inProceeoftheAAAISpringSymposiumonExploringAttitudeandAectinTextAAAItechnicalreportSS-04-07,2004.[243]J.M.PonteandW.BruceCroft,Alanguagemodelingapproachtoinforma-tionretrieval,inProceedingsofSIGIR,pp.275 281,1998.[244]A.-M.PopescuandO.Etzioni,Extractingproductfeaturesandopinionsfromreviews,inProceedingsoftheHumanLanguageTechnologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(HLT/EMNLP),2005.[245]R.Quirk,S.Greenbaum,G.Leech,andJ.Svartvik,AcomprehensivegrammaroftheEnglishlanguage.Longman,1985.[246]D.Radev,T.Allison,S.Blair-Goldensohn,J.Blitzer,A.C¸elebi,S.Dimitrov,E.Drabek,A.Hakim,W.Lam,D.Liu,J.Otterbacher,H.Qi,H.Saggion,S.Teufel,M.Topper,A.Winkel,andZ.Zhang,MEADAplatformformultidocumentmultilingualtextsummarization,inConferenceonLanguageResourcesandEvaluation(LREC),Lisbon,Portugal,May2004.[247]D.R.Radev,E.Hovy,andK.McKeown,Introductiontothespecialissueonsummarization,ComputationalLinguistics,vol.28,pp.399 408,(ISSN0891-2017),2002.[248]L.RainieandJ.Horrigan,Election2006online,PewInternet&AmericanLifeProjectReport,January2007.[249]J.Read,Usingemoticonstoreducedependencyinmachinelearningtech-niquesforsentimentclassication,inProceedingsoftheACLStudentResearchWorkshop,2005.[250]D.A.ReinsteinandC.M.Snyder,Theinuenceofexpertreviewsoncon-sumerdemandforexperiencegoods:Acasestudyofmoviecritics,ofIndustrialEconomics,vol.53,pp.27 51,2005. [251]E.ReiterandR.Dale,BuildingNaturalLanguageGenerationSystems.Cam-bridge,2000.[252]P.Resnick,K.Kuwabara,R.Zeckhauser,andE.Friedman,ReputationCommunicationsoftheAssociationforComputingMachinery(CACM),vol.43,pp.45 48,(ISSN0001-0782),2000.[253]P.Resnick,R.Zeckhauser,J.Swanson,andK.Lockwood,Thevalueofrep-utationoneBay:Acontrolledexperiment,ExperimentalEconomics,vol.9,pp.79 101,2006.[254]E.Rilo,S.Patwardhan,andJ.Wiebe,Featuresubsumptionforopinionanalysis,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2006.[255]E.RiloandJ.Wiebe,Learningextractionpatternsforsubjectiveexpres-sions,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2003.[256]E.Rilo,J.Wiebe,andW.Phillips,Exploitingsubjectivityclassicationtoimproveinformationextraction,inProceedingsofAAAI,pp.1106 1111,[257]E.Rilo,J.Wiebe,andT.Wilson,Learningsubjectivenounsusingextrac-tionpatternbootstrapping,inProceedingsoftheConferenceonNaturalLan-guageLearning(CoNLL),pp.25 32,2003.[258]E.Rogers,DiusionofInnovations.FreePress,NewYork,1962.(ISBN0743222091.Fiftheditiondated2003).[259]S.Rosen,Hedonicpricesandimplicitmarkets:Productdierentiationinpurecompetition,TheJournalofPoliticalEconomy,vol.82,pp.34 55,Jan Feb[260]D.RothandW.Yih,Probabilisticreasoningforentityandrelationrecogni-tion,inProceedingsoftheInternationalConferenceonComputationalLin-guistics(COLING),2004.[261]V.L.RubinandE.D.Liddy,Assessingcredibilityofweblogs,inAAAISym-posiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW)pp.187 190,2006.[262]W.Sack,Onthecomputationofpointofview,inProceedingsofAAAIp.1488,1994.(Studentabstract).[263]F.Sebastiani,Machinelearninginautomatedtextcategorization,ACMComputingSurveys,vol.34,pp.1 47,2002.[264]Y.Seki,K.Eguchi,andN.Kando,Analysisofmulti-documentviewpointsummarizationusingmulti-dimensionalgenres,inProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText:TheoriesandApplications,pp.142 145,2004.[265]Y.Seki,K.Eguchi,N.Kando,andM.Aono,Multi-documentsummarizationwithsubjectivityanalysisatDUC2005,inProceedingsoftheDocumentUnderstandingConference(DUC),2005.[266]Y.Seki,K.Eguchi,N.Kando,andM.Aono,Opinion-focusedsummarizationanditsanalysisatDUC2006,inProceedingsoftheDocumentUnderstandingConference(DUC),pp.122 130,2006.[267]Y.Seki,D.KirkEvans,L.-W.Ku,H.-H.Chen,N.Kando,andC.-Y.Lin,OverviewofopinionanalysispilottaskatNTCIR-6,inProceedingsofthe WorkshopMeetingoftheNationalInstituteofInformatics(NII)TestCollec-tionforInformationRetrievalSystems(NTCIR),pp.265 278,2007.[268]C.Shapiro,Consumerinformation,productquality,andsellerreputation,BellJournalofEconomics,vol.13,pp.20 35,1982.[269]C.Shapiro,Premiumsforhighqualityproductsasreturnstoreputations,QuarterlyJournalofEconomics,vol.98,pp.659 680,1983.[270]B.Shneiderman,Treevisualizationwithtree-maps:2-dspace-llingapproach,ACMTransactionsonGraphics,vol.11,pp.92 99,1992.[271]S.Shulman,J.Callan,E.Hovy,andS.Zavestoski,Languageprocessingtech-nologiesforelectronicrulemaking:Aprojecthighlight,inProceedingsofDig-italGovernmentResearch(dg.o),pp.87 88,2005.[272]B.SnyderandR.Barzilay,MultipleaspectrankingusingtheGoodGriefalgorithm,inProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL),pp.300 307,2007.[273]S.Somasundaran,J.Ruppenhofer,andJ.Wiebe,Detectingarguingandsentimentinmeetings,inProceedingsoftheSIGdialWorkshoponDiscourseandDialogue,2007.[274]S.Somasundaran,T.Wilson,J.Wiebe,andV.Stoyanov,QAwithattitude:Exploitingopiniontypeanalysisforimprovingquestionansweringinon-linediscussionsandthenews,inProceedingsoftheInternationalConferenceonWeblogsandSocialMedia(ICWSM),2007.[275]X.Song,Y.Chi,K.Hino,andB.Tseng,Identifyingopinionleadersintheblogosphere,inProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),pp.971 974,2007.[276]E.Spertus,Smokey:Automaticrecognitionofhostilemessages,inProceed-ingsofInnovativeApplicationsofArticialIntelligence(IAAI),pp.1058 1065,1997.[277]E.Stamatatos,N.Fakotakis,andG.Kokkinakis,Textgenredetectionusingcommonwordfrequencies,inProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING),2000.[278]S.S.Standird,Reputationande-commerce:eBayauctionsandtheasym-metricalimpactofpositiveandnegativeratings,JournalofManagementvol.27,pp.279 295,2001.[279]A.StepinskiandV.Mittal,Afact/opinionclassierfornewsarticles,ProceedingsoftheACMSpecialInterestGrouponInformationRetrieval,pp.807 808,NewYork,NY,USA:ACMPress,2007.(ISBN978-1-[280]B.StoneandM.Richtel,Thehandthatcontrolsthesockpuppetcouldgetslapped,TheNewYorkTimes,July162007.[281]P.J.Stone,TheGeneralInquirer:AComputerApproachtoContentAnalysisTheMITPress,1966.[282]V.StoyanovandC.Cardie,Partiallysupervisedcoreferenceresolutionforopinionsummarizationthroughstructuredrulelearning,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)pp.336 344,Sydney,Australia:AssociationforComputationalLinguistics,July2006. [283]V.Stoyanov,C.Cardie,D.Litman,andJ.Wiebe,Evaluatinganopin-ionannotationschemeusinganewmulti-perspectivequestionandanswercorpus,inProceedingsoftheAAAISpringSymposiumonExploringAttitudeandAectinText,AAAITechnicalReportSS-04-07.[284]V.Stoyanov,C.Cardie,andJ.Wiebe,Multi-perspectivequestionansweringusingtheOpQAcorpus,inProceedingsoftheHumanLanguageTechnol-ogyConferenceandtheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(HLT/EMNLP),pp.923 930,Vancouver,BritishColumbia,Canada:AssociationforComputationalLinguistics,October2005.[285]P.SubasicandA.Huettner,Aectanalysisoftextusingfuzzysemantictyping,IEEETransactionsonFuzzySystems,vol.9,pp.483 496,2001.[286]M.Taboada,C.Anthony,andK.Voll,Methodsforcreatingsemanticori-entationdictionaries,inConferenceonLanguageResourcesandEvaluation,pp.427 432,2006.[287]M.Taboada,M.A.Gillies,andP.McFetridge,Sentimentclassicationtech-niquesfortrackingliteraryreputation,inLRECWorkshop:TowardsCom-putationalModelsofLiteraryAnalysis,pp.36 43,2006.[288]H.Takamura,T.Inui,andM.Okumura,Extractingsemanticorientationofwordsusingspinmodel,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.133 140,2005.[289]H.Takamura,T.Inui,andM.Okumura,Latentvariablemodelsforseman-ticorientationsofphrases,inProceedingsoftheEuropeanChapteroftheAssociationforComputationalLinguistics(EACL),2006.[290]H.Takamura,T.Inui,andM.Okumura,Extractingsemanticorientationsofphrasesfromdictionary,inProceedingsoftheJointHumanLanguageTechnology/NorthAmericanChapteroftheACLConference(HLT-NAACL)[291]K.Tateishi,Y.Ishiguro,andT.Fukushima,Opinioninformationretrievalfromtheinternet,InformationProcessingSocietyofJapan(IPSJ)SIG2001,vol.69,no.7,pp.75 82,2001.(AlsocitedasAreputationsearchenginethatgatherspeoplesopinionsfromtheInternet,IPSJTechni-calReportNL-14411.InJapanese).[292]J.Tatemura,Virtualreviewersforcollaborativeexplorationofmoviereviews,inProceedingsofIntelligentUserInterfaces(IUI),pp.272 275,[293]L.Terveen,W.Hill,B.Amento,D.McDonald,andJ.Creter,PHOAKS:Asystemforsharingrecommendations,CommunicationsoftheAssociationforComputingMachinery(CACM),vol.40,pp.59 62,1997.[294]M.Thomas,B.Pang,andL.Lee,Getoutthevote:Determiningsupportoroppositionfromcongressionaloor-debatetranscripts,inProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP)pp.327 335,2006.[295]R.TokuhisaandR.Terashima,Relationshipbetweenutterancesandenthu-siasminnon-task-orientedconversationaldialogue,inProceedingsoftheSIGdialWorkshoponDiscourseandDialogue,pp.161 167,Sydney,Australia:AssociationforComputationalLinguistics,July2006. [296]R.M.Tong,Anoperationalsystemfordetectingandtrackingopinionsinon-linediscussion,inProceedingsoftheWorkshoponOperationalTextClas-sication(OTC),2001.[297]R.TumarkinandR.F.Whitelaw,Newsornoise?Internetpostingsandstockprices,FinancialAnalystsJournal,vol.57,pp.41 51,May/June[298]P.Turney,Thumbsuporthumbsdown?Semanticorientationappliedtounsupervisedclassicationofreviews,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.417 424,2002.[299]P.D.TurneyandM.L.Littman,Measuringpraiseandcriticism:Inferenceofsemanticorientationfromassociation,ACMTransactionsonInformationSystems(TOIS),vol.21,pp.315 346,2003.[300]S.WanandK.McKeown,Generatingoverviewsummariesofongoingemailthreaddiscussions,inProceedingsoftheInternationalConferenceonCom-putationalLinguistics(COLING),pp.549 555,Geneva,Switzerland,2004.[301]M.White,C.Cardie,andV.Ng,Detectingdiscrepanciesinnumericestimatesusingmultidocumenthypertextsummaries,inProceedingsoftheConferenceonHumanLanguageTechnology,pp.336 341,2002.[302]M.White,C.Cardie,V.Ng,K.Wagsta,andD.McCullough,Detectingdis-crepanciesandimprovingintelligibility:TwopreliminaryevaluationsofRIP-TIDES,inProceedingsoftheDocumentUnderstandingConference(DUC)[303]C.Whitelaw,N.Garg,andS.Argamon,Usingappraisalgroupsforsentimentanalysis,inProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),pp.625 631,ACM,2005.[304]J.Wiebe,Learningsubjectiveadjectivesfromcorpora,inProceedingsof,2000.[305]J.Wiebe,E.Breck,C.Buckley,C.Cardie,P.Davis,B.Fraser,D.Litman,D.Pierce,E.Rilo,T.Wilson,D.Day,andM.Maybury,Recognizingandorganizingopinionsexpressedintheworldpress,inProceedingsoftheAAAISpringSymposiumonNewDirectionsinQuestionAnswering,2003.[306]J.WiebeandR.Bruce,Probabilisticclassiersfortrackingpointofview,ProceedingsoftheAAAISpringSymposiumonEmpiricalMethodsinDis-courseInterpretationandGeneration,pp.181 187,1995.[307]J.WiebeandR.Mihalcea,Wordsenseandsubjectivity,inProceedingsoftheConferenceonComputationalLinguistics/AssociationforComputationalLinguistics(COLING/ACL),2006.[308]J.WiebeandT.Wilson,Learningtodisambiguatepotentiallysubjectiveexpressions,inProceedingsoftheConferenceonNaturalLanguageLearning,pp.112 118,2002.[309]J.Wiebe,T.Wilson,andC.Cardie,Annotatingexpressionsofopinionsandemotionsinlanguage,LanguageResourcesandEvaluation(formerlyCom-putersandtheHumanities),vol.39,pp.164 210,2005.[310]J.M.Wiebe,Identifyingsubjectivecharactersinnarrative,inProceeoftheInternationalConferenceonComputationalLinguistics(COLING)pp.401 408,1990. [311]J.M.Wiebe,Trackingpointofviewinnarrative,ComputationalLinguisticsvol.20,pp.233 287,1994.[312]J.M.Wiebe,R.F.Bruce,andT.P.OHara,Developmentanduseofagoldstandarddatasetforsubjectivityclassications,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.246 253,1999.[313]J.M.WiebeandW.J.Rapaport,Acomputationaltheoryofperspectiveandreferenceinnarrative,inProceedingsoftheAssociationforComputationalLinguistics(ACL),pp.131 138,1988.[314]J.M.WiebeandE.Rilo,Creatingsubjectiveandobjectivesentenceclassiersfromunannotatedtexts,inProceedingsoftheConferenceonComputationalLinguisticsandIntelligentTextProcessing(CICLing),number3406inLectureNotesinComputerScience,pp.486 497,2005.[315]J.M.Wiebe,T.Wilson,andM.Bell,Identifyingcollocationsforrecogniz-ingopinions,inProceedingsoftheACL/EACLWorkshoponCollocation:ComputationalExtraction,Analysis,andExploitation,2001.[316]J.M.Wiebe,T.Wilson,R.Bruce,M.Bell,andM.Martin,Learningsub-jectivelanguage,ComputationalLinguistics,vol.30,pp.277 308,September[317]Y.WilksandJ.Bien,Beliefs,pointsofviewandmultipleenvironments,ProceedingsoftheinternationalNATOsymposiumonarticialandhumanintelligence,pp.147 171,USA,NewYork,NY:ElsevierNorth-Holland,Inc.,[318]Y.WilksandM.Stevenson,Thegrammarofsense:Usingpart-of-speechtagsasarststepinsemanticdisambiguation,JournalofNaturalLanguageEngineering,vol.4,pp.135 144,1998.[319]T.Wilson,J.Wiebe,andP.Homann,Recognizingcontextualpolarityinphrase-levelsentimentanalysis,inProceedingsoftheHumanLanguageTech-nologyConferenceandtheConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(HLT/EMNLP),pp.347 354,2005.[320]T.Wilson,J.Wiebe,andR.Hwa,Justhowmadareyou?Findingstrongandweakopinionclauses,inProceedingsofAAAI,pp.761 769,2004.(ExtendedversioninComputationalIntelligence,vol.22,no.2,pp.73 99,2006).[321]H.Yang,L.Si,andJ.Callan,KnowledgetransferandopiniondetectionintheTREC2006blogtrack,inProceedingsofTREC,2006.[322]K.Yang,N.Yu,A.Valerio,andH.Zhang,WIDITinTREC-2006blogtrack,ProceedingsofTREC,2006.[323]J.Yi,T.Nasukawa,R.Bunescu,andW.Niblack,Sentimentanalyzer:Extractingsentimentsaboutagiventopicusingnaturallanguageprocessingtechniques,inProceedingsoftheIEEEInternationalConferenceonDataMining(ICDM),2003.[324]J.YiandW.Niblack,SentimentmininginWebFountain,inProceedingsoftheInternationalConferenceonDataEngineering(ICDE),2005.[325]P.-L.Yin,Informationdispersionandauctionprices,SocialScienceResearchNetwork(SSRN)WorkingPaperSeries,VersiondatedMarch2005.[326]H.YuandV.Hatzivassiloglou,Towardsansweringopinionquestions:Sepa-ratingfactsfromopinionsandidentifyingthepolarityofopinionsentences, ProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP),2003.[327]J.ZabinandA.Jeeries,Socialmediamonitoringandanalysis:Generat-ingconsumerinsightsfromonlineconversation,AberdeenGroupBenchmarkReport,January2008.[328]Z.ZhangandB.Varadarajan,Utilityscoringofproductreviews,inPro-ceedingsoftheACMSIGIRConferenceonInformationandKnowledgeMan-agement(CIKM),pp.51 57,2006.[329]L.Zhou,J.K.Burgeon,andD.P.Twitchell,Alongitudinalanalysisoflanguagebehaviorofdeceptionine-mail,inProceedingsofIntelligenceandSecurityInformatics(ISI),number2665inLectureNotesinComputerSci-ence,p.959,2008.[330]L.ZhouandE.Hovy,Onthesummarizationofdynamicallyintroducedinfor-mation:Onlinediscussionsandblogs,inAAAISymposiumonComputationalApproachestoAnalysingWeblogs(AAAI-CAAW),pp.237 242,2006.[331]F.ZhuandX.Zhang,Theinuenceofonlineconsumerreviewsonthedemandforexperiencegoods:Thecaseofvideogames,inInternationalCon-ferenceonInformationSystems(ICIS),2006.[332]L.Zhuang,F.Jing,X.-Y.Zhu,andL.Zhang,Moviereviewminingandsummarization,inProceedingsoftheACMSIGIRConferenceonInformationandKnowledgeManagement(CIKM),2006.