In57357uenc is measure of the e57355ect of user on the recommendations from recommender system In 57357uence is erful to ol for understanding the orkings of recommender system Exp erimen ts sho that users ha widely arying degrees of in57357uence in ID: 26939
Download Pdf The PPT/PDF document "Inuence in RatingsBased Recommender Syst..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
In\ruenceinRatings-BasedRecommenderSystems:AnAlgorithm-IndependentApproachAlMamunurRashidGeorgeKarypisJohnRiedlAbstractsystemshavebeenshowntohelpusersnditemsofinterestfromamongalargepoolofpotentiallyin-terestingitems.In\ruenceisameasureoftheeectofauserontherecommendationsfromarecommendersystem.In-\ruenceisapowerfultoolforunderstandingtheworkingsofarecommendersystem.Experimentsshowthatusershavewidelyvaryingdegreesofin\ruenceinratings-basedrecom-mendersystems.Proposedin\ruencemeasureshavebeenalgorithm-specic,whichlimitstheirgeneralityandcompa-rability.Weproposeanalgorithm-independentdenitionofin\ruencethatcanbeappliedtoanyratings-basedrecom-mendersystem.Weshowexperimentallythatin\ruencemaybeeectivelyestimatedusingsimple,inexpensivemetrics.1IntroductionSociologistshavelongtriedtocharacterizethein\ruenceofapersoninasocialnetworkofmanypeople[1].Iden-tifyingthein\ruentialpeoplecanbringtwinadvantagestothosewhostudygroupdynamics:(1)Thein\ruen-tialpeoplecanbedirectlystudied,yieldinginsightsincetheirchoicesmaybepredictiveofgroupchoices;or(2)Thein\ruentialpeoplemaybein\ruencedtochangethebehaviorofthegroup.Manysocialnetworksareformedandmaintainedthroughinformal,qualitative,andun-observedinteractions.Capturingdataaboutthesein-teractionsisdicult,andtheactofcapturingthosedatamaychangethesocialinteractionsthemselves.CollaborativeFiltering(CF)recommendersystems[2,3,4]basetheirdecisionsontheopinionsofusers.Incontrasttoothersocialnetworks,recommendersystemscaptureinteractionsthatareformal,quantitative,andobserved.Thesocialnetworkcanbeanalyzeddirectlythroughdataalreadycapturedinthecomputersystem.Pastresearchhasdemonstratedthatanalyzingthesocialnetworkcanprovideleverageinin\ruencingthegroup[5].Theanalysisperformedinthesestudiesisbasedonadeepinvestigationofthecharacteristicsofoneparticularrecommenderalgorithm,thewell-knownuser-usernearestneighboralgorithm[2].Carefulanalysisofthistypehasmanyadvantages,butonekeydisadvantage:itistiedcloselytothedetailsofthealgorithm.Inprinciple,similartechniquescouldDepartmentofComputerScience&Engineering,Univer-sityofMinnesota,Minneapolis,MN-55455,farashid,karypis,riedlg@cs.umn.edubeappliedtootheralgorithms,butdoingsowouldbelaborious,andtheresultingin\ruencemeasureonlyappliestoalgorithmsthatworkpreciselyaccordingtothedetailsoftheanalysis.Sincemanycommercialoperatorstweaktheoperationoftherecommenderinmanywaystottheneedsoftheirbusiness,thisanalysismaynotapplyinpractice.Further,theresultingmeasuresofin\ruencewouldbeunlikelytobecomparablebetweendierentalgorithms,sincetheyhavebeenproducedthroughverydierenttechniques.Akeygoalofthepresentresearchistoidentifyameasureofin\ruenceforrecommendersystemsthatisapplicabletoanyratings-basedrecommendersystem,independentoftheparticularsofthealgorithm.Suchameasurewouldallowforconsistent,black-boxanalysisofin\ruence.2RelatedWork2.1RecommenderSystems.Resnick,etal.[2]in-troducedanautomaticcollaborativelteringalgorithmbasedonak-nearestneighbors(kNN)algorithmamongusers;thisalgorithmisnowcalleduser-userCF.Theuser-useralgorithmweuseinthispaperisaversionoftheoriginalkNNalgorithm,tunedtoachievebestknownperformance.Sarwaretal.[4]proposedanal-ternativekNNCFalgorithmbasedonsimilarityamongitems.Thisvariantisoftencalleditem-itemCF.Breeseetal.[3]havedividedanumberofCFalgorithmsintotwoclasses:memory-basedalgorithmsandmodel-basedalgorithms.OvertheyearsmanyotheralgorithmswereproposedincludingonesbasedonSVD,cluster-ing,BayesianNetworks[3].Wefocusontheuser-useranditem-itemalgorithmsinthispaperbecausetheyarethemostcommoninexistingsystems.2.2SocialNetworksandIn\ruence.ASocialnet-workisaformofgraphdelineatingrelationshipsandinteractionsamongindividuals.Findingtheimportantnodesinsuchgraphshasbeenanobjectofinteresttosociologistsforalongtime.Oneproposedmeasureforimportanceiscentrality[1].Twoexamplesof\cen-trality"measuresare\degreecentrality",whichtreatshighdegreenodesasimportant,and\distancecentral- ui,muN2,muN1,muN3,muN4,muN5,muN6,muN7,muNk,mFigure1:Showingthenotionofin-linksforthekclosestneighborsofui.Here,predictionisbeingcomputedforthe(user,item)pair,(ui;m).ity",whichtreatsnodeswithshortpathstomanyothernodesasimportant[1].Kleinberg'sHITS[6],andBrinandPage'sPageRank[7]algorithmsfororderingnodesinagraphofwebarebasedonsocialnetworkprinciples.Domingosetal.[5]havestudiedtheproblemofchoosingin\ruentialusersformarketerswhowishtoattractattentiontotheirproducts.Theyshowthatselectingtherightsetofusersforamarketingcampaigncanmakeabigdierence.Kempeetal.[8]focusonacollectionofmodelswidelystudiedinsocialnetworks,aswellasthemodelsin[5],underthecategories:LinearThresholdModels,andIndependentCascadeModels.Ourresearchalsoinvestigatesin\ruenceinsocialnetworks.LikeDomingosetal.wefocusonnetworksinrecommendersystems.Weextendtheirresearchtogeneralmeasuresofin\ruencethatareindependentoftheparticularrecommenderalgorithmbeingused.3DeningIn\ruentialUsersinCFSystemsWerstdiscussthedatausedinthisproject,thenan-alyzeapopularCFalgorithmtounderstandapossibleformationprocessofin\ruentialusers,andthentrydif-ferentwaystosetthedenition.3.1TheData.Wehaveusedapubliclyavailabledatasetfromwww.grouplens.org.ThedatasetisafractionoftheusagedatadrawnfromMovieLens(www.movielens.org),aCF-basedonlinemovierec-ommendationsystem.Itcontains6,040users,3,593movies,andaboutonemillionratingsona5-starscale.Eachuserhasratedatleast20moviesinthedataset.Wehavepartitionedthisdataintotrainingandtestsetsbyarandom80%/20%split.3.2TheUser-UserAlgorithm.ThemostwidelycitedandarguablythemostcommonlyusedCFalgo-rithminresearchisakNN-basedalgorithm.Inthisschemetheusers'preferencedataisrepresentedinanmuser-itemmatrixforasystemwithnusersandmitems,wherethe(i;j)-thentryofthismatrixstandsfortheuserui'sratingonitemj,ornull,dependingonwhethertheuseruihasratedtheitemj,ornot,respectively.Theuser-useralgorithmcanbethoughtofworkingintwostages.Intherststage,similari-tiesbetweeneverypairofusersarecomputedandarestoredasamodel.Althoughmanydierentformula-tionsarepossibleforsimilarityweightcalculations,theGroupLens[2]proposedmechanismisthePearsoncor-relationcoecient.Accordingly,thesimilarityweightbetweentwousers,ui,andujismeasuredbyequation3.1:Wij=Pk2I(Rik Ri)(Rjk Rj)qPk2I(Rik Ri)2Pk2I(Rjk Rj)2(3.1)Iisthesetofitemsratedbybothoftheusers,Rikisuserui'sratingonitemk,andRiistheaverageratingofui.Usingthissimilaritymetric,thenextstep,predictiongeneration,iscarriedoutasfollows.Predictiononitemaforuseruiiscomputedbypickingknearestuserswhohavealsorateditema,andbyapplyingaweightedaverageofdeviationsfromtheselectedusers'means:Pia=Ri+Pk=1(Rua Ru)WiuPk=1Wiu(3.2)SomePlausibleIn\ruenceMetricsBasedonPriorWork.Wecannowproposeseveralin\ruencemetrics.Onetypeofmetricismotivatedbytargetedmarketing.Anothertypeofmetricexploitsconnectionsbetweenusersbasedonsimilarity.3.3.1ExpectedLiftinProt:NetworkValues.Thisapproach,asoutlinedin[5],isbasedonthegoaloftargetedmarketing.Inthisscheme,userswhocanyieldthemostexpectedliftinprotbymakingacascadingadoptionofaproducthappen,areconsideredasin\ruentialusers.Domingosetal.[5]haveappliedthisideaonarecommendationsystemdatasetbasedontheuser-userCFalgorithmdescribedinthelastsection.Theprobabilisticmodelin[5]isbasedontheMarkovRandomFields,whichrequirestheneighborsbesymmetric;i.e.,twousersareneighborstoeachotherifoneofthemisaneighbortotheother.TheauthorsmentionthatinakNN-basedCFsystem,thismightnothold.Again,ELPNetworkValueistiedtoaparticularproduct;morespecically,itisspecictoasetoffeaturesoftheproductbeingmarketed.TranslatingthisissueintotheRSdomain,ELPNetworkValues arespecictoparticulargenrevectors.Thusauser'sELPNetworkValuewilldierformovieswithdierentgenrevectors.3.3.2NetworkStructure:SimilarityLinks.Bycloselyobservingtheprocessofneighbor-selection,wenoticesomenetworkstructurethatcouldfacilitateinformingadenitionforin\ruentialusers.Figure1demonstratesasituationwherethesystemiscomputingapredictiononitemmforuserui.Inordertodoso,itselectstopkneighborswhoalsohaveratedtheitemm.Nowwecanimaginedirectededgesfromuitowardseachofthekneighbors.Equations3.3and3.4showtheupdatedauthorityandhubequations.Inordertoconsiderthefactthatallthelinksmaynotofsameweight,wehaveincorporatedaweighttermsimilarto[9]tothebasicHITS[6]equations.Heretheconditionalprobability,p(ijj)referstothedegreeofuseruj'spresenceindicatinguserui'spresence.a(i)=Xj!ip(ijj)h(j)Wij(3.3)h(i)=Xi!jp(jji)a(j)Wij(3.4)Wecanusethismodiedauthoritytorepresentin\ruence.Thedrawbackofthisschemeofin\ruence,however,isalgorithmdependence:thenetworkstructurecap-turedhereisverymuchalgorithm-specic;and,forotheralgorithms,thestructuremightnotbeasap-parent.Inordertoderiveadenitionthatisgenericenough,yetsimple,weusetheHide-one-Userapproachdiscussednext.Thefundamentalconceptwiththisap-proachisguringoutwhichusercausesthelargestcu-mulativechangeofpredictioninthesystem.4Algorithm-IndependentIn\ruenceThesemetricsdenein\ruenceastheamountofeectauserhasoverothersviathepredictionstheyreceive.Onewaytoobservethiseectistoexcludeauserandmeasurethenetchangesinpredictionscausedbytheremoval.Theidea:LetUbethesetofavailableusersinthesystem,MUbethemodelbuiltwiththepreferencedataofthissetofusers.WecallNPDui(NumberofPrediction-Dierences)asthenumberoftimesthefollowingexpressionholdstrue:jPja(MU) Pja(MU fUig)j;8j6=iHere,Pja(MU)isthepredictiononitemafortheuserujusingthemodelMU,isathresholdthatcanbe00.20.40.60.811101201301401501601701801901100111011201130114011501Ranks of the users by NUPDNormalizedNUPD(a)00.040.080.120.160.2300301302303305#of ratings of the selected usersNormalized NUPD(b)Figure2:(a)Distributionofin\ruence.(b)NUPDvaluesofagroupof20userswhohaveratedalmostthesamenumberofitems.tiedwiththesmallestpredictionchangeperceivabletotheusersviatheavailableuserinterface.Asanexample,smallestpredictionchangeaMovieLensuserwouldnoticeis0.5orahalf-star.Inessence,theexpressionforNPDuisayshowmanytimesthepredictionswouldchangebeyondsomethresholdifwebuildthemodelwithouttheuserui.NPDuiisthein\ruencelevelofuserui.ThereisaproblemwithNPDui:ifthegroupofusers,whogetaectedbyui'sremoval,needpredictionsonmanyitems,uicouldexhibitpossessingalargeNPDui.Toovercomethisproblem,weproposeanotherversionofthisdenitionandcallitNUPDui.NUPDuicountsthenumberofuniqueuserswhosepredictions'gotchangedbyatleastthethresholdamountaswekeepthei-thuseroutduringmodel-building.AsisevidentfromthedenitionofNUPDui,itisequallyapplicabletoanyCFalgorithm,providedthatwehavethehistoricaldatatocomputeitfrom.NoticethatastraightforwardcomputationofNUPDscanbecomeveryexpensive;ifwearetocom-puteNUPDonlineorinaregularbasis,weneedtondacheaperway.Section6detailssuchanendeavor.4.1TheNatureofIn\ruence.Figure2(a)showsnormalizedNUPDvaluesofthetop1500in\ruentialusersandhighlightsthefactthatonlyahandfuloftheuserspossesshighin\ruence.ThisistrueforbothauthorityandNUPDmeasures.Theshapesdemonstratethepower-laworaZipf-likedistribution.Asimilarshapeisreportedin[5]forELPNetworkValues.NotethatthecorrelationbetweenauthorityandNUPDis0.96.5BuildingaPredictiveModelAsstatedbefore,NUPDsuersfromadrawback:thecomputationisquitetimeconsuming.Inordertocircumventthislimitation,weseekapredictivemodelthatcanprovideusers'in\ruencelevelsonthe\rywhile maintaininggoodaccuracy.AlthoughthecorrelationcoecientbetweenNUPDandthenumberofratingsis0.75,gure2(b)showsthattheamountofin\ruencecanvarywidelybetweenuserswhohaveratedapproximatelythesamenumberofmovies.Thissuggestswelookforamodelthatcanaccountforfactorsnotcapturedbythenumberofratings.Inthefollowingsectionwecompilealistofquali-tativefactorsthatseemtoaectin\ruencelevels.5.1QualitativeFactorsNumberofratings:Thisisthemostimmediatefactoronewouldpossiblycomealongwith.Ifauserratesmoreitems,shehasagreaterchancetobeclosetomanyusers.Moreover,suchausercanbeusefultomanyuserswhoarelookingforrecommendationsforawidevarietyofitems.Degreeofagreementwithothers:Thismeasureattemptstoestimateonaveragehowmuchauseragreestotheaverageopinionofothers:1=kPk=1jRia Raj.Thisexpressioncomputestheextenttowhichtheuserui'sratingsareswayedfromeachofthecorrespondingitem'saveragerating.Rarityoftherateditems:ThisisameasureverysimilartothatoftheInverseDocumentFre-quency(IDF),whichpenalizesfrequentitems,astheyareconsideredtohavelittlediscriminatingpower:1=kPj2Iui1=freq(j);where,Iuiisthesetofitemsthatuseruihasrated.Standarddeviationinone'srating:Thisamountstothedegreeauser'sratingsdeviatefromherrating-average.Theimplicationisthatahigherstandarddeviationcontributesagreatervaluethroughtheterm,(Rik Ri)inequation3.1.Degreeofsimilaritywithtopneighbors:Thisistheaveragesimilarityweightofthetopkneighborsofauserui:1=kPk=1Wij.Thisfactorcanbeassociatedwithtwoopposingimplications:usershavinghighervaluesfromthisexpressionmightbeabletoexertmoreeecttobein\ruential;whereas,ausermightbeeasiertoreplaceifsheisverysimilartoanumberofotherusers.Aggregatedpopularityoftherateditems:Ifthesumofthepopularitiesoftherateditemsishighenough,theuserhasagreaterchancetohaveoverlappeditemswithmanyusers.AggregatedMoviePopularity*Entropy:Entropyofamoviesimplyindicatesthedispersionoftheratingsitreceived.Multiplyingthiswiththepopularityofthemoviegivesameasurethattriestobalancebetweenpopularityandvariance.5.2TheRegressionModelWechosetouseSVMRegression(SVR)forourmod-eling.SVMsfollowtheStructuralRiskMinimizationPrinciplewhichseekstominimizeanupperboundonthegeneralizationerrorratherthantheprincipleusedinmostofthelearningmachines:EmpiricalRiskMinimizationPrinciple{minimizingthetrainingerror.Hence,SVMshavebeenshowingbettergeneralizationinmanyresults.Althoughmostofthepracticalus-agesforSVMsusedtobeinclassicationproblems,SVMshavebeenextendedtosolvenon-linearregres-sionproblems,mostlybecauseoftheintroductionofthe"-insensitivelossfunction[11];andtheresultingre-gressionmethodcalled" SVR.Wehavetriedvariouskernelfunctionstoperformthenon-linearmappingfromtheinputspacetothefeaturespace.However,theradialbasisfunction(RBF)producedthebestregressionresult.Inordertoselectthevaluesoftheparameters,Cand",across-validationapproachwascarriedout.Wehaverandomlyselected2416users(40%ofthetotal)andpartitionedthemintotrainingandtestsetsbya8:2split.libsvm[10]wasusedtogenerateregressionmodelsusingthefollowing:thesevenfactorsoutlinedbeforeaspredictors(independentvariables),anRBFkernel," SVR,andtheparameters,Cand".Themodelgaveasquaredcorrelationcoecientof0.94.Figure3showsthepredictionperformancebyplottingpredictedNUPDsagainstthecorrespondingactualNUPDstakenfromthetest-set.Ave-foldcrossvalidationwascarriedouttoensuretheresults'validity.Table1hastheregressionresultsaswellasafewstatisticsoftheactualNUPDvaluesinthetestset,averagedoverthevefolds.6In\ruenceinanItem-basedAlgorithmWenowturntohowthein\ruencepicturelookswhenusinganotherpredictionalgorithminordertoseehowalgorithm-dependentourmeasuresare.Theitem-itemAlgorithm.ThekNNbasedCFalgorithmproposedin[4]isdierentinmanywaysthantheuser-basedalgorithmwehaveaddressedsofar.Thealgorithmrstbuildsthemodelbycomputingitem-itemsimilarities.[4]proposedadjustedcosinemeasureforestimatingthesimilaritybetweentwoitemsi,andj:si;j=Pu2U(Ru;i Ru)(Ru;j Ru)qPu2U(Ru;i Ru)2Pu2U(Ru;j Ru)2Predictionforthe(user,item)pair,(u;i)iscomputedas:Pallsimilaritems;N(si;NRu;N)=P(jsi;Nj).Wecouldnotemployauthorityonthisalgorithm,asitisnotquitestraightforwardtoestablishdi- 010020030040050060070080090010001112131415161718191101111121131Test data pointsNUPD valuesFigure3:PerformanceofSVMregressionforNUPDonuser-useralgorithm.Thedottedlineshowstheactualvalues;whereas,thecontinuouslinerepresentsthepredictedvalues.rectedgesbetweenusers.WecouldnotcomputeELPNetworkValuesonthisalgorithmeither,sinceELPNetworkValuesinvolvethenotionofhowneigh-borsaectauser,andcorrespondingprobabilitycom-putationsbasedonthis.However,applyingNUPDbyHide-one-Usermethodwaseasy.WehaveestimatedNUPDsforthesamesetofuserswehaveselectedfortheuser-basedapproach.Modelingwith" SVRgaveaverygoodperformance:squaredcorrelationcoecientwas0.989.7ConclusionInthispaper,wehavecontinuedtheinvestigationintoin\ruenceinrecommendersbegunin[5].Wehaveshownthathowmanyopinionsauserexpressesisanimportantcomponentofin\ruence,butnotthewholestory.Wehavedenedseveralplausiblein\ruencemetricsandshownthatingeneral,theycorrelatestrongly.Webelieveourproposedmetric,NUPD,isexplain-ablebothtoresearchersandoperatorsofrecommendersystems.NUPDisalsoalgorithmindependent|itap-pliestoanyrecommendersystemalgorithmthatmakespredictions.NUPDiscomputationallyinecient.How-ever,wehavedemonstratedhowtobuilddataset-andalgorithm-specicregressionmodelsthatallowfortherapid,accurateestimationofauser'sin\ruence.Muchremainstobedone.Researchisneededtounderstandhowtheroleofin\ruencechangesit.Forinstance,whenin\ruenceisusedtohelpretailerssellproductsitmayhaveverydierentcharacteristicsthanwhenitisusedtoencouragecommunitymemberstocontributeopinions.Anotherrichareaofresearchisininterfacesforcommunicatingin\ruencetocommunitymembers.Theinterfaceislikelytoimpactboththein-terpretationofin\ruenceanditseectivenessinchang-ingbehavior.References[1]S.Wasserman,K.Faust,SocialNetworkAnaly-sis:MethodsandApplications,CambridgeUniversityPress,(1994).[2]P.Resnick,N.Iacovou,M.Sushak,P.Bergstrom,andJ.Riedl,Grouplens:Anopenarchitectureforcollaborativelteringofnetnews,inProceedingsofCSCW1994,ACMSIGComputerSupportedCoop-erativeWork,1994.[3]J.S.Breese,D.HeckermanandC.Kadie,Empiricalanalysisofpredictivealgorithmsforcollaborativelter-ing,inProceedingsoftheFourteenthAnnualConfer-enceonUncertaintyinAI,July1998.[4]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Riedl,Item-basedcollaborativelteringrecommenda-tionalgorithms,inProceedingsofthe10thInterna-tionalWorldWideWebConference(WWW10),HongKong,May2001.[5]P.DomingosandM.Richardson,MiningtheNet-workValueofCustomers,ProceedingsoftheSeventhInternationalConferenceonKnowledgeDiscoveryandDataMining,SanFrancisco,CA,2001.ACMPress,pp.57{66.[6]L.Kleinberg.Authoritativesourcesinahyperlinkedenvironment,JournaloftheACM,46,1999.[7]L.Page,S.Brin,R.Motwani,andT.Winograd.ThePageRankcitationranking:Bringingordertotheweb,TechnicalReport,StanfordUniversity,Stanford,CA.1998.[8]D.Kempe,J.Kleinberg,andTardos,Maximizingthespreadofin\ruencethroughasocialnetwork,inProceedingsoftheninthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,WashingtonDC,2003,pp.137{146.[9]K.Wang,andM.Y.T.Su,ItemSelectionby\Hub-Authority"ProtRanking,inSIGKDD'02,Canada.[10]C.C.Chang,andC.J.Lin,LIBSVM:alibraryforsupportvectormachines,2001.[11]V.N.Vapnik,TheNatureofStatisticalLearningTheory,NewYork,Springer-Verlag,1995.Table1:RegressionresultsonbothCFalgorithmsUser-UserItem-ItemRegressionerformanceMAE15.2630.6Sq.corr.coe.0.940.99MSE10362252.6NUPDTestSetAvg.81.57405.6Min00Max9802487StdDev123.25454.6