124K - views

Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un ur Rashid George Karypis John Riedl Abstract Recommender systems ha een sho wn to help users nd items of in teres

In57357uenc is measure of the e57355ect of user on the recommendations from recommender system In 57357uence is erful to ol for understanding the orkings of recommender system Exp erimen ts sho that users ha widely arying degrees of in57357uence in

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Inuence in RatingsBased Recommender Syst..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un ur Rashid George Karypis John Riedl Abstract Recommender systems ha een sho wn to help users nd items of in teres






Presentation on theme: "Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un ur Rashid George Karypis John Riedl Abstract Recommender systems ha een sho wn to help users nd items of in teres"— Presentation transcript:

In\ruenceinRatings-BasedRecommenderSystems:AnAlgorithm-IndependentApproachAlMamunurRashidGeorgeKarypisJohnRiedlAbstractsystemshavebeenshowntohelpusers nditemsofinterestfromamongalargepoolofpotentiallyin-terestingitems.In\ruenceisameasureofthee ectofauserontherecommendationsfromarecommendersystem.In-\ruenceisapowerfultoolforunderstandingtheworkingsofarecommendersystem.Experimentsshowthatusershavewidelyvaryingdegreesofin\ruenceinratings-basedrecom-mendersystems.Proposedin\ruencemeasureshavebeenalgorithm-speci c,whichlimitstheirgeneralityandcompa-rability.Weproposeanalgorithm-independentde nitionofin\ruencethatcanbeappliedtoanyratings-basedrecom-mendersystem.Weshowexperimentallythatin\ruencemaybee ectivelyestimatedusingsimple,inexpensivemetrics.1IntroductionSociologistshavelongtriedtocharacterizethein\ruenceofapersoninasocialnetworkofmanypeople[1].Iden-tifyingthein\ruentialpeoplecanbringtwinadvantagestothosewhostudygroupdynamics:(1)Thein\ruen-tialpeoplecanbedirectlystudied,yieldinginsightsincetheirchoicesmaybepredictiveofgroupchoices;or(2)Thein\ruentialpeoplemaybein\ruencedtochangethebehaviorofthegroup.Manysocialnetworksareformedandmaintainedthroughinformal,qualitative,andun-observedinteractions.Capturingdataaboutthesein-teractionsisdicult,andtheactofcapturingthosedatamaychangethesocialinteractionsthemselves.CollaborativeFiltering(CF)recommendersystems[2,3,4]basetheirdecisionsontheopinionsofusers.Incontrasttoothersocialnetworks,recommendersystemscaptureinteractionsthatareformal,quantitative,andobserved.Thesocialnetworkcanbeanalyzeddirectlythroughdataalreadycapturedinthecomputersystem.Pastresearchhasdemonstratedthatanalyzingthesocialnetworkcanprovideleverageinin\ruencingthegroup[5].Theanalysisperformedinthesestudiesisbasedonadeepinvestigationofthecharacteristicsofoneparticularrecommenderalgorithm,thewell-knownuser-usernearestneighboralgorithm[2].Carefulanalysisofthistypehasmanyadvantages,butonekeydisadvantage:itistiedcloselytothedetailsofthealgorithm.Inprinciple,similartechniquescouldDepartmentofComputerScience&Engineering,Univer-sityofMinnesota,Minneapolis,MN-55455,farashid,karypis,riedlg@cs.umn.edubeappliedtootheralgorithms,butdoingsowouldbelaborious,andtheresultingin\ruencemeasureonlyappliestoalgorithmsthatworkpreciselyaccordingtothedetailsoftheanalysis.Sincemanycommercialoperatorstweaktheoperationoftherecommenderinmanywaysto ttheneedsoftheirbusiness,thisanalysismaynotapplyinpractice.Further,theresultingmeasuresofin\ruencewouldbeunlikelytobecomparablebetweendi erentalgorithms,sincetheyhavebeenproducedthroughverydi erenttechniques.Akeygoalofthepresentresearchistoidentifyameasureofin\ruenceforrecommendersystemsthatisapplicabletoanyratings-basedrecommendersystem,independentoftheparticularsofthealgorithm.Suchameasurewouldallowforconsistent,black-boxanalysisofin\ruence.2RelatedWork2.1RecommenderSystems.Resnick,etal.[2]in-troducedanautomaticcollaborative lteringalgorithmbasedonak-nearestneighbors(kNN)algorithmamongusers;thisalgorithmisnowcalleduser-userCF.Theuser-useralgorithmweuseinthispaperisaversionoftheoriginalkNNalgorithm,tunedtoachievebestknownperformance.Sarwaretal.[4]proposedanal-ternativekNNCFalgorithmbasedonsimilarityamongitems.Thisvariantisoftencalleditem-itemCF.Breeseetal.[3]havedividedanumberofCFalgorithmsintotwoclasses:memory-basedalgorithmsandmodel-basedalgorithms.OvertheyearsmanyotheralgorithmswereproposedincludingonesbasedonSVD,cluster-ing,BayesianNetworks[3].Wefocusontheuser-useranditem-itemalgorithmsinthispaperbecausetheyarethemostcommoninexistingsystems.2.2SocialNetworksandIn\ruence.ASocialnet-workisaformofgraphdelineatingrelationshipsandinteractionsamongindividuals.Findingtheimportantnodesinsuchgraphshasbeenanobjectofinteresttosociologistsforalongtime.Oneproposedmeasureforimportanceiscentrality[1].Twoexamplesof\cen-trality"measuresare\degreecentrality",whichtreatshighdegreenodesasimportant,and\distancecentral- ui,muN2,muN1,muN3,muN4,muN5,muN6,muN7,muNk,mFigure1:Showingthenotionofin-linksforthekclosestneighborsofui.Here,predictionisbeingcomputedforthe(user,item)pair,(ui;m).ity",whichtreatsnodeswithshortpathstomanyothernodesasimportant[1].Kleinberg'sHITS[6],andBrinandPage'sPageRank[7]algorithmsfororderingnodesinagraphofwebarebasedonsocialnetworkprinciples.Domingosetal.[5]havestudiedtheproblemofchoosingin\ruentialusersformarketerswhowishtoattractattentiontotheirproducts.Theyshowthatselectingtherightsetofusersforamarketingcampaigncanmakeabigdi erence.Kempeetal.[8]focusonacollectionofmodelswidelystudiedinsocialnetworks,aswellasthemodelsin[5],underthecategories:LinearThresholdModels,andIndependentCascadeModels.Ourresearchalsoinvestigatesin\ruenceinsocialnetworks.LikeDomingosetal.wefocusonnetworksinrecommendersystems.Weextendtheirresearchtogeneralmeasuresofin\ruencethatareindependentoftheparticularrecommenderalgorithmbeingused.3De ningIn\ruentialUsersinCFSystemsWe rstdiscussthedatausedinthisproject,thenan-alyzeapopularCFalgorithmtounderstandapossibleformationprocessofin\ruentialusers,andthentrydif-ferentwaystosetthede nition.3.1TheData.Wehaveusedapubliclyavailabledatasetfromwww.grouplens.org.ThedatasetisafractionoftheusagedatadrawnfromMovieLens(www.movielens.org),aCF-basedonlinemovierec-ommendationsystem.Itcontains6,040users,3,593movies,andaboutonemillionratingsona5-starscale.Eachuserhasratedatleast20moviesinthedataset.Wehavepartitionedthisdataintotrainingandtestsetsbyarandom80%/20%split.3.2TheUser-UserAlgorithm.ThemostwidelycitedandarguablythemostcommonlyusedCFalgo-rithminresearchisakNN-basedalgorithm.Inthisschemetheusers'preferencedataisrepresentedinanmuser-itemmatrixforasystemwithnusersandmitems,wherethe(i;j)-thentryofthismatrixstandsfortheuserui'sratingonitemj,ornull,dependingonwhethertheuseruihasratedtheitemj,ornot,respectively.Theuser-useralgorithmcanbethoughtofworkingintwostages.Inthe rststage,similari-tiesbetweeneverypairofusersarecomputedandarestoredasamodel.Althoughmanydi erentformula-tionsarepossibleforsimilarityweightcalculations,theGroupLens[2]proposedmechanismisthePearsoncor-relationcoecient.Accordingly,thesimilarityweightbetweentwousers,ui,andujismeasuredbyequation3.1:Wij=Pk2I(RikRi)(RjkRj)qPk2I(RikRi)2Pk2I(RjkRj)2(3.1)Iisthesetofitemsratedbybothoftheusers,Rikisuserui'sratingonitemk,andRiistheaverageratingofui.Usingthissimilaritymetric,thenextstep,predictiongeneration,iscarriedoutasfollows.Predictiononitemaforuseruiiscomputedbypickingknearestuserswhohavealsorateditema,andbyapplyingaweightedaverageofdeviationsfromtheselectedusers'means:Pia=Ri+Pk=1(RuaRu)WiuPk=1Wiu(3.2)SomePlausibleIn\ruenceMetricsBasedonPriorWork.Wecannowproposeseveralin\ruencemetrics.Onetypeofmetricismotivatedbytargetedmarketing.Anothertypeofmetricexploitsconnectionsbetweenusersbasedonsimilarity.3.3.1ExpectedLiftinPro t:NetworkValues.Thisapproach,asoutlinedin[5],isbasedonthegoaloftargetedmarketing.Inthisscheme,userswhocanyieldthemostexpectedliftinpro tbymakingacascadingadoptionofaproducthappen,areconsideredasin\ruentialusers.Domingosetal.[5]haveappliedthisideaonarecommendationsystemdatasetbasedontheuser-userCFalgorithmdescribedinthelastsection.Theprobabilisticmodelin[5]isbasedontheMarkovRandomFields,whichrequirestheneighborsbesymmetric;i.e.,twousersareneighborstoeachotherifoneofthemisaneighbortotheother.TheauthorsmentionthatinakNN-basedCFsystem,thismightnothold.Again,ELPNetworkValueistiedtoaparticularproduct;morespeci cally,itisspeci ctoasetoffeaturesoftheproductbeingmarketed.TranslatingthisissueintotheRSdomain,ELPNetworkValues arespeci ctoparticulargenrevectors.Thusauser'sELPNetworkValuewilldi erformovieswithdi erentgenrevectors.3.3.2NetworkStructure:SimilarityLinks.Bycloselyobservingtheprocessofneighbor-selection,wenoticesomenetworkstructurethatcouldfacilitateinformingade nitionforin\ruentialusers.Figure1demonstratesasituationwherethesystemiscomputingapredictiononitemmforuserui.Inordertodoso,itselectstopkneighborswhoalsohaveratedtheitemm.Nowwecanimaginedirectededgesfromuitowardseachofthekneighbors.Equations3.3and3.4showtheupdatedauthorityandhubequations.Inordertoconsiderthefactthatallthelinksmaynotofsameweight,wehaveincorporatedaweighttermsimilarto[9]tothebasicHITS[6]equations.Heretheconditionalprobability,p(ijj)referstothedegreeofuseruj'spresenceindicatinguserui'spresence.a(i)=Xj!ip(ijj)h(j)Wij(3.3)h(i)=Xi!jp(jji)a(j)Wij(3.4)Wecanusethismodi edauthoritytorepresentin\ruence.Thedrawbackofthisschemeofin\ruence,however,isalgorithmdependence:thenetworkstructurecap-turedhereisverymuchalgorithm-speci c;and,forotheralgorithms,thestructuremightnotbeasap-parent.Inordertoderiveade nitionthatisgenericenough,yetsimple,weusetheHide-one-Userapproachdiscussednext.Thefundamentalconceptwiththisap-proachis guringoutwhichusercausesthelargestcu-mulativechangeofpredictioninthesystem.4Algorithm-IndependentIn\ruenceThesemetricsde nein\ruenceastheamountofe ectauserhasoverothersviathepredictionstheyreceive.Onewaytoobservethise ectistoexcludeauserandmeasurethenetchangesinpredictionscausedbytheremoval.Theidea:LetUbethesetofavailableusersinthesystem,MUbethemodelbuiltwiththepreferencedataofthissetofusers.WecallNPDui(NumberofPrediction-Di erences)asthenumberoftimesthefollowingexpressionholdstrue:jPja(MU)Pja(MUfUig)j;8j6=iHere,Pja(MU)isthepredictiononitemafortheuserujusingthemodelMU,isathresholdthatcanbe00.20.40.60.811101201301401501601701801901100111011201130114011501Ranks of the users by NUPDNormalizedNUPD(a)00.040.080.120.160.2300301302303305#of ratings of the selected usersNormalized NUPD(b)Figure2:(a)Distributionofin\ruence.(b)NUPDvaluesofagroupof20userswhohaveratedalmostthesamenumberofitems.tiedwiththesmallestpredictionchangeperceivabletotheusersviatheavailableuserinterface.Asanexample,smallestpredictionchangeaMovieLensuserwouldnoticeis0.5orahalf-star.Inessence,theexpressionforNPDuisayshowmanytimesthepredictionswouldchangebeyondsomethresholdifwebuildthemodelwithouttheuserui.NPDuiisthein\ruencelevelofuserui.ThereisaproblemwithNPDui:ifthegroupofusers,whogeta ectedbyui'sremoval,needpredictionsonmanyitems,uicouldexhibitpossessingalargeNPDui.Toovercomethisproblem,weproposeanotherversionofthisde nitionandcallitNUPDui.NUPDuicountsthenumberofuniqueuserswhosepredictions'gotchangedbyatleastthethresholdamountaswekeepthei-thuseroutduringmodel-building.Asisevidentfromthede nitionofNUPDui,itisequallyapplicabletoanyCFalgorithm,providedthatwehavethehistoricaldatatocomputeitfrom.NoticethatastraightforwardcomputationofNUPDscanbecomeveryexpensive;ifwearetocom-puteNUPDonlineorinaregularbasis,weneedto ndacheaperway.Section6detailssuchanendeavor.4.1TheNatureofIn\ruence.Figure2(a)showsnormalizedNUPDvaluesofthetop1500in\ruentialusersandhighlightsthefactthatonlyahandfuloftheuserspossesshighin\ruence.ThisistrueforbothauthorityandNUPDmeasures.Theshapesdemonstratethepower-laworaZipf-likedistribution.Asimilarshapeisreportedin[5]forELPNetworkValues.NotethatthecorrelationbetweenauthorityandNUPDis0.96.5BuildingaPredictiveModelAsstatedbefore,NUPDsu ersfromadrawback:thecomputationisquitetimeconsuming.Inordertocircumventthislimitation,weseekapredictivemodelthatcanprovideusers'in\ruencelevelsonthe\rywhile maintaininggoodaccuracy.AlthoughthecorrelationcoecientbetweenNUPDandthenumberofratingsis0.75, gure2(b)showsthattheamountofin\ruencecanvarywidelybetweenuserswhohaveratedapproximatelythesamenumberofmovies.Thissuggestswelookforamodelthatcanaccountforfactorsnotcapturedbythenumberofratings.Inthefollowingsectionwecompilealistofquali-tativefactorsthatseemtoa ectin\ruencelevels.5.1QualitativeFactorsNumberofratings:Thisisthemostimmediatefactoronewouldpossiblycomealongwith.Ifauserratesmoreitems,shehasagreaterchancetobeclosetomanyusers.Moreover,suchausercanbeusefultomanyuserswhoarelookingforrecommendationsforawidevarietyofitems.Degreeofagreementwithothers:Thismeasureattemptstoestimateonaveragehowmuchauseragreestotheaverageopinionofothers:1=kPk=1jRiaRaj.Thisexpressioncomputestheextenttowhichtheuserui'sratingsareswayedfromeachofthecorrespondingitem'saveragerating.Rarityoftherateditems:ThisisameasureverysimilartothatoftheInverseDocumentFre-quency(IDF),whichpenalizesfrequentitems,astheyareconsideredtohavelittlediscriminatingpower:1=kPj2Iui1=freq(j);where,Iuiisthesetofitemsthatuseruihasrated.Standarddeviationinone'srating:Thisamountstothedegreeauser'sratingsdeviatefromherrating-average.Theimplicationisthatahigherstandarddeviationcontributesagreatervaluethroughtheterm,(RikRi)inequation3.1.Degreeofsimilaritywithtopneighbors:Thisistheaveragesimilarityweightofthetopkneighborsofauserui:1=kPk=1Wij.Thisfactorcanbeassociatedwithtwoopposingimplications:usershavinghighervaluesfromthisexpressionmightbeabletoexertmoree ecttobein\ruential;whereas,ausermightbeeasiertoreplaceifsheisverysimilartoanumberofotherusers.Aggregatedpopularityoftherateditems:Ifthesumofthepopularitiesoftherateditemsishighenough,theuserhasagreaterchancetohaveoverlappeditemswithmanyusers.AggregatedMoviePopularity*Entropy:Entropyofamoviesimplyindicatesthedispersionoftheratingsitreceived.Multiplyingthiswiththepopularityofthemoviegivesameasurethattriestobalancebetweenpopularityandvariance.5.2TheRegressionModelWechosetouseSVMRegression(SVR)forourmod-eling.SVMsfollowtheStructuralRiskMinimizationPrinciplewhichseekstominimizeanupperboundonthegeneralizationerrorratherthantheprincipleusedinmostofthelearningmachines:EmpiricalRiskMinimizationPrinciple{minimizingthetrainingerror.Hence,SVMshavebeenshowingbettergeneralizationinmanyresults.Althoughmostofthepracticalus-agesforSVMsusedtobeinclassi cationproblems,SVMshavebeenextendedtosolvenon-linearregres-sionproblems,mostlybecauseoftheintroductionofthe"-insensitivelossfunction[11];andtheresultingre-gressionmethodcalled"SVR.Wehavetriedvariouskernelfunctionstoperformthenon-linearmappingfromtheinputspacetothefeaturespace.However,theradialbasisfunction(RBF)producedthebestregressionresult.Inordertoselectthevaluesoftheparameters,Cand",across-validationapproachwascarriedout.Wehaverandomlyselected2416users(40%ofthetotal)andpartitionedthemintotrainingandtestsetsbya8:2split.libsvm[10]wasusedtogenerateregressionmodelsusingthefollowing:thesevenfactorsoutlinedbeforeaspredictors(independentvariables),anRBFkernel,"SVR,andtheparameters,Cand".Themodelgaveasquaredcorrelationcoecientof0.94.Figure3showsthepredictionperformancebyplottingpredictedNUPDsagainstthecorrespondingactualNUPDstakenfromthetest-set.A ve-foldcrossvalidationwascarriedouttoensuretheresults'validity.Table1hastheregressionresultsaswellasafewstatisticsoftheactualNUPDvaluesinthetestset,averagedoverthe vefolds.6In\ruenceinanItem-basedAlgorithmWenowturntohowthein\ruencepicturelookswhenusinganotherpredictionalgorithminordertoseehowalgorithm-dependentourmeasuresare.Theitem-itemAlgorithm.ThekNNbasedCFalgorithmproposedin[4]isdi erentinmanywaysthantheuser-basedalgorithmwehaveaddressedsofar.Thealgorithm rstbuildsthemodelbycomputingitem-itemsimilarities.[4]proposedadjustedcosinemeasureforestimatingthesimilaritybetweentwoitemsi,andj:si;j=Pu2U(Ru;iRu)(Ru;jRu)qPu2U(Ru;iRu)2Pu2U(Ru;jRu)2Predictionforthe(user,item)pair,(u;i)iscomputedas:Pallsimilaritems;N(si;NRu;N)=P(jsi;Nj).Wecouldnotemployauthorityonthisalgorithm,asitisnotquitestraightforwardtoestablishdi- 010020030040050060070080090010001112131415161718191101111121131Test data pointsNUPD valuesFigure3:PerformanceofSVMregressionforNUPDonuser-useralgorithm.Thedottedlineshowstheactualvalues;whereas,thecontinuouslinerepresentsthepredictedvalues.rectedgesbetweenusers.WecouldnotcomputeELPNetworkValuesonthisalgorithmeither,sinceELPNetworkValuesinvolvethenotionofhowneigh-borsa ectauser,andcorrespondingprobabilitycom-putationsbasedonthis.However,applyingNUPDbyHide-one-Usermethodwaseasy.WehaveestimatedNUPDsforthesamesetofuserswehaveselectedfortheuser-basedapproach.Modelingwith"SVRgaveaverygoodperformance:squaredcorrelationcoecientwas0.989.7ConclusionInthispaper,wehavecontinuedtheinvestigationintoin\ruenceinrecommendersbegunin[5].Wehaveshownthathowmanyopinionsauserexpressesisanimportantcomponentofin\ruence,butnotthewholestory.Wehavede nedseveralplausiblein\ruencemetricsandshownthatingeneral,theycorrelatestrongly.Webelieveourproposedmetric,NUPD,isexplain-ablebothtoresearchersandoperatorsofrecommendersystems.NUPDisalsoalgorithmindependent|itap-pliestoanyrecommendersystemalgorithmthatmakespredictions.NUPDiscomputationallyinecient.How-ever,wehavedemonstratedhowtobuilddataset-andalgorithm-speci cregressionmodelsthatallowfortherapid,accurateestimationofauser'sin\ruence.Muchremainstobedone.Researchisneededtounderstandhowtheroleofin\ruencechangesit.Forinstance,whenin\ruenceisusedtohelpretailerssellproductsitmayhaveverydi erentcharacteristicsthanwhenitisusedtoencouragecommunitymemberstocontributeopinions.Anotherrichareaofresearchisininterfacesforcommunicatingin\ruencetocommunitymembers.Theinterfaceislikelytoimpactboththein-terpretationofin\ruenceanditse ectivenessinchang-ingbehavior.References[1]S.Wasserman,K.Faust,SocialNetworkAnaly-sis:MethodsandApplications,CambridgeUniversityPress,(1994).[2]P.Resnick,N.Iacovou,M.Sushak,P.Bergstrom,andJ.Riedl,Grouplens:Anopenarchitectureforcollaborative lteringofnetnews,inProceedingsofCSCW1994,ACMSIGComputerSupportedCoop-erativeWork,1994.[3]J.S.Breese,D.HeckermanandC.Kadie,Empiricalanalysisofpredictivealgorithmsforcollaborative lter-ing,inProceedingsoftheFourteenthAnnualConfer-enceonUncertaintyinAI,July1998.[4]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Riedl,Item-basedcollaborative lteringrecommenda-tionalgorithms,inProceedingsofthe10thInterna-tionalWorldWideWebConference(WWW10),HongKong,May2001.[5]P.DomingosandM.Richardson,MiningtheNet-workValueofCustomers,ProceedingsoftheSeventhInternationalConferenceonKnowledgeDiscoveryandDataMining,SanFrancisco,CA,2001.ACMPress,pp.57{66.[6]L.Kleinberg.Authoritativesourcesinahyperlinkedenvironment,JournaloftheACM,46,1999.[7]L.Page,S.Brin,R.Motwani,andT.Winograd.ThePageRankcitationranking:Bringingordertotheweb,TechnicalReport,StanfordUniversity,Stanford,CA.1998.[8]D.Kempe,J.Kleinberg,andTardos,Maximizingthespreadofin\ruencethroughasocialnetwork,inProceedingsoftheninthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,WashingtonDC,2003,pp.137{146.[9]K.Wang,andM.Y.T.Su,ItemSelectionby\Hub-Authority"Pro tRanking,inSIGKDD'02,Canada.[10]C.C.Chang,andC.J.Lin,LIBSVM:alibraryforsupportvectormachines,2001.[11]V.N.Vapnik,TheNatureofStatisticalLearningTheory,NewYork,Springer-Verlag,1995.Table1:RegressionresultsonbothCFalgorithmsUser-UserItem-ItemRegressionerformanceMAE15.2630.6Sq.corr.coe .0.940.99MSE10362252.6NUPDTestSetAvg.81.57405.6Min00Max9802487StdDev123.25454.6