/
Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un

Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un - PDF document

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
468 views
Uploaded On 2014-12-20

Inuence in RatingsBased Recommender Systems An AlgorithmIndep enden Approac Al Mam un - PPT Presentation

In57357uenc is measure of the e57355ect of user on the recommendations from recommender system In 57357uence is erful to ol for understanding the orkings of recommender system Exp erimen ts sho that users ha widely arying degrees of in57357uence in ID: 26939

In57357uenc measure

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Inuence in RatingsBased Recommender Syst..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

In\ruenceinRatings-BasedRecommenderSystems:AnAlgorithm-IndependentApproachAlMamunurRashidGeorgeKarypisJohnRiedlAbstractsystemshavebeenshowntohelpusers nditemsofinterestfromamongalargepoolofpotentiallyin-terestingitems.In\ruenceisameasureofthee ectofauserontherecommendationsfromarecommendersystem.In-\ruenceisapowerfultoolforunderstandingtheworkingsofarecommendersystem.Experimentsshowthatusershavewidelyvaryingdegreesofin\ruenceinratings-basedrecom-mendersystems.Proposedin\ruencemeasureshavebeenalgorithm-speci c,whichlimitstheirgeneralityandcompa-rability.Weproposeanalgorithm-independentde nitionofin\ruencethatcanbeappliedtoanyratings-basedrecom-mendersystem.Weshowexperimentallythatin\ruencemaybee ectivelyestimatedusingsimple,inexpensivemetrics.1IntroductionSociologistshavelongtriedtocharacterizethein\ruenceofapersoninasocialnetworkofmanypeople[1].Iden-tifyingthein\ruentialpeoplecanbringtwinadvantagestothosewhostudygroupdynamics:(1)Thein\ruen-tialpeoplecanbedirectlystudied,yieldinginsightsincetheirchoicesmaybepredictiveofgroupchoices;or(2)Thein\ruentialpeoplemaybein\ruencedtochangethebehaviorofthegroup.Manysocialnetworksareformedandmaintainedthroughinformal,qualitative,andun-observedinteractions.Capturingdataaboutthesein-teractionsisdicult,andtheactofcapturingthosedatamaychangethesocialinteractionsthemselves.CollaborativeFiltering(CF)recommendersystems[2,3,4]basetheirdecisionsontheopinionsofusers.Incontrasttoothersocialnetworks,recommendersystemscaptureinteractionsthatareformal,quantitative,andobserved.Thesocialnetworkcanbeanalyzeddirectlythroughdataalreadycapturedinthecomputersystem.Pastresearchhasdemonstratedthatanalyzingthesocialnetworkcanprovideleverageinin\ruencingthegroup[5].Theanalysisperformedinthesestudiesisbasedonadeepinvestigationofthecharacteristicsofoneparticularrecommenderalgorithm,thewell-knownuser-usernearestneighboralgorithm[2].Carefulanalysisofthistypehasmanyadvantages,butonekeydisadvantage:itistiedcloselytothedetailsofthealgorithm.Inprinciple,similartechniquescouldDepartmentofComputerScience&Engineering,Univer-sityofMinnesota,Minneapolis,MN-55455,farashid,karypis,riedlg@cs.umn.edubeappliedtootheralgorithms,butdoingsowouldbelaborious,andtheresultingin\ruencemeasureonlyappliestoalgorithmsthatworkpreciselyaccordingtothedetailsoftheanalysis.Sincemanycommercialoperatorstweaktheoperationoftherecommenderinmanywaysto ttheneedsoftheirbusiness,thisanalysismaynotapplyinpractice.Further,theresultingmeasuresofin\ruencewouldbeunlikelytobecomparablebetweendi erentalgorithms,sincetheyhavebeenproducedthroughverydi erenttechniques.Akeygoalofthepresentresearchistoidentifyameasureofin\ruenceforrecommendersystemsthatisapplicabletoanyratings-basedrecommendersystem,independentoftheparticularsofthealgorithm.Suchameasurewouldallowforconsistent,black-boxanalysisofin\ruence.2RelatedWork2.1RecommenderSystems.Resnick,etal.[2]in-troducedanautomaticcollaborative lteringalgorithmbasedonak-nearestneighbors(kNN)algorithmamongusers;thisalgorithmisnowcalleduser-userCF.Theuser-useralgorithmweuseinthispaperisaversionoftheoriginalkNNalgorithm,tunedtoachievebestknownperformance.Sarwaretal.[4]proposedanal-ternativekNNCFalgorithmbasedonsimilarityamongitems.Thisvariantisoftencalleditem-itemCF.Breeseetal.[3]havedividedanumberofCFalgorithmsintotwoclasses:memory-basedalgorithmsandmodel-basedalgorithms.OvertheyearsmanyotheralgorithmswereproposedincludingonesbasedonSVD,cluster-ing,BayesianNetworks[3].Wefocusontheuser-useranditem-itemalgorithmsinthispaperbecausetheyarethemostcommoninexistingsystems.2.2SocialNetworksandIn\ruence.ASocialnet-workisaformofgraphdelineatingrelationshipsandinteractionsamongindividuals.Findingtheimportantnodesinsuchgraphshasbeenanobjectofinteresttosociologistsforalongtime.Oneproposedmeasureforimportanceiscentrality[1].Twoexamplesof\cen-trality"measuresare\degreecentrality",whichtreatshighdegreenodesasimportant,and\distancecentral- ui,muN2,muN1,muN3,muN4,muN5,muN6,muN7,muNk,mFigure1:Showingthenotionofin-linksforthekclosestneighborsofui.Here,predictionisbeingcomputedforthe(user,item)pair,(ui;m).ity",whichtreatsnodeswithshortpathstomanyothernodesasimportant[1].Kleinberg'sHITS[6],andBrinandPage'sPageRank[7]algorithmsfororderingnodesinagraphofwebarebasedonsocialnetworkprinciples.Domingosetal.[5]havestudiedtheproblemofchoosingin\ruentialusersformarketerswhowishtoattractattentiontotheirproducts.Theyshowthatselectingtherightsetofusersforamarketingcampaigncanmakeabigdi erence.Kempeetal.[8]focusonacollectionofmodelswidelystudiedinsocialnetworks,aswellasthemodelsin[5],underthecategories:LinearThresholdModels,andIndependentCascadeModels.Ourresearchalsoinvestigatesin\ruenceinsocialnetworks.LikeDomingosetal.wefocusonnetworksinrecommendersystems.Weextendtheirresearchtogeneralmeasuresofin\ruencethatareindependentoftheparticularrecommenderalgorithmbeingused.3De ningIn\ruentialUsersinCFSystemsWe rstdiscussthedatausedinthisproject,thenan-alyzeapopularCFalgorithmtounderstandapossibleformationprocessofin\ruentialusers,andthentrydif-ferentwaystosetthede nition.3.1TheData.Wehaveusedapubliclyavailabledatasetfromwww.grouplens.org.ThedatasetisafractionoftheusagedatadrawnfromMovieLens(www.movielens.org),aCF-basedonlinemovierec-ommendationsystem.Itcontains6,040users,3,593movies,andaboutonemillionratingsona5-starscale.Eachuserhasratedatleast20moviesinthedataset.Wehavepartitionedthisdataintotrainingandtestsetsbyarandom80%/20%split.3.2TheUser-UserAlgorithm.ThemostwidelycitedandarguablythemostcommonlyusedCFalgo-rithminresearchisakNN-basedalgorithm.Inthisschemetheusers'preferencedataisrepresentedinanmuser-itemmatrixforasystemwithnusersandmitems,wherethe(i;j)-thentryofthismatrixstandsfortheuserui'sratingonitemj,ornull,dependingonwhethertheuseruihasratedtheitemj,ornot,respectively.Theuser-useralgorithmcanbethoughtofworkingintwostages.Inthe rststage,similari-tiesbetweeneverypairofusersarecomputedandarestoredasamodel.Althoughmanydi erentformula-tionsarepossibleforsimilarityweightcalculations,theGroupLens[2]proposedmechanismisthePearsoncor-relationcoecient.Accordingly,thesimilarityweightbetweentwousers,ui,andujismeasuredbyequation3.1:Wij=Pk2I(RikRi)(RjkRj)qPk2I(RikRi)2Pk2I(RjkRj)2(3.1)Iisthesetofitemsratedbybothoftheusers,Rikisuserui'sratingonitemk,andRiistheaverageratingofui.Usingthissimilaritymetric,thenextstep,predictiongeneration,iscarriedoutasfollows.Predictiononitemaforuseruiiscomputedbypickingknearestuserswhohavealsorateditema,andbyapplyingaweightedaverageofdeviationsfromtheselectedusers'means:Pia=Ri+Pk=1(RuaRu)WiuPk=1Wiu(3.2)SomePlausibleIn\ruenceMetricsBasedonPriorWork.Wecannowproposeseveralin\ruencemetrics.Onetypeofmetricismotivatedbytargetedmarketing.Anothertypeofmetricexploitsconnectionsbetweenusersbasedonsimilarity.3.3.1ExpectedLiftinPro t:NetworkValues.Thisapproach,asoutlinedin[5],isbasedonthegoaloftargetedmarketing.Inthisscheme,userswhocanyieldthemostexpectedliftinpro tbymakingacascadingadoptionofaproducthappen,areconsideredasin\ruentialusers.Domingosetal.[5]haveappliedthisideaonarecommendationsystemdatasetbasedontheuser-userCFalgorithmdescribedinthelastsection.Theprobabilisticmodelin[5]isbasedontheMarkovRandomFields,whichrequirestheneighborsbesymmetric;i.e.,twousersareneighborstoeachotherifoneofthemisaneighbortotheother.TheauthorsmentionthatinakNN-basedCFsystem,thismightnothold.Again,ELPNetworkValueistiedtoaparticularproduct;morespeci cally,itisspeci ctoasetoffeaturesoftheproductbeingmarketed.TranslatingthisissueintotheRSdomain,ELPNetworkValues arespeci ctoparticulargenrevectors.Thusauser'sELPNetworkValuewilldi erformovieswithdi erentgenrevectors.3.3.2NetworkStructure:SimilarityLinks.Bycloselyobservingtheprocessofneighbor-selection,wenoticesomenetworkstructurethatcouldfacilitateinformingade nitionforin\ruentialusers.Figure1demonstratesasituationwherethesystemiscomputingapredictiononitemmforuserui.Inordertodoso,itselectstopkneighborswhoalsohaveratedtheitemm.Nowwecanimaginedirectededgesfromuitowardseachofthekneighbors.Equations3.3and3.4showtheupdatedauthorityandhubequations.Inordertoconsiderthefactthatallthelinksmaynotofsameweight,wehaveincorporatedaweighttermsimilarto[9]tothebasicHITS[6]equations.Heretheconditionalprobability,p(ijj)referstothedegreeofuseruj'spresenceindicatinguserui'spresence.a(i)=Xj!ip(ijj)h(j)Wij(3.3)h(i)=Xi!jp(jji)a(j)Wij(3.4)Wecanusethismodi edauthoritytorepresentin\ruence.Thedrawbackofthisschemeofin\ruence,however,isalgorithmdependence:thenetworkstructurecap-turedhereisverymuchalgorithm-speci c;and,forotheralgorithms,thestructuremightnotbeasap-parent.Inordertoderiveade nitionthatisgenericenough,yetsimple,weusetheHide-one-Userapproachdiscussednext.Thefundamentalconceptwiththisap-proachis guringoutwhichusercausesthelargestcu-mulativechangeofpredictioninthesystem.4Algorithm-IndependentIn\ruenceThesemetricsde nein\ruenceastheamountofe ectauserhasoverothersviathepredictionstheyreceive.Onewaytoobservethise ectistoexcludeauserandmeasurethenetchangesinpredictionscausedbytheremoval.Theidea:LetUbethesetofavailableusersinthesystem,MUbethemodelbuiltwiththepreferencedataofthissetofusers.WecallNPDui(NumberofPrediction-Di erences)asthenumberoftimesthefollowingexpressionholdstrue:jPja(MU)Pja(MUfUig)j;8j6=iHere,Pja(MU)isthepredictiononitemafortheuserujusingthemodelMU,isathresholdthatcanbe00.20.40.60.811101201301401501601701801901100111011201130114011501Ranks of the users by NUPDNormalizedNUPD(a)00.040.080.120.160.2300301302303305#of ratings of the selected usersNormalized NUPD(b)Figure2:(a)Distributionofin\ruence.(b)NUPDvaluesofagroupof20userswhohaveratedalmostthesamenumberofitems.tiedwiththesmallestpredictionchangeperceivabletotheusersviatheavailableuserinterface.Asanexample,smallestpredictionchangeaMovieLensuserwouldnoticeis0.5orahalf-star.Inessence,theexpressionforNPDuisayshowmanytimesthepredictionswouldchangebeyondsomethresholdifwebuildthemodelwithouttheuserui.NPDuiisthein\ruencelevelofuserui.ThereisaproblemwithNPDui:ifthegroupofusers,whogeta ectedbyui'sremoval,needpredictionsonmanyitems,uicouldexhibitpossessingalargeNPDui.Toovercomethisproblem,weproposeanotherversionofthisde nitionandcallitNUPDui.NUPDuicountsthenumberofuniqueuserswhosepredictions'gotchangedbyatleastthethresholdamountaswekeepthei-thuseroutduringmodel-building.Asisevidentfromthede nitionofNUPDui,itisequallyapplicabletoanyCFalgorithm,providedthatwehavethehistoricaldatatocomputeitfrom.NoticethatastraightforwardcomputationofNUPDscanbecomeveryexpensive;ifwearetocom-puteNUPDonlineorinaregularbasis,weneedto ndacheaperway.Section6detailssuchanendeavor.4.1TheNatureofIn\ruence.Figure2(a)showsnormalizedNUPDvaluesofthetop1500in\ruentialusersandhighlightsthefactthatonlyahandfuloftheuserspossesshighin\ruence.ThisistrueforbothauthorityandNUPDmeasures.Theshapesdemonstratethepower-laworaZipf-likedistribution.Asimilarshapeisreportedin[5]forELPNetworkValues.NotethatthecorrelationbetweenauthorityandNUPDis0.96.5BuildingaPredictiveModelAsstatedbefore,NUPDsu ersfromadrawback:thecomputationisquitetimeconsuming.Inordertocircumventthislimitation,weseekapredictivemodelthatcanprovideusers'in\ruencelevelsonthe\rywhile maintaininggoodaccuracy.AlthoughthecorrelationcoecientbetweenNUPDandthenumberofratingsis0.75, gure2(b)showsthattheamountofin\ruencecanvarywidelybetweenuserswhohaveratedapproximatelythesamenumberofmovies.Thissuggestswelookforamodelthatcanaccountforfactorsnotcapturedbythenumberofratings.Inthefollowingsectionwecompilealistofquali-tativefactorsthatseemtoa ectin\ruencelevels.5.1QualitativeFactorsNumberofratings:Thisisthemostimmediatefactoronewouldpossiblycomealongwith.Ifauserratesmoreitems,shehasagreaterchancetobeclosetomanyusers.Moreover,suchausercanbeusefultomanyuserswhoarelookingforrecommendationsforawidevarietyofitems.Degreeofagreementwithothers:Thismeasureattemptstoestimateonaveragehowmuchauseragreestotheaverageopinionofothers:1=kPk=1jRiaRaj.Thisexpressioncomputestheextenttowhichtheuserui'sratingsareswayedfromeachofthecorrespondingitem'saveragerating.Rarityoftherateditems:ThisisameasureverysimilartothatoftheInverseDocumentFre-quency(IDF),whichpenalizesfrequentitems,astheyareconsideredtohavelittlediscriminatingpower:1=kPj2Iui1=freq(j);where,Iuiisthesetofitemsthatuseruihasrated.Standarddeviationinone'srating:Thisamountstothedegreeauser'sratingsdeviatefromherrating-average.Theimplicationisthatahigherstandarddeviationcontributesagreatervaluethroughtheterm,(RikRi)inequation3.1.Degreeofsimilaritywithtopneighbors:Thisistheaveragesimilarityweightofthetopkneighborsofauserui:1=kPk=1Wij.Thisfactorcanbeassociatedwithtwoopposingimplications:usershavinghighervaluesfromthisexpressionmightbeabletoexertmoree ecttobein\ruential;whereas,ausermightbeeasiertoreplaceifsheisverysimilartoanumberofotherusers.Aggregatedpopularityoftherateditems:Ifthesumofthepopularitiesoftherateditemsishighenough,theuserhasagreaterchancetohaveoverlappeditemswithmanyusers.AggregatedMoviePopularity*Entropy:Entropyofamoviesimplyindicatesthedispersionoftheratingsitreceived.Multiplyingthiswiththepopularityofthemoviegivesameasurethattriestobalancebetweenpopularityandvariance.5.2TheRegressionModelWechosetouseSVMRegression(SVR)forourmod-eling.SVMsfollowtheStructuralRiskMinimizationPrinciplewhichseekstominimizeanupperboundonthegeneralizationerrorratherthantheprincipleusedinmostofthelearningmachines:EmpiricalRiskMinimizationPrinciple{minimizingthetrainingerror.Hence,SVMshavebeenshowingbettergeneralizationinmanyresults.Althoughmostofthepracticalus-agesforSVMsusedtobeinclassi cationproblems,SVMshavebeenextendedtosolvenon-linearregres-sionproblems,mostlybecauseoftheintroductionofthe"-insensitivelossfunction[11];andtheresultingre-gressionmethodcalled"SVR.Wehavetriedvariouskernelfunctionstoperformthenon-linearmappingfromtheinputspacetothefeaturespace.However,theradialbasisfunction(RBF)producedthebestregressionresult.Inordertoselectthevaluesoftheparameters,Cand",across-validationapproachwascarriedout.Wehaverandomlyselected2416users(40%ofthetotal)andpartitionedthemintotrainingandtestsetsbya8:2split.libsvm[10]wasusedtogenerateregressionmodelsusingthefollowing:thesevenfactorsoutlinedbeforeaspredictors(independentvariables),anRBFkernel,"SVR,andtheparameters,Cand".Themodelgaveasquaredcorrelationcoecientof0.94.Figure3showsthepredictionperformancebyplottingpredictedNUPDsagainstthecorrespondingactualNUPDstakenfromthetest-set.A ve-foldcrossvalidationwascarriedouttoensuretheresults'validity.Table1hastheregressionresultsaswellasafewstatisticsoftheactualNUPDvaluesinthetestset,averagedoverthe vefolds.6In\ruenceinanItem-basedAlgorithmWenowturntohowthein\ruencepicturelookswhenusinganotherpredictionalgorithminordertoseehowalgorithm-dependentourmeasuresare.Theitem-itemAlgorithm.ThekNNbasedCFalgorithmproposedin[4]isdi erentinmanywaysthantheuser-basedalgorithmwehaveaddressedsofar.Thealgorithm rstbuildsthemodelbycomputingitem-itemsimilarities.[4]proposedadjustedcosinemeasureforestimatingthesimilaritybetweentwoitemsi,andj:si;j=Pu2U(Ru;iRu)(Ru;jRu)qPu2U(Ru;iRu)2Pu2U(Ru;jRu)2Predictionforthe(user,item)pair,(u;i)iscomputedas:Pallsimilaritems;N(si;NRu;N)=P(jsi;Nj).Wecouldnotemployauthorityonthisalgorithm,asitisnotquitestraightforwardtoestablishdi- 010020030040050060070080090010001112131415161718191101111121131Test data pointsNUPD valuesFigure3:PerformanceofSVMregressionforNUPDonuser-useralgorithm.Thedottedlineshowstheactualvalues;whereas,thecontinuouslinerepresentsthepredictedvalues.rectedgesbetweenusers.WecouldnotcomputeELPNetworkValuesonthisalgorithmeither,sinceELPNetworkValuesinvolvethenotionofhowneigh-borsa ectauser,andcorrespondingprobabilitycom-putationsbasedonthis.However,applyingNUPDbyHide-one-Usermethodwaseasy.WehaveestimatedNUPDsforthesamesetofuserswehaveselectedfortheuser-basedapproach.Modelingwith"SVRgaveaverygoodperformance:squaredcorrelationcoecientwas0.989.7ConclusionInthispaper,wehavecontinuedtheinvestigationintoin\ruenceinrecommendersbegunin[5].Wehaveshownthathowmanyopinionsauserexpressesisanimportantcomponentofin\ruence,butnotthewholestory.Wehavede nedseveralplausiblein\ruencemetricsandshownthatingeneral,theycorrelatestrongly.Webelieveourproposedmetric,NUPD,isexplain-ablebothtoresearchersandoperatorsofrecommendersystems.NUPDisalsoalgorithmindependent|itap-pliestoanyrecommendersystemalgorithmthatmakespredictions.NUPDiscomputationallyinecient.How-ever,wehavedemonstratedhowtobuilddataset-andalgorithm-speci cregressionmodelsthatallowfortherapid,accurateestimationofauser'sin\ruence.Muchremainstobedone.Researchisneededtounderstandhowtheroleofin\ruencechangesit.Forinstance,whenin\ruenceisusedtohelpretailerssellproductsitmayhaveverydi erentcharacteristicsthanwhenitisusedtoencouragecommunitymemberstocontributeopinions.Anotherrichareaofresearchisininterfacesforcommunicatingin\ruencetocommunitymembers.Theinterfaceislikelytoimpactboththein-terpretationofin\ruenceanditse ectivenessinchang-ingbehavior.References[1]S.Wasserman,K.Faust,SocialNetworkAnaly-sis:MethodsandApplications,CambridgeUniversityPress,(1994).[2]P.Resnick,N.Iacovou,M.Sushak,P.Bergstrom,andJ.Riedl,Grouplens:Anopenarchitectureforcollaborative lteringofnetnews,inProceedingsofCSCW1994,ACMSIGComputerSupportedCoop-erativeWork,1994.[3]J.S.Breese,D.HeckermanandC.Kadie,Empiricalanalysisofpredictivealgorithmsforcollaborative lter-ing,inProceedingsoftheFourteenthAnnualConfer-enceonUncertaintyinAI,July1998.[4]B.M.Sarwar,G.Karypis,J.A.Konstan,andJ.Riedl,Item-basedcollaborative lteringrecommenda-tionalgorithms,inProceedingsofthe10thInterna-tionalWorldWideWebConference(WWW10),HongKong,May2001.[5]P.DomingosandM.Richardson,MiningtheNet-workValueofCustomers,ProceedingsoftheSeventhInternationalConferenceonKnowledgeDiscoveryandDataMining,SanFrancisco,CA,2001.ACMPress,pp.57{66.[6]L.Kleinberg.Authoritativesourcesinahyperlinkedenvironment,JournaloftheACM,46,1999.[7]L.Page,S.Brin,R.Motwani,andT.Winograd.ThePageRankcitationranking:Bringingordertotheweb,TechnicalReport,StanfordUniversity,Stanford,CA.1998.[8]D.Kempe,J.Kleinberg,andTardos,Maximizingthespreadofin\ruencethroughasocialnetwork,inProceedingsoftheninthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,WashingtonDC,2003,pp.137{146.[9]K.Wang,andM.Y.T.Su,ItemSelectionby\Hub-Authority"Pro tRanking,inSIGKDD'02,Canada.[10]C.C.Chang,andC.J.Lin,LIBSVM:alibraryforsupportvectormachines,2001.[11]V.N.Vapnik,TheNatureofStatisticalLearningTheory,NewYork,Springer-Verlag,1995.Table1:RegressionresultsonbothCFalgorithmsUser-UserItem-ItemRegressionerformanceMAE15.2630.6Sq.corr.coe .0.940.99MSE10362252.6NUPDTestSetAvg.81.57405.6Min00Max9802487StdDev123.25454.6