25 Prague 1 Czech Republic lbrozovskycentrumcz petricekacmorg Abstract Users of online dating sites are facing information over load that requires them to manually construct queries and browse huge amount of matching user pro64257les This becomes ev ID: 20603
Download Pdf The PPT/PDF document "Recommender System for Online Dating Ser..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2LukasBrozovskyetal. typicallypresentedwithrandomproles(orwithrandomprolespreselectedwithasimplecondition-forexampleonlymenofcertainage)andratestheprolesonagivennumericalscale.2Onlinedatingsitesgenerallyallowuserstopostandbrowseprolesforfreeandrequirepaymentforcontactdetails.Fromthebusinessmodelpointofviewthedatingagenciesshouldhavehighinterestinimprovingthequalityofrecommendationstheyprovide.Recommendersys-tems[14]havebeenusedinmanyareas.TheyhavebeenpopularizedespeciallybyrecommendersatAmazon,Net ix,Movielensandothers.Eventhoughmatch-makingisoftencitedasapotentialareaforrecommendersystemsapplicationtherehasbeenasurprisinglackofworkpublishedinthisarea.Inthispaperwedescribearecommendersystemforonlinedatingagency,benchmarkalgorithmsonrealworlddatasetandperformastudywithrealuserstoevaluatethequalityofrecommendations.Presentedimprovementinonlinedatingandmatchmakinghasmanybenetsforusersoftheserviceaswellasfortheowners.Thesebenetsincludehigherusersatisfactionandloyalty,andalsoabettermonetizationoftheservice.2RelatedWorkRecommendersystems[14]areapopularandsuccessfulwayoftacklingtheinformationoverload.Recommendersystemshavebeenpopularizedbyappli-cationssuchasAmazon[10]orNet ixrecommenders3.Themostwidelyusedrecommendersystemsarebasedoncollaborativelteringalgorithms.OneoftherstcollaborativelteringsystemswasTapestry[5].OthernotableCFsystemsincludejester[6],Ringo[18],MovielensandLaunch.com.TherearetwoclassesofCFalgorithms-memorybased(e.g.user-user)andmodelbased.Memorybasedsystemsrequirethewholeratingmatrixstoredinmemory.Typicalexampleofamemorybasedalgorithmisuser-userk-nearestneighboralgorithm.Modelbasedalgorithmsincludemethodsbasedonratingmatrixdecomposition:forexampleSVD[15],MMMF[12],andothermethodslikeitem-item[16],Bayes-networks[1],personalitydiagnosis[11].Collaborativelteringresearchhasbeenaddressingmanyareasincludingpredictionaccuracy,scalability,coldstartproblem,robustnessandrecommendationquality.Breeseetal.[1]and[8]containempiricalcomparisonsofdierenttypesofalgorithmsintermsofpredictionMAEandreviewofresultsinthisarea.Deshpandeetal.[4]performedacomparisonoftopNitembasedrecommendersystemsonseveraldatasetsincludingmovies,creditcardtransactions,andothersusingmetricssuchashitrate.Inadditiontothesecross-validationexperimentsseveralstudieswithrealuserswereperformed.Ziegleretal[19]showedhowtopicdiversicationintherecommendedsetincreasesbookrecommendationqualityasperceivedbyusers.Recommendationexplanationrolehasbeenrecognizedand/orusedin[3,17,9].Cosleyetal.[3]comparedseveralrecommendersystemsinCiteSeer 2http://hotornot.com,http://libimseti.cz,http://chceteme.volny.cz3http://amazon.com,http://net ix.com 4LukasBrozovskyetal. Item-Itemalgorithm[16]usesadierentviewontheratingsmatrix.InsteadofutilizingthesimilaritybetweenrowsofthematrixasUser-Userdoes,itcomputessimilaritiesbetweenproles.5WeusedadjustedPearsoncorrelationasitem-itemsimilarity:wadj(j;l)=Pi(ri;j ri)(ri;l ri) p Pi(ri;j ri)2Pi(ri;l ri)2(3)wherethesummationsoveriareovertheuserswhoratedbothprolesjandl.EachpairofcommonratingsoftwoprolescomesfromadierentuserwithTheadjustedversionofthePearsoncorrelationsubtractseachuser'smean{otherwisethesimilaritycomputationwouldsuerfromthefactthatdierentratingscale.Whenmakingpredictionpa;jfortheactiveusera,theratingsoftheactiveuseraforthekmostsimilarneighborstoprolejareused:pa;j= r;j+kXi=1~w(j;ni)(ra;ni r;ni)(4)whereniisthei-thmostsimilarproletoprolejthatuserarated,isanormalizingfactor,and r;uisthemeanratingofproleu.SimilartoUser-Useralgorithm,ourimplementationofItem-Itemalgorithmisalsoparametrizedbytwoparametersi)MinO{minimumnumberofcommonratingsbetweenprolesnecessarytocalculateitem-itemsimilarityandii)MaxN{maximumnumberofitemneighborstobeusedduringthecomputation.WerefertotheItem-ItemalgorithmwithparametersMinO=5,MaxN=50simplyas`Item-Item(5,50)'.4LibimsetiDatasetThedatasetweusedconsistsofdatafromarealonlinedatingservice{Libimseti.AsnapshotoftheratingmatrixfromJuly20056wasusedinthisstudy.Inthedatasetusersappearintwodierentroles:i)asactiveprovidersofratingsforphotographsofothers{wewilldenotethemas`users'inthissituationandii)asobjectsonphotographswhentheyareratedbyothers{wewillrefertotheminthissituationas`proles'.Overallthedatasetcontains194,439users,whoprovided11,767,448ratings.Thesparsityofthematrixis0.03%.Table1comparesourdatasettotheMovie-lens[7]andJester[6]datasetsthatarewellknownincollaborativelteringresearchcommunity.WecanseethatLibimsetiismuchlargerbutalsosparserthantheothertwodatasets.ThedistributionofnumberofratingsprovidedbyasingleuseranddistributionoftheratingvalueareshowninFigure1.Thesim-ilaritydistributionsforratingsmatrixrowsandcolumnsareshowninFigure2. 5Pleasenotethatinthedatingserviceenvironment,bothusersanditemscouldbeseenasrepresentingusers.6http://www.ksi.ms.m.cuni.cz/~petricek/data/ 6LukasBrozovskyetal. server.Itconsistsofaglobaldatamanager,asetofColFiservicesandacommu-nicationmoduleimplementingCCP-ColFiCommunicationProtocol.ThedatamanagerservesasthemaindataproviderforallthecomponentsinthesystemandColFiservicesexposeAPItoclients).ParticulardatamanagerandColFiservicesimplementationsareprovidedasserverplug-ins.Figure3summarizesColFiarchitecturedesign. Fig.3.ColFiarchitecture.Thecommunicationprotocolbetweenthedatastor-ageandthedatamanagerisimplementationdependent. SystemwasimplementedasastatelessTCP/IPclient-serversolutiontoachievesimplicity,platformindependence,scalabilityandgoodperformance.EachColFiserviceinstancehasitsownuniqueTCP/IPportnumberandisin-dependentofalltheotherColFiservices.Multipleclientconnectionsarethere-foresupportedandhandledinparallel.WeimplementedthesysteminJava.Theimplementationallowseasyadditionofcollaborativelteringalgorithms.Weusealazyapproachtosimilaritycomputationsandstorethesimilaritiesinalocalcacheobject.MySQLserverisusedasabackendbutthedatabaseisfullyabstractedinthecode.Formoredetailsontheimplementationsee[2].ColFiCommunicationProtocolissimilartopopularJavaRMI.CCPisaverysimpleprotocolanditsupportsonlyinvocationoflimitedsetofremotemethods.ThatmakesitveryfastunlikeSOAPWebservices)anddoesnotrequirespecialsetupontheclientsideunlikeJavaRMI,EJB,JSP,orServlets.DataManageristheheartoftheColFisystem.ItisageneralinterfacethatprovidesalltheotherColFicomponentswithdatanecessaryforcollaborativeltering.Thereisalwaysexactlyonedatamanagerinstanceperserverinstanceinaformofaplug-infortheColFiserverandtheactualbehaviorcandiersignicantlyfordierentimplementations.Thedatamanagercanforexamplecachedatainthememoryoritcandirectlyaccessdatabaseorspeciallesystem.Insomeothercases,readonlyimplementationmaybedesired.Althoughbothcachedanddirectaccessisimplemented,onlycachedvariantsachievereasonableperformance.Directaccessisgenerallyusableonlywithsmalldatasets.ColFiServiceistheonlypartofthesystemexposedtotheoutsideworld.EachservicehasitsownTCP/IPserverportnumberwhereitlistensforincom-ingclientrequests.ColFiservicesarestateless,soeachrequestisbothatomic RecommenderSystemforOnlineDatingService7 andisolatedandshouldnotaectanyotherrequests.Thisallowssimpleparallelhandlingofmultipleclientconnectionsontheserver.TheColFiserviceisnotlimitedtodocollaborativeltering,itcanalsoprovidestatisticsforexample.6BenchmarksWecomparetheimplementedalgorithmsonLibimsetidatasetusingthreetypesofcross-validation.Ineachofthescenariosweprovidealgorithmswithonesub-setofratingsfortrainingandwithholdothersubsetofratingsfortestingthepredictionaccuracy.6.1SetupWeusedthreetypesofcross-validation:i)AllButOneii)GivenRandomNandiii)production.EachvalidationusesNMAEasametricofpredictionquality:NMAE=1 cPck=1j~pkijrijj rmaxrmin(5)Itispossiblethatsomealgorithmsfailtopredictcertainratingsduetolackofdata.ThesecasesareignoredandnotincludedinNMAEcomputation8andonlythetotalnumberofsuchunsuccessfulpredictionsisreported.AllButOneValidationisthemostcommonlyusedbenchmarkamongmanycollaborativelteringpapers[6,1].Itisthesimplestprotocolofthethree.Theideabehindthisprotocolistoselectexactlyoneratingandhideitfromthetestedalgorithm.Algorithmisthenaskedtopredictthatonehiddenratingandthepredictionerroristheabsolutedierencebetweenthehiddenratingvalueandthepredictedratingvalue.Wereapeatthisforeverysingleratinginthetestdataset.GivenRandomXValidationisintendedtoexaminethealgorithm'sperfor-mancewithlessdataavailablefromtheactiveuser.Thissituationissometimesreferredtoasusercold-startandhappenswhenauserisnewtothesystemanddidnotprovideenoughratingsyet.Thevalidationworksonsubsetofuserswhoprovidedmorethan100ratings.Wesplittheratingsintotwo:atraininggroupTcontaining99randomratingsfromeachuserandatestsetwithremainingratings.GroupTisthenusedtogenerate99trainingsetstifori=1::99insuchawaythatti=iand8ititi+1.Thesesetsrepresenttheratingsgraduallyprovidedbyactiveuser.TheresultoftheGivenRandomXvalidationisagraphoftheNMAEvaluesasafunctionoftrainingsetsizei. 8Ourexperimentsshowedthatthisdoesnotaecttheorderingofalgorithms. RecommenderSystemforOnlineDatingService9 AlgorithmsshowimprovedperformanceintermsofNMAEvaluesfortherstfewratingsinserted.Aftertheuserentersabout30-40ratings,algorithm'saccuraciesgetstabilize(algorithmhasenoughinformationtoderivetheactiveusertype).Overall,theUser-UseralgorithmhasgenerallylowestNMAEvalues. Fig.4.GivenRandomXbenchmarkresults.NMAEvaluesbasedonthenumberofratingsgivenbyapotentialnewuser.Thegraphshowsthebehaviorofthealgorithmsastheygetmoreinformationaboutparticularuser.DatafortheRandomalgorithmarenotdisplayedasithasrelativelyhighconstantNMAEvalueof38:27%. ProductionValidationResultsarepresentedintheFigure5.ResultsfortheworstperformingRandomalgorithmareagainnotdisplayedintheNMAEgraphstoimprovethelegibilityofthegure.Theperformanceorderingofalgo-rithmscorrespondstotheAllButOneandGivenRandomNresults.DuringthecourseofthisexperimentwealsoveriedthatColFirecommendercanhandlesignicantloadatthelevelofrealworldapplicationtrac.7UserexperimentWeconductedawebbasedexperimentwithhumansubjects.Eachparticipantwasaskedtoprovideratingsfor150randomproles.Basedontheratingspro-videdwegeneratedtworecommendationlistsoftop10proles.WetestedtheRandom,Mean,andUser-Useralgorithms.User-User(10,50)9wasselectedasthebestperformingcollaborativealgorithm,MeanrepresentedthecurrentlydeployedfunctionalityatLibimseti.czandRandomservedasabaseline.Recom-mendationlistscontainedonlyprolesthattheactiveuserdidnotrateyet.Each 9MinO=10,MaxN=50 RecommenderSystemforOnlineDatingService11 theRandomalgorithm{12.5%and16.87%{arequitehigh,whichindicatesthatpercievedprolequalityisclusteredduetoself-selectionbias(goodlookingpeoplearemorelikelytoputtheirphotoonline).ThesurprisingsuccessoftheMeanalgorithm{35.62%winsagainstcollaborativelteringalgorithm{suggestsastronguniversalpreferencethatissharedbyusers{infactthisisaprerequisitiveforanybeautycontesttobepossible.8DiscussionWehaveshownthatcollaborativelteringbasedrecommendersystemscanpro-videgoodrecommendationstousersofonlinedatingservices.Wedemonstratedthisusingcross-validationonasnapshotofarealworlddatingservice.User-UserandItem-ItemalgorithmoutperformedglobalpopularityintermsofpredictionNMAE3:08%and2:04%respectively.WealsoveriedthatthisdierenceinrecommendationqualityisnoticabletouserswhopreferredrecommendationsbyUser-Useralgorithmtoglobalpopularityrecommendations.Alogicalcontinuationofourworkisamorecompleteevaluationofcurrentstateoftheartrecommendersystemsincludingmodelbasedcollaborativelter-ingmethodsinthesettingofdatingservices.Thesemethodscouldalsohelptoaddressscalabilityandperformanceissues.Domainspecicimprovementsmaybepossible.Userinterfacemayintroducebiasinthesensethatusersinsteadofprovid-ingtheirpersonalpreferencetrytoguesstheglobalpreference.Thisreducestheusefulnessofratingsprovided.Itremainsanopenissuehowtobestdesignaninterfacewhichmotivatesuserstoprovidesucientamountoftruthfulrat-ings.Applicationofindexstructurestospeedupnearestneighborsearchisaninterestingresearchdirection.Recommendationscanbefurtherimprovedbyhybridalgorithms.Theseal-gorithmsarecombiningthecollaborativelteringapproachwithcontentinfor-mation.Anotherproblemspecictodatingisthat\AlikesB"doesnotimply\BlikesA".Thereforeeachusershouldbeprobablypresentedwithrecommen-dationsofsuchusers,whoarealsointerestedinhim/her.Thereisaneedforreciprocalmatchingalgorithms.9AcknowledgmentsWewouldliketothankourthreeanonymousreviewersfortheirideasandsuggestions.WewouldalsoliketothankOldrichNeubergerforprovidingtheanonymizedLibimsetidatasetandtoTomasSkopalforhisvaluablecomments.References 1. JohnS.Breese,DavidHeckerman,andCarlKadie.Empiricalanalysisofpredictivealgorithmsforcollaborativeltering.pages43{52,1998.