/
Recommender System for Online Dating Service Lukas Brozovsky and Vaclav Petrcek KSI MFF Recommender System for Online Dating Service Lukas Brozovsky and Vaclav Petrcek KSI MFF

Recommender System for Online Dating Service Lukas Brozovsky and Vaclav Petrcek KSI MFF - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
468 views
Uploaded On 2014-12-03

Recommender System for Online Dating Service Lukas Brozovsky and Vaclav Petrcek KSI MFF - PPT Presentation

25 Prague 1 Czech Republic lbrozovskycentrumcz petricekacmorg Abstract Users of online dating sites are facing information over load that requires them to manually construct queries and browse huge amount of matching user pro64257les This becomes ev ID: 20603

Prague Czech

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Recommender System for Online Dating Ser..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2LukasBrozovskyetal. typicallypresentedwithrandompro les(orwithrandompro lespreselectedwithasimplecondition-forexampleonlymenofcertainage)andratesthepro lesonagivennumericalscale.2Onlinedatingsitesgenerallyallowuserstopostandbrowsepro lesforfreeandrequirepaymentforcontactdetails.Fromthebusinessmodelpointofviewthedatingagenciesshouldhavehighinterestinimprovingthequalityofrecommendationstheyprovide.Recommendersys-tems[14]havebeenusedinmanyareas.TheyhavebeenpopularizedespeciallybyrecommendersatAmazon,Net ix,Movielensandothers.Eventhoughmatch-makingisoftencitedasapotentialareaforrecommendersystemsapplicationtherehasbeenasurprisinglackofworkpublishedinthisarea.Inthispaperwedescribearecommendersystemforonlinedatingagency,benchmarkalgorithmsonrealworlddatasetandperformastudywithrealuserstoevaluatethequalityofrecommendations.Presentedimprovementinonlinedatingandmatchmakinghasmanybene tsforusersoftheserviceaswellasfortheowners.Thesebene tsincludehigherusersatisfactionandloyalty,andalsoabettermonetizationoftheservice.2RelatedWorkRecommendersystems[14]areapopularandsuccessfulwayoftacklingtheinformationoverload.Recommendersystemshavebeenpopularizedbyappli-cationssuchasAmazon[10]orNet ixrecommenders3.Themostwidelyusedrecommendersystemsarebasedoncollaborative lteringalgorithms.Oneofthe rstcollaborative lteringsystemswasTapestry[5].OthernotableCFsystemsincludejester[6],Ringo[18],MovielensandLaunch.com.TherearetwoclassesofCFalgorithms-memorybased(e.g.user-user)andmodelbased.Memorybasedsystemsrequirethewholeratingmatrixstoredinmemory.Typicalexampleofamemorybasedalgorithmisuser-userk-nearestneighboralgorithm.Modelbasedalgorithmsincludemethodsbasedonratingmatrixdecomposition:forexampleSVD[15],MMMF[12],andothermethodslikeitem-item[16],Bayes-networks[1],personalitydiagnosis[11].Collaborative lteringresearchhasbeenaddressingmanyareasincludingpredictionaccuracy,scalability,coldstartproblem,robustnessandrecommendationquality.Breeseetal.[1]and[8]containempiricalcomparisonsofdi erenttypesofalgorithmsintermsofpredictionMAEandreviewofresultsinthisarea.Deshpandeetal.[4]performedacomparisonoftopNitembasedrecommendersystemsonseveraldatasetsincludingmovies,creditcardtransactions,andothersusingmetricssuchashitrate.Inadditiontothesecross-validationexperimentsseveralstudieswithrealuserswereperformed.Ziegleretal[19]showedhowtopicdiversi cationintherecommendedsetincreasesbookrecommendationqualityasperceivedbyusers.Recommendationexplanationrolehasbeenrecognizedand/orusedin[3,17,9].Cosleyetal.[3]comparedseveralrecommendersystemsinCiteSeer 2http://hotornot.com,http://libimseti.cz,http://chceteme.volny.cz3http://amazon.com,http://net ix.com 4LukasBrozovskyetal. Item-Itemalgorithm[16]usesadi erentviewontheratingsmatrix.InsteadofutilizingthesimilaritybetweenrowsofthematrixasUser-Userdoes,itcomputessimilaritiesbetweenpro les.5WeusedadjustedPearsoncorrelationasitem-itemsimilarity:wadj(j;l)=Pi(ri;j� ri)(ri;l� ri) p Pi(ri;j� ri)2Pi(ri;l� ri)2(3)wherethesummationsoveriareovertheuserswhoratedbothpro lesjandl.Eachpairofcommonratingsoftwopro lescomesfromadi erentuserwithTheadjustedversionofthePearsoncorrelationsubtractseachuser'smean{otherwisethesimilaritycomputationwouldsu erfromthefactthatdi erentratingscale.Whenmakingpredictionpa;jfortheactiveusera,theratingsoftheactiveuseraforthekmostsimilarneighborstopro lejareused:pa;j= r;j+kXi=1~w(j;ni)(ra;ni� r;ni)(4)whereniisthei-thmostsimilarpro letopro lejthatuserarated,isanormalizingfactor,and r;uisthemeanratingofpro leu.SimilartoUser-Useralgorithm,ourimplementationofItem-Itemalgorithmisalsoparametrizedbytwoparametersi)MinO{minimumnumberofcommonratingsbetweenpro lesnecessarytocalculateitem-itemsimilarityandii)MaxN{maximumnumberofitemneighborstobeusedduringthecomputation.WerefertotheItem-ItemalgorithmwithparametersMinO=5,MaxN=50simplyas`Item-Item(5,50)'.4LibimsetiDatasetThedatasetweusedconsistsofdatafromarealonlinedatingservice{Libimseti.AsnapshotoftheratingmatrixfromJuly20056wasusedinthisstudy.Inthedatasetusersappearintwodi erentroles:i)asactiveprovidersofratingsforphotographsofothers{wewilldenotethemas`users'inthissituationandii)asobjectsonphotographswhentheyareratedbyothers{wewillrefertotheminthissituationas`pro les'.Overallthedatasetcontains194,439users,whoprovided11,767,448ratings.Thesparsityofthematrixis0.03%.Table1comparesourdatasettotheMovie-lens[7]andJester[6]datasetsthatarewellknownincollaborative lteringresearchcommunity.WecanseethatLibimsetiismuchlargerbutalsosparserthantheothertwodatasets.ThedistributionofnumberofratingsprovidedbyasingleuseranddistributionoftheratingvalueareshowninFigure1.Thesim-ilaritydistributionsforratingsmatrixrowsandcolumnsareshowninFigure2. 5Pleasenotethatinthedatingserviceenvironment,bothusersanditemscouldbeseenasrepresentingusers.6http://www.ksi.ms.m .cuni.cz/~petricek/data/ 6LukasBrozovskyetal. server.Itconsistsofaglobaldatamanager,asetofColFiservicesandacommu-nicationmoduleimplementingCCP-ColFiCommunicationProtocol.ThedatamanagerservesasthemaindataproviderforallthecomponentsinthesystemandColFiservicesexposeAPItoclients).ParticulardatamanagerandColFiservicesimplementationsareprovidedasserverplug-ins.Figure3summarizesColFiarchitecturedesign. Fig.3.ColFiarchitecture.Thecommunicationprotocolbetweenthedatastor-ageandthedatamanagerisimplementationdependent. SystemwasimplementedasastatelessTCP/IPclient-serversolutiontoachievesimplicity,platformindependence,scalabilityandgoodperformance.EachColFiserviceinstancehasitsownuniqueTCP/IPportnumberandisin-dependentofalltheotherColFiservices.Multipleclientconnectionsarethere-foresupportedandhandledinparallel.WeimplementedthesysteminJava.Theimplementationallowseasyadditionofcollaborative lteringalgorithms.Weusealazyapproachtosimilaritycomputationsandstorethesimilaritiesinalocalcacheobject.MySQLserverisusedasabackendbutthedatabaseisfullyabstractedinthecode.Formoredetailsontheimplementationsee[2].ColFiCommunicationProtocolissimilartopopularJavaRMI.CCPisaverysimpleprotocolanditsupportsonlyinvocationoflimitedsetofremotemethods.ThatmakesitveryfastunlikeSOAPWebservices)anddoesnotrequirespecialsetupontheclientsideunlikeJavaRMI,EJB,JSP,orServlets.DataManageristheheartoftheColFisystem.ItisageneralinterfacethatprovidesalltheotherColFicomponentswithdatanecessaryforcollaborative ltering.Thereisalwaysexactlyonedatamanagerinstanceperserverinstanceinaformofaplug-infortheColFiserverandtheactualbehaviorcandi ersigni cantlyfordi erentimplementations.Thedatamanagercanforexamplecachedatainthememoryoritcandirectlyaccessdatabaseorspecial lesystem.Insomeothercases,readonlyimplementationmaybedesired.Althoughbothcachedanddirectaccessisimplemented,onlycachedvariantsachievereasonableperformance.Directaccessisgenerallyusableonlywithsmalldatasets.ColFiServiceistheonlypartofthesystemexposedtotheoutsideworld.EachservicehasitsownTCP/IPserverportnumberwhereitlistensforincom-ingclientrequests.ColFiservicesarestateless,soeachrequestisbothatomic RecommenderSystemforOnlineDatingService7 andisolatedandshouldnota ectanyotherrequests.Thisallowssimpleparallelhandlingofmultipleclientconnectionsontheserver.TheColFiserviceisnotlimitedtodocollaborative ltering,itcanalsoprovidestatisticsforexample.6BenchmarksWecomparetheimplementedalgorithmsonLibimsetidatasetusingthreetypesofcross-validation.Ineachofthescenariosweprovidealgorithmswithonesub-setofratingsfortrainingandwithholdothersubsetofratingsfortestingthepredictionaccuracy.6.1SetupWeusedthreetypesofcross-validation:i)AllButOneii)GivenRandomNandiii)production.EachvalidationusesNMAEasametricofpredictionquality:NMAE=1 cPck=1j~pkij�rijj rmax�rmin(5)Itispossiblethatsomealgorithmsfailtopredictcertainratingsduetolackofdata.ThesecasesareignoredandnotincludedinNMAEcomputation8andonlythetotalnumberofsuchunsuccessfulpredictionsisreported.AllButOneValidationisthemostcommonlyusedbenchmarkamongmanycollaborative lteringpapers[6,1].Itisthesimplestprotocolofthethree.Theideabehindthisprotocolistoselectexactlyoneratingandhideitfromthetestedalgorithm.Algorithmisthenaskedtopredictthatonehiddenratingandthepredictionerroristheabsolutedi erencebetweenthehiddenratingvalueandthepredictedratingvalue.Wereapeatthisforeverysingleratinginthetestdataset.GivenRandomXValidationisintendedtoexaminethealgorithm'sperfor-mancewithlessdataavailablefromtheactiveuser.Thissituationissometimesreferredtoasusercold-startandhappenswhenauserisnewtothesystemanddidnotprovideenoughratingsyet.Thevalidationworksonsubsetofuserswhoprovidedmorethan100ratings.Wesplittheratingsintotwo:atraininggroupTcontaining99randomratingsfromeachuserandatestsetwithremainingratings.GroupTisthenusedtogenerate99trainingsetstifori=1::99insuchawaythatti=iand8ititi+1.Thesesetsrepresenttheratingsgraduallyprovidedbyactiveuser.TheresultoftheGivenRandomXvalidationisagraphoftheNMAEvaluesasafunctionoftrainingsetsizei. 8Ourexperimentsshowedthatthisdoesnota ecttheorderingofalgorithms. RecommenderSystemforOnlineDatingService9 AlgorithmsshowimprovedperformanceintermsofNMAEvaluesforthe rstfewratingsinserted.Aftertheuserentersabout30-40ratings,algorithm'saccuraciesgetstabilize(algorithmhasenoughinformationtoderivetheactiveusertype).Overall,theUser-UseralgorithmhasgenerallylowestNMAEvalues. Fig.4.GivenRandomXbenchmarkresults.NMAEvaluesbasedonthenumberofratingsgivenbyapotentialnewuser.Thegraphshowsthebehaviorofthealgorithmsastheygetmoreinformationaboutparticularuser.DatafortheRandomalgorithmarenotdisplayedasithasrelativelyhighconstantNMAEvalueof38:27%. ProductionValidationResultsarepresentedintheFigure5.ResultsfortheworstperformingRandomalgorithmareagainnotdisplayedintheNMAEgraphstoimprovethelegibilityofthe gure.Theperformanceorderingofalgo-rithmscorrespondstotheAllButOneandGivenRandomNresults.Duringthecourseofthisexperimentwealsoveri edthatColFirecommendercanhandlesigni cantloadatthelevelofrealworldapplicationtrac.7UserexperimentWeconductedawebbasedexperimentwithhumansubjects.Eachparticipantwasaskedtoprovideratingsfor150randompro les.Basedontheratingspro-videdwegeneratedtworecommendationlistsoftop10pro les.WetestedtheRandom,Mean,andUser-Useralgorithms.User-User(10,50)9wasselectedasthebestperformingcollaborativealgorithm,MeanrepresentedthecurrentlydeployedfunctionalityatLibimseti.czandRandomservedasabaseline.Recom-mendationlistscontainedonlypro lesthattheactiveuserdidnotrateyet.Each 9MinO=10,MaxN=50 RecommenderSystemforOnlineDatingService11 theRandomalgorithm{12.5%and16.87%{arequitehigh,whichindicatesthatpercievedpro lequalityisclusteredduetoself-selectionbias(goodlookingpeoplearemorelikelytoputtheirphotoonline).ThesurprisingsuccessoftheMeanalgorithm{35.62%winsagainstcollaborative lteringalgorithm{suggestsastronguniversalpreferencethatissharedbyusers{infactthisisaprerequisitiveforanybeautycontesttobepossible.8DiscussionWehaveshownthatcollaborative lteringbasedrecommendersystemscanpro-videgoodrecommendationstousersofonlinedatingservices.Wedemonstratedthisusingcross-validationonasnapshotofarealworlddatingservice.User-UserandItem-ItemalgorithmoutperformedglobalpopularityintermsofpredictionNMAE3:08%and2:04%respectively.Wealsoveri edthatthisdi erenceinrecommendationqualityisnoticabletouserswhopreferredrecommendationsbyUser-Useralgorithmtoglobalpopularityrecommendations.Alogicalcontinuationofourworkisamorecompleteevaluationofcurrentstateoftheartrecommendersystemsincludingmodelbasedcollaborative lter-ingmethodsinthesettingofdatingservices.Thesemethodscouldalsohelptoaddressscalabilityandperformanceissues.Domainspeci cimprovementsmaybepossible.Userinterfacemayintroducebiasinthesensethatusersinsteadofprovid-ingtheirpersonalpreferencetrytoguesstheglobalpreference.Thisreducestheusefulnessofratingsprovided.Itremainsanopenissuehowtobestdesignaninterfacewhichmotivatesuserstoprovidesucientamountoftruthfulrat-ings.Applicationofindexstructurestospeedupnearestneighborsearchisaninterestingresearchdirection.Recommendationscanbefurtherimprovedbyhybridalgorithms.Theseal-gorithmsarecombiningthecollaborative lteringapproachwithcontentinfor-mation.Anotherproblemspeci ctodatingisthat\AlikesB"doesnotimply\BlikesA".Thereforeeachusershouldbeprobablypresentedwithrecommen-dationsofsuchusers,whoarealsointerestedinhim/her.Thereisaneedforreciprocalmatchingalgorithms.9AcknowledgmentsWewouldliketothankourthreeanonymousreviewersfortheirideasandsuggestions.WewouldalsoliketothankOldrichNeubergerforprovidingtheanonymizedLibimsetidatasetandtoTomasSkopalforhisvaluablecomments.References 1. JohnS.Breese,DavidHeckerman,andCarlKadie.Empiricalanalysisofpredictivealgorithmsforcollaborative ltering.pages43{52,1998.