308CHAPTER9RECOMMENDATIONSYSTEMSwhichexplainstheadvantageofonlinevendorsoverconventionalbrickandmortarvendorsWethenbrierysurveythesortsofapplicationsinwhichrecommendationsystemshaveproveduseful ID: 411617
Download Pdf The PPT/PDF document "Chapter9RecommendationSystemsThereisanex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationsthatinvolvepredictinguserresponsestooptions.Suchafacilityiscalledarecommendationsystem.Weshallbeginthischapterwithasurveyofthemostimportantexamplesofthesesystems.However,tobringtheproblemintofocus,twogoodexamplesofrecommendationsystemsare:1.Oeringnewsarticlestoon-linenewspaperreaders,basedonapredictionofreaderinterests.2.Oeringcustomersofanon-lineretailersuggestionsaboutwhattheymightliketobuy,basedontheirpasthistoryofpurchasesand/orproductsearches.Recommendationsystemsuseanumberofdierenttechnologies.Wecanclassifythesesystemsintotwobroadgroups.Content-basedsystemsexaminepropertiesoftheitemsrecommended.Forinstance,ifaNet\rixuserhaswatchedmanycowboymovies,thenrecom-mendamovieclassiedinthedatabaseashavingthe\cowboy"genreCollaborativelteringsystemsrecommenditemsbasedonsimilaritymea-suresbetweenusersand/oritems.Theitemsrecommendedtoauserarethosepreferredbysimilarusers.ThissortofrecommendationsystemcanusethegroundworklaidinChapter3onsimilaritysearchandChapter7onclustering.However,thesetechnologiesbythemselvesarenotsu-cient,andtherearesomenewalgorithmsthathaveproveneectiveforrecommendationsystems.9.1AModelforRecommendationSystemsInthissectionweintroduceamodelforrecommendationsystems,basedonautilitymatrixofpreferences.Weintroducetheconceptofa\long-tail,"319 320CHAPTER9.RECOMMENDATIONSYSTEMSwhichexplainstheadvantageofon-linevendorsoverconventional,brick-and-mortarvendors.Wethenbrie\rysurveythesortsofapplicationsinwhichrecommendationsystemshaveproveduseful.9.1.1TheUtilityMatrixInarecommendation-systemapplicationtherearetwoclassesofentities,whichweshallrefertoasusersanditems.Usershavepreferencesforcertainitems,andthesepreferencesmustbeteasedoutofthedata.Thedataitselfisrepre-sentedasautilitymatrix,givingforeachuser-itempair,avaluethatrepresentswhatisknownaboutthedegreeofpreferenceofthatuserforthatitem.Valuescomefromanorderedset,e.g.,integers1{5representingthenumberofstarsthattheusergaveasaratingforthatitem.Weassumethatthematrixissparse,meaningthatmostentriesare\unknown."Anunknownratingimpliesthatwehavenoexplicitinformationabouttheuser'spreferencefortheitem.Example9.1:InFig.9.1weseeanexampleutilitymatrix,representingusers'ratingsofmoviesona1{5scale,with5thehighestrating.Blanksrepresentthesituationwheretheuserhasnotratedthemovie.ThemovienamesareHP1,HP2,andHP3forHarryPotterI,II,andIII,TWforTwilight,andSW1,SW2,andSW3forStarWarsepisodes1,2,and3.TheusersarerepresentedbycapitallettersAthroughD HP1HP2HP3TWSW1SW2SW3 A 451B 554C 245D 33Figure9.1:Autilitymatrixrepresentingratingsofmoviesona1{5scaleNoticethatmostuser-moviepairshaveblanks,meaningtheuserhasnotratedthemovie.Inpractice,thematrixwouldbeevensparser,withthetypicaluserratingonlyatinyfractionofallavailablemovies.2Thegoalofarecommendationsystemistopredicttheblanksintheutilitymatrix.Forexample,woulduserAlikeSW2?ThereislittleevidencefromthetinymatrixinFig.9.1.Wemightdesignourrecommendationsystemtotakeintoaccountpropertiesofmovies,suchastheirproducer,director,stars,oreventhesimilarityoftheirnames.Ifso,wemightthennotethesimilaritybetweenSW1andSW2,andthenconcludethatsinceAdidnotlikeSW1,theywereunlikelytoenjoySW2either.Alternatively,withmuchmoredata,wemightobservethatthepeoplewhoratedbothSW1andSW2tendedtogivethemsimilarratings.Thus,wecouldconcludethatAwouldalsogiveSW2alowrating,similartoA'sratingofSW1. 9.1.AMODELFORRECOMMENDATIONSYSTEMS321Weshouldalsobeawareofaslightlydierentgoalthatmakessenseinmanyapplications.Itisnotnecessarytopredicteveryblankentryinautilitymatrix.Rather,itisonlynecessarytodiscoversomeentriesineachrowthatarelikelytobehigh.Inmostapplications,therecommendationsystemdoesnotoerusersarankingofallitems,butrathersuggestsafewthattheusershouldvaluehighly.Itmaynotevenbenecessarytondallitemswiththehighestexpectedratings,butonlytondalargesubsetofthosewiththehighestratings.9.1.2TheLongTailBeforediscussingtheprincipalapplicationsofrecommendationsystems,letusponderthelongtailphenomenonthatmakesrecommendationsystemsneces-sary.Physicaldeliverysystemsarecharacterizedbyascarcityofresources.Brick-and-mortarstoreshavelimitedshelfspace,andcanshowthecustomeronlyasmallfractionofallthechoicesthatexist.Ontheotherhand,on-linestorescanmakeanythingthatexistsavailabletothecustomer.Thus,aphysicalbookstoremayhaveseveralthousandbooksonitsshelves,butAmazonoersmillionsofbooks.Aphysicalnewspapercanprintseveraldozenarticlesperday,whileon-linenewsservicesoerthousandsperday.Recommendationinthephysicalworldisfairlysimple.First,itisnotpossibletotailorthestoretoeachindividualcustomer.Thus,thechoiceofwhatismadeavailableisgovernedonlybytheaggregatenumbers.Typically,abookstorewilldisplayonlythebooksthataremostpopular,andanewspaperwillprintonlythearticlesitbelievesthemostpeoplewillbeinterestedinIntherstcase,salesguresgovernthechoices,inthesecondcase,editorialjudgementserves.Thedistinctionbetweenthephysicalandon-lineworldshasbeencalledthelongtailphenomenon,anditissuggestedinFig.9.2.Theverticalaxisrepresentspopularity(thenumberoftimesanitemischosen).Theitemsareorderedonthehorizontalaxisaccordingtotheirpopularity.Physicalinstitu-tionsprovideonlythemostpopularitemstotheleftoftheverticalline,whilethecorrespondingon-lineinstitutionsprovidetheentirerangeofitems:thetailaswellasthepopularitems.Thelong-tailphenomenonforceson-lineinstitutionstorecommenditemstoindividualusers.Itisnotpossibletopresentallavailableitemstotheuser,thewayphysicalinstitutionscan.Neithercanweexpectuserstohaveheardofeachoftheitemstheymightlike.9.1.3ApplicationsofRecommendationSystemsWehavementionedseveralimportantapplicationsofrecommendationsystems,buthereweshallconsolidatethelistinasingleplace.1.ProductRecommendations:Perhapsthemostimportantuseofrecom-mendationsystemsisaton-lineretailers.WehavenotedhowAmazonorsimilaron-linevendorsstrivetopresenteachreturninguserwithsome 322CHAPTER9.RECOMMENDATIONSYSTEMS The Long Tail Figure9.2:Thelongtail:physicalinstitutionscanonlyprovidewhatispopular,whileon-lineinstitutionscanmakeeverythingavailablesuggestionsofproductsthattheymightliketobuy.Thesesuggestionsarenotrandom,butarebasedonthepurchasingdecisionsmadebysimilarcustomersoronothertechniquesweshalldiscussinthischapter.2.MovieRecommendations:Net\rixoersitscustomersrecommendationsofmoviestheymightlike.Theserecommendationsarebasedonratingsprovidedbyusers,muchliketheratingssuggestedintheexampleutilitymatrixofFig.9.1.Theimportanceofpredictingratingsaccuratelyissohigh,thatNet\rixoeredaprizeofonemilliondollarsfortherstalgorithmthatcouldbeatitsownrecommendationsystemby10%.1Theprizewasnallywonin2009,byateamofresearcherscalled\Bellkor'sPragmaticChaos,"afteroverthreeyearsofcompetition.3.NewsArticles:Newsserviceshaveattemptedtoidentifyarticlesofin-teresttoreaders,basedonthearticlesthattheyhavereadinthepast.Thesimilaritymightbebasedonthesimilarityofimportantwordsinthedocuments,oronthearticlesthatarereadbypeoplewithsimilarreadingtastes.Thesameprinciplesapplytorecommendingblogsfromamongthemillionsofblogsavailable,videosonYouTube,orothersiteswherecontentisprovidedregularly. 1Tobeexact,thealgorithmhadtohavearoot-mean-squareerror(RMSE)thatwas10%lessthantheRMSEoftheNet\rixalgorithmonatestsettakenfromactualratingsofNet\rixusers.Todevelopanalgorithm,contestantsweregivenatrainingsetofdata,alsotakenfromactualNet\rixdata. 9.1.AMODELFORRECOMMENDATIONSYSTEMS323 IntoThinAirandTouchingtheVoidAnextremeexampleofhowthelongtail,togetherwithawelldesignedrecommendationsystemcanin\ruenceeventsisthestorytoldbyChrisAn-dersonaboutabookcalledTouchingtheVoid.Thismountain-climbingbookwasnotabigsellerinitsday,butmanyyearsafteritwaspub-lished,anotherbookonthesametopic,calledIntoThinAirwaspub-lished.Amazon'srecommendationsystemnoticedafewpeoplewhoboughtbothbooks,andstartedrecommendingTouchingtheVoidtopeo-plewhobought,orwereconsidering,IntoThinAir.Hadtherebeennoon-linebookseller,TouchingtheVoidmightneverhavebeenseenbypoten-tialbuyers,butintheon-lineworld,TouchingtheVoideventuallybecameverypopularinitsownright,infact,moresothanIntoThinAir 9.1.4PopulatingtheUtilityMatrixWithoutautilitymatrix,itisalmostimpossibletorecommenditems.However,acquiringdatafromwhichtobuildautilitymatrixisoftendicult.Therearetwogeneralapproachestodiscoveringthevalueusersplaceonitems.1.Wecanaskuserstorateitems.Movieratingsaregenerallyobtainedthisway,andsomeon-linestorestrytoobtainratingsfromtheirpurchasers.Sitesprovidingcontent,suchassomenewssitesorYouTubealsoaskuserstorateitems.Thisapproachislimitedinitseectiveness,sincegenerallyusersareunwillingtoprovideresponses,andtheinformationfromthosewhodomaybebiasedbytheveryfactthatitcomesfrompeoplewillingtoprovideratings.2.Wecanmakeinferencesfromusers'behavior.Mostobviously,ifauserbuysaproductatAmazon,watchesamovieonYouTube,orreadsanewsarticle,thentheusercanbesaidto\like"thisitem.Notethatthissortofratingsystemreallyhasonlyonevalue:1meansthattheuserlikestheitem.Often,wendautilitymatrixwiththiskindofdatashownwith0'sratherthanblankswheretheuserhasnotpurchasedorviewedtheitem.However,inthiscase0isnotalowerratingthan1;itisnoratingatall.Moregenerally,onecaninferinterestfrombehaviorotherthanpurchasing.Forexample,ifanAmazoncustomerviewsinformationaboutanitem,wecaninferthattheyareinterestedintheitem,eveniftheydon'tbuyit. 324CHAPTER9.RECOMMENDATIONSYSTEMS9.2Content-BasedRecommendationsAswementionedatthebeginningofthechapter,therearetwobasicarchitec-turesforarecommendationsystem:1.Content-Basedsystemsfocusonpropertiesofitems.Similarityofitemsisdeterminedbymeasuringthesimilarityintheirproperties.2.Collaborative-Filteringsystemsfocusontherelationshipbetweenusersanditems.Similarityofitemsisdeterminedbythesimilarityoftheratingsofthoseitemsbytheuserswhohaveratedbothitems.Inthissection,wefocusoncontent-basedrecommendationsystems.Thenextsectionwillcovercollaborativeltering.9.2.1ItemProlesInacontent-basedsystem,wemustconstructforeachitemaprole,whichisarecordorcollectionofrecordsrepresentingimportantcharacteristicsofthatitem.Insimplecases,theproleconsistsofsomecharacteristicsoftheitemthatareeasilydiscovered.Forexample,considerthefeaturesofamoviethatmightberelevanttoarecommendationsystem.1.Thesetofactorsofthemovie.Someviewersprefermovieswiththeirfavoriteactors.2.Thedirector.Someviewershaveapreferencefortheworkofcertaindirectors.3.Theyearinwhichthemoviewasmade.Someviewerspreferoldmovies;otherswatchonlythelatestreleases.4.Thegenreorgeneraltypeofmovie.Someviewerslikeonlycomedies,othersdramasorromances.Therearemanyotherfeaturesofmoviesthatcouldbeusedaswell.Exceptforthelast,genre,theinformationisreadilyavailablefromdescriptionsofmovies.Genreisavaguerconcept.However,moviereviewsgenerallyassignagenrefromasetofcommonlyusedterms.ForexampletheInternetMovieDatabase(IMDB)assignsagenreorgenrestoeverymovie.WeshalldiscussmechanicalconstructionofgenresinSection9.3.3.Manyotherclassesofitemsalsoallowustoobtainfeaturesfromavailabledata,evenifthatdatamustatsomepointbeenteredbyhand.Forinstance,productsoftenhavedescriptionswrittenbythemanufacturer,givingfeaturesrelevanttothatclassofproduct(e.g.,thescreensizeandcabinetcolorforaTV).Bookshavedescriptionssimilartothoseformovies,sowecanobtainfeaturessuchasauthor,yearofpublication,andgenre.MusicproductssuchasCD'sandMP3downloadshaveavailablefeaturessuchasartist,composer,andgenre. 9.2.CONTENT-BASEDRECOMMENDATIONS3259.2.2DiscoveringFeaturesofDocumentsThereareotherclassesofitemswhereitisnotimmediatelyapparentwhatthevaluesoffeaturesshouldbe.Weshallconsidertwoofthem:documentcollec-tionsandimages.Documentspresentspecialproblems,andweshalldiscussthetechnologyforextractingfeaturesfromdocumentsinthissection.ImageswillbediscussedinSection9.2.3asanimportantexamplewhereuser-suppliedfeatureshavesomehopeofsuccess.Therearemanykindsofdocumentsforwhicharecommendationsystemcanbeuseful.Forexample,therearemanynewsarticlespublishedeachday,andwecannotreadallofthem.Arecommendationsystemcansuggestarticlesontopicsauserisinterestedin,buthowcanwedistinguishamongtopics?Webpagesarealsoacollectionofdocuments.Canwesuggestpagesausermightwanttosee?Likewise,blogscouldberecommendedtointerestedusers,ifwecouldclassifyblogsbytopics.Unfortunately,theseclassesofdocumentsdonottendtohavereadilyavail-ableinformationgivingfeatures.Asubstitutethathasbeenusefulinpracticeistheidenticationofwordsthatcharacterizethetopicofadocument.HowwedotheidenticationwasoutlinedinSection1.3.1.First,eliminatestopwords{theseveralhundredmostcommonwords,whichtendtosaylittleaboutthetopicofadocument.Fortheremainingwords,computetheTF.IDFscoreforeachwordinthedocument.Theoneswiththehighestscoresarethewordsthatcharacterizethedocument.WemaythentakeasthefeaturesofadocumentthenwordswiththehighestTF.IDFscores.Itispossibletopickntobethesameforalldocuments,ortoletnbeaxedpercentageofthewordsinthedocument.WecouldalsochoosetomakeallwordswhoseTF.IDFscoresareaboveagiventhresholdtobeapartofthefeatureset.Now,documentsarerepresentedbysetsofwords.Intuitively,weexpectthesewordstoexpressthesubjectsormainideasofthedocument.Forexample,inanewsarticle,wewouldexpectthewordswiththehighestTF.IDFscoretoincludethenamesofpeoplediscussedinthearticle,unusualpropertiesoftheeventdescribed,andthelocationoftheevent.Tomeasurethesimilarityoftwodocuments,thereareseveralnaturaldistancemeasureswecanuse:1.WecouldusetheJaccarddistancebetweenthesetsofwords(recallSec-tion3.5.3).2.Wecouldusethecosinedistance(recallSection3.5.4)betweenthesets,treatedasvectors.Tocomputethecosinedistanceinoption(2),thinkofthesetsofhigh-TF.IDFwordsasavector,withonecomponentforeachpossibleword.Thevectorhas1ifthewordisinthesetand0ifnot.Sincebetweentwodocu-mentsthereareonlyanitenumberofwordsamongtheirtwosets,theinnitedimensionalityofthevectorsisunimportant.Almostallcomponentsare0in 326CHAPTER9.RECOMMENDATIONSYSTEMS TwoKindsofDocumentSimilarityRecallthatinSection3.4wegaveamethodforndingdocumentsthatwere\similar,"usingshingling,minhashing,andLSH.There,thenotionofsimilaritywaslexical{documentsaresimilariftheycontainlarge,identicalsequencesofcharacters.Forrecommendationsystems,thenotionofsimilarityisdierent.Weareinterestedonlyintheoccurrencesofmanyimportantwordsinbothdocuments,evenifthereislittlelexicalsimilaritybetweenthedocuments.However,themethodologyforndingsimilardocumentsremainsalmostthesame.Oncewehaveadistancemeasure,eitherJaccardorcosine,wecanuseminhashing(forJaccard)orrandomhyperplanes(forcosinedistance;seeSection3.7.2)feedingdatatoanLSHalgorithmtondthepairsofdocumentsthataresimilarinthesenseofsharingmanycommonkeywords. both,and0'sdonotimpactthevalueofthedotproduct.Tobeprecise,thedotproductisthesizeoftheintersectionofthetwosetsofwords,andthelengthsofthevectorsarethesquarerootsofthenumbersofwordsineachset.Thatcalculationletsuscomputethecosineoftheanglebetweenthevectorsasthedotproductdividedbytheproductofthevectorlengths.9.2.3ObtainingItemFeaturesFromTagsLetusconsideradatabaseofimagesasanexampleofawaythatfeatureshavebeenobtainedforitems.Theproblemwithimagesisthattheirdata,typicallyanarrayofpixels,doesnottellusanythingusefulabouttheirfeatures.Wecancalculatesimplepropertiesofpixels,suchastheaverageamountofredinthepicture,butfewusersarelookingforredpicturesorespeciallylikeredpictures.Therehavebeenanumberofattemptstoobtaininformationaboutfeaturesofitemsbyinvitinguserstotagtheitemsbyenteringwordsorphrasesthatdescribetheitem.Thus,onepicturewithalotofredmightbetagged\Tianan-menSquare,"whileanotheristagged\sunsetatMalibu."Thedistinctionisnotsomethingthatcouldbediscoveredbyexistingimage-analysisprograms.Almostanykindofdatacanhaveitsfeaturesdescribedbytags.Oneoftheearliestattemptstotagmassiveamountsofdatawasthesitedel.icio.us,laterboughtbyYahoo!,whichinviteduserstotagWebpages.Thegoalofthistaggingwastomakeanewmethodofsearchavailable,whereusersenteredasetoftagsastheirsearchquery,andthesystemretrievedtheWebpagesthathadbeentaggedthatway.However,itisalsopossibletousethetagsasarecommendationsystem.Ifitisobservedthatauserretrievesorbookmarksmanypageswithacertainsetoftags,thenwecanrecommendotherpageswiththesametags.Theproblemwithtaggingasanapproachtofeaturediscoveryisthatthe 9.2.CONTENT-BASEDRECOMMENDATIONS327 TagsfromComputerGamesAninterestingdirectionforencouragingtaggingisthe\games"approachpioneeredbyLuisvonAhn.Heenabledtwoplayerstocollaborateonthetagforanimage.Inrounds,theywouldsuggestatag,andthetagswouldbeexchanged.Iftheyagreed,thenthey\won,"andifnot,theywouldplayanotherroundwiththesameimage,tryingtoagreesimultaneouslyonatag.Whileaninnovativedirectiontotry,itisquestionablewhethersucientpublicinterestcanbegeneratedtoproduceenoughfreeworktosatisfytheneedsfortaggeddata. processonlyworksifusersarewillingtotakethetroubletocreatethetags,andthereareenoughtagsthatoccasionalerroneousoneswillnotbiasthesystemtoomuch.9.2.4RepresentingItemProlesOurultimategoalforcontent-basedrecommendationistocreatebothanitemproleconsistingoffeature-valuepairsandauserprolesummarizingthepref-erencesoftheuser,basedoftheirrowoftheutilitymatrix.InSection9.2.2wesuggestedhowanitemprolecouldbeconstructed.Weimaginedavectorof0'sand1's,wherea1representedtheoccurrenceofahigh-TF.IDFwordinthedocument.Sincefeaturesfordocumentswereallwords,itwaseasytorepresentprolesthisway.Weshalltrytogeneralizethisvectorapproachtoallsortsoffeatures.Itiseasytodosoforfeaturesthataresetsofdiscretevalues.Forexample,ifonefeatureofmoviesisthesetofactors,thenimaginethatthereisacomponentforeachactor,with1iftheactorisinthemovie,and0ifnot.Likewise,wecanhaveacomponentforeachpossibledirector,andeachpossiblegenre.Allthesefeaturescanberepresentedusingonly0'sand1's.ThereisanotherclassoffeaturesthatisnotreadilyrepresentedbyBooleanvectors:thosefeaturesthatarenumerical.Forinstance,wemighttaketheaverageratingformoviestobeafeature,2andthisaverageisarealnumber.Itdoesnotmakesensetohaveonecomponentforeachofthepossibleaverageratings,anddoingsowouldcauseustolosethestructureimplicitinnumbers.Thatis,tworatingsthatareclosebutnotidenticalshouldbeconsideredmoresimilarthanwidelydieringratings.Likewise,numericalfeaturesofproducts,suchasscreensizeordiskcapacityforPC's,shouldbeconsideredsimilariftheirvaluesdonotdiergreatly.Numericalfeaturesshouldberepresentedbysinglecomponentsofvectorsrepresentingitems.Thesecomponentsholdtheexactvalueofthatfeature. 2Theratingisnotaveryreliablefeature,butitwillserveasanexample. 328CHAPTER9.RECOMMENDATIONSYSTEMSThereisnoharmifsomecomponentsofthevectorsareBooleanandothersarereal-valuedorinteger-valued.Wecanstillcomputethecosinedistancebetweenvectors,althoughifwedoso,weshouldgivesomethoughttotheappropri-atescalingofthenonBooleancomponents,sothattheyneitherdominatethecalculationnoraretheyirrelevant.Example9.2:Supposetheonlyfeaturesofmoviesarethesetofactorsandtheaveragerating.Considertwomovieswithveactorseach.Twooftheactorsareinbothmovies.Also,onemoviehasanaverageratingof3andtheotheranaverageof4.Thevectorslooksomethinglike011011013110101104However,thereareinprincipleaninnitenumberofadditionalcomponents,eachwith0'sforbothvectors,representingallthepossibleactorsthatneithermoviehas.Sincecosinedistanceofvectorsisnotaectedbycomponentsinwhichbothvectorshave0,weneednotworryabouttheeectofactorsthatareinneithermovie.Thelastcomponentshownrepresentstheaveragerating.Wehaveshownitashavinganunknownscalingfactor.Intermsof,wecancomputethecosineoftheanglebetweenthevectors.Thedotproductis2+122,andthelengthsofthevectorsarep 5+92andp 5+162.Thus,thecosineoftheanglebetweenthevectorsis2+122 p 25+1252+1444Ifwechoose=1,thatis,wetaketheaverageratingsastheyare,thenthevalueoftheaboveexpressionis0.816.Ifweuse=2,thatis,wedoubletheratings,thenthecosineis0.940.Thatis,thevectorsappearmuchcloserindirectionthanifweuse=1.Likewise,ifweuse=1=2,thenthecosineis0.619,makingthevectorslookquitedierent.Wecannottellwhichvalueofis\right,"butweseethatthechoiceofscalingfactorfornumericalfeaturesaectsourdecisionabouthowsimilaritemsare.29.2.5UserProlesWenotonlyneedtocreatevectorsdescribingitems;weneedtocreatevectorswiththesamecomponentsthatdescribetheuser'spreferences.Wehavetheutilitymatrixrepresentingtheconnectionbetweenusersanditems.Recallthenonblankmatrixentriescouldbejust1'srepresentinguserpurchasesorasimilarconnection,ortheycouldbearbitrarynumbersrepresentingaratingordegreeofaectionthattheuserhasfortheitem.Withthisinformation,thebestestimatewecanmakeregardingwhichitemstheuserlikesissomeaggregationoftheprolesofthoseitems.Iftheutilitymatrixhasonly1's,thenthenaturalaggregateistheaverageofthecomponents 9.2.CONTENT-BASEDRECOMMENDATIONS329ofthevectorsrepresentingtheitemprolesfortheitemsinwhichtheutilitymatrixhas1forthatuser.Example9.3:Supposeitemsaremovies,representedbyBooleanproleswithcomponentscorrespondingtoactors.Also,theutilitymatrixhasa1iftheuserhasseenthemovieandisblankotherwise.If20%ofthemoviesthatuserUlikeshaveJuliaRobertsasoneoftheactors,thentheuserproleforUwillhave0.2inthecomponentforJuliaRoberts.2IftheutilitymatrixisnotBoolean,e.g.,ratings1{5,thenwecanweightthevectorsrepresentingtheprolesofitemsbytheutilityvalue.Itmakessensetonormalizetheutilitiesbysubtractingtheaveragevalueforauser.Thatway,wegetnegativeweightsforitemswithabelow-averagerating,andpositiveweightsforitemswithabove-averageratings.ThateectwillproveusefulwhenwediscussinSection9.2.6howtonditemsthatausershouldlike.Example9.4:ConsiderthesamemovieinformationasinExample9.3,butnowsupposetheutilitymatrixhasnonblankentriesthatareratingsinthe1{5range.SupposeuserUgivesanaverageratingof3.TherearethreemovieswithJuliaRobertsasanactor,andthosemoviesgotratingsof3,4,and5.ThenintheuserproleofU,thecomponentforJuliaRobertswillhavevaluethatistheaverageof3 3,4 3,and5 3,thatis,avalueof1.Ontheotherhand,userVgivesanaverageratingof4,andhasalsoratedthreemovieswithJuliaRoberts(itdoesn'tmatterwhetherornottheyarethesamethreemoviesUrated).UserVgivesthesethreemoviesratingsof2,3,and5.TheuserproleforVhas,inthecomponentforJuliaRoberts,theaverageof2 4,3 4,and5 4,thatis,thevalue 2=3.29.2.6RecommendingItemstoUsersBasedonContentWithprolevectorsforbothusersanditems,wecanestimatethedegreetowhichauserwouldpreferanitembycomputingthecosinedistancebetweentheuser'sanditem'svectors.AsinExample9.2,wemaywishtoscalevar-iouscomponentswhosevaluesarenotBoolean.Therandom-hyperplaneandlocality-sensitive-hashingtechniquescanbeusedtoplace(just)itemprolesinbuckets.Inthatway,givenausertowhomwewanttorecommendsomeitems,wecanapplythesametwotechniques{randomhyperplanesandLSH{todetermineinwhichbucketswemustlookforitemsthatmighthaveasmallcosinedistancefromtheuser.Example9.5:ConsiderrstthedataofExample9.3.Theuser'sprolewillhavecomponentsforactorsproportionaltothelikelihoodthattheactorwillappearinamovietheuserlikes.Thus,thehighestrecommendations(lowestcosinedistance)belongtothemovieswithlotsofactorsthatappearinmany 330CHAPTER9.RECOMMENDATIONSYSTEMSofthemoviestheuserlikes.Aslongasactorsaretheonlyinformationwehaveaboutfeaturesofmovies,thatisprobablythebestwecando.3Now,considerExample9.4.There,weobservedthatthevectorforauserwillhavepositivenumbersforactorsthattendtoappearinmoviestheuserlikesandnegativenumbersforactorsthattendtoappearinmoviestheuserdoesn'tlike.Consideramoviewithmanyactorstheuserlikes,andonlyafewornonethattheuserdoesn'tlike.Thecosineoftheanglebetweentheuser'sandmovie'svectorswillbealargepositivefraction.Thatimpliesananglecloseto0,andthereforeasmallcosinedistancebetweenthevectors.Next,consideramoviewithaboutasmanyactorsthattheuserlikesasthosetheuserdoesn'tlike.Inthissituation,thecosineoftheanglebetweentheuserandmovieisaround0,andthereforetheanglebetweenthetwovectorsisaround90degrees.Finally,consideramoviewithmostlyactorstheuserdoesn'tlike.Inthatcase,thecosinewillbealargenegativefraction,andtheanglebetweenthetwovectorswillbecloseto180degrees{themaximumpossiblecosinedistance.29.2.7ClassicationAlgorithmsAcompletelydierentapproachtoarecommendationsystemusingitemprolesandutilitymatricesistotreattheproblemasoneofmachinelearning.Regardthegivendataasatrainingset,andforeachuser,buildaclassierthatpredictstheratingofallitems.Thereareagreatnumberofdierentclassiers,anditisnotourpurposetoteachthissubjecthere.However,youshouldbeawareoftheoptionofdevelopingaclassierforrecommendation,soweshalldiscussonecommonclassier{decisiontrees{brie\ry.Adecisiontreeisacollectionofnodes,arrangedasabinarytree.Theleavesrenderdecisions;inourcase,thedecisionwouldbe\likes"or\doesn'tlike."Eachinteriornodeisaconditionontheobjectsbeingclassied;inourcasetheconditionwouldbeapredicateinvolvingoneormorefeaturesofanitem.Toclassifyanitem,westartattheroot,andapplythepredicateattheroottotheitem.Ifthepredicateistrue,gototheleftchild,andifitisfalse,gototherightchild.Thenrepeatthesameprocessatthenodevisited,untilaleafisreached.Thatleafclassiestheitemaslikedornot.Constructionofadecisiontreerequiresselectionofapredicateforeachinteriornode.Therearemanywaysofpickingthebestpredicate,buttheyalltrytoarrangethatoneofthechildrengetsallormostofthepositiveexamplesinthetrainingset(i.e,theitemsthatthegivenuserlikes,inourcase)andtheotherchildgetsallormostofthenegativeexamples(theitemsthisuserdoesnotlike). 3Notethatthefactalluser-vectorcomponentswillbesmallfractionsdoesnotaecttherecommendation,sincethecosinecalculationinvolvesdividingbythelengthofeachvector.Thatis,uservectorswilltendtobemuchshorterthanmovievectors,butonlythedirectionofvectorsmatters. 9.2.CONTENT-BASEDRECOMMENDATIONS331OncewehaveselectedapredicateforanodeN,wedividetheitemsintothetwogroups:thosethatsatisfythepredicateandthosethatdonot.Foreachgroup,weagainndthepredicatethatbestseparatesthepositiveandnegativeexamplesinthatgroup.ThesepredicatesareassignedtothechildrenofN.Thisprocessofdividingtheexamplesandbuildingchildrencanproceedtoanynumberoflevels.Wecanstop,andcreatealeaf,ifthegroupofitemsforanodeishomogeneous;i.e.,theyareallpositiveorallnegativeexamples.However,wemaywishtostopandcreatealeafwiththemajoritydecisionforagroup,evenifthegroupcontainsbothpositiveandnegativeexamples.Thereasonisthatthestatisticalsignicanceofasmallgroupmaynotbehighenoughtorelyon.Forthatreasonavariantstrategyistocreateanensembleofdecisiontrees,eachusingdierentpredicates,butallowthetreestobedeeperthanwhattheavailabledatajusties.Suchtreesarecalledovertted.Toclassifyanitem,applyallthetreesintheensemble,andletthemvoteontheoutcome.Weshallnotconsiderthisoptionhere,butgiveasimplehypotheticalexampleofadecisiontree.Example9.6:Supposeouritemsarenewsarticles,andfeaturesarethehigh-TF.IDFwords(keywords)inthosedocuments.FurthersupposethereisauserUwholikesarticlesaboutbaseball,exceptarticlesabouttheNewYorkYankees.TherowoftheutilitymatrixforUhas1ifUhasreadthearticleandisblankifnot.Weshalltakethe1'sas\like"andtheblanksas\doesn'tlike."PredicateswillbeBooleanexpressionsofkeywords.SinceUgenerallylikesbaseball,wemightndthatthebestpredicatefortherootis\homerun"OR(\batter"AND\pitcher").Itemsthatsatisfythepredicatewilltendtobepositiveexamples(articleswith1intherowforUintheutilitymatrix),anditemsthatfailtosatisfythepredicatewilltendtobenegativeexamples(blanksintheutility-matrixrowforU).Figure9.3showstherootaswellastherestofthedecisiontree.Supposethatthegroupofarticlesthatdonotsatisfythepredicateincludessucientlyfewpositiveexamplesthatwecanconcludealloftheseitemsareinthe\don't-like"class.Wemaythenputaleafwithdecision\don'tlike"astherightchildoftheroot.However,thearticlesthatsatisfythepredicateincludesanumberofarticlesthatuserUdoesn'tlike;thesearethearticlesthatmentiontheYankees.Thus,attheleftchildoftheroot,webuildanotherpredicate.Wemightndthatthepredicate\Yankees"OR\Jeter"OR\Teixeira"isthebestpossibleindicatorofanarticleaboutbaseballandabouttheYankees.Thus,weseeinFig.9.3theleftchildoftheroot,whichappliesthispredicate.Bothchildrenofthisnodeareleaves,sincewemaysupposethattheitemssatisfyingthispredicatearepredominantlynegativeandthosenotsatisfyingitarepredominantlypositive.2Unfortunately,classiersofalltypestendtotakealongtimetoconstruct.Forinstance,ifwewishtousedecisiontrees,weneedonetreeperuser.Con-structingatreenotonlyrequiresthatwelookatalltheitemproles,butwe 332CHAPTER9.RECOMMENDATIONSYSTEMS "homerun"OR Figure9.3:Adecisiontreehavetoconsidermanydierentpredicates,whichcouldinvolvecomplexcom-binationsoffeatures.Thus,thisapproachtendstobeusedonlyforrelativelysmallproblemsizes.9.2.8ExercisesforSection9.2Exercise9.2.1:Threecomputers,AB,andC,havethenumericalfeatureslistedbelow:Feature ABC ProcessorSpeed 3.062.682.92DiskSize 500320640Main-MemorySize 646Wemayimaginethesevaluesasdeningavectorforeachcomputer;forin-stance,A'svectoris[3065006].Wecancomputethecosinedistancebetweenanytwoofthevectors,butifwedonotscalethecomponents,thenthedisksizewilldominateandmakedierencesintheothercomponentsessentiallyin-visible.Letususe1asthescalefactorforprocessorspeed,forthedisksize,andforthemainmemorysize.(a)Intermsofand,computethecosinesoftheanglesbetweenthevectorsforeachpairofthethreecomputers.(b)Whataretheanglesbetweenthevectorsif==1?(c)Whataretheanglesbetweenthevectorsif=001and=05? 9.3.COLLABORATIVEFILTERING333!(d)Onefairwayofselectingscalefactorsistomakeeachinverselypropor-tionaltotheaveragevalueinitscomponent.Whatwouldbethevaluesofand,andwhatwouldbetheanglesbetweenthevectors?Exercise9.2.2:Analternativewayofscalingcomponentsofavectoristobeginbynormalizingthevectors.Thatis,computetheaverageforeachcom-ponentandsubtractitfromthatcomponent'svalueineachofthevectors.(a)NormalizethevectorsforthethreecomputersdescribedinExercise9.2.1.!!(b)Thisquestiondoesnotrequiredicultcalculation,butitrequiressomeseriousthoughtaboutwhatanglesbetweenvectorsmean.Whenallcom-ponentsarenonnegative,astheyareinthedataofExercise9.2.1,novectorscanhaveananglegreaterthan90degrees.However,whenwenormalizevectors,wecan(andmust)getsomenegativecomponents,sotheanglescannowbeanything,thatis,0to180degrees.Moreover,averagesarenow0ineverycomponent,sothesuggestioninpart(d)ofExercise9.2.1thatweshouldscaleininverseproportiontotheaveragemakesnosense.Suggestawayofndinganappropriatescaleforeachcomponentofnormalizedvectors.Howwouldyouinterpretalargeorsmallanglebetweennormalizedvectors?WhatwouldtheanglesbeforthenormalizedvectorsderivedfromthedatainExercise9.2.1?Exercise9.2.3:AcertainuserhasratedthethreecomputersofExercise9.2.1asfollows:A:4stars,B:2stars,C:5stars.(a)Normalizetheratingsforthisuser.(b)Computeauserprolefortheuser,withcomponentsforprocessorspeed,disksize,andmainmemorysize,basedonthedataofExercise9.2.1.9.3CollaborativeFilteringWeshallnowtakeupasignicantlydierentapproachtorecommendation.Insteadofusingfeaturesofitemstodeterminetheirsimilarity,wefocusonthesimilarityoftheuserratingsfortwoitems.Thatis,inplaceoftheitem-prolevectorforanitem,weuseitscolumnintheutilitymatrix.Further,insteadofcontrivingaprolevectorforusers,werepresentthembytheirrowsintheutilitymatrix.UsersaresimilariftheirvectorsarecloseaccordingtosomedistancemeasuresuchasJaccardorcosinedistance.RecommendationforauserUisthenmadebylookingattheusersthataremostsimilartoUinthissense,andrecommendingitemsthattheseuserslike.Theprocessofidentifyingsimilarusersandrecommendingwhatsimilaruserslikeiscalledcollaborativeltering 334CHAPTER9.RECOMMENDATIONSYSTEMS9.3.1MeasuringSimilarityTherstquestionwemustdealwithishowtomeasuresimilarityofusersoritemsfromtheirrowsorcolumnsintheutilitymatrix.WehavereproducedFig.9.1hereasFig.9.4.Thisdataistoosmalltodrawanyreliableconclusions,butitssmallsizewillmakeclearsomeofthepitfallsinpickingadistancemeasure.ObservespecicallytheusersAandC.Theyratedtwomoviesincommon,buttheyappeartohavealmostdiametricallyoppositeopinionsofthesemovies.Wewouldexpectthatagooddistancemeasurewouldmakethemratherfarapart.Herearesomealternativemeasurestoconsider. HP1HP2HP3TWSW1SW2SW3 A 451B 554C 245D 33Figure9.4:TheutilitymatrixintroducedinFig.9.1JaccardDistanceWecouldignorevaluesinthematrixandfocusonlyonthesetsofitemsrated.Iftheutilitymatrixonlyre\rectedpurchases,thismeasurewouldbeagoodonetochoose.However,whenutilitiesaremoredetailedratings,theJaccarddistancelosesimportantinformation.Example9.7:AandBhaveanintersectionofsize1andaunionofsize5.Thus,theirJaccardsimilarityis1/5,andtheirJaccarddistanceis4/5;i.e.,theyareveryfarapart.Incomparison,AandChaveaJaccardsimilarityof2/4,sotheirJaccarddistanceisthesame,1/2.Thus,AappearsclosertoCthantoB.Yetthatconclusionseemsintuitivelywrong.AandCdisagreeonthetwomoviestheybothwatched,whileAandBseembothtohavelikedtheonemovietheywatchedincommon.2CosineDistanceWecantreatblanksasa0value.Thischoiceisquestionable,sinceithastheeectoftreatingthelackofaratingasmoresimilartodislikingthemoviethanlikingit.Example9.8:ThecosineoftheanglebetweenAandBis45 p 42+52+12p 52+52+42=0380 9.3.COLLABORATIVEFILTERING335ThecosineoftheanglebetweenAandCis52+14 p 42+52+12p 22+42+52=0322Sincealarger(positive)cosineimpliesasmallerangleandthereforeasmallerdistance,thismeasuretellsusthatAisslightlyclosertoBthantoC2RoundingtheDataWecouldtrytoeliminatetheapparentsimilaritybetweenmoviesauserrateshighlyandthosewithlowscoresbyroundingtheratings.Forinstance,wecouldconsiderratingsof3,4,and5asa\1"andconsiderratings1and2asunrated.TheutilitymatrixwouldthenlookasinFig.9.5.Now,theJaccarddistancebetweenAandBis3/4,whilebetweenAandCitis1;i.e.,CappearsfurtherfromAthanBdoes,whichisintuitivelycorrect.ApplyingcosinedistancetoFig.9.5allowsustodrawthesameconclusion. HP1HP2HP3TWSW1SW2SW3 A 11B 111C 11D 11Figure9.5:Utilitiesof3,4,and5havebeenreplacedby1,whileratingsof1and2areomittedNormalizingRatingsIfwenormalizeratings,bysubtractingfromeachratingtheaverageratingofthatuser,weturnlowratingsintonegativenumbersandhighratingsintopositivenumbers.Ifwethentakethecosinedistance,wendthatuserswithoppositeviewsofthemoviestheyviewedincommonwillhavevectorsinalmostoppositedirections,andcanbeconsideredasfarapartaspossible.However,userswithsimilaropinionsaboutthemoviesratedincommonwillhavearelativelysmallanglebetweenthem.Example9.9:Figure9.6showsthematrixofFig.9.4withallratingsnor-malized.AninterestingeectisthatD'sratingshaveeectivelydisappeared,becausea0isthesameasablankwhencosinedistanceiscomputed.NotethatDgaveonly3'sanddidnotdierentiateamongmovies,soitisquitepossiblethatD'sopinionsarenotworthtakingseriously.LetuscomputethecosineoftheanglebetweenAandB(2=3)(1=3) p (2=3)2+(5=3)2+( 7=3)2p (1=3)2+(1=3)2+( 2=3)2=0092 336CHAPTER9.RECOMMENDATIONSYSTEMS HP1HP2HP3TWSW1SW2SW3 A 2/35/3 7=3B 1/31/3 2=3C 5=31/34/3D 00Figure9.6:TheutilitymatrixintroducedinFig.9.1ThecosineoftheanglebetweenbetweenAandCis(5=3)( 5=3)+( 7=3)(1=3) p (2=3)2+(5=3)2+( 7=3)2p ( 5=3)2+(1=3)2+(4=3)2= 0559Noticethatunderthismeasure,AandCaremuchfurtherapartthanAandB,andneitherpairisveryclose.Boththeseobservationsmakeintuitivesense,giventhatAandCdisagreeonthetwomoviestheyratedincommon,whileAandBgivesimilarscorestotheonemovietheyratedincommon.29.3.2TheDualityofSimilarityTheutilitymatrixcanbeviewedastellingusaboutusersoraboutitems,orboth.ItisimportanttorealizethatanyofthetechniqueswesuggestedinSection9.3.1forndingsimilaruserscanbeusedoncolumnsoftheutilitymatrixtondsimilaritems.Therearetwowaysinwhichthesymmetryisbrokeninpractice.1.Wecanuseinformationaboutuserstorecommenditems.Thatis,givenauser,wecanndsomenumberofthemostsimilarusers,perhapsusingthetechniquesofChapter3.Wecanbaseourrecommendationonthedecisionsmadebythesesimilarusers,e.g.,recommendtheitemsthatthegreatestnumberofthemhavepurchasedorratedhighly.However,thereisnosymmetry.Evenifwendpairsofsimilaritems,weneedtotakeanadditionalstepinordertorecommenditemstousers.Thispointisexploredfurtherattheendofthissubsection.2.Thereisadierenceinthetypicalbehaviorofusersanditems,asitpertainstosimilarity.Intuitively,itemstendtobeclassiableinsimpleterms.Forexample,musictendstobelongtoasinglegenre.Itisimpossi-ble,e.g.,forapieceofmusictobeboth60'srockand1700'sbaroque.Ontheotherhand,thereareindividualswholikeboth60'srockand1700'sbaroque,andwhobuyexamplesofbothtypesofmusic.Theconsequenceisthatitiseasiertodiscoveritemsthataresimilarbecausetheybelongtothesamegenre,thanitistodetectthattwousersaresimilarbecausetheypreferonegenreincommon,whileeachalsolikessomegenresthattheotherdoesn'tcarefor. 9.3.COLLABORATIVEFILTERING337Aswesuggestedin(1)above,onewayofpredictingthevalueoftheutility-matrixentryforuserUanditemIistondthenusers(forsomepredeterminedn)mostsimilartoUandaveragetheirratingsforitemI,countingonlythoseamongthensimilaruserswhohaveratedI.Itisgenerallybettertonormalizethematrixrst.Thatis,foreachofthenuserssubtracttheiraverageratingforitemsfromtheirratingfori.AveragethedierenceforthoseuserswhohaveratedI,andthenaddthisaveragetotheaverageratingthatUgivesforallitems.ThisnormalizationadjuststheestimateinthecasethatUtendstogiveveryhighorverylowratings,oralargefractionofthesimilaruserswhoratedI(ofwhichtheremaybeonlyafew)areuserswhotendtorateveryhighorverylow.Dually,wecanuseitemsimilaritytoestimatetheentryforuserUanditemI.FindthemitemsmostsimilartoI,forsomem,andtaketheaveragerating,amongthemitems,oftheratingsthatUhasgiven.Asforuser-usersimilarity,weconsideronlythoseitemsamongthemthatUhasrated,anditisprobablywisetonormalizeitemratingsrst.Notethatwhicheverapproachtoestimatingentriesintheutilitymatrixweuse,itisnotsucienttondonlyoneentry.InordertorecommenditemstoauserU,weneedtoestimateeveryentryintherowoftheutilitymatrixforU,oratleastndallormostoftheentriesinthatrowthatareblankbuthaveahighestimatedvalue.Thereisatradeoregardingwhetherweshouldworkfromsimilarusersorsimilaritems.Ifwendsimilarusers,thenweonlyhavetodotheprocessonceforuserU.FromthesetofsimilaruserswecanestimatealltheblanksintheutilitymatrixforU.Ifweworkfromsimilaritems,wehavetocomputesimilaritemsforalmostallitems,beforewecanestimatetherowforUOntheotherhand,item-itemsimilarityoftenprovidesmorereliablein-formation,becauseofthephenomenonobservedabove,namelythatitiseasiertonditemsofthesamegenrethanitistondusersthatlikeonlyitemsofasinglegenre.Whichevermethodwechoose,weshouldprecomputepreferreditemsforeachuser,ratherthanwaitinguntilweneedtomakeadecision.Sincetheutilitymatrixevolvesslowly,itisgenerallysucienttocomputeitinfrequentlyandassumethatitremainsxedbetweenrecomputations.9.3.3ClusteringUsersandItemsItishardtodetectsimilarityamongeitheritemsorusers,becausewehavelittleinformationaboutuser-itempairsinthesparseutilitymatrix.IntheperspectiveofSection9.3.2,eveniftwoitemsbelongtothesamegenre,therearelikelytobeveryfewuserswhoboughtorratedboth.Likewise,eveniftwousersbothlikeagenreorgenres,theymaynothaveboughtanyitemsincommon. 338CHAPTER9.RECOMMENDATIONSYSTEMSOnewayofdealingwiththispitfallistoclusteritemsand/orusers.SelectanyofthedistancemeasuressuggestedinSection9.3.1,oranyotherdistancemeasure,anduseittoperformaclusteringof,say,items.AnyofthemethodssuggestedinChapter7canbeused.However,weshallseethattheremaybelittlereasontotrytoclusterintoasmallnumberofclustersimmediately.Rather,ahierarchicalapproach,whereweleavemanyclustersunmergedmaysuceasarststep.Forexample,wemightleavehalfasmanyclustersasthereareitems. HPTWSW A 451B 4.67C 24.5D 33Figure9.7:UtilitymatrixforusersandclustersofitemsExample9.10:Figure9.7showswhathappenstotheutilitymatrixofFig.9.4ifwemanagetoclusterthethreeHarry-Pottermoviesintoonecluster,denotedHP,andalsoclusterthethreeStar-WarsmoviesintooneclusterSW2Havingclustereditemstoanextent,wecanrevisetheutilitymatrixsothecolumnsrepresentclustersofitems,andtheentryforuserUandclusterCistheaverageratingthatUgavetothosemembersofclusterCthatUdidrate.NotethatUmayhaveratednoneoftheclustermembers,inwhichcasetheentryforUandCisstillblank.Wecanusethisrevisedutilitymatrixtoclusterusers,againusingthedis-tancemeasureweconsidermostappropriate.Useaclusteringalgorithmthatagainleavesmanyclusters,e.g.,halfasmanyclustersasthereareusers.Re-visetheutilitymatrix,sotherowscorrespondtoclustersofusers,justasthecolumnscorrespondtoclustersofitems.Asforitem-clusters,computetheentryforauserclusterbyaveragingtheratingsoftheusersinthecluster.Now,thisprocesscanberepeatedseveraltimesifwelike.Thatis,wecanclustertheitemclustersandagainmergethecolumnsoftheutilitymatrixthatbelongtoonecluster.Wecanthenturntotheusersagain,andclustertheuserclusters.Theprocesscanrepeatuntilwehaveanintuitivelyreasonablenumberofclustersofeachkind.Oncewehaveclusteredtheusersand/oritemstothedesiredextentandcomputedthecluster-clusterutilitymatrix,wecanestimateentriesintheorig-inalutilitymatrixasfollows.SupposewewanttopredicttheentryforuserUanditemI(a)FindtheclusterstowhichUandIbelong,sayclustersCandD,respec-tively. 9.3.COLLABORATIVEFILTERING339(b)Iftheentryinthecluster-clusterutilitymatrixforCandDissomethingotherthanblank,usethisvalueastheestimatedvaluefortheU{Ientryintheoriginalutilitymatrix.(c)IftheentryforC{Disblank,thenusethemethodoutlinedinSection9.3.2toestimatethatentrybyconsideringclusterssimilartoCorD.UsetheresultingestimateastheestimatefortheU-Ientry.9.3.4ExercisesforSection9.3 abcdefgh A 455132B 343121C 213453Figure9.8:AutilitymatrixforexercisesExercise9.3.1:Figure9.8isautilitymatrix,representingtheratings,ona1{5starscale,ofeightitems,athroughh,bythreeusersAB,andC.Computethefollowingfromthedataofthismatrix.(a)Treatingtheutilitymatrixasboolean,computetheJaccarddistancebe-tweeneachpairofusers.(b)RepeatPart(a),butusethecosinedistance.(c)Treatratingsof3,4,and5as1and1,2,andblankas0.ComputetheJaccarddistancebetweeneachpairofusers.(d)RepeatPart(c),butusethecosinedistance.(e)Normalizethematrixbysubtractingfromeachnonblankentrytheaveragevalueforitsuser.(f)UsingthenormalizedmatrixfromPart(e),computethecosinedistancebetweeneachpairofusers.Exercise9.3.2:Inthisexercise,weclusteritemsinthematrixofFig.9.8.Dothefollowingsteps.(a)Clustertheeightitemshierarchicallyintofourclusters.Thefollowingmethodshouldbeusedtocluster.Replaceall3's,4's,and5'sby1andreplace1's,2's,andblanksby0.usetheJaccarddistancetomeasurethedistancebetweentheresultingcolumnvectors.Forclustersofmorethanoneelement,takethedistancebetweenclusterstobetheminimumdistancebetweenpairsofelements,onefromeachcluster. 340CHAPTER9.RECOMMENDATIONSYSTEMS(b)Then,constructfromtheoriginalmatrixofFig.9.8anewmatrixwhoserowscorrespondtousers,asbefore,andwhosecolumnscorrespondtoclusters.Computetheentryforauserandclusterofitemsbyaveragingthenonblankentriesforthatuserandalltheitemsinthecluster.(c)Computethecosinedistancebetweeneachpairofusers,accordingtoyourmatrixfromPart(b).9.4DimensionalityReductionAnentirelydierentapproachtoestimatingtheblankentriesintheutilitymatrixistoconjecturethattheutilitymatrixisactuallytheproductoftwolong,thinmatrices.Thisviewmakessenseiftherearearelativelysmallsetoffeaturesofitemsandusersthatdeterminethereactionofmostuserstomostitems.Inthissection,wesketchoneapproachtodiscoveringtwosuchmatrices;theapproachiscalled\UV-decomposition,"anditisaninstanceofamoregeneraltheorycalledSVD(singular-valuedecomposition).9.4.1UV-DecompositionConsidermoviesasacaseinpoint.Mostusersrespondtoasmallnumberoffeatures;theylikecertaingenres,theymayhavecertainfamousactorsoractressesthattheylike,andperhapsthereareafewdirectorswithasignicantfollowing.IfwestartwiththeutilitymatrixM,withnrowsandmcolumns(i.e.,therearenusersandmitems),thenwemightbeabletondamatrixUwithnrowsanddcolumnsandamatrixVwithdrowsandmcolumns,suchthatUVcloselyapproximatesMinthoseentrieswhereMisnonblank.Ifso,thenwehaveestablishedthatthereareddimensionsthatallowustocharacterizebothusersanditemsclosely.WecanthenusetheentryintheproductUVtoestimatethecorrespondingblankentryinutilitymatrixMThisprocessiscalledUV-decompositionofM26666452443312412314254354454377775=266664u11u12u21u22u31u32u41u42u51u52377775v11v12v13v14v15v21v22v23v24v25Figure9.9:UV-decompositionofmatrixMExample9.11:Weshalluseasarunningexamplea5-by-5matrixMwithallbuttwoofitsentriesknown.WewishtodecomposeMintoa5-by-2and2-by-5matrix,UandV,respectively.ThematricesMU,andVareshowninFig.9.9withtheknownentriesofMindicatedandthematricesUand 9.4.DIMENSIONALITYREDUCTION341Vshownwiththeirentriesasvariablestobedetermined.ThisexampleisessentiallythesmallestnontrivialcasewheretherearemoreknownentriesthanthereareentriesinUandVcombined,andwethereforecanexpectthatthebestdecompositionwillnotyieldaproductthatagreesexactlyinthenonblankentriesofM29.4.2Root-Mean-SquareErrorWhilewecanpickamongseveralmeasuresofhowclosetheproductUVistoM,thetypicalchoiceistheroot-mean-squareerror(RMSE),wherewe1.Sum,overallnonblankentriesinMthesquareofthedierencebetweenthatentryandthecorrespondingentryintheproductUV2.Takethemean(average)ofthesesquaresbydividingbythenumberoftermsinthesum(i.e.,thenumberofnonblankentriesinM).3.Takethesquarerootofthemean.Minimizingthesumofthesquaresisthesameasminimizingthesquarerootoftheaveragesquare,sowegenerallyomitthelasttwostepsinourrunningexample.26666411111111113777751111111111=2666642222222222222222222222222377775Figure9.10:MatricesUandVwithallentries1Example9.12:SupposeweguessthatUandVshouldeachhaveentriesthatareall1's,asshowninFig.9.10.Thisisapoorguess,sincetheproduct,consistingofall2's,hasentriesthataremuchbelowtheaverageoftheentriesinM.Nonetheless,wecancomputetheRMSEforthisUandV;infacttheregularityintheentriesmakesthecalculationespeciallyeasytofollow.ConsidertherstrowsofMandUV.Wesubtract2(eachentryinUV)fromtheentriesintherstrowofM,toget30221.Wesquareandsumthesetoget18.Inthesecondrow,wedothesametoget1 102 1,squareandsumtoget7.Inthethirdrow,thesecondcolumnisblank,sothatentryisignoredwhencomputingtheRMSE.Thedierencesare01 12andthesumofsquaresis6.Forthefourthrow,thedierencesare03213andthesumofsquaresis23.Thefthrowhasablankentryinthelastcolumn,sothedierencesare2232andthesumofsquaresis21.Whenwesumthesumsfromeachoftheverows,weget18+7+6+23+21=75.Generally,weshall 342CHAPTER9.RECOMMENDATIONSYSTEMSstopatthispoint,butifwewanttocomputethetrueRMSE,wedivideby23(thenumberofnonblankentriesinM)andtakethesquareroot.Inthiscasep 75=23=1806istheRMSE.29.4.3IncrementalComputationofaUV-DecompositionFindingtheUV-decompositionwiththeleastRMSEinvolvesstartingwithsomearbitrarilychosenUandV,andrepeatedlyadjustingUandVtomaketheRMSEsmaller.WeshallconsideronlyadjustmentstoasingleelementofUorV,althoughinprinciple,onecouldmakemorecomplexadjustments.Whateveradjustmentsweallow,inatypicalexampletherewillbemanylo-calminima{matricesUandVsuchthatnoallowableadjustmentreducestheRMSE.Unfortunately,onlyoneoftheselocalminimawillbetheglobalminimum{thematricesUandVthatproducetheleastpossibleRMSE.Toincreaseourchancesofndingtheglobalminimum,weneedtopickmanydif-ferentstartingpoints,thatis,dierentchoicesoftheinitialmatricesUandVHowever,thereisneveraguaranteethatourbestlocalminimumwillbetheglobalminimum.WeshallstartwiththeUandVofFig.9.10,whereallentriesare1,anddoafewadjustmentstosomeoftheentries,ndingthevaluesofthoseentriesthatgivethelargestpossibleimprovementtotheRMSE.Fromtheseexamples,thegeneralcalculationshouldbecomeobvious,butweshallfollowtheexamplesbytheformulaforminimizingtheRMSEbychangingasingleentry.Inwhatfollows,weshallrefertoentriesofUandVbytheirvariablenamesu11,andsoon,asgiveninFig.9.9.Example9.13:SupposewestartwithUandVasinFig.9.10,andwedecidetoalteru11toreducetheRMSEasmuchaspossible.Letthevalueofu11bex.ThenthenewUandVcanbeexpressedasinFig.9.11.266664x1111111113777751111111111=266664x+1x+1x+1x+1x+122222222222222222222377775Figure9.11:Makingu11avariableNoticethattheonlyentriesoftheproductthathavechangedarethoseintherstrow.Thus,whenwecompareUVwithM,theonlychangetotheRMSEcomesfromtherstrow.Thecontributiontothesumofsquaresfromtherstrowis 5 (x+1)2+ 2 (x+1)2+ 4 (x+1)2+ 4 (x+1)2+ 3 (x+1)2 9.4.DIMENSIONALITYREDUCTION343Thissumsimpliesto(4 x)2+(1 x)2+(3 x)2+(3 x)2+(2 x)2Wewantthevalueofxthatminimizesthesum,sowetakethederivativeandsetthatequalto0,as: 2 (4 x)+(1 x)+(3 x)+(3 x)+(2 x)=0or 2(13 5x)=0,fromwhichitfollowsthatx=26.266664261111111113777751111111111=266664363636363622222222222222222222377775Figure9.12:Thebestvalueforu11isfoundtobe2.6Figure9.12showsUandVafteru11hasbeensetto2.6.Notethatthesumofthesquaresoftheerrorsintherstrowhasbeenreducedfrom18to5.2,sothetotalRMSE(ignoringaverageandsquareroot)hasbeenreducedfrom75to62.2.26666426111111111377775y111111111=26666426y+136363636y+12222y+12222y+12222y+12222377775Figure9.13:v11becomesavariableySupposeournextentrytovaryisv11.Letthevalueofthisentrybey,assuggestedinFig.9.13.Onlytherstcolumnoftheproductisaectedbyy,soweneedonlytocomputethesumofthesquaresofthedierencesbetweentheentriesintherstcolumnsofMandUV.Thissumis 5 (26y+1)2+ 3 (y+1)2+ 2 (y+1)2+ 2 (y+1)2+ 4 (y+1)2Thisexpressionsimpliesto(4 26y)2+(2 y)2+(1 y)2+(1 y)2+(3 y)2Asbefore,wendtheminimumvalueofthisexpressionbydierentiatingandequatingto0,as: 2 26(4 26y)+(2 y)+(1 y)+(1 y)+(3 y)=0 344CHAPTER9.RECOMMENDATIONSYSTEMS266664261111111113777751617111111111=26666452043636363626172222261722222617222226172222377775Figure9.14:Replaceyby1.617Thesolutionforyisy=174=1076=1617.TheimprovedestimatesofUandVareshowninFig.9.14.Weshalldoonemorechange,toillustratewhathappenswhenentriesofMareblank.Weshallvaryu31,callingitztemporarily.ThenewUandVareshowninFig.9.15.Thevalueofzaectsonlytheentriesinthethirdrow.2666642:6111z111113777751:617111111111=2666645:2043:63:63:63:62:61722221:617z+1z+1z+1z+1z+12:61722222:6172222377775Figure9.15:u31becomesavariablezWecanexpressthesumofthesquaresoftheerrorsas 2 (1617z+1)2+ 3 (z+1)2+ 1 (z+1)2+ 4 (z+1)2Notethatthereisnocontributionfromtheelementinthesecondcolumnofthethirdrow,sincethiselementisblankinM.Theexpressionsimpliesto(1 1617z)2+(2 z)2+( z)2+(3 z)2Theusualprocessofsettingthederivativeto0givesus 2 1617(1 1617z)+(2 z)+( z)+(3 z)=0whosesolutionisz=6617=5615=1178.Thenextestimateofthedecompo-sitionUVisshowninFig.9.16.29.4.4OptimizinganArbitraryElementHavingseensomeexamplesofpickingtheoptimumvalueforasingleelementinthematrixUorV,letusnowdevelopthegeneralformula.Asbefore,assume 9.4.DIMENSIONALITYREDUCTION3452666642:61111:178111113777751:617111111111=2666645:2043:63:63:63:62:61722222:9052:1782:1782:1782:1782:61722222:6172222377775Figure9.16:Replacezby1.178thatMisann-by-mutilitymatrixwithsomeentriesblank,whileUandVarematricesofdimensionsn-by-dandd-by-m,forsomed.Weshallusemijuij,andvijfortheentriesinrowiandcolumnjofMU,andV,respectively.Also,letP=UV,andusepijfortheelementinrowiandcolumnjoftheproductmatrixPSupposewewanttovaryursandndthevalueofthiselementthatmini-mizestheRMSEbetweenMandUV.NotethatursaectsonlytheelementsinrowroftheproductP=UV.Thus,weneedonlyconcernourselveswiththeelementsprj=dXk=1urkvkj=Xk=surkvkj+xvsjforallvaluesofjsuchthatmrjisnonblank.Intheexpressionabove,wehavereplacedurs,theelementwewishtovary,byavariablex,andweusetheconventionPk=sisshorthandforthesumfork=12;:::;d,exceptfork=sIfmrjisanonblankentryofthematrixM,thenthecontributionofthiselementtothesumofthesquaresoftheerrorsis(mrj prj)2= mrj Xk=surkvkj xvsj2Weshalluseanotherconvention:Pjisshorthandforthesumoveralljsuchthatmrjisnonblank.Thenwecanwritethesumofthesquaresoftheerrorsthatareaectedbythevalueofx=ursasXj mrj Xk=surkvkj xvsj2Takethederivativeoftheabovewithrespecttox,andsetitequalto0,inordertondthevalueofxthatminimizestheRMSE.Thatis,Xj 2vsj mrj Xk=surkvkj xvsj=0 346CHAPTER9.RECOMMENDATIONSYSTEMSAsinthepreviousexamples,thecommonfactor 2canbedropped.Wesolvetheaboveequationforx,andgetx=Pjvsj mrj Pk=surkvkj Pjv2sjThereisananalogousformulafortheoptimumvalueofanelementofV.Ifwewanttovaryvrs=y,thenthevalueofythatminimizestheRMSEisy=Piuir mis Pk=ruikvks Piu2irHere,Piisshorthandforthesumoverallisuchthatmisisnonblank,andPk=risthesumoverallvaluesofkbetween1andd,exceptfork=r9.4.5BuildingaCompleteUV-DecompositionAlgorithmNow,wehavethetoolstosearchfortheglobaloptimumdecompositionofautilitymatrixM.Therearefourareaswhereweshalldiscusstheoptions.1.PreprocessingofthematrixM2.InitializingUandV3.OrderingtheoptimizationoftheelementsofUandV4.Endingtheattemptatoptimization.PreprocessingBecausethedierencesinthequalityofitemsandtheratingscalesofusersaresuchimportantfactorsindeterminingthemissingelementsofthematrixM,itisoftenusefultoremovethesein\ruencesbeforedoinganythingelse.TheideawasintroducedinSection9.3.1.Wecansubtractfromeachnonblankelementmijtheaverageratingofuseri.Then,theresultingmatrixcanbemodiedbysubtractingtheaveragerating(inthemodiedmatrix)ofitemj.Itisalsopossibletorstsubtracttheaverageratingofitemjandthensubtracttheaverageratingofuseriinthemodiedmatrix.Theresultsoneobtainsfromdoingthingsinthesetwodierentordersneednotbethesame,butwilltendtobeclose.Athirdoptionistonormalizebysubtractingfrommijtheaverageoftheaverageratingofuserianditemj,thatis,subtractingonehalfthesumoftheuseraverageandtheitemaverage.IfwechoosetonormalizeM,thenwhenwemakepredictions,weneedtoundothenormalization.Thatis,ifwhateverpredictionmethodweuseresultsinestimateeforanelementmijofthenormalizedmatrix,thenthevaluewepredictformijinthetrueutilitymatrixisepluswhateveramountwassubtractedfromrowiandfromcolumnjduringthenormalizationprocess. 9.4.DIMENSIONALITYREDUCTION347InitializationAswementioned,itisessentialthattherebesomerandomnessinthewayweseekanoptimumsolution,becausetheexistenceofmanylocalminimajustiesourrunningmanydierentoptimizationsinthehopeofreachingtheglobalminimumonatleastonerun.WecanvarytheinitialvaluesofUandV,orwecanvarythewayweseektheoptimum(tobediscussednext),orboth.AsimplestartingpointforUandVistogiveeachelementthesamevalue,andagoodchoiceforthisvalueisthatwhichgivestheelementsoftheproductUVtheaverageofthenonblankelementsofM.NotethatifwehavenormalizedM,thenthisvaluewillnecessarilybe0.IfwehavechosendasthelengthsoftheshortsidesofUandV,andaistheaveragenonblankelementofM,thentheelementsofUandVshouldbep a=dIfwewantmanystartingpointsforUandV,thenwecanperturbthevaluep a=drandomlyandindependentlyforeachoftheelements.Therearemanyoptionsforhowwedotheperturbation.Wehaveachoiceregardingthedistributionofthedierence.Forexamplewecouldaddtoeachelementanormallydistributedvaluewithmean0andsomechosenstandarddeviation.Orwecouldaddavalueuniformlychosenfromtherange cto+cforsomecPerformingtheOptimizationInordertoreachalocalminimumfromagivenstartingvalueofUandV,weneedtopickanorderinwhichwevisittheelementsofUandV.Thesimplestthingtodoispickanorder,e.g.,row-by-row,fortheelementsofUandVandvisittheminround-robinfashion.Notethatjustbecauseweoptimizedanelementoncedoesnotmeanwecannotndabettervalueforthatelementafterotherelementshavebeenadjusted.Thus,weneedtovisitelementsrepeatedly,untilwehavereasontobelievethatnofurtherimprovementsarepossible.Alternatively,wecanfollowmanydierentoptimizationpathsfromasinglestartingvaluebyrandomlypickingtheelementtooptimize.Tomakesurethateveryelementisconsideredineachround,wecouldinsteadchooseapermuta-tionoftheelementsandfollowthatorderforeveryround.ConvergingtoaMinimumIdeally,atsomepointtheRMSEbecomes0,andweknowwecannotdobetter.Inpractice,sincetherearenormallymanymorenonblankelementsinMthanthereareelementsinUandVtogether,wehavenorighttoexpectthatwecanreducetheRMSEto0.Thus,wehavetodetectwhenthereislittlebenettobehadinrevisitingelementsofUand/orV.WecantracktheamountofimprovementintheRMSEobtainedinoneroundoftheoptimization,andstopwhenthatimprovementfallsbelowathreshold.Asmallvariationistoobservetheimprovementsresultingfromtheoptimizationofindividualelements,andstopwhenthemaximumimprovementduringaroundisbelowathreshold. 348CHAPTER9.RECOMMENDATIONSYSTEMS GradientDescentThetechniqueforndingaUV-decompositiondiscussedinSection9.4isanexampleofgradientdescent.Wearegivensomedatapoints{thenonblankelementsofthematrixM{andforeachdatapointwendthedirectionofchangethatmostdecreasestheerrorfunction:theRMSEbetweenthecurrentUVproductandM.WeshallhavemuchmoretosayaboutgradientdescentinSection12.3.4.ItisalsoworthnotingthatwhilewehavedescribedthemethodasvisitingeachnonblankpointofMseveraltimesuntilweapproachaminimum-errordecomposition,thatmaywellbetoomuchworkonalargematrixM.Thus,analternativeapproachhasuslookatonlyarandomlychosenfractionofthedatawhenseekingtominimizetheerror.Thisapproach,calledstochasticgradientdescentisdiscussedinSection12.3.5. AvoidingOverttingOneproblemthatoftenariseswhenperformingaUV-decompositionisthatwearriveatoneofthemanylocalminimathatconformwelltothegivendata,butpicksupvaluesinthedatathatdon'tre\rectwelltheunderlyingprocessthatgivesrisetothedata.Thatis,althoughtheRMSEmaybesmallonthegivendata,itdoesn'tdowellpredictingfuturedata.Thereareseveralthingsthatcanbedonetocopewiththisproblem,whichiscalledoverttingbystatisticians.1.Avoidfavoringtherstcomponentstobeoptimizedbyonlymovingthevalueofacomponentafractionoftheway,sayhalfway,fromitscurrentvaluetowarditsoptimizedvalue.2.StoprevisitingelementsofUandVwellbeforetheprocesshasconverged.3.TakeseveraldierentUVdecompositions,andwhenpredictinganewentryinthematrixM,taketheaverageoftheresultsofusingeachdecomposition.9.4.6ExercisesforSection9.4Exercise9.4.1:StartingwiththedecompositionofFig.9.10,wemaychooseanyofthe20entriesinUorVtooptimizerst.Performthisrstoptimizationstepassumingwechoose:(a)u32(b)v41Exercise9.4.2:Ifwewishtostartout,asinFig.9.10,withallUandVentriessettothesamevalue,whatvalueminimizestheRMSEforthematrixMofourrunningexample? 9.5.THENETFLIXCHALLENGE349Exercise9.4.3:StartingwiththeUandVmatricesinFig.9.16,dothefollowinginorder:(a)Reconsiderthevalueofu11.Finditsnewbestvalue,giventhechangesthathavebeenmadesofar.(b)Thenchoosethebestvalueforu52(c)Thenchoosethebestvalueforv22Exercise9.4.4:Derivetheformulafory(theoptimumvalueofelementvrsgivenattheendofSection9.4.4.Exercise9.4.5:NormalizethematrixMofourrunningexampleby:(a)Firstsubtractingfromeachelementtheaverageofitsrow,andthensubtractingfromeachelementtheaverageofits(modied)column(b)Firstsubtractingfromeachelementtheaverageofitscolumn,andthensubtractingfromeachelementtheaverageofits(modied)row.Arethereanydierencesintheresultsof(a)and(b)?9.5TheNet\rixChallengeAsignicantboosttoresearchintorecommendationsystemswasgivenwhenNet\rixoeredaprizeof$1,000,000totherstpersonorteamtobeattheirownrecommendationalgorithm,calledCineMatch,by10%.Afteroverthreeyearsofwork,theprizewasawardedinSeptember,2009.TheNet\rixchallengeconsistedofapublisheddataset,givingtheratingsbyapproximatelyhalfamillionuserson(typicallysmallsubsetsof)approximately17,000movies.Thisdatawasselectedfromalargerdataset,andproposedal-gorithmsweretestedontheirabilitytopredicttheratingsinasecretremainderofthelargerdataset.Theinformationforeach(user,movie)pairinthepub-lisheddatasetincludedarating(1{5stars)andthedateonwhichtheratingwasmade.TheRMSEwasusedtomeasuretheperformanceofalgorithms.CineMatchhasanRMSEofapproximately0.95;i.e.,thetypicalratingwouldbeobyalmostonefullstar.Towintheprize,itwasnecessarythatyouralgorithmhaveanRMSEthatwasatmost90%oftheRMSEofCineMatch.Thebibliographicnotesforthischapterincludereferencestodescriptionsofthewinningalgorithms.Here,wementionsomeinterestingandperhapsunintuitivefactsaboutthechallenge.CineMatchwasnotaverygoodalgorithm.Infact,itwasdiscoveredearlythattheobviousalgorithmofpredicting,fortheratingbyuseruonmoviem,theaverageof: 350CHAPTER9.RECOMMENDATIONSYSTEMS1.Theaverageratinggivenbyuonallratedmoviesand2.Theaverageoftheratingsformoviembyalluserswhoratedthatmovie.wasonly3%worsethanCineMatch.TheUV-decompositionalgorithmdescribedinSection9.4wasfoundbythreestudents(MichaelHarris,JereyWang,andDavidKamm)togivea7%improvementoverCineMatch,whencoupledwithnormalizationandafewothertricks.Thewinningentrywasactuallyacombinationofseveraldierentalgo-rithmsthathadbeendevelopedindependently.Asecondteam,whichsubmittedanentrythatwouldhavewon,haditbeensubmittedafewminutesearlier,alsowasablendofindependentalgorithms.Thisstrat-egy{combiningdierentalgorithms{hasbeenusedbeforeinanumberofhardproblemsandissomethingworthremembering.SeveralattemptshavebeenmadetousethedatacontainedinIMDB,theInternetmoviedatabase,tomatchthenamesofmoviesfromtheNet\rixchallengewiththeirnamesinIMDB,andthusextractusefulinformationnotcontainedintheNet\rixdataitself.IMDBhasinformationaboutactorsanddirectors,andclassiesmoviesintooneormoreof28genres.Itwasfoundthatgenreandotherinformationwasnotuseful.Onepos-siblereasonisthemachine-learningalgorithmswereabletodiscovertherelevantinformationanyway,andasecondisthattheentityresolutionproblemofmatchingmovienamesasgiveninNet\rixandIMDBdataisnotthateasytosolveexactly.Timeofratingturnedouttobeuseful.Itappearstherearemoviesthataremorelikelytobeappreciatedbypeoplewhorateitimmediatelyafterviewingthanbythosewhowaitawhileandthenrateit.\PatchAdams"wasgivenasanexampleofsuchamovie.Conversely,thereareothermoviesthatwerenotlikedbythosewhorateditimmediately,butwerebetterappreciatedafterawhile;\Memento"wascitedasanexample.Whileonecannotteaseoutofthedatainformationabouthowlongwasthedelaybetweenviewingandrating,itisgenerallysafetoassumethatmostpeopleseeamovieshortlyafteritcomesout.Thus,onecanexaminetheratingsofanymovietoseeifitsratingshaveanupwardordownwardslopewithtime.9.6SummaryofChapter9FUtilityMatrices:Recommendationsystemsdealwithusersanditems.Autilitymatrixoersknowninformationaboutthedegreetowhichauserlikesanitem.Normally,mostentriesareunknown,andtheessential 9.6.SUMMARYOFCHAPTER9351problemofrecommendingitemstousersispredictingthevaluesoftheunknownentriesbasedonthevaluesoftheknownentries.FTwoClassesofRecommendationSystems:Thesesystemsattempttopre-dictauser'sresponsetoanitembydiscoveringsimilaritemsandtheresponseoftheusertothose.Oneclassofrecommendationsystemiscontent-based;itmeasuressimilaritybylookingforcommonfeaturesoftheitems.Asecondclassofrecommendationsystemusescollaborativel-tering;thesemeasuresimilarityofusersbytheiritempreferencesand/ormeasuresimilarityofitemsbytheuserswholikethem.FItemProles:Theseconsistoffeaturesofitems.Dierentkindsofitemshavedierentfeaturesonwhichcontent-basedsimilaritycanbebased.Featuresofdocumentsaretypicallyimportantorunusualwords.Prod-uctshaveattributessuchasscreensizeforatelevision.Mediasuchasmovieshaveagenreanddetailssuchasactororperformer.TagscanalsobeusedasfeaturesiftheycanbeacquiredfrominterestedusersFUserProles:Acontent-basedcollaborativelteringsystemcancon-structprolesforusersbymeasuringthefrequencywithwhichfeaturesappearintheitemstheuserlikes.Wecanthenestimatethedegreetowhichauserwilllikeanitembytheclosenessoftheitem'sproletotheuser'sprole.FClassicationofItems:Analternativetoconstructingauserproleistobuildaclassierforeachuser,e.g.,adecisiontree.Therowoftheutilitymatrixforthatuserbecomesthetrainingdata,andtheclassiermustpredicttheresponseoftheusertoallitems,whetherornottherowhadanentryforthatitem.FSimilarityofRowsandColumnsoftheUtilityMatrix:Collaborativel-teringalgorithmsmustmeasurethesimilarityofrowsand/orcolumnsoftheutilitymatrix.Jaccarddistanceisappropriatewhenthematrixconsistsonlyof1'sandblanks(for\notrated").Cosinedistanceworksformoregeneralvaluesintheutilitymatrix.Itisoftenusefultonormal-izetheutilitymatrixbysubtractingtheaveragevalue(eitherbyrow,bycolumn,orboth)beforemeasuringthecosinedistance.FClusteringUsersandItems:Sincetheutilitymatrixtendstobemostlyblanks,distancemeasuressuchasJaccardorcosineoftenhavetoolittledatawithwhichtocomparetworowsortwocolumns.Apreliminarysteporsteps,inwhichsimilarityisusedtoclusterusersand/oritemsintosmallgroupswithstrongsimilarity,canhelpprovidemorecommoncomponentswithwhichtocomparerowsorcolumns.FUV-Decomposition:Onewayofpredictingtheblankvaluesinautilitymatrixistondtwolong,thinmatricesUandV,whoseproductisanapproximationtothegivenutilitymatrix.SincethematrixproductUV 352CHAPTER9.RECOMMENDATIONSYSTEMSgivesvaluesforalluser-itempairs,thatvaluecanbeusedtopredictthevalueofablankintheutilitymatrix.Theintuitivereasonthismethodmakessenseisthatoftentherearearelativelysmallnumberofissues(thatnumberisthe\thin"dimensionofUandV)thatdeterminewhetherornotauserlikesanitem.FRoot-Mean-SquareError:AgoodmeasureofhowclosetheproductUVistothegivenutilitymatrixistheRMSE(root-mean-squareerror).TheRMSEiscomputedbyaveragingthesquareofthedierencesbetweenUVandtheutilitymatrix,inthoseelementswheretheutilitymatrixisnonblank.ThesquarerootofthisaverageistheRMSE.FComputingUandV:OnewayofndingagoodchoiceforUandVinaUV-decompositionistostartwitharbitrarymatricesUandV.Repeat-edlyadjustoneoftheelementsofUorVtominimizetheRMSEbetweentheproductUVandthegivenutilitymatrix.Theprocessconvergestoalocaloptimum,althoughtohaveagoodchanceofobtainingaglobaloptimumwemusteitherrepeattheprocessfrommanystartingmatrices,orsearchfromthestartingpointinmanydierentways.FTheNet\rixChallenge:Animportantdriverofresearchintorecommenda-tionsystemswastheNet\rixchallenge.Aprizeof$1,000,000wasoeredforacontestantwhocouldproduceanalgorithmthatwas10%betterthanNet\rix'sownalgorithmatpredictingmovieratingsbyusers.TheprizewasawardedinSept.,2009.9.7ReferencesforChapter9[1]isasurveyofrecommendationsystemsasof2005.Theargumentregard-ingtheimportanceofthelongtailinon-linesystemsisfrom[2],whichwasexpandedintoabook[3].[8]discussestheuseofcomputergamestoextracttagsforitemsSee[5]foradiscussionofitem-itemsimilarityandhowAmazondesigneditscollaborative-lteringalgorithmforproductrecommendations.Therearethreepapersdescribingthethreealgorithmsthat,incombination,wontheNetFlixchallenge.Theyare[4],[6],and[7].1.G.AdomaviciusandA.Tuzhilin,\Towardsthenextgenerationofrec-ommendersystems:asurveyofthestate-of-the-artandpossibleexten-sions,"IEEETrans.onDataandKnowledgeEngineering17:6,pp.734{749,2005.2.C.Anderson,http://www.wired.com/wired/archive/12.10/tail.html 9.7.REFERENCESFORCHAPTER93532004.3.C.Anderson,TheLongTail:WhytheFutureofBusinessisSellingLessofMore,HyperionBooks,NewYork,2006.4.Y.Koren,\TheBellKorsolutiontotheNet\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC BellKor.pdf2009.5.G.Linden,B.Smith,andJ.York,\Amazon.comrecommendations:item-to-itemcollaborativeltering,"InternetComputing7:1,pp.76{80,2003.6.M.PiotteandM.Chabbert,"ThePragmaticTheorysolutiontotheNet-\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC PragmaticTheory.pdf2009.7.A.Toscher,M.Jahrer,andR.Bell,\TheBigChaossolutiontotheNet\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC BigChaos.pdf2009.8.L.vonAhn,\Gameswithapurpose,"IEEEComputerMagazine,pp.96{98,June2006.