/
Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationst Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationst

Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationst - PDF document

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
389 views
Uploaded On 2016-07-19

Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationst - PPT Presentation

308CHAPTER9RECOMMENDATIONSYSTEMSwhichexplainstheadvantageofonlinevendorsoverconventionalbrickandmortarvendorsWethenbrierysurveythesortsofapplicationsinwhichrecommendationsystemshaveproveduseful ID: 411617

308CHAPTER9.RECOMMENDATIONSYSTEMSwhichexplainstheadvantageofon-linevendorsoverconventional brick-and-mortarvendors.Wethenbrie\rysurveythesortsofapplicationsinwhichrecommendationsystemshaveproveduseful

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Chapter9RecommendationSystemsThereisanex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Chapter9RecommendationSystemsThereisanextensiveclassofWebapplicationsthatinvolvepredictinguserresponsestooptions.Suchafacilityiscalledarecommendationsystem.Weshallbeginthischapterwithasurveyofthemostimportantexamplesofthesesystems.However,tobringtheproblemintofocus,twogoodexamplesofrecommendationsystemsare:1.O eringnewsarticlestoon-linenewspaperreaders,basedonapredictionofreaderinterests.2.O eringcustomersofanon-lineretailersuggestionsaboutwhattheymightliketobuy,basedontheirpasthistoryofpurchasesand/orproductsearches.Recommendationsystemsuseanumberofdi erenttechnologies.Wecanclassifythesesystemsintotwobroadgroups.Content-basedsystemsexaminepropertiesoftheitemsrecommended.Forinstance,ifaNet\rixuserhaswatchedmanycowboymovies,thenrecom-mendamovieclassi edinthedatabaseashavingthe\cowboy"genreCollaborative lteringsystemsrecommenditemsbasedonsimilaritymea-suresbetweenusersand/oritems.Theitemsrecommendedtoauserarethosepreferredbysimilarusers.ThissortofrecommendationsystemcanusethegroundworklaidinChapter3onsimilaritysearchandChapter7onclustering.However,thesetechnologiesbythemselvesarenotsu-cient,andtherearesomenewalgorithmsthathaveprovene ectiveforrecommendationsystems.9.1AModelforRecommendationSystemsInthissectionweintroduceamodelforrecommendationsystems,basedonautilitymatrixofpreferences.Weintroducetheconceptofa\long-tail,"319 320CHAPTER9.RECOMMENDATIONSYSTEMSwhichexplainstheadvantageofon-linevendorsoverconventional,brick-and-mortarvendors.Wethenbrie\rysurveythesortsofapplicationsinwhichrecommendationsystemshaveproveduseful.9.1.1TheUtilityMatrixInarecommendation-systemapplicationtherearetwoclassesofentities,whichweshallrefertoasusersanditems.Usershavepreferencesforcertainitems,andthesepreferencesmustbeteasedoutofthedata.Thedataitselfisrepre-sentedasautilitymatrix,givingforeachuser-itempair,avaluethatrepresentswhatisknownaboutthedegreeofpreferenceofthatuserforthatitem.Valuescomefromanorderedset,e.g.,integers1{5representingthenumberofstarsthattheusergaveasaratingforthatitem.Weassumethatthematrixissparse,meaningthatmostentriesare\unknown."Anunknownratingimpliesthatwehavenoexplicitinformationabouttheuser'spreferencefortheitem.Example9.1:InFig.9.1weseeanexampleutilitymatrix,representingusers'ratingsofmoviesona1{5scale,with5thehighestrating.Blanksrepresentthesituationwheretheuserhasnotratedthemovie.ThemovienamesareHP1,HP2,andHP3forHarryPotterI,II,andIII,TWforTwilight,andSW1,SW2,andSW3forStarWarsepisodes1,2,and3.TheusersarerepresentedbycapitallettersAthroughD HP1HP2HP3TWSW1SW2SW3 A 451B 554C 245D 33Figure9.1:Autilitymatrixrepresentingratingsofmoviesona1{5scaleNoticethatmostuser-moviepairshaveblanks,meaningtheuserhasnotratedthemovie.Inpractice,thematrixwouldbeevensparser,withthetypicaluserratingonlyatinyfractionofallavailablemovies.2Thegoalofarecommendationsystemistopredicttheblanksintheutilitymatrix.Forexample,woulduserAlikeSW2?ThereislittleevidencefromthetinymatrixinFig.9.1.Wemightdesignourrecommendationsystemtotakeintoaccountpropertiesofmovies,suchastheirproducer,director,stars,oreventhesimilarityoftheirnames.Ifso,wemightthennotethesimilaritybetweenSW1andSW2,andthenconcludethatsinceAdidnotlikeSW1,theywereunlikelytoenjoySW2either.Alternatively,withmuchmoredata,wemightobservethatthepeoplewhoratedbothSW1andSW2tendedtogivethemsimilarratings.Thus,wecouldconcludethatAwouldalsogiveSW2alowrating,similartoA'sratingofSW1. 9.1.AMODELFORRECOMMENDATIONSYSTEMS321Weshouldalsobeawareofaslightlydi erentgoalthatmakessenseinmanyapplications.Itisnotnecessarytopredicteveryblankentryinautilitymatrix.Rather,itisonlynecessarytodiscoversomeentriesineachrowthatarelikelytobehigh.Inmostapplications,therecommendationsystemdoesnoto erusersarankingofallitems,butrathersuggestsafewthattheusershouldvaluehighly.Itmaynotevenbenecessaryto ndallitemswiththehighestexpectedratings,butonlyto ndalargesubsetofthosewiththehighestratings.9.1.2TheLongTailBeforediscussingtheprincipalapplicationsofrecommendationsystems,letusponderthelongtailphenomenonthatmakesrecommendationsystemsneces-sary.Physicaldeliverysystemsarecharacterizedbyascarcityofresources.Brick-and-mortarstoreshavelimitedshelfspace,andcanshowthecustomeronlyasmallfractionofallthechoicesthatexist.Ontheotherhand,on-linestorescanmakeanythingthatexistsavailabletothecustomer.Thus,aphysicalbookstoremayhaveseveralthousandbooksonitsshelves,butAmazono ersmillionsofbooks.Aphysicalnewspapercanprintseveraldozenarticlesperday,whileon-linenewsserviceso erthousandsperday.Recommendationinthephysicalworldisfairlysimple.First,itisnotpossibletotailorthestoretoeachindividualcustomer.Thus,thechoiceofwhatismadeavailableisgovernedonlybytheaggregatenumbers.Typically,abookstorewilldisplayonlythebooksthataremostpopular,andanewspaperwillprintonlythearticlesitbelievesthemostpeoplewillbeinterestedinInthe rstcase,sales guresgovernthechoices,inthesecondcase,editorialjudgementserves.Thedistinctionbetweenthephysicalandon-lineworldshasbeencalledthelongtailphenomenon,anditissuggestedinFig.9.2.Theverticalaxisrepresentspopularity(thenumberoftimesanitemischosen).Theitemsareorderedonthehorizontalaxisaccordingtotheirpopularity.Physicalinstitu-tionsprovideonlythemostpopularitemstotheleftoftheverticalline,whilethecorrespondingon-lineinstitutionsprovidetheentirerangeofitems:thetailaswellasthepopularitems.Thelong-tailphenomenonforceson-lineinstitutionstorecommenditemstoindividualusers.Itisnotpossibletopresentallavailableitemstotheuser,thewayphysicalinstitutionscan.Neithercanweexpectuserstohaveheardofeachoftheitemstheymightlike.9.1.3ApplicationsofRecommendationSystemsWehavementionedseveralimportantapplicationsofrecommendationsystems,buthereweshallconsolidatethelistinasingleplace.1.ProductRecommendations:Perhapsthemostimportantuseofrecom-mendationsystemsisaton-lineretailers.WehavenotedhowAmazonorsimilaron-linevendorsstrivetopresenteachreturninguserwithsome 322CHAPTER9.RECOMMENDATIONSYSTEMS The Long Tail Figure9.2:Thelongtail:physicalinstitutionscanonlyprovidewhatispopular,whileon-lineinstitutionscanmakeeverythingavailablesuggestionsofproductsthattheymightliketobuy.Thesesuggestionsarenotrandom,butarebasedonthepurchasingdecisionsmadebysimilarcustomersoronothertechniquesweshalldiscussinthischapter.2.MovieRecommendations:Net\rixo ersitscustomersrecommendationsofmoviestheymightlike.Theserecommendationsarebasedonratingsprovidedbyusers,muchliketheratingssuggestedintheexampleutilitymatrixofFig.9.1.Theimportanceofpredictingratingsaccuratelyissohigh,thatNet\rixo eredaprizeofonemilliondollarsforthe rstalgorithmthatcouldbeatitsownrecommendationsystemby10%.1Theprizewas nallywonin2009,byateamofresearcherscalled\Bellkor'sPragmaticChaos,"afteroverthreeyearsofcompetition.3.NewsArticles:Newsserviceshaveattemptedtoidentifyarticlesofin-teresttoreaders,basedonthearticlesthattheyhavereadinthepast.Thesimilaritymightbebasedonthesimilarityofimportantwordsinthedocuments,oronthearticlesthatarereadbypeoplewithsimilarreadingtastes.Thesameprinciplesapplytorecommendingblogsfromamongthemillionsofblogsavailable,videosonYouTube,orothersiteswherecontentisprovidedregularly. 1Tobeexact,thealgorithmhadtohavearoot-mean-squareerror(RMSE)thatwas10%lessthantheRMSEoftheNet\rixalgorithmonatestsettakenfromactualratingsofNet\rixusers.Todevelopanalgorithm,contestantsweregivenatrainingsetofdata,alsotakenfromactualNet\rixdata. 9.1.AMODELFORRECOMMENDATIONSYSTEMS323 IntoThinAirandTouchingtheVoidAnextremeexampleofhowthelongtail,togetherwithawelldesignedrecommendationsystemcanin\ruenceeventsisthestorytoldbyChrisAn-dersonaboutabookcalledTouchingtheVoid.Thismountain-climbingbookwasnotabigsellerinitsday,butmanyyearsafteritwaspub-lished,anotherbookonthesametopic,calledIntoThinAirwaspub-lished.Amazon'srecommendationsystemnoticedafewpeoplewhoboughtbothbooks,andstartedrecommendingTouchingtheVoidtopeo-plewhobought,orwereconsidering,IntoThinAir.Hadtherebeennoon-linebookseller,TouchingtheVoidmightneverhavebeenseenbypoten-tialbuyers,butintheon-lineworld,TouchingtheVoideventuallybecameverypopularinitsownright,infact,moresothanIntoThinAir 9.1.4PopulatingtheUtilityMatrixWithoutautilitymatrix,itisalmostimpossibletorecommenditems.However,acquiringdatafromwhichtobuildautilitymatrixisoftendicult.Therearetwogeneralapproachestodiscoveringthevalueusersplaceonitems.1.Wecanaskuserstorateitems.Movieratingsaregenerallyobtainedthisway,andsomeon-linestorestrytoobtainratingsfromtheirpurchasers.Sitesprovidingcontent,suchassomenewssitesorYouTubealsoaskuserstorateitems.Thisapproachislimitedinitse ectiveness,sincegenerallyusersareunwillingtoprovideresponses,andtheinformationfromthosewhodomaybebiasedbytheveryfactthatitcomesfrompeoplewillingtoprovideratings.2.Wecanmakeinferencesfromusers'behavior.Mostobviously,ifauserbuysaproductatAmazon,watchesamovieonYouTube,orreadsanewsarticle,thentheusercanbesaidto\like"thisitem.Notethatthissortofratingsystemreallyhasonlyonevalue:1meansthattheuserlikestheitem.Often,we ndautilitymatrixwiththiskindofdatashownwith0'sratherthanblankswheretheuserhasnotpurchasedorviewedtheitem.However,inthiscase0isnotalowerratingthan1;itisnoratingatall.Moregenerally,onecaninferinterestfrombehaviorotherthanpurchasing.Forexample,ifanAmazoncustomerviewsinformationaboutanitem,wecaninferthattheyareinterestedintheitem,eveniftheydon'tbuyit. 324CHAPTER9.RECOMMENDATIONSYSTEMS9.2Content-BasedRecommendationsAswementionedatthebeginningofthechapter,therearetwobasicarchitec-turesforarecommendationsystem:1.Content-Basedsystemsfocusonpropertiesofitems.Similarityofitemsisdeterminedbymeasuringthesimilarityintheirproperties.2.Collaborative-Filteringsystemsfocusontherelationshipbetweenusersanditems.Similarityofitemsisdeterminedbythesimilarityoftheratingsofthoseitemsbytheuserswhohaveratedbothitems.Inthissection,wefocusoncontent-basedrecommendationsystems.Thenextsectionwillcovercollaborative ltering.9.2.1ItemPro lesInacontent-basedsystem,wemustconstructforeachitemapro le,whichisarecordorcollectionofrecordsrepresentingimportantcharacteristicsofthatitem.Insimplecases,thepro leconsistsofsomecharacteristicsoftheitemthatareeasilydiscovered.Forexample,considerthefeaturesofamoviethatmightberelevanttoarecommendationsystem.1.Thesetofactorsofthemovie.Someviewersprefermovieswiththeirfavoriteactors.2.Thedirector.Someviewershaveapreferencefortheworkofcertaindirectors.3.Theyearinwhichthemoviewasmade.Someviewerspreferoldmovies;otherswatchonlythelatestreleases.4.Thegenreorgeneraltypeofmovie.Someviewerslikeonlycomedies,othersdramasorromances.Therearemanyotherfeaturesofmoviesthatcouldbeusedaswell.Exceptforthelast,genre,theinformationisreadilyavailablefromdescriptionsofmovies.Genreisavaguerconcept.However,moviereviewsgenerallyassignagenrefromasetofcommonlyusedterms.ForexampletheInternetMovieDatabase(IMDB)assignsagenreorgenrestoeverymovie.WeshalldiscussmechanicalconstructionofgenresinSection9.3.3.Manyotherclassesofitemsalsoallowustoobtainfeaturesfromavailabledata,evenifthatdatamustatsomepointbeenteredbyhand.Forinstance,productsoftenhavedescriptionswrittenbythemanufacturer,givingfeaturesrelevanttothatclassofproduct(e.g.,thescreensizeandcabinetcolorforaTV).Bookshavedescriptionssimilartothoseformovies,sowecanobtainfeaturessuchasauthor,yearofpublication,andgenre.MusicproductssuchasCD'sandMP3downloadshaveavailablefeaturessuchasartist,composer,andgenre. 9.2.CONTENT-BASEDRECOMMENDATIONS3259.2.2DiscoveringFeaturesofDocumentsThereareotherclassesofitemswhereitisnotimmediatelyapparentwhatthevaluesoffeaturesshouldbe.Weshallconsidertwoofthem:documentcollec-tionsandimages.Documentspresentspecialproblems,andweshalldiscussthetechnologyforextractingfeaturesfromdocumentsinthissection.ImageswillbediscussedinSection9.2.3asanimportantexamplewhereuser-suppliedfeatureshavesomehopeofsuccess.Therearemanykindsofdocumentsforwhicharecommendationsystemcanbeuseful.Forexample,therearemanynewsarticlespublishedeachday,andwecannotreadallofthem.Arecommendationsystemcansuggestarticlesontopicsauserisinterestedin,buthowcanwedistinguishamongtopics?Webpagesarealsoacollectionofdocuments.Canwesuggestpagesausermightwanttosee?Likewise,blogscouldberecommendedtointerestedusers,ifwecouldclassifyblogsbytopics.Unfortunately,theseclassesofdocumentsdonottendtohavereadilyavail-ableinformationgivingfeatures.Asubstitutethathasbeenusefulinpracticeistheidenti cationofwordsthatcharacterizethetopicofadocument.Howwedotheidenti cationwasoutlinedinSection1.3.1.First,eliminatestopwords{theseveralhundredmostcommonwords,whichtendtosaylittleaboutthetopicofadocument.Fortheremainingwords,computetheTF.IDFscoreforeachwordinthedocument.Theoneswiththehighestscoresarethewordsthatcharacterizethedocument.WemaythentakeasthefeaturesofadocumentthenwordswiththehighestTF.IDFscores.Itispossibletopickntobethesameforalldocuments,ortoletnbea xedpercentageofthewordsinthedocument.WecouldalsochoosetomakeallwordswhoseTF.IDFscoresareaboveagiventhresholdtobeapartofthefeatureset.Now,documentsarerepresentedbysetsofwords.Intuitively,weexpectthesewordstoexpressthesubjectsormainideasofthedocument.Forexample,inanewsarticle,wewouldexpectthewordswiththehighestTF.IDFscoretoincludethenamesofpeoplediscussedinthearticle,unusualpropertiesoftheeventdescribed,andthelocationoftheevent.Tomeasurethesimilarityoftwodocuments,thereareseveralnaturaldistancemeasureswecanuse:1.WecouldusetheJaccarddistancebetweenthesetsofwords(recallSec-tion3.5.3).2.Wecouldusethecosinedistance(recallSection3.5.4)betweenthesets,treatedasvectors.Tocomputethecosinedistanceinoption(2),thinkofthesetsofhigh-TF.IDFwordsasavector,withonecomponentforeachpossibleword.Thevectorhas1ifthewordisinthesetand0ifnot.Sincebetweentwodocu-mentsthereareonlya nitenumberofwordsamongtheirtwosets,thein nitedimensionalityofthevectorsisunimportant.Almostallcomponentsare0in 326CHAPTER9.RECOMMENDATIONSYSTEMS TwoKindsofDocumentSimilarityRecallthatinSection3.4wegaveamethodfor ndingdocumentsthatwere\similar,"usingshingling,minhashing,andLSH.There,thenotionofsimilaritywaslexical{documentsaresimilariftheycontainlarge,identicalsequencesofcharacters.Forrecommendationsystems,thenotionofsimilarityisdi erent.Weareinterestedonlyintheoccurrencesofmanyimportantwordsinbothdocuments,evenifthereislittlelexicalsimilaritybetweenthedocuments.However,themethodologyfor ndingsimilardocumentsremainsalmostthesame.Oncewehaveadistancemeasure,eitherJaccardorcosine,wecanuseminhashing(forJaccard)orrandomhyperplanes(forcosinedistance;seeSection3.7.2)feedingdatatoanLSHalgorithmto ndthepairsofdocumentsthataresimilarinthesenseofsharingmanycommonkeywords. both,and0'sdonotimpactthevalueofthedotproduct.Tobeprecise,thedotproductisthesizeoftheintersectionofthetwosetsofwords,andthelengthsofthevectorsarethesquarerootsofthenumbersofwordsineachset.Thatcalculationletsuscomputethecosineoftheanglebetweenthevectorsasthedotproductdividedbytheproductofthevectorlengths.9.2.3ObtainingItemFeaturesFromTagsLetusconsideradatabaseofimagesasanexampleofawaythatfeatureshavebeenobtainedforitems.Theproblemwithimagesisthattheirdata,typicallyanarrayofpixels,doesnottellusanythingusefulabouttheirfeatures.Wecancalculatesimplepropertiesofpixels,suchastheaverageamountofredinthepicture,butfewusersarelookingforredpicturesorespeciallylikeredpictures.Therehavebeenanumberofattemptstoobtaininformationaboutfeaturesofitemsbyinvitinguserstotagtheitemsbyenteringwordsorphrasesthatdescribetheitem.Thus,onepicturewithalotofredmightbetagged\Tianan-menSquare,"whileanotheristagged\sunsetatMalibu."Thedistinctionisnotsomethingthatcouldbediscoveredbyexistingimage-analysisprograms.Almostanykindofdatacanhaveitsfeaturesdescribedbytags.Oneoftheearliestattemptstotagmassiveamountsofdatawasthesitedel.icio.us,laterboughtbyYahoo!,whichinviteduserstotagWebpages.Thegoalofthistaggingwastomakeanewmethodofsearchavailable,whereusersenteredasetoftagsastheirsearchquery,andthesystemretrievedtheWebpagesthathadbeentaggedthatway.However,itisalsopossibletousethetagsasarecommendationsystem.Ifitisobservedthatauserretrievesorbookmarksmanypageswithacertainsetoftags,thenwecanrecommendotherpageswiththesametags.Theproblemwithtaggingasanapproachtofeaturediscoveryisthatthe 9.2.CONTENT-BASEDRECOMMENDATIONS327 TagsfromComputerGamesAninterestingdirectionforencouragingtaggingisthe\games"approachpioneeredbyLuisvonAhn.Heenabledtwoplayerstocollaborateonthetagforanimage.Inrounds,theywouldsuggestatag,andthetagswouldbeexchanged.Iftheyagreed,thenthey\won,"andifnot,theywouldplayanotherroundwiththesameimage,tryingtoagreesimultaneouslyonatag.Whileaninnovativedirectiontotry,itisquestionablewhethersucientpublicinterestcanbegeneratedtoproduceenoughfreeworktosatisfytheneedsfortaggeddata. processonlyworksifusersarewillingtotakethetroubletocreatethetags,andthereareenoughtagsthatoccasionalerroneousoneswillnotbiasthesystemtoomuch.9.2.4RepresentingItemPro lesOurultimategoalforcontent-basedrecommendationistocreatebothanitempro leconsistingoffeature-valuepairsandauserpro lesummarizingthepref-erencesoftheuser,basedoftheirrowoftheutilitymatrix.InSection9.2.2wesuggestedhowanitempro lecouldbeconstructed.Weimaginedavectorof0'sand1's,wherea1representedtheoccurrenceofahigh-TF.IDFwordinthedocument.Sincefeaturesfordocumentswereallwords,itwaseasytorepresentpro lesthisway.Weshalltrytogeneralizethisvectorapproachtoallsortsoffeatures.Itiseasytodosoforfeaturesthataresetsofdiscretevalues.Forexample,ifonefeatureofmoviesisthesetofactors,thenimaginethatthereisacomponentforeachactor,with1iftheactorisinthemovie,and0ifnot.Likewise,wecanhaveacomponentforeachpossibledirector,andeachpossiblegenre.Allthesefeaturescanberepresentedusingonly0'sand1's.ThereisanotherclassoffeaturesthatisnotreadilyrepresentedbyBooleanvectors:thosefeaturesthatarenumerical.Forinstance,wemighttaketheaverageratingformoviestobeafeature,2andthisaverageisarealnumber.Itdoesnotmakesensetohaveonecomponentforeachofthepossibleaverageratings,anddoingsowouldcauseustolosethestructureimplicitinnumbers.Thatis,tworatingsthatareclosebutnotidenticalshouldbeconsideredmoresimilarthanwidelydi eringratings.Likewise,numericalfeaturesofproducts,suchasscreensizeordiskcapacityforPC's,shouldbeconsideredsimilariftheirvaluesdonotdi ergreatly.Numericalfeaturesshouldberepresentedbysinglecomponentsofvectorsrepresentingitems.Thesecomponentsholdtheexactvalueofthatfeature. 2Theratingisnotaveryreliablefeature,butitwillserveasanexample. 328CHAPTER9.RECOMMENDATIONSYSTEMSThereisnoharmifsomecomponentsofthevectorsareBooleanandothersarereal-valuedorinteger-valued.Wecanstillcomputethecosinedistancebetweenvectors,althoughifwedoso,weshouldgivesomethoughttotheappropri-atescalingofthenonBooleancomponents,sothattheyneitherdominatethecalculationnoraretheyirrelevant.Example9.2:Supposetheonlyfeaturesofmoviesarethesetofactorsandtheaveragerating.Considertwomovieswith veactorseach.Twooftheactorsareinbothmovies.Also,onemoviehasanaverageratingof3andtheotheranaverageof4.Thevectorslooksomethinglike011011013 110101104 However,thereareinprincipleanin nitenumberofadditionalcomponents,eachwith0'sforbothvectors,representingallthepossibleactorsthatneithermoviehas.Sincecosinedistanceofvectorsisnota ectedbycomponentsinwhichbothvectorshave0,weneednotworryaboutthee ectofactorsthatareinneithermovie.Thelastcomponentshownrepresentstheaveragerating.Wehaveshownitashavinganunknownscalingfactor .Intermsof ,wecancomputethecosineoftheanglebetweenthevectors.Thedotproductis2+12 2,andthelengthsofthevectorsarep 5+9 2andp 5+16 2.Thus,thecosineoftheanglebetweenthevectorsis2+12 2 p 25+125 2+144 4Ifwechoose =1,thatis,wetaketheaverageratingsastheyare,thenthevalueoftheaboveexpressionis0.816.Ifweuse =2,thatis,wedoubletheratings,thenthecosineis0.940.Thatis,thevectorsappearmuchcloserindirectionthanifweuse =1.Likewise,ifweuse =1=2,thenthecosineis0.619,makingthevectorslookquitedi erent.Wecannottellwhichvalueof is\right,"butweseethatthechoiceofscalingfactorfornumericalfeaturesa ectsourdecisionabouthowsimilaritemsare.29.2.5UserPro lesWenotonlyneedtocreatevectorsdescribingitems;weneedtocreatevectorswiththesamecomponentsthatdescribetheuser'spreferences.Wehavetheutilitymatrixrepresentingtheconnectionbetweenusersanditems.Recallthenonblankmatrixentriescouldbejust1'srepresentinguserpurchasesorasimilarconnection,ortheycouldbearbitrarynumbersrepresentingaratingordegreeofa ectionthattheuserhasfortheitem.Withthisinformation,thebestestimatewecanmakeregardingwhichitemstheuserlikesissomeaggregationofthepro lesofthoseitems.Iftheutilitymatrixhasonly1's,thenthenaturalaggregateistheaverageofthecomponents 9.2.CONTENT-BASEDRECOMMENDATIONS329ofthevectorsrepresentingtheitempro lesfortheitemsinwhichtheutilitymatrixhas1forthatuser.Example9.3:Supposeitemsaremovies,representedbyBooleanpro leswithcomponentscorrespondingtoactors.Also,theutilitymatrixhasa1iftheuserhasseenthemovieandisblankotherwise.If20%ofthemoviesthatuserUlikeshaveJuliaRobertsasoneoftheactors,thentheuserpro leforUwillhave0.2inthecomponentforJuliaRoberts.2IftheutilitymatrixisnotBoolean,e.g.,ratings1{5,thenwecanweightthevectorsrepresentingthepro lesofitemsbytheutilityvalue.Itmakessensetonormalizetheutilitiesbysubtractingtheaveragevalueforauser.Thatway,wegetnegativeweightsforitemswithabelow-averagerating,andpositiveweightsforitemswithabove-averageratings.Thate ectwillproveusefulwhenwediscussinSection9.2.6howto nditemsthatausershouldlike.Example9.4:ConsiderthesamemovieinformationasinExample9.3,butnowsupposetheutilitymatrixhasnonblankentriesthatareratingsinthe1{5range.SupposeuserUgivesanaverageratingof3.TherearethreemovieswithJuliaRobertsasanactor,andthosemoviesgotratingsof3,4,and5.Thenintheuserpro leofU,thecomponentforJuliaRobertswillhavevaluethatistheaverageof33,43,and53,thatis,avalueof1.Ontheotherhand,userVgivesanaverageratingof4,andhasalsoratedthreemovieswithJuliaRoberts(itdoesn'tmatterwhetherornottheyarethesamethreemoviesUrated).UserVgivesthesethreemoviesratingsof2,3,and5.Theuserpro leforVhas,inthecomponentforJuliaRoberts,theaverageof24,34,and54,thatis,thevalue2=3.29.2.6RecommendingItemstoUsersBasedonContentWithpro levectorsforbothusersanditems,wecanestimatethedegreetowhichauserwouldpreferanitembycomputingthecosinedistancebetweentheuser'sanditem'svectors.AsinExample9.2,wemaywishtoscalevar-iouscomponentswhosevaluesarenotBoolean.Therandom-hyperplaneandlocality-sensitive-hashingtechniquescanbeusedtoplace(just)itempro lesinbuckets.Inthatway,givenausertowhomwewanttorecommendsomeitems,wecanapplythesametwotechniques{randomhyperplanesandLSH{todetermineinwhichbucketswemustlookforitemsthatmighthaveasmallcosinedistancefromtheuser.Example9.5:Consider rstthedataofExample9.3.Theuser'spro lewillhavecomponentsforactorsproportionaltothelikelihoodthattheactorwillappearinamovietheuserlikes.Thus,thehighestrecommendations(lowestcosinedistance)belongtothemovieswithlotsofactorsthatappearinmany 330CHAPTER9.RECOMMENDATIONSYSTEMSofthemoviestheuserlikes.Aslongasactorsaretheonlyinformationwehaveaboutfeaturesofmovies,thatisprobablythebestwecando.3Now,considerExample9.4.There,weobservedthatthevectorforauserwillhavepositivenumbersforactorsthattendtoappearinmoviestheuserlikesandnegativenumbersforactorsthattendtoappearinmoviestheuserdoesn'tlike.Consideramoviewithmanyactorstheuserlikes,andonlyafewornonethattheuserdoesn'tlike.Thecosineoftheanglebetweentheuser'sandmovie'svectorswillbealargepositivefraction.Thatimpliesananglecloseto0,andthereforeasmallcosinedistancebetweenthevectors.Next,consideramoviewithaboutasmanyactorsthattheuserlikesasthosetheuserdoesn'tlike.Inthissituation,thecosineoftheanglebetweentheuserandmovieisaround0,andthereforetheanglebetweenthetwovectorsisaround90degrees.Finally,consideramoviewithmostlyactorstheuserdoesn'tlike.Inthatcase,thecosinewillbealargenegativefraction,andtheanglebetweenthetwovectorswillbecloseto180degrees{themaximumpossiblecosinedistance.29.2.7Classi cationAlgorithmsAcompletelydi erentapproachtoarecommendationsystemusingitempro lesandutilitymatricesistotreattheproblemasoneofmachinelearning.Regardthegivendataasatrainingset,andforeachuser,buildaclassi erthatpredictstheratingofallitems.Thereareagreatnumberofdi erentclassi ers,anditisnotourpurposetoteachthissubjecthere.However,youshouldbeawareoftheoptionofdevelopingaclassi erforrecommendation,soweshalldiscussonecommonclassi er{decisiontrees{brie\ry.Adecisiontreeisacollectionofnodes,arrangedasabinarytree.Theleavesrenderdecisions;inourcase,thedecisionwouldbe\likes"or\doesn'tlike."Eachinteriornodeisaconditionontheobjectsbeingclassi ed;inourcasetheconditionwouldbeapredicateinvolvingoneormorefeaturesofanitem.Toclassifyanitem,westartattheroot,andapplythepredicateattheroottotheitem.Ifthepredicateistrue,gototheleftchild,andifitisfalse,gototherightchild.Thenrepeatthesameprocessatthenodevisited,untilaleafisreached.Thatleafclassi estheitemaslikedornot.Constructionofadecisiontreerequiresselectionofapredicateforeachinteriornode.Therearemanywaysofpickingthebestpredicate,buttheyalltrytoarrangethatoneofthechildrengetsallormostofthepositiveexamplesinthetrainingset(i.e,theitemsthatthegivenuserlikes,inourcase)andtheotherchildgetsallormostofthenegativeexamples(theitemsthisuserdoesnotlike). 3Notethatthefactalluser-vectorcomponentswillbesmallfractionsdoesnota ecttherecommendation,sincethecosinecalculationinvolvesdividingbythelengthofeachvector.Thatis,uservectorswilltendtobemuchshorterthanmovievectors,butonlythedirectionofvectorsmatters. 9.2.CONTENT-BASEDRECOMMENDATIONS331OncewehaveselectedapredicateforanodeN,wedividetheitemsintothetwogroups:thosethatsatisfythepredicateandthosethatdonot.Foreachgroup,weagain ndthepredicatethatbestseparatesthepositiveandnegativeexamplesinthatgroup.ThesepredicatesareassignedtothechildrenofN.Thisprocessofdividingtheexamplesandbuildingchildrencanproceedtoanynumberoflevels.Wecanstop,andcreatealeaf,ifthegroupofitemsforanodeishomogeneous;i.e.,theyareallpositiveorallnegativeexamples.However,wemaywishtostopandcreatealeafwiththemajoritydecisionforagroup,evenifthegroupcontainsbothpositiveandnegativeexamples.Thereasonisthatthestatisticalsigni canceofasmallgroupmaynotbehighenoughtorelyon.Forthatreasonavariantstrategyistocreateanensembleofdecisiontrees,eachusingdi erentpredicates,butallowthetreestobedeeperthanwhattheavailabledatajusti es.Suchtreesarecalledover tted.Toclassifyanitem,applyallthetreesintheensemble,andletthemvoteontheoutcome.Weshallnotconsiderthisoptionhere,butgiveasimplehypotheticalexampleofadecisiontree.Example9.6:Supposeouritemsarenewsarticles,andfeaturesarethehigh-TF.IDFwords(keywords)inthosedocuments.FurthersupposethereisauserUwholikesarticlesaboutbaseball,exceptarticlesabouttheNewYorkYankees.TherowoftheutilitymatrixforUhas1ifUhasreadthearticleandisblankifnot.Weshalltakethe1'sas\like"andtheblanksas\doesn'tlike."PredicateswillbeBooleanexpressionsofkeywords.SinceUgenerallylikesbaseball,wemight ndthatthebestpredicatefortherootis\homerun"OR(\batter"AND\pitcher").Itemsthatsatisfythepredicatewilltendtobepositiveexamples(articleswith1intherowforUintheutilitymatrix),anditemsthatfailtosatisfythepredicatewilltendtobenegativeexamples(blanksintheutility-matrixrowforU).Figure9.3showstherootaswellastherestofthedecisiontree.Supposethatthegroupofarticlesthatdonotsatisfythepredicateincludessucientlyfewpositiveexamplesthatwecanconcludealloftheseitemsareinthe\don't-like"class.Wemaythenputaleafwithdecision\don'tlike"astherightchildoftheroot.However,thearticlesthatsatisfythepredicateincludesanumberofarticlesthatuserUdoesn'tlike;thesearethearticlesthatmentiontheYankees.Thus,attheleftchildoftheroot,webuildanotherpredicate.Wemight ndthatthepredicate\Yankees"OR\Jeter"OR\Teixeira"isthebestpossibleindicatorofanarticleaboutbaseballandabouttheYankees.Thus,weseeinFig.9.3theleftchildoftheroot,whichappliesthispredicate.Bothchildrenofthisnodeareleaves,sincewemaysupposethattheitemssatisfyingthispredicatearepredominantlynegativeandthosenotsatisfyingitarepredominantlypositive.2Unfortunately,classi ersofalltypestendtotakealongtimetoconstruct.Forinstance,ifwewishtousedecisiontrees,weneedonetreeperuser.Con-structingatreenotonlyrequiresthatwelookatalltheitempro les,butwe 332CHAPTER9.RECOMMENDATIONSYSTEMS "homerun"OR Figure9.3:Adecisiontreehavetoconsidermanydi erentpredicates,whichcouldinvolvecomplexcom-binationsoffeatures.Thus,thisapproachtendstobeusedonlyforrelativelysmallproblemsizes.9.2.8ExercisesforSection9.2Exercise9.2.1:Threecomputers,AB,andC,havethenumericalfeatureslistedbelow:Feature ABC ProcessorSpeed 3.062.682.92DiskSize 500320640Main-MemorySize 646Wemayimaginethesevaluesasde ningavectorforeachcomputer;forin-stance,A'svectoris[3065006].Wecancomputethecosinedistancebetweenanytwoofthevectors,butifwedonotscalethecomponents,thenthedisksizewilldominateandmakedi erencesintheothercomponentsessentiallyin-visible.Letususe1asthescalefactorforprocessorspeed, forthedisksize,and forthemainmemorysize.(a)Intermsof and ,computethecosinesoftheanglesbetweenthevectorsforeachpairofthethreecomputers.(b)Whataretheanglesbetweenthevectorsif = =1?(c)Whataretheanglesbetweenthevectorsif =001and =05? 9.3.COLLABORATIVEFILTERING333!(d)Onefairwayofselectingscalefactorsistomakeeachinverselypropor-tionaltotheaveragevalueinitscomponent.Whatwouldbethevaluesof and ,andwhatwouldbetheanglesbetweenthevectors?Exercise9.2.2:Analternativewayofscalingcomponentsofavectoristobeginbynormalizingthevectors.Thatis,computetheaverageforeachcom-ponentandsubtractitfromthatcomponent'svalueineachofthevectors.(a)NormalizethevectorsforthethreecomputersdescribedinExercise9.2.1.!!(b)Thisquestiondoesnotrequiredicultcalculation,butitrequiressomeseriousthoughtaboutwhatanglesbetweenvectorsmean.Whenallcom-ponentsarenonnegative,astheyareinthedataofExercise9.2.1,novectorscanhaveananglegreaterthan90degrees.However,whenwenormalizevectors,wecan(andmust)getsomenegativecomponents,sotheanglescannowbeanything,thatis,0to180degrees.Moreover,averagesarenow0ineverycomponent,sothesuggestioninpart(d)ofExercise9.2.1thatweshouldscaleininverseproportiontotheaveragemakesnosense.Suggestawayof ndinganappropriatescaleforeachcomponentofnormalizedvectors.Howwouldyouinterpretalargeorsmallanglebetweennormalizedvectors?WhatwouldtheanglesbeforthenormalizedvectorsderivedfromthedatainExercise9.2.1?Exercise9.2.3:AcertainuserhasratedthethreecomputersofExercise9.2.1asfollows:A:4stars,B:2stars,C:5stars.(a)Normalizetheratingsforthisuser.(b)Computeauserpro lefortheuser,withcomponentsforprocessorspeed,disksize,andmainmemorysize,basedonthedataofExercise9.2.1.9.3CollaborativeFilteringWeshallnowtakeupasigni cantlydi erentapproachtorecommendation.Insteadofusingfeaturesofitemstodeterminetheirsimilarity,wefocusonthesimilarityoftheuserratingsfortwoitems.Thatis,inplaceoftheitem-pro levectorforanitem,weuseitscolumnintheutilitymatrix.Further,insteadofcontrivingapro levectorforusers,werepresentthembytheirrowsintheutilitymatrix.UsersaresimilariftheirvectorsarecloseaccordingtosomedistancemeasuresuchasJaccardorcosinedistance.RecommendationforauserUisthenmadebylookingattheusersthataremostsimilartoUinthissense,andrecommendingitemsthattheseuserslike.Theprocessofidentifyingsimilarusersandrecommendingwhatsimilaruserslikeiscalledcollaborative ltering 334CHAPTER9.RECOMMENDATIONSYSTEMS9.3.1MeasuringSimilarityThe rstquestionwemustdealwithishowtomeasuresimilarityofusersoritemsfromtheirrowsorcolumnsintheutilitymatrix.WehavereproducedFig.9.1hereasFig.9.4.Thisdataistoosmalltodrawanyreliableconclusions,butitssmallsizewillmakeclearsomeofthepitfallsinpickingadistancemeasure.Observespeci callytheusersAandC.Theyratedtwomoviesincommon,buttheyappeartohavealmostdiametricallyoppositeopinionsofthesemovies.Wewouldexpectthatagooddistancemeasurewouldmakethemratherfarapart.Herearesomealternativemeasurestoconsider. HP1HP2HP3TWSW1SW2SW3 A 451B 554C 245D 33Figure9.4:TheutilitymatrixintroducedinFig.9.1JaccardDistanceWecouldignorevaluesinthematrixandfocusonlyonthesetsofitemsrated.Iftheutilitymatrixonlyre\rectedpurchases,thismeasurewouldbeagoodonetochoose.However,whenutilitiesaremoredetailedratings,theJaccarddistancelosesimportantinformation.Example9.7:AandBhaveanintersectionofsize1andaunionofsize5.Thus,theirJaccardsimilarityis1/5,andtheirJaccarddistanceis4/5;i.e.,theyareveryfarapart.Incomparison,AandChaveaJaccardsimilarityof2/4,sotheirJaccarddistanceisthesame,1/2.Thus,AappearsclosertoCthantoB.Yetthatconclusionseemsintuitivelywrong.AandCdisagreeonthetwomoviestheybothwatched,whileAandBseembothtohavelikedtheonemovietheywatchedincommon.2CosineDistanceWecantreatblanksasa0value.Thischoiceisquestionable,sinceithasthee ectoftreatingthelackofaratingasmoresimilartodislikingthemoviethanlikingit.Example9.8:ThecosineoftheanglebetweenAandBis45 p 42+52+12p 52+52+42=0380 9.3.COLLABORATIVEFILTERING335ThecosineoftheanglebetweenAandCis52+14 p 42+52+12p 22+42+52=0322Sincealarger(positive)cosineimpliesasmallerangleandthereforeasmallerdistance,thismeasuretellsusthatAisslightlyclosertoBthantoC2RoundingtheDataWecouldtrytoeliminatetheapparentsimilaritybetweenmoviesauserrateshighlyandthosewithlowscoresbyroundingtheratings.Forinstance,wecouldconsiderratingsof3,4,and5asa\1"andconsiderratings1and2asunrated.TheutilitymatrixwouldthenlookasinFig.9.5.Now,theJaccarddistancebetweenAandBis3/4,whilebetweenAandCitis1;i.e.,CappearsfurtherfromAthanBdoes,whichisintuitivelycorrect.ApplyingcosinedistancetoFig.9.5allowsustodrawthesameconclusion. HP1HP2HP3TWSW1SW2SW3 A 11B 111C 11D 11Figure9.5:Utilitiesof3,4,and5havebeenreplacedby1,whileratingsof1and2areomittedNormalizingRatingsIfwenormalizeratings,bysubtractingfromeachratingtheaverageratingofthatuser,weturnlowratingsintonegativenumbersandhighratingsintopositivenumbers.Ifwethentakethecosinedistance,we ndthatuserswithoppositeviewsofthemoviestheyviewedincommonwillhavevectorsinalmostoppositedirections,andcanbeconsideredasfarapartaspossible.However,userswithsimilaropinionsaboutthemoviesratedincommonwillhavearelativelysmallanglebetweenthem.Example9.9:Figure9.6showsthematrixofFig.9.4withallratingsnor-malized.Aninterestinge ectisthatD'sratingshavee ectivelydisappeared,becausea0isthesameasablankwhencosinedistanceiscomputed.NotethatDgaveonly3'sanddidnotdi erentiateamongmovies,soitisquitepossiblethatD'sopinionsarenotworthtakingseriously.LetuscomputethecosineoftheanglebetweenAandB(2=3)(1=3) p (2=3)2+(5=3)2+(7=3)2p (1=3)2+(1=3)2+(2=3)2=0092 336CHAPTER9.RECOMMENDATIONSYSTEMS HP1HP2HP3TWSW1SW2SW3 A 2/35/37=3B 1/31/32=3C 5=31/34/3D 00Figure9.6:TheutilitymatrixintroducedinFig.9.1ThecosineoftheanglebetweenbetweenAandCis(5=3)(5=3)+(7=3)(1=3) p (2=3)2+(5=3)2+(7=3)2p (5=3)2+(1=3)2+(4=3)2=0559Noticethatunderthismeasure,AandCaremuchfurtherapartthanAandB,andneitherpairisveryclose.Boththeseobservationsmakeintuitivesense,giventhatAandCdisagreeonthetwomoviestheyratedincommon,whileAandBgivesimilarscorestotheonemovietheyratedincommon.29.3.2TheDualityofSimilarityTheutilitymatrixcanbeviewedastellingusaboutusersoraboutitems,orboth.ItisimportanttorealizethatanyofthetechniqueswesuggestedinSection9.3.1for ndingsimilaruserscanbeusedoncolumnsoftheutilitymatrixto ndsimilaritems.Therearetwowaysinwhichthesymmetryisbrokeninpractice.1.Wecanuseinformationaboutuserstorecommenditems.Thatis,givenauser,wecan ndsomenumberofthemostsimilarusers,perhapsusingthetechniquesofChapter3.Wecanbaseourrecommendationonthedecisionsmadebythesesimilarusers,e.g.,recommendtheitemsthatthegreatestnumberofthemhavepurchasedorratedhighly.However,thereisnosymmetry.Evenifwe ndpairsofsimilaritems,weneedtotakeanadditionalstepinordertorecommenditemstousers.Thispointisexploredfurtherattheendofthissubsection.2.Thereisadi erenceinthetypicalbehaviorofusersanditems,asitpertainstosimilarity.Intuitively,itemstendtobeclassi ableinsimpleterms.Forexample,musictendstobelongtoasinglegenre.Itisimpossi-ble,e.g.,forapieceofmusictobeboth60'srockand1700'sbaroque.Ontheotherhand,thereareindividualswholikeboth60'srockand1700'sbaroque,andwhobuyexamplesofbothtypesofmusic.Theconsequenceisthatitiseasiertodiscoveritemsthataresimilarbecausetheybelongtothesamegenre,thanitistodetectthattwousersaresimilarbecausetheypreferonegenreincommon,whileeachalsolikessomegenresthattheotherdoesn'tcarefor. 9.3.COLLABORATIVEFILTERING337Aswesuggestedin(1)above,onewayofpredictingthevalueoftheutility-matrixentryforuserUanditemIisto ndthenusers(forsomepredeterminedn)mostsimilartoUandaveragetheirratingsforitemI,countingonlythoseamongthensimilaruserswhohaveratedI.Itisgenerallybettertonormalizethematrix rst.Thatis,foreachofthenuserssubtracttheiraverageratingforitemsfromtheirratingfori.Averagethedi erenceforthoseuserswhohaveratedI,andthenaddthisaveragetotheaverageratingthatUgivesforallitems.ThisnormalizationadjuststheestimateinthecasethatUtendstogiveveryhighorverylowratings,oralargefractionofthesimilaruserswhoratedI(ofwhichtheremaybeonlyafew)areuserswhotendtorateveryhighorverylow.Dually,wecanuseitemsimilaritytoestimatetheentryforuserUanditemI.FindthemitemsmostsimilartoI,forsomem,andtaketheaveragerating,amongthemitems,oftheratingsthatUhasgiven.Asforuser-usersimilarity,weconsideronlythoseitemsamongthemthatUhasrated,anditisprobablywisetonormalizeitemratings rst.Notethatwhicheverapproachtoestimatingentriesintheutilitymatrixweuse,itisnotsucientto ndonlyoneentry.InordertorecommenditemstoauserU,weneedtoestimateeveryentryintherowoftheutilitymatrixforU,oratleast ndallormostoftheentriesinthatrowthatareblankbuthaveahighestimatedvalue.Thereisatradeo regardingwhetherweshouldworkfromsimilarusersorsimilaritems.Ifwe ndsimilarusers,thenweonlyhavetodotheprocessonceforuserU.FromthesetofsimilaruserswecanestimatealltheblanksintheutilitymatrixforU.Ifweworkfromsimilaritems,wehavetocomputesimilaritemsforalmostallitems,beforewecanestimatetherowforUOntheotherhand,item-itemsimilarityoftenprovidesmorereliablein-formation,becauseofthephenomenonobservedabove,namelythatitiseasierto nditemsofthesamegenrethanitisto ndusersthatlikeonlyitemsofasinglegenre.Whichevermethodwechoose,weshouldprecomputepreferreditemsforeachuser,ratherthanwaitinguntilweneedtomakeadecision.Sincetheutilitymatrixevolvesslowly,itisgenerallysucienttocomputeitinfrequentlyandassumethatitremains xedbetweenrecomputations.9.3.3ClusteringUsersandItemsItishardtodetectsimilarityamongeitheritemsorusers,becausewehavelittleinformationaboutuser-itempairsinthesparseutilitymatrix.IntheperspectiveofSection9.3.2,eveniftwoitemsbelongtothesamegenre,therearelikelytobeveryfewuserswhoboughtorratedboth.Likewise,eveniftwousersbothlikeagenreorgenres,theymaynothaveboughtanyitemsincommon. 338CHAPTER9.RECOMMENDATIONSYSTEMSOnewayofdealingwiththispitfallistoclusteritemsand/orusers.SelectanyofthedistancemeasuressuggestedinSection9.3.1,oranyotherdistancemeasure,anduseittoperformaclusteringof,say,items.AnyofthemethodssuggestedinChapter7canbeused.However,weshallseethattheremaybelittlereasontotrytoclusterintoasmallnumberofclustersimmediately.Rather,ahierarchicalapproach,whereweleavemanyclustersunmergedmaysuceasa rststep.Forexample,wemightleavehalfasmanyclustersasthereareitems. HPTWSW A 451B 4.67C 24.5D 33Figure9.7:UtilitymatrixforusersandclustersofitemsExample9.10:Figure9.7showswhathappenstotheutilitymatrixofFig.9.4ifwemanagetoclusterthethreeHarry-Pottermoviesintoonecluster,denotedHP,andalsoclusterthethreeStar-WarsmoviesintooneclusterSW2Havingclustereditemstoanextent,wecanrevisetheutilitymatrixsothecolumnsrepresentclustersofitems,andtheentryforuserUandclusterCistheaverageratingthatUgavetothosemembersofclusterCthatUdidrate.NotethatUmayhaveratednoneoftheclustermembers,inwhichcasetheentryforUandCisstillblank.Wecanusethisrevisedutilitymatrixtoclusterusers,againusingthedis-tancemeasureweconsidermostappropriate.Useaclusteringalgorithmthatagainleavesmanyclusters,e.g.,halfasmanyclustersasthereareusers.Re-visetheutilitymatrix,sotherowscorrespondtoclustersofusers,justasthecolumnscorrespondtoclustersofitems.Asforitem-clusters,computetheentryforauserclusterbyaveragingtheratingsoftheusersinthecluster.Now,thisprocesscanberepeatedseveraltimesifwelike.Thatis,wecanclustertheitemclustersandagainmergethecolumnsoftheutilitymatrixthatbelongtoonecluster.Wecanthenturntotheusersagain,andclustertheuserclusters.Theprocesscanrepeatuntilwehaveanintuitivelyreasonablenumberofclustersofeachkind.Oncewehaveclusteredtheusersand/oritemstothedesiredextentandcomputedthecluster-clusterutilitymatrix,wecanestimateentriesintheorig-inalutilitymatrixasfollows.SupposewewanttopredicttheentryforuserUanditemI(a)FindtheclusterstowhichUandIbelong,sayclustersCandD,respec-tively. 9.3.COLLABORATIVEFILTERING339(b)Iftheentryinthecluster-clusterutilitymatrixforCandDissomethingotherthanblank,usethisvalueastheestimatedvaluefortheU{Ientryintheoriginalutilitymatrix.(c)IftheentryforC{Disblank,thenusethemethodoutlinedinSection9.3.2toestimatethatentrybyconsideringclusterssimilartoCorD.UsetheresultingestimateastheestimatefortheU-Ientry.9.3.4ExercisesforSection9.3 abcdefgh A 455132B 343121C 213453Figure9.8:AutilitymatrixforexercisesExercise9.3.1:Figure9.8isautilitymatrix,representingtheratings,ona1{5starscale,ofeightitems,athroughh,bythreeusersAB,andC.Computethefollowingfromthedataofthismatrix.(a)Treatingtheutilitymatrixasboolean,computetheJaccarddistancebe-tweeneachpairofusers.(b)RepeatPart(a),butusethecosinedistance.(c)Treatratingsof3,4,and5as1and1,2,andblankas0.ComputetheJaccarddistancebetweeneachpairofusers.(d)RepeatPart(c),butusethecosinedistance.(e)Normalizethematrixbysubtractingfromeachnonblankentrytheaveragevalueforitsuser.(f)UsingthenormalizedmatrixfromPart(e),computethecosinedistancebetweeneachpairofusers.Exercise9.3.2:Inthisexercise,weclusteritemsinthematrixofFig.9.8.Dothefollowingsteps.(a)Clustertheeightitemshierarchicallyintofourclusters.Thefollowingmethodshouldbeusedtocluster.Replaceall3's,4's,and5'sby1andreplace1's,2's,andblanksby0.usetheJaccarddistancetomeasurethedistancebetweentheresultingcolumnvectors.Forclustersofmorethanoneelement,takethedistancebetweenclusterstobetheminimumdistancebetweenpairsofelements,onefromeachcluster. 340CHAPTER9.RECOMMENDATIONSYSTEMS(b)Then,constructfromtheoriginalmatrixofFig.9.8anewmatrixwhoserowscorrespondtousers,asbefore,andwhosecolumnscorrespondtoclusters.Computetheentryforauserandclusterofitemsbyaveragingthenonblankentriesforthatuserandalltheitemsinthecluster.(c)Computethecosinedistancebetweeneachpairofusers,accordingtoyourmatrixfromPart(b).9.4DimensionalityReductionAnentirelydi erentapproachtoestimatingtheblankentriesintheutilitymatrixistoconjecturethattheutilitymatrixisactuallytheproductoftwolong,thinmatrices.Thisviewmakessenseiftherearearelativelysmallsetoffeaturesofitemsandusersthatdeterminethereactionofmostuserstomostitems.Inthissection,wesketchoneapproachtodiscoveringtwosuchmatrices;theapproachiscalled\UV-decomposition,"anditisaninstanceofamoregeneraltheorycalledSVD(singular-valuedecomposition).9.4.1UV-DecompositionConsidermoviesasacaseinpoint.Mostusersrespondtoasmallnumberoffeatures;theylikecertaingenres,theymayhavecertainfamousactorsoractressesthattheylike,andperhapsthereareafewdirectorswithasigni cantfollowing.IfwestartwiththeutilitymatrixM,withnrowsandmcolumns(i.e.,therearenusersandmitems),thenwemightbeableto ndamatrixUwithnrowsanddcolumnsandamatrixVwithdrowsandmcolumns,suchthatUVcloselyapproximatesMinthoseentrieswhereMisnonblank.Ifso,thenwehaveestablishedthatthereareddimensionsthatallowustocharacterizebothusersanditemsclosely.WecanthenusetheentryintheproductUVtoestimatethecorrespondingblankentryinutilitymatrixMThisprocessiscalledUV-decompositionofM26666452443312412314254354454377775=266664u11u12u21u22u31u32u41u42u51u52377775v11v12v13v14v15v21v22v23v24v25Figure9.9:UV-decompositionofmatrixMExample9.11:Weshalluseasarunningexamplea5-by-5matrixMwithallbuttwoofitsentriesknown.WewishtodecomposeMintoa5-by-2and2-by-5matrix,UandV,respectively.ThematricesMU,andVareshowninFig.9.9withtheknownentriesofMindicatedandthematricesUand 9.4.DIMENSIONALITYREDUCTION341Vshownwiththeirentriesasvariablestobedetermined.ThisexampleisessentiallythesmallestnontrivialcasewheretherearemoreknownentriesthanthereareentriesinUandVcombined,andwethereforecanexpectthatthebestdecompositionwillnotyieldaproductthatagreesexactlyinthenonblankentriesofM29.4.2Root-Mean-SquareErrorWhilewecanpickamongseveralmeasuresofhowclosetheproductUVistoM,thetypicalchoiceistheroot-mean-squareerror(RMSE),wherewe1.Sum,overallnonblankentriesinMthesquareofthedi erencebetweenthatentryandthecorrespondingentryintheproductUV2.Takethemean(average)ofthesesquaresbydividingbythenumberoftermsinthesum(i.e.,thenumberofnonblankentriesinM).3.Takethesquarerootofthemean.Minimizingthesumofthesquaresisthesameasminimizingthesquarerootoftheaveragesquare,sowegenerallyomitthelasttwostepsinourrunningexample.26666411111111113777751111111111=2666642222222222222222222222222377775Figure9.10:MatricesUandVwithallentries1Example9.12:SupposeweguessthatUandVshouldeachhaveentriesthatareall1's,asshowninFig.9.10.Thisisapoorguess,sincetheproduct,consistingofall2's,hasentriesthataremuchbelowtheaverageoftheentriesinM.Nonetheless,wecancomputetheRMSEforthisUandV;infacttheregularityintheentriesmakesthecalculationespeciallyeasytofollow.Considerthe rstrowsofMandUV.Wesubtract2(eachentryinUV)fromtheentriesinthe rstrowofM,toget30221.Wesquareandsumthesetoget18.Inthesecondrow,wedothesametoget11021,squareandsumtoget7.Inthethirdrow,thesecondcolumnisblank,sothatentryisignoredwhencomputingtheRMSE.Thedi erencesare0112andthesumofsquaresis6.Forthefourthrow,thedi erencesare03213andthesumofsquaresis23.The fthrowhasablankentryinthelastcolumn,sothedi erencesare2232andthesumofsquaresis21.Whenwesumthesumsfromeachofthe verows,weget18+7+6+23+21=75.Generally,weshall 342CHAPTER9.RECOMMENDATIONSYSTEMSstopatthispoint,butifwewanttocomputethetrueRMSE,wedivideby23(thenumberofnonblankentriesinM)andtakethesquareroot.Inthiscasep 75=23=1806istheRMSE.29.4.3IncrementalComputationofaUV-DecompositionFindingtheUV-decompositionwiththeleastRMSEinvolvesstartingwithsomearbitrarilychosenUandV,andrepeatedlyadjustingUandVtomaketheRMSEsmaller.WeshallconsideronlyadjustmentstoasingleelementofUorV,althoughinprinciple,onecouldmakemorecomplexadjustments.Whateveradjustmentsweallow,inatypicalexampletherewillbemanylo-calminima{matricesUandVsuchthatnoallowableadjustmentreducestheRMSE.Unfortunately,onlyoneoftheselocalminimawillbetheglobalminimum{thematricesUandVthatproducetheleastpossibleRMSE.Toincreaseourchancesof ndingtheglobalminimum,weneedtopickmanydif-ferentstartingpoints,thatis,di erentchoicesoftheinitialmatricesUandVHowever,thereisneveraguaranteethatourbestlocalminimumwillbetheglobalminimum.WeshallstartwiththeUandVofFig.9.10,whereallentriesare1,anddoafewadjustmentstosomeoftheentries, ndingthevaluesofthoseentriesthatgivethelargestpossibleimprovementtotheRMSE.Fromtheseexamples,thegeneralcalculationshouldbecomeobvious,butweshallfollowtheexamplesbytheformulaforminimizingtheRMSEbychangingasingleentry.Inwhatfollows,weshallrefertoentriesofUandVbytheirvariablenamesu11,andsoon,asgiveninFig.9.9.Example9.13:SupposewestartwithUandVasinFig.9.10,andwedecidetoalteru11toreducetheRMSEasmuchaspossible.Letthevalueofu11bex.ThenthenewUandVcanbeexpressedasinFig.9.11.266664x1111111113777751111111111=266664x+1x+1x+1x+1x+122222222222222222222377775Figure9.11:Makingu11avariableNoticethattheonlyentriesoftheproductthathavechangedarethoseinthe rstrow.Thus,whenwecompareUVwithM,theonlychangetotheRMSEcomesfromthe rstrow.Thecontributiontothesumofsquaresfromthe rstrowis5(x+1)2+2(x+1)2+4(x+1)2+4(x+1)2+3(x+1)2 9.4.DIMENSIONALITYREDUCTION343Thissumsimpli esto(4x)2+(1x)2+(3x)2+(3x)2+(2x)2Wewantthevalueofxthatminimizesthesum,sowetakethederivativeandsetthatequalto0,as:2(4x)+(1x)+(3x)+(3x)+(2x)=0or2(135x)=0,fromwhichitfollowsthatx=26.266664261111111113777751111111111=266664363636363622222222222222222222377775Figure9.12:Thebestvalueforu11isfoundtobe2.6Figure9.12showsUandVafteru11hasbeensetto2.6.Notethatthesumofthesquaresoftheerrorsinthe rstrowhasbeenreducedfrom18to5.2,sothetotalRMSE(ignoringaverageandsquareroot)hasbeenreducedfrom75to62.2.26666426111111111377775y111111111=26666426y+136363636y+12222y+12222y+12222y+12222377775Figure9.13:v11becomesavariableySupposeournextentrytovaryisv11.Letthevalueofthisentrybey,assuggestedinFig.9.13.Onlythe rstcolumnoftheproductisa ectedbyy,soweneedonlytocomputethesumofthesquaresofthedi erencesbetweentheentriesinthe rstcolumnsofMandUV.Thissumis5(26y+1)2+3(y+1)2+2(y+1)2+2(y+1)2+4(y+1)2Thisexpressionsimpli esto(426y)2+(2y)2+(1y)2+(1y)2+(3y)2Asbefore,we ndtheminimumvalueofthisexpressionbydi erentiatingandequatingto0,as:226(426y)+(2y)+(1y)+(1y)+(3y)=0 344CHAPTER9.RECOMMENDATIONSYSTEMS266664261111111113777751617111111111=26666452043636363626172222261722222617222226172222377775Figure9.14:Replaceyby1.617Thesolutionforyisy=174=1076=1617.TheimprovedestimatesofUandVareshowninFig.9.14.Weshalldoonemorechange,toillustratewhathappenswhenentriesofMareblank.Weshallvaryu31,callingitztemporarily.ThenewUandVareshowninFig.9.15.Thevalueofza ectsonlytheentriesinthethirdrow.2666642:6111z111113777751:617111111111=2666645:2043:63:63:63:62:61722221:617z+1z+1z+1z+1z+12:61722222:6172222377775Figure9.15:u31becomesavariablezWecanexpressthesumofthesquaresoftheerrorsas2(1617z+1)2+3(z+1)2+1(z+1)2+4(z+1)2Notethatthereisnocontributionfromtheelementinthesecondcolumnofthethirdrow,sincethiselementisblankinM.Theexpressionsimpli esto(11617z)2+(2z)2+(z)2+(3z)2Theusualprocessofsettingthederivativeto0givesus21617(11617z)+(2z)+(z)+(3z)=0whosesolutionisz=6617=5615=1178.Thenextestimateofthedecompo-sitionUVisshowninFig.9.16.29.4.4OptimizinganArbitraryElementHavingseensomeexamplesofpickingtheoptimumvalueforasingleelementinthematrixUorV,letusnowdevelopthegeneralformula.Asbefore,assume 9.4.DIMENSIONALITYREDUCTION3452666642:61111:178111113777751:617111111111=2666645:2043:63:63:63:62:61722222:9052:1782:1782:1782:1782:61722222:6172222377775Figure9.16:Replacezby1.178thatMisann-by-mutilitymatrixwithsomeentriesblank,whileUandVarematricesofdimensionsn-by-dandd-by-m,forsomed.Weshallusemijuij,andvijfortheentriesinrowiandcolumnjofMU,andV,respectively.Also,letP=UV,andusepijfortheelementinrowiandcolumnjoftheproductmatrixPSupposewewanttovaryursand ndthevalueofthiselementthatmini-mizestheRMSEbetweenMandUV.Notethatursa ectsonlytheelementsinrowroftheproductP=UV.Thus,weneedonlyconcernourselveswiththeelementsprj=dXk=1urkvkj=Xk=surkvkj+xvsjforallvaluesofjsuchthatmrjisnonblank.Intheexpressionabove,wehavereplacedurs,theelementwewishtovary,byavariablex,andweusetheconventionPk=sisshorthandforthesumfork=12;:::;d,exceptfork=sIfmrjisanonblankentryofthematrixM,thenthecontributionofthiselementtothesumofthesquaresoftheerrorsis(mrjprj)2=mrjXk=surkvkjxvsj2Weshalluseanotherconvention:Pjisshorthandforthesumoveralljsuchthatmrjisnonblank.Thenwecanwritethesumofthesquaresoftheerrorsthatarea ectedbythevalueofx=ursasXjmrjXk=surkvkjxvsj2Takethederivativeoftheabovewithrespecttox,andsetitequalto0,inorderto ndthevalueofxthatminimizestheRMSE.Thatis,Xj2vsjmrjXk=surkvkjxvsj=0 346CHAPTER9.RECOMMENDATIONSYSTEMSAsinthepreviousexamples,thecommonfactor2canbedropped.Wesolvetheaboveequationforx,andgetx=PjvsjmrjPk=surkvkj Pjv2sjThereisananalogousformulafortheoptimumvalueofanelementofV.Ifwewanttovaryvrs=y,thenthevalueofythatminimizestheRMSEisy=PiuirmisPk=ruikvks Piu2irHere,Piisshorthandforthesumoverallisuchthatmisisnonblank,andPk=risthesumoverallvaluesofkbetween1andd,exceptfork=r9.4.5BuildingaCompleteUV-DecompositionAlgorithmNow,wehavethetoolstosearchfortheglobaloptimumdecompositionofautilitymatrixM.Therearefourareaswhereweshalldiscusstheoptions.1.PreprocessingofthematrixM2.InitializingUandV3.OrderingtheoptimizationoftheelementsofUandV4.Endingtheattemptatoptimization.PreprocessingBecausethedi erencesinthequalityofitemsandtheratingscalesofusersaresuchimportantfactorsindeterminingthemissingelementsofthematrixM,itisoftenusefultoremovethesein\ruencesbeforedoinganythingelse.TheideawasintroducedinSection9.3.1.Wecansubtractfromeachnonblankelementmijtheaverageratingofuseri.Then,theresultingmatrixcanbemodi edbysubtractingtheaveragerating(inthemodi edmatrix)ofitemj.Itisalsopossibleto rstsubtracttheaverageratingofitemjandthensubtracttheaverageratingofuseriinthemodi edmatrix.Theresultsoneobtainsfromdoingthingsinthesetwodi erentordersneednotbethesame,butwilltendtobeclose.Athirdoptionistonormalizebysubtractingfrommijtheaverageoftheaverageratingofuserianditemj,thatis,subtractingonehalfthesumoftheuseraverageandtheitemaverage.IfwechoosetonormalizeM,thenwhenwemakepredictions,weneedtoundothenormalization.Thatis,ifwhateverpredictionmethodweuseresultsinestimateeforanelementmijofthenormalizedmatrix,thenthevaluewepredictformijinthetrueutilitymatrixisepluswhateveramountwassubtractedfromrowiandfromcolumnjduringthenormalizationprocess. 9.4.DIMENSIONALITYREDUCTION347InitializationAswementioned,itisessentialthattherebesomerandomnessinthewayweseekanoptimumsolution,becausetheexistenceofmanylocalminimajusti esourrunningmanydi erentoptimizationsinthehopeofreachingtheglobalminimumonatleastonerun.WecanvarytheinitialvaluesofUandV,orwecanvarythewayweseektheoptimum(tobediscussednext),orboth.AsimplestartingpointforUandVistogiveeachelementthesamevalue,andagoodchoiceforthisvalueisthatwhichgivestheelementsoftheproductUVtheaverageofthenonblankelementsofM.NotethatifwehavenormalizedM,thenthisvaluewillnecessarilybe0.IfwehavechosendasthelengthsoftheshortsidesofUandV,andaistheaveragenonblankelementofM,thentheelementsofUandVshouldbep a=dIfwewantmanystartingpointsforUandV,thenwecanperturbthevaluep a=drandomlyandindependentlyforeachoftheelements.Therearemanyoptionsforhowwedotheperturbation.Wehaveachoiceregardingthedistributionofthedi erence.Forexamplewecouldaddtoeachelementanormallydistributedvaluewithmean0andsomechosenstandarddeviation.Orwecouldaddavalueuniformlychosenfromtherangecto+cforsomecPerformingtheOptimizationInordertoreachalocalminimumfromagivenstartingvalueofUandV,weneedtopickanorderinwhichwevisittheelementsofUandV.Thesimplestthingtodoispickanorder,e.g.,row-by-row,fortheelementsofUandVandvisittheminround-robinfashion.Notethatjustbecauseweoptimizedanelementoncedoesnotmeanwecannot ndabettervalueforthatelementafterotherelementshavebeenadjusted.Thus,weneedtovisitelementsrepeatedly,untilwehavereasontobelievethatnofurtherimprovementsarepossible.Alternatively,wecanfollowmanydi erentoptimizationpathsfromasinglestartingvaluebyrandomlypickingtheelementtooptimize.Tomakesurethateveryelementisconsideredineachround,wecouldinsteadchooseapermuta-tionoftheelementsandfollowthatorderforeveryround.ConvergingtoaMinimumIdeally,atsomepointtheRMSEbecomes0,andweknowwecannotdobetter.Inpractice,sincetherearenormallymanymorenonblankelementsinMthanthereareelementsinUandVtogether,wehavenorighttoexpectthatwecanreducetheRMSEto0.Thus,wehavetodetectwhenthereislittlebene ttobehadinrevisitingelementsofUand/orV.WecantracktheamountofimprovementintheRMSEobtainedinoneroundoftheoptimization,andstopwhenthatimprovementfallsbelowathreshold.Asmallvariationistoobservetheimprovementsresultingfromtheoptimizationofindividualelements,andstopwhenthemaximumimprovementduringaroundisbelowathreshold. 348CHAPTER9.RECOMMENDATIONSYSTEMS GradientDescentThetechniquefor ndingaUV-decompositiondiscussedinSection9.4isanexampleofgradientdescent.Wearegivensomedatapoints{thenonblankelementsofthematrixM{andforeachdatapointwe ndthedirectionofchangethatmostdecreasestheerrorfunction:theRMSEbetweenthecurrentUVproductandM.WeshallhavemuchmoretosayaboutgradientdescentinSection12.3.4.ItisalsoworthnotingthatwhilewehavedescribedthemethodasvisitingeachnonblankpointofMseveraltimesuntilweapproachaminimum-errordecomposition,thatmaywellbetoomuchworkonalargematrixM.Thus,analternativeapproachhasuslookatonlyarandomlychosenfractionofthedatawhenseekingtominimizetheerror.Thisapproach,calledstochasticgradientdescentisdiscussedinSection12.3.5. AvoidingOver ttingOneproblemthatoftenariseswhenperformingaUV-decompositionisthatwearriveatoneofthemanylocalminimathatconformwelltothegivendata,butpicksupvaluesinthedatathatdon'tre\rectwelltheunderlyingprocessthatgivesrisetothedata.Thatis,althoughtheRMSEmaybesmallonthegivendata,itdoesn'tdowellpredictingfuturedata.Thereareseveralthingsthatcanbedonetocopewiththisproblem,whichiscalledover ttingbystatisticians.1.Avoidfavoringthe rstcomponentstobeoptimizedbyonlymovingthevalueofacomponentafractionoftheway,sayhalfway,fromitscurrentvaluetowarditsoptimizedvalue.2.StoprevisitingelementsofUandVwellbeforetheprocesshasconverged.3.Takeseveraldi erentUVdecompositions,andwhenpredictinganewentryinthematrixM,taketheaverageoftheresultsofusingeachdecomposition.9.4.6ExercisesforSection9.4Exercise9.4.1:StartingwiththedecompositionofFig.9.10,wemaychooseanyofthe20entriesinUorVtooptimize rst.Performthis rstoptimizationstepassumingwechoose:(a)u32(b)v41Exercise9.4.2:Ifwewishtostartout,asinFig.9.10,withallUandVentriessettothesamevalue,whatvalueminimizestheRMSEforthematrixMofourrunningexample? 9.5.THENETFLIXCHALLENGE349Exercise9.4.3:StartingwiththeUandVmatricesinFig.9.16,dothefollowinginorder:(a)Reconsiderthevalueofu11.Finditsnewbestvalue,giventhechangesthathavebeenmadesofar.(b)Thenchoosethebestvalueforu52(c)Thenchoosethebestvalueforv22Exercise9.4.4:Derivetheformulafory(theoptimumvalueofelementvrsgivenattheendofSection9.4.4.Exercise9.4.5:NormalizethematrixMofourrunningexampleby:(a)Firstsubtractingfromeachelementtheaverageofitsrow,andthensubtractingfromeachelementtheaverageofits(modi ed)column(b)Firstsubtractingfromeachelementtheaverageofitscolumn,andthensubtractingfromeachelementtheaverageofits(modi ed)row.Arethereanydi erencesintheresultsof(a)and(b)?9.5TheNet\rixChallengeAsigni cantboosttoresearchintorecommendationsystemswasgivenwhenNet\rixo eredaprizeof$1,000,000tothe rstpersonorteamtobeattheirownrecommendationalgorithm,calledCineMatch,by10%.Afteroverthreeyearsofwork,theprizewasawardedinSeptember,2009.TheNet\rixchallengeconsistedofapublisheddataset,givingtheratingsbyapproximatelyhalfamillionuserson(typicallysmallsubsetsof)approximately17,000movies.Thisdatawasselectedfromalargerdataset,andproposedal-gorithmsweretestedontheirabilitytopredicttheratingsinasecretremainderofthelargerdataset.Theinformationforeach(user,movie)pairinthepub-lisheddatasetincludedarating(1{5stars)andthedateonwhichtheratingwasmade.TheRMSEwasusedtomeasuretheperformanceofalgorithms.CineMatchhasanRMSEofapproximately0.95;i.e.,thetypicalratingwouldbeo byalmostonefullstar.Towintheprize,itwasnecessarythatyouralgorithmhaveanRMSEthatwasatmost90%oftheRMSEofCineMatch.Thebibliographicnotesforthischapterincludereferencestodescriptionsofthewinningalgorithms.Here,wementionsomeinterestingandperhapsunintuitivefactsaboutthechallenge.CineMatchwasnotaverygoodalgorithm.Infact,itwasdiscoveredearlythattheobviousalgorithmofpredicting,fortheratingbyuseruonmoviem,theaverageof: 350CHAPTER9.RECOMMENDATIONSYSTEMS1.Theaverageratinggivenbyuonallratedmoviesand2.Theaverageoftheratingsformoviembyalluserswhoratedthatmovie.wasonly3%worsethanCineMatch.TheUV-decompositionalgorithmdescribedinSection9.4wasfoundbythreestudents(MichaelHarris,Je reyWang,andDavidKamm)togivea7%improvementoverCineMatch,whencoupledwithnormalizationandafewothertricks.Thewinningentrywasactuallyacombinationofseveraldi erentalgo-rithmsthathadbeendevelopedindependently.Asecondteam,whichsubmittedanentrythatwouldhavewon,haditbeensubmittedafewminutesearlier,alsowasablendofindependentalgorithms.Thisstrat-egy{combiningdi erentalgorithms{hasbeenusedbeforeinanumberofhardproblemsandissomethingworthremembering.SeveralattemptshavebeenmadetousethedatacontainedinIMDB,theInternetmoviedatabase,tomatchthenamesofmoviesfromtheNet\rixchallengewiththeirnamesinIMDB,andthusextractusefulinformationnotcontainedintheNet\rixdataitself.IMDBhasinformationaboutactorsanddirectors,andclassi esmoviesintooneormoreof28genres.Itwasfoundthatgenreandotherinformationwasnotuseful.Onepos-siblereasonisthemachine-learningalgorithmswereabletodiscovertherelevantinformationanyway,andasecondisthattheentityresolutionproblemofmatchingmovienamesasgiveninNet\rixandIMDBdataisnotthateasytosolveexactly.Timeofratingturnedouttobeuseful.Itappearstherearemoviesthataremorelikelytobeappreciatedbypeoplewhorateitimmediatelyafterviewingthanbythosewhowaitawhileandthenrateit.\PatchAdams"wasgivenasanexampleofsuchamovie.Conversely,thereareothermoviesthatwerenotlikedbythosewhorateditimmediately,butwerebetterappreciatedafterawhile;\Memento"wascitedasanexample.Whileonecannotteaseoutofthedatainformationabouthowlongwasthedelaybetweenviewingandrating,itisgenerallysafetoassumethatmostpeopleseeamovieshortlyafteritcomesout.Thus,onecanexaminetheratingsofanymovietoseeifitsratingshaveanupwardordownwardslopewithtime.9.6SummaryofChapter9FUtilityMatrices:Recommendationsystemsdealwithusersanditems.Autilitymatrixo ersknowninformationaboutthedegreetowhichauserlikesanitem.Normally,mostentriesareunknown,andtheessential 9.6.SUMMARYOFCHAPTER9351problemofrecommendingitemstousersispredictingthevaluesoftheunknownentriesbasedonthevaluesoftheknownentries.FTwoClassesofRecommendationSystems:Thesesystemsattempttopre-dictauser'sresponsetoanitembydiscoveringsimilaritemsandtheresponseoftheusertothose.Oneclassofrecommendationsystemiscontent-based;itmeasuressimilaritybylookingforcommonfeaturesoftheitems.Asecondclassofrecommendationsystemusescollaborative l-tering;thesemeasuresimilarityofusersbytheiritempreferencesand/ormeasuresimilarityofitemsbytheuserswholikethem.FItemPro les:Theseconsistoffeaturesofitems.Di erentkindsofitemshavedi erentfeaturesonwhichcontent-basedsimilaritycanbebased.Featuresofdocumentsaretypicallyimportantorunusualwords.Prod-uctshaveattributessuchasscreensizeforatelevision.Mediasuchasmovieshaveagenreanddetailssuchasactororperformer.TagscanalsobeusedasfeaturesiftheycanbeacquiredfrominterestedusersFUserPro les:Acontent-basedcollaborative lteringsystemcancon-structpro lesforusersbymeasuringthefrequencywithwhichfeaturesappearintheitemstheuserlikes.Wecanthenestimatethedegreetowhichauserwilllikeanitembytheclosenessoftheitem'spro letotheuser'spro le.FClassi cationofItems:Analternativetoconstructingauserpro leistobuildaclassi erforeachuser,e.g.,adecisiontree.Therowoftheutilitymatrixforthatuserbecomesthetrainingdata,andtheclassi ermustpredicttheresponseoftheusertoallitems,whetherornottherowhadanentryforthatitem.FSimilarityofRowsandColumnsoftheUtilityMatrix:Collaborative l-teringalgorithmsmustmeasurethesimilarityofrowsand/orcolumnsoftheutilitymatrix.Jaccarddistanceisappropriatewhenthematrixconsistsonlyof1'sandblanks(for\notrated").Cosinedistanceworksformoregeneralvaluesintheutilitymatrix.Itisoftenusefultonormal-izetheutilitymatrixbysubtractingtheaveragevalue(eitherbyrow,bycolumn,orboth)beforemeasuringthecosinedistance.FClusteringUsersandItems:Sincetheutilitymatrixtendstobemostlyblanks,distancemeasuressuchasJaccardorcosineoftenhavetoolittledatawithwhichtocomparetworowsortwocolumns.Apreliminarysteporsteps,inwhichsimilarityisusedtoclusterusersand/oritemsintosmallgroupswithstrongsimilarity,canhelpprovidemorecommoncomponentswithwhichtocomparerowsorcolumns.FUV-Decomposition:Onewayofpredictingtheblankvaluesinautilitymatrixisto ndtwolong,thinmatricesUandV,whoseproductisanapproximationtothegivenutilitymatrix.SincethematrixproductUV 352CHAPTER9.RECOMMENDATIONSYSTEMSgivesvaluesforalluser-itempairs,thatvaluecanbeusedtopredictthevalueofablankintheutilitymatrix.Theintuitivereasonthismethodmakessenseisthatoftentherearearelativelysmallnumberofissues(thatnumberisthe\thin"dimensionofUandV)thatdeterminewhetherornotauserlikesanitem.FRoot-Mean-SquareError:AgoodmeasureofhowclosetheproductUVistothegivenutilitymatrixistheRMSE(root-mean-squareerror).TheRMSEiscomputedbyaveragingthesquareofthedi erencesbetweenUVandtheutilitymatrix,inthoseelementswheretheutilitymatrixisnonblank.ThesquarerootofthisaverageistheRMSE.FComputingUandV:Onewayof ndingagoodchoiceforUandVinaUV-decompositionistostartwitharbitrarymatricesUandV.Repeat-edlyadjustoneoftheelementsofUorVtominimizetheRMSEbetweentheproductUVandthegivenutilitymatrix.Theprocessconvergestoalocaloptimum,althoughtohaveagoodchanceofobtainingaglobaloptimumwemusteitherrepeattheprocessfrommanystartingmatrices,orsearchfromthestartingpointinmanydi erentways.FTheNet\rixChallenge:Animportantdriverofresearchintorecommenda-tionsystemswastheNet\rixchallenge.Aprizeof$1,000,000waso eredforacontestantwhocouldproduceanalgorithmthatwas10%betterthanNet\rix'sownalgorithmatpredictingmovieratingsbyusers.TheprizewasawardedinSept.,2009.9.7ReferencesforChapter9[1]isasurveyofrecommendationsystemsasof2005.Theargumentregard-ingtheimportanceofthelongtailinon-linesystemsisfrom[2],whichwasexpandedintoabook[3].[8]discussestheuseofcomputergamestoextracttagsforitemsSee[5]foradiscussionofitem-itemsimilarityandhowAmazondesigneditscollaborative- lteringalgorithmforproductrecommendations.Therearethreepapersdescribingthethreealgorithmsthat,incombination,wontheNetFlixchallenge.Theyare[4],[6],and[7].1.G.AdomaviciusandA.Tuzhilin,\Towardsthenextgenerationofrec-ommendersystems:asurveyofthestate-of-the-artandpossibleexten-sions,"IEEETrans.onDataandKnowledgeEngineering17:6,pp.734{749,2005.2.C.Anderson,http://www.wired.com/wired/archive/12.10/tail.html 9.7.REFERENCESFORCHAPTER93532004.3.C.Anderson,TheLongTail:WhytheFutureofBusinessisSellingLessofMore,HyperionBooks,NewYork,2006.4.Y.Koren,\TheBellKorsolutiontotheNet\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC BellKor.pdf2009.5.G.Linden,B.Smith,andJ.York,\Amazon.comrecommendations:item-to-itemcollaborative ltering,"InternetComputing7:1,pp.76{80,2003.6.M.PiotteandM.Chabbert,"ThePragmaticTheorysolutiontotheNet-\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC PragmaticTheory.pdf2009.7.A.Toscher,M.Jahrer,andR.Bell,\TheBigChaossolutiontotheNet\rixgrandprize,"www.netflixprize.com/assets/GrandPrize2009 BPC BigChaos.pdf2009.8.L.vonAhn,\Gameswithapurpose,"IEEEComputerMagazine,pp.96{98,June2006.

Related Contents


Next Show more