rendleunikonstanzde ABSTRACT The most common approach in predictive modeling is to de scribe cases with feature vectors aka design matrix Many machine learning methods such as linear regression or sup port vector machines rely on this representation ID: 24240
Download Pdf The PPT/PDF document "Scaling Factorization Machines to Relati..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
339 343 348 338 340 346 337 344 342 341 345 347 ScalingFactorizationMachinestoRelationalDataSteffenRendleUniversityofKonstanz78457Konstanz,Germanysteffen.rendle@unikonstanz.deABSTRACTThemostcommonapproachinpredictivemodelingistode-scribecaseswithfeaturevectors(akadesignmatrix).Manymachinelearningmethodssuchaslinearregressionorsup-portvectormachinesrelyonthisrepresentation.However,whentheunderlyingdatahasstrongrelationalpatterns,es-peciallyrelationswithhighcardinality,thedesignmatrixcangetverylargewhichcanmakelearningandpredictionsloworeveninfeasible.Thisworksolvesthisissuebymakinguseofrepeatingpat-ternsinthedesignmatrixwhichstemfromtheunderlyingrelationalstructureofthedata.ItisshownhowcoordinatedescentlearningandBayesianMarkovChainMonteCarloinferencecanbescaledforlinearregressionandfactoriza-tionmachinemodels.Empirically,itisshownontwolargescaleandverycompetitivedatasets(Net\rixprize,KDDCup2012),that(1)standardlearningalgorithmsbasedonthedesignmatrixrepresentationcannotscaletorelationalpre-dictorvariables,(2)theproposednewalgorithmsscaleand(3)thepredictivequalityoftheproposedgenericfeature-basedapproachisasgoodasthebestspecializedmodelsthathavebeentailoredtotherespectivetasks.1.INTRODUCTIONPredictiveanalyticsisanimportanttechniquewithap-plicationsinmanyeldsrangingfrombusinesstoscience.Typically,apredictivemodelisdenedasafunctionofpre-dictorvariablestosometarget.E.g.howmuch(target)doesacustomer(rstpredictorvariable)likeaproduct(secondpredictorvariable).Themostcommonapproachisthatadataanalystselectssomepredictorvariables(akafeatureen-gineering)andappliesamachinelearning(ML)methodtolearnthetargetfunctionfromobservationsofthepast.TheMLmodelisthenafunctionofthepredictorvariables(akafeaturevector).ManyimportantMLmethodsarebasedonthisprincipleincl.linearregression(LR),supportvec-tormachines(SVM),decisiontrees,etc.Theruntimeoflearningandpredictiondependsonthe(sparse)sizeofthePermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.ArticlesfromthisvolumewereinvitedtopresenttheirresultsatThe39thInternationalConferenceonVeryLargeDataBases,August26th30th2013,RivadelGarda,Trento,Italy.ProceedingsoftheVLDBEndowment,Vol.6,No.5Copyright2013VLDBEndowment21508097/13/03...$10.00.featurevectorandistypicallylinearatbest.Nowadays,featureengineeringbasedMListhedominanttechniqueinpredictiveanalytics.However,ifitisappliedtorelationaldata,especiallyinvolvingrelationsofhighcardinality,thefeaturevectorscangrowverylargewhichcanmakelearningandpredictionverysloworeveninfeasible.E.g.tofollowtheexamplefromabove,thefriendsofacustomermightbepredictiveforhis/hertaste.Usingthevariable"friendsofacustomer"inthefeaturevector(e.g.foraSVM,LR,etc.)canresultinaverylongfeaturevectorbecauseallfriends(i.e.theirIDs)areincludedinthefeaturevector.Inthispaper,itisshownhowpredictionandlearningalgo-rithmsforlinearregressionandfactorizationmachinescanbescaledtopredictorvariablesgeneratedfromrelationaldatainvolvingrelationsofhighcardinality.Theideaistomakeuseofrepeatingpatternsoverasetoffeaturevectors.Nochangeismadeonthepredictivemodelingapproachandalsonotontheunderlyingstatisticalmodel.Thusthepro-posedalgorithmslearnthesameparametersandmakethesamepredictionsbutwithamuchlowerruntimecomplex-ity.Thepaperstartswithlinearregressionasitisoneofthebest-knownMLmodelsandstillachieveshighpredictionaccuracyincompetitiveproblems(e.g.KDDCup2010[23]).Moreovertheideaofscalingiseasiertounderstandforthisbasicmodelrst.Themaincontributionisscalingfactor-izationmachines[12]whichisagenericfactorizationmodelincludingamongothersmatrixfactorization[17],SVD++[3],PITF[15],timeSVD++[5],etc.FactorizationmodelshaveshowngreatpredictiveperformanceinverycompetitivemachinelearningproblemsincludingtheNet\rixprize1,re-centKDDCups2(2010,2011,2012)aswellasotherpredictionchallenges(e.g.`WhatDoYouKnow?'Challenge3,EMIMusicHackathon4).Forbothmodels,scalingisshownforcoordinatedescent(CD)learningandforaMarkovChainMonteCarlo(MCMC)Gibbssampler.CDisoneofthemosteectivepointestimators[2]andMCMCastate-of-the-artBayesianinferencemethod.Fromapracticalpointofview,theproposedalgorithmsallowtohandlepredictivemodelingasusual:deningpre-dictorvariables(alsovariablesfromrelationsofhighcardi-nality)byfeatureengineeringandapplyingafeature-vector-basedMLalgorithm.Internally,thealgorithmsmakeuseoftherepeatingpatternsstemmingfromtherelationalstruc-tureofthedatatolargelyspeedupcomputation. 1http://www.netflixprize.com/2http://www.sigkdd.org/kddcup/3http://www.kaggle.com/c/WhatDoYouKnow4http://www.kaggle.com/c/MusicHackathon