sankenosakauacjp Abstract In this paper we introduce Factorization Machines FM which are a new model class that combines the advantages of Support Vector Machines SVM with factorization models Like SVMs FMs are a general predictor working with any re ID: 24237 Download Pdf
rendleunikonstanzde ABSTRACT The most common approach in predictive modeling is to de scribe cases with feature vectors aka design matrix Many machine learning methods such as linear regression or sup port vector machines rely on this representation
oitacjp Abstract An interactor matrix plays several important roles in the control systems theory In this paper we present a simple method to derive the right interactor for tall transfer function matrices using MoorePenrose pseudoinverse By the pres
: . Intro to Artificial Intelligence and Computer Simulation. Instructor: Kris Hauser. http://cs.indiana.edu/~hauserk. 1. Basics. Class web site. http://. cs.indiana.edu/classes/b351. Textbook. S. Russell and P. .
Fromthe*DepartmentofOrthopaedicSurgery,OsakaUniversityMedicalSchool,Suita,Osaka,Japan;andCenterofArthroplasty,KyowakaiHospital,Suita,Osaka,Japan. Inpreviousstudiescomparingroboticimplanta-tionofafemor
This detector targets to RDX a high explosive inside an antipersonnel landmine bu ried up to 15 cm deep This detector works well outside an electromagnetically shielded room It wa s also mounted an antimine vehicle and remotely controlled mine detec
The Edwin Smith papyrus. Title:. Instructions for treating a fracture of the cheekbone.. Symptoms:. If you examine a man with a fracture of the cheekbone, you will find a salient and red fluxion, bordering the wound..
tutokyoacjp Building Research Institute Ministry of Construction Tsukuba Japan Email hiraishikenkengojp Building Research Institute Ministry of Construction Tsukuba Japan Email midorikenkengojp Building Research Institute Ministry of Construction Tsu
Japan Labor Review, vol. 9, no. 2, Spring 201260 rently feeling happy?
R. Tschirhart. US-Japan 30th Anniversary Symposium . Kona, Hawaii. October 20. th. 2010. K. aons. at the . TeV. atron. The . KTeV. experiment was primarily a comprehensive study of neutral . kaon.
Go . Yoshizawa. (Osaka University). Background. Patient registry for rare diseases become gradually popular to collect more information for medical research. As the registration process is mostly on a paper basis, an online-based registration system can be more effective and efficient.
Published bystefany-barnette
sankenosakauacjp Abstract In this paper we introduce Factorization Machines FM which are a new model class that combines the advantages of Support Vector Machines SVM with factorization models Like SVMs FMs are a general predictor working with any re
Download Pdf - The PPT/PDF document "Factorization Machines Steffen Rendle De..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
FactorizationMachinesSteffenRendleDepartmentofReasoningforIntelligenceTheInstituteofScienticandIndustrialResearchOsakaUniversity,Japanrendle@ar.sanken.osaka-u.ac.jpAbstract—Inthispaper,weintroduceFactorizationMachines(FM)whichareanewmodelclassthatcombinestheadvantagesofSupportVectorMachines(SVM)withfactorizationmodels.LikeSVMs,FMsareageneralpredictorworkingwithanyrealvaluedfeaturevector.IncontrasttoSVMs,FMsmodelallinteractionsbetweenvariablesusingfactorizedparameters.Thustheyareabletoestimateinteractionseveninproblemswithhugesparsity(likerecommendersystems)whereSVMsfail.WeshowthatthemodelequationofFMscanbecalculatedinlineartimeandthusFMscanbeoptimizeddirectly.SounlikenonlinearSVMs,atransformationinthedualformisnotnecessaryandthemodelparameterscanbeestimateddirectlywithouttheneedofanysupportvectorinthesolution.WeshowtherelationshiptoSVMsandtheadvantagesofFMsforparameterestimationinsparsesettings.Ontheotherhandtherearemanydifferentfactorizationmod-elslikematrixfactorization,parallelfactoranalysisorspecializedmodelslikeSVD++,PITForFPMC.Thedrawbackofthesemodelsisthattheyarenotapplicableforgeneralpredictiontasksbutworkonlywithspecialinputdata.Furthermoretheirmodelequationsandoptimizationalgorithmsarederivedindividuallyforeachtask.WeshowthatFMscanmimicthesemodelsjustbyspecifyingtheinputdata(i.e.thefeaturevectors).ThismakesFMseasilyapplicableevenforuserswithoutexpertknowledgeinfactorizationmodels.IndexTerms—factorizationmachine;sparsedata;tensorfac-torization;supportvectormachineI.INTRODUCTIONSupportVectorMachinesareoneofthemostpopularpredictorsinmachinelearninganddatamining.Neverthelessinsettingslikecollaborativeltering,SVMsplaynoimportantroleandthebestmodelsareeitherdirectapplicationsofstandardmatrix/tensorfactorizationmodelslikePARAFAC[1]orspecializedmodelsusingfactorizedparameters[2],[3],[4].Inthispaper,weshowthattheonlyreasonwhystandardSVMpredictorsarenotsuccessfulinthesetasksisthattheycannotlearnreliableparameters(`hyperplanes')incomplex(non-linear)kernelspacesunderverysparsedata.Ontheotherhand,thedrawbackoftensorfactorizationmodelsandevenmoreforspecializedfactorizationmodelsisthat(1)theyarenotapplicabletostandardpredictiondata(e.g.arealvaluedfeaturevectorinRn.)and(2)thatspecializedmodelsareusuallyderivedindividuallyforaspecictaskrequiringeffortinmodellinganddesignofalearningalgorithm.Inthispaper,weintroduceanewpredictor,theFactor-izationMachine(FM),thatisageneralpredictorlikeSVMsbutisalsoabletoestimatereliableparametersunderveryhighsparsity.Thefactorizationmachinemodelsallnestedvariableinteractions(comparabletoapolynomialkernelinSVM),butusesafactorizedparametrizationinsteadofadenseparametrizationlikeinSVMs.WeshowthatthemodelequationofFMscanbecomputedinlineartimeandthatitdependsonlyonalinearnumberofparameters.Thisallowsdirectoptimizationandstorageofmodelparameterswithouttheneedofstoringanytrainingdata(e.g.supportvectors)forprediction.Incontrasttothis,non-linearSVMsareusuallyoptimizedinthedualformandcomputingaprediction(themodelequation)dependsonpartsofthetrainingdata(thesupportvectors).WealsoshowthatFMssubsumemanyofthemostsuccessfulapproachesforthetaskofcollaborativelteringincludingbiasedMF,SVD++[2],PITF[3]andFPMC[4].Intotal,theadvantagesofourproposedFMare:1)FMsallowparameterestimationunderverysparsedatawhereSVMsfail.2)FMshavelinearcomplexity,canbeoptimizedintheprimalanddonotrelyonsupportvectorslikeSVMs.WeshowthatFMsscaletolargedatasetslikeNetixwith100millionsoftraininginstances.3)FMsareageneralpredictorthatcanworkwithanyrealvaluedfeaturevector.Incontrasttothis,otherstate-of-the-artfactorizationmodelsworkonlyonveryrestrictedinputdata.Wewillshowthatjustbydeningthefeaturevectorsoftheinputdata,FMscanmimicstate-of-the-artmodelslikebiasedMF,SVD++,PITForFPMC.II.PREDICTIONUNDERSPARSITYThemostcommonpredictiontaskistoestimateafunctionyRn!Tfromarealvaluedfeaturevectorx2RntoatargetdomainT(e.g.T=RforregressionorT=f+ gforclassication).Insupervisedsettings,itisassumedthatthereisatrainingdatasetD=f(x(1);y(1))(x(2);y(2));:::gofexamplesforthetargetfunctionygiven.WealsoinvestigatetherankingtaskwherethefunctionywithtargetT=Rcanbeusedtoscorefeaturevectorsxandsortthemaccordingtotheirscore.Scoringfunctionscanbelearnedwithpairwisetrainingdata[5],whereafeaturetuple(x(A)x(B))2Dmeansthatx(A)shouldberankedhigherthanx(B).Asthepairwiserankingrelationisantisymmetric,itissufcienttouseonlypositivetraininginstances.Inthispaper,wedealwithproblemswherexishighlysparse,i.e.almostalloftheelementsxiofavectorxarezero.Letm(x)bethenumberofnon-zeroelementsinthe Fig.1.Exampleforsparserealvaluedfeaturevectorsxthatarecreatedfromthetransactionsofexample1.Everyrowrepresentsafeaturevectorx(i)withitscorrespondingtargety(i).Therst4columns(blue)representindicatorvariablesfortheactiveuser;thenext5(red)indicatorvariablesfortheactiveitem.Thenext5columns(yellow)holdadditionalimplicitindicators(i.e.othermoviestheuserhasrated).Onefeature(green)representsthetimeinmonths.Thelast5columns(brown)haveindicatorsforthelastmovietheuserhasratedbeforetheactiveone.Therightmostcolumnisthetarget–heretherating.featurevectorxand mDbetheaveragenumberofnon-zeroelementsm(x)ofallvectorsx2D.Hugesparsity( mDn)appearsinmanyreal-worlddatalikefeaturevectorsofeventtransactions(e.g.purchasesinrecommendersystems)ortextanalysis(e.g.bagofwordapproach).Onereasonforhugesparsityisthattheunderlyingproblemdealswithlargecategoricalvariabledomains.Example1Assumewehavethetransactiondataofamoviereviewsystem.Thesystemrecordswhichuseru2Uratesamovie(item)i2Iatacertaintimet2Rwitharatingr2f12345g.LettheusersUanditemsIbe:U=fAlice(A)Bob(B)Charlie(C);:::gI=fTitanic(TI)NottingHill(NH)StarWars(SW)StarTrek(ST);:::gLettheobserveddataSbe:S=f(ATI2010-15)(ANH2010-23)(ASW2010-41)(BSW2009-54)(BST2009-85)(CTI2009-91)(CSW2009-125)gAnexampleforapredictiontaskusingthisdata,istoestimateafunction^ythatpredictstheratingbehaviourofauserforanitematacertainpointintime.Figure1showsoneexampleofhowfeaturevectorscanbecreatedfromSforthistask.1Here,rstthereareUbinaryindicatorvariables(blue)thatrepresenttheactiveuserofatransaction–thereisalwaysexactlyoneactiveuserineachtransaction(u;i;t;r)2S,e.g.userAliceintherstone(x(1)A=1).ThenextIbinaryindicatorvariables(red)holdtheactiveitem–againthereisalwaysexactlyoneactiveitem(e.g.x(1)TI=1).Thefeaturevectorsingure1alsocontainindicatorvariables(yellow)foralltheothermoviestheuser1Tosimplifyreadability,wewillusecategoricallevels(e.g.Alice(A))insteadofnumbers(e.g.1)toidentifyelementsinvectorswhereveritmakessense(e.g.wewritexAorxAliceinsteadofx1).haseverrated.Foreachuser,thevariablesarenormalizedsuchthattheysumupto1.E.g.AlicehasratedTitanic,NottingHillandStarWars.Additionallytheexamplecontainsavariable(green)holdingthetimeinmonthsstartingfromJanuary,2009.Andnallythevectorcontainsinformationofthelastmovie(brown)theuserhasratedbefore(s)heratedtheactiveone–e.g.forx(2),AliceratedTitanicbeforesheratedNottingHill.InsectionV,weshowhowfactorizationmachinesusingsuchfeaturevectorsasinputdataarerelatedtospecializedstate-of-the-artfactorizationmodels.Wewillusethisexampledatathroughoutthepaperforillus-tration.HoweverpleasenotethatFMsaregeneralpredictorslikeSVMsandthusareapplicabletoanyrealvaluedfeaturevectorsandarenotrestrictedtorecommendersystems.III.FACTORIZATIONMACHINES(FM)Inthissection,weintroducefactorizationmachines.WediscussthemodelequationindetailandshowshortlyhowtoapplyFMstoseveralpredictiontasks.A.FactorizationMachineModel1)ModelEquation:Themodelequationforafactorizationmachineofdegreed=2isdenedas:^y(x):=w0+nXi=1wixi+nXi=1nXj=i+1hvivjixixj(1)wherethemodelparametersthathavetobeestimatedare:w02Rw2RnV2Rnk(2)Andhiisthedotproductoftwovectorsofsizek:hvivji:=kXf=1vi;fvj;f(3)ArowviwithinVdescribesthei-thvariablewithkfactors.k2N+0isahyperparameterthatdenesthedimensionalityofthefactorization.A2-wayFM(degreed=2)capturesallsingleandpairwiseinteractionsbetweenvariables:w0istheglobalbias.wimodelsthestrengthofthei-thvariable.^wi;j:=hvivjimodelstheinteractionbetweenthei-thandj-thvariable.Insteadofusinganownmodelparameterwi;j2Rforeachinteraction,theFMmodelstheinteractionbyfactorizingit.Wewillseelateron,thatthisisthekeypointwhichallowshighqualityparameterestimatesofhigher-orderinteractions(d2)undersparsity.2)Expressiveness:ItiswellknownthatforanypositivedenitematrixW,thereexistsamatrixVsuchthatW=VVtprovidedthatkissufcientlylarge.ThisshowsthataFMcanexpressanyinteractionmatrixWifkischosenlargeenough.Neverthelessinsparsesettings,typicallyasmallkshouldbechosenbecausethereisnotenoughdatatoestimatecomplexinteractionsW.Restrictingk–andthustheexpressivenessoftheFM–leadstobettergeneralizationandthusimprovedinteractionmatricesundersparsity. 3)ParameterEstimationUnderSparsity:Insparsesettings,thereisusuallynotenoughdatatoestimateinteractionsbetweenvariablesdirectlyandindependently.Factorizationmachinescanestimateinteractionseveninthesesettingswellbecausetheybreaktheindependenceoftheinteractionparametersbyfactorizingthem.Ingeneralthismeansthatthedataforoneinteractionhelpsalsotoestimatetheparametersforrelatedinteractions.Wewillmaketheideamoreclearwithanexamplefromthedataingure1.AssumewewanttoestimatetheinteractionbetweenAlice(A)andStarTrek(ST)forpredictingthetargety(heretherating).Obviously,thereisnocasexinthetrainingdatawherebothvariablesxAandxSTarenon-zeroandthusadirectestimatewouldleadtonointeraction(wA;ST=0).ButwiththefactorizedinteractionparametershvAvSTiwecanestimatetheinteractioneveninthiscase.Firstofall,BobandCharliewillhavesimilarfactorvectorsvBandvCbecausebothhavesimilarinteractionswithStarWars(vSW)forpredictingratings–i.e.hvBvSWiandhvCvSWihavetobesimilar.Alice(vA)willhaveadifferentfactorvectorfromCharlie(vC)becauseshehasdifferentinteractionswiththefactorsofTitanicandStarWarsforpredictingratings.Next,thefactorvectorsofStarTrekarelikelytobesimilartotheoneofStarWarsbecauseBobhassimilarinteractionsforbothmoviesforpredictingy.Intotal,thismeansthatthedotproduct(i.e.theinteraction)ofthefactorvectorsofAliceandStarTrekwillbesimilartotheoneofAliceandStarWars–whichalsomakesintuitivelysense.4)Computation:Next,weshowhowtomakeFMsappli-cablefromacomputationalpointofview.Thecomplexityofstraightforwardcomputationofeq.(1)isinO(kn2)becauseallpairwiseinteractionshavetobecomputed.Butwithreformulatingitdropstolinearruntime.Lemma3.1:Themodelequationofafactorizationmachine(eq.(1))canbecomputedinlineartimeO(kn).Proof:Duetothefactorizationofthepairwiseinterac-tions,thereisnomodelparameterthatdirectlydependsontwovariables(e.g.aparameterwithanindex(i;j)).Sothepairwiseinteractionscanbereformulated:nXi=1nXj=i+1hvivjixixj=1 2nXi=1nXj=1hvivjixixj 1 2nXi=1hviviixixi=1 20@nXi=1nXj=1kXf=1vi;fvj;fxixj nXi=1kXf=1vi;fvi;fxixi1A=1 2kXf=10@ nXi=1vi;fxi!0@nXj=1vj;fxj1A nXi=1v2i;fx2i1A=1 2kXf=10@ nXi=1vi;fxi!2 nXi=1v2i;fx2i1AThisequationhasonlylinearcomplexityinbothkandn–i.e.itscomputationisinO(kn). Moreover,undersparsitymostoftheelementsinxare0(i.e.m(x)issmall)andthus,thesumshaveonlytobecomputedoverthenon-zeroelements.Thusinsparseapplications,thecomputationofthefactorizationmachineisinO(k mD)–e.g. mD=2fortypicalrecommendersystemslikeMFapproaches(seesectionV-A).B.FactorizationMachinesasPredictorsFMcanbeappliedtoavarietyofpredictiontasks.Amongthemare:Regression:^y(x)canbeuseddirectlyasthepredictorandtheoptimizationcriterionise.g.theminimalleastsquareerroronD.Binaryclassication:thesignof^y(x)isusedandtheparametersareoptimizedforhingelossorlogitloss.Ranking:thevectorsxareorderedbythescoreof^y(x)andoptimizationisdoneoverpairsofinstancevectors(x(a)x(b))2Dwithapairwiseclassicationloss(e.g.likein[5]).Inallthesecases,regularizationtermslikeL2areusuallyaddedtotheoptimizationobjectivetopreventovertting.C.LearningFactorizationMachinesAswehaveshown,FMshaveaclosedmodelequationthatcanbecomputedinlineartime.Thus,themodelparameters(w0,wandV)ofFMscanbelearnedefcientlybygradientdescentmethods–e.g.stochasticgradientdescent(SGD)–foravarietyoflosses,amongthemaresquare,logitorhingeloss.ThegradientoftheFMmodelis: @^y(x)=8-3.3;〱:1ifisw0xiifiswixiPnj=1vj;fxj vi;fx2iifisvi;f(4)ThesumPnj=1vj;fxjisindependentofiandthuscanbeprecomputed(e.g.whencomputing^y(x)).Ingeneral,eachgradientcanbecomputedinconstanttimeO(1).Andallparameterupdatesforacase(x;y)canbedoneinO(kn)–orO(km(x))undersparsity.Weprovideagenericimplementation,LIBFM2,thatusesSGDandsupportsbothelement-wiseandpairwiselosses.D.d-wayFactorizationMachineThe2-wayFMdescribedsofarcaneasilybegeneralizedtoad-wayFM:^y(x):=w0+nXi=1wixi+dXl=2nXi1=1:::nXil=il 1+10@lYj=1xij1A0@klXf=1lYj=1v(l)ij;f1A(5)2http://www.libfm.org wheretheinteractionparametersforthel-thinteractionarefactorizedbythePARAFACmodel[1]withthemodelpa-rameters:V(l)2Rnkl;kl2N+0(6)Thestraight-forwardcomplexityforcomputingeq.(5)isO(kdnd).Butwiththesameargumentsasinlemma3.1,onecanshowthatitcanbecomputedinlineartime.E.SummaryFMsmodelallpossibleinteractionsbetweenvaluesinthefeaturevectorxusingfactorizedinteractionsinsteadoffullparametrizedones.Thishastwomainadvantages:1)Theinteractionsbetweenvaluescanbeestimatedevenunderhighsparsity.Especially,itispossibletogeneral-izetounobservedinteractions.2)Thenumberofparametersaswellasthetimeforpredictionandlearningislinear.ThismakesdirectoptimizationusingSGDfeasibleandallowsoptimizingagainstavarietyoflossfunctions.Intheremainderofthispaper,wewillshowtherelationshipsbetweenfactorizationmachinesandsupportvectormachinesaswellasmatrix,tensorandspecializedfactorizationmodels.IV.FMSVS.SVMSA.SVMmodelThemodelequationofanSVM[6]canbeexpressedasthedotproductbetweenthetransformedinputxandmodelparametersw:^y(x)=h(x)wi,whereisamappingfromthefeaturespaceRnintoamorecomplexspaceF.Themappingisrelatedtothekernelwith:KRnRn!R;K(xz)=h(x);(z)iInthefollowing,wediscusstherelationshipsofFMsandSVMsbyanalyzingtheprimalformoftheSVMs3.1)Linearkernel:Themostsimplekernelisthelinearker-nel:Kl(xz):=1+hxzi,whichcorrespondstothemapping(x):=(1;x1;:::;xn).AndthusthemodelequationofalinearSVMcanberewrittenas:^y(x)=w0+nXi=1wixi;w02Rw2Rn(7)ItisobviousthatalinearSVM(eq.(7))isidenticaltoaFMofdegreed=1(eq.(5)).2)Polynomialkernel:ThepolynomialkernelallowstheSVMtomodelhigherinteractionsbetweenvariables.ItisdenedasK(xz):=(hxzi+1)d.E.g.ford=2thiscorrespondstothefollowingmapping:(x):=(1p 2x1;:::;p 2xn;x21;:::;x2np 2x1x2;:::;p 2x1xnp 2x2x3;:::;p 2xn 1xn)(8)3Inpractice,SVMsaresolvedinthedualformandthemappingisnotperformedexplicitly.Nevertheless,theprimalanddualhavethesamesolution(optimum),soallourargumentsabouttheprimalholdalsoforthedualform. 020406080100120 0.900.920.940.960.98 Netflix: Rating Prediction Error Factorization MachineSupport Vector Machine Fig.2.FMssucceedinestimating2-wayvariableinteractionsinverysparseproblemswhereSVMsfail(seesectionIII-A3andIV-Bfordetails.)Andso,themodelequationforpolynomialSVMscanberewrittenas:^y(x)=w0+p 2nXi=1wixi+nXi=1w(2)i;ix2i+p 2nXi=1nXj=i+1w(2)i;jxixj(9)wherethemodelparametersare:w02Rw2RnW(2)2Rnn(symmetricmatrix)ComparingapolynomialSVM(eq.(9))toaFM(eq.(1)),onecanseethatbothmodelallnestedinteractionsuptodegreed=2.ThemaindifferencebetweenSVMsandFMsistheparametrization:allinteractionparameterswi;jofSVMsarecompletelyindependent,e.g.wi;jandwi;l.IncontrasttothistheinteractionparametersofFMsarefactorizedandthushvivjiandhvivlidependoneachotherastheyoverlapandshareparameters(herevi).B.ParameterEstimationUnderSparsityInthefollowing,wewillshowwhylinearandpolynomialSVMsfailforverysparseproblems.Weshowthisfortheexampleofcollaborativelteringwithuseranditemindicatorvariables(seethersttwogroups(blueandred)intheexampleofgure1).Here,thefeaturevectorsaresparseandonlytwoelementsarenon-zero(theactiveuseruandactiveitemi).1)LinearSVM:Forthiskindofdatax,thelinearSVMmodel(eq.(7))isequivalentto:^y(x)=w0+wu+wi(10)Becausexj=1ifandonlyifj=uorj=i.Thismodelcorrespondstooneofthemostbasiccollaborativelteringmodelswhereonlytheuseranditembiasesarecaptured.Asthismodelisverysimple,theparameterscanbeestimatedwellevenundersparsity.However,theempiricalpredictionqualitytypicallyislow(seegure2).2)PolynomialSVM:Withthepolynomialkernel,theSVMcancapturehigher-orderinteractions(herebetweenusersanditems).Inoursparsecasewithm(x)=2,themodelequationforSVMsisequivalentto:^y(x)=w0+p 2(wu+wi)+w(2)u;u+w(2)i;i+p 2w(2)u;i Firstofall,wuandw(2)u;uexpressthesame–i.e.onecandroponeofthem(e.g.w(2)u;u).Nowthemodelequationisthesameasforthelinearcasebutwithanadditionaluser-iteminteractionw(2)u;i.Intypicalcollaborativeltering(CF)problems,foreachinteractionparameterw(2)u;ithereisatmostoneobservation(u;i)inthetrainingdataandforcases(u0;i0)inthetestdatathereareusuallynoobservationsatallinthetrainingdata.Forexampleingure1thereisjustoneobservationfortheinteraction(Alice,Titanic)andnonfortheinteraction(Alice,StarTrek).Thatmeansthemaximummarginsolutionfortheinteractionparametersw(2)u;iforalltestcases(u;i)are0(e.g.w(2)A;ST=0).AndthusthepolynomialSVMcanmakenouseofany2-wayinteractionforpredictingtestexamples;sothepolynomialSVMonlyreliesontheuseranditembiasesandcannotprovidebetterestimationsthanalinearSVM.ForSVMs,estimatinghigher-orderinteractionsisnotonlyanissueinCFbutinallscenarioswherethedataishugelysparse.Becauseforareliableestimateoftheparameterw(2)i;jofapairwiseinteraction(i;j),theremustbe`enough'casesx2Dwherexi=0^xj=0.Assoonaseitherxi=0orxj=0,thecasexcannotbeusedforestimatingtheparameterw(2)i;j.Tosummarize,ifthedataistoosparse,i.e.therearetoofeworevennocasesfor(i;j),SVMsarelikelytofail.C.Summary1)ThedenseparametrizationofSVMsrequiresdirectobservationsfortheinteractionswhichisoftennotgiveninsparsesettings.ParametersofFMscanbeestimatedwellevenundersparsity(seesectionIII-A3).2)FMscanbedirectlylearnedintheprimal.Non-linearSVMsareusuallylearnedinthedual.3)ThemodelequationofFMsisindependentofthetrainingdata.PredictionwithSVMsdependsonpartsofthetrainingdata(thesupportvectors).V.FMSVS.OTHERFACTORIZATIONMODELSThereisavarietyoffactorizationmodels,rangingfromstandardmodelsform-aryrelationsovercategoricalvariables(e.g.MF,PARAFAC)tospecializedmodelsforspecicdataandtasks(e.g.SVD++,PITF,FPMC).Next,weshowthatFMscanmimicmanyofthesemodelsjustbyusingtherightinputdata(e.g.featurevectorx).A.MatrixandTensorFactorizationMatrixfactorization(MF)isoneofthemoststudiedfactor-izationmodels(e.g.[7],[8],[2]).Itfactorizesarelationshipbetweentwocategoricalvariables(e.g.UandI).ThestandardapproachtodealwithcategoricalvariablesistodenebinaryindicatorvariablesforeachlevelofUandI(e.g.seeg.1,rst(blue)andsecond(red)group)4:n:=U[I;xj:=(j=i_j=u)(11)4Toshortennotation,weaddresselementsinx(e.g.xj)andtheparametersbothbynumbers(e.g.j2f1;:::;ng)andcategoricallevels(e.g.j2(U[I)).Thatmeansweimplicitlyassumeabijectivemappingfromnumberstocategoricallevels.AFMusingthisfeaturevectorxisidenticaltothematrixfactorizationmodel[2]becausexjisonlynon-zeroforuandi,soallotherbiasesandinteractionsdrop:^y(x)=w0+wu+wi+hvuvii(12)Withthesameargument,onecanseethatforproblemswithmorethantwocategoricalvariables,FMsincludesanestedparallelfactoranalysismodel(PARAFAC)[1].B.SVD++Forthetaskofratingprediction(i.e.regression),KorenimprovesthematrixfactorizationmodeltotheSVD++model[2].AFMcanmimicthismodelbyusingthefollowinginputdatax(likeintherstthreegroupsofgure1):n:=U[I[L;xj:=8-3.3;➆-3.3;➆:1ifj=i_j=u1 p jNujifj2Nu0elsewhereNuisthesetofallmoviestheuserhaseverrated5.AFM(d=2)wouldbehavethefollowingusingthisdata:^y(x)=SVD++z }| {w0+wu+wi+hvuvii+1 p NuXl2Nuhvivli+1 p NuXl2Nu0@wl+hvuvli+1 p NuXl02Nu;l0lhvlv0li1AwheretherstpartisexactlythesameastheSVD++model.ButtheFMcontainsalsosomeadditionalinteractionsbetweenusersandmoviesNuaswellasbasiceffectsforthemoviesNuandinteractionsbetweenpairsofmoviesinNu.C.PITFforTagRecommendationTheproblemoftagpredictionisdenedasrankingtagsforagivenuseranditemcombination.Thatmeanstherearethreecategoricaldomainsinvolved:usersU,itemsIandtagsT.IntheECML/PKDDDiscoveryChallengeabouttagrecom-mendation,amodelbasedonfactorizingpairwiseinteractions(PITF)hasachievedthebestscore[3].WewillshowhowaFMcanmimicthismodel.Afactorizationmachinewithbinaryindicatorvariablesfortheactiveuseru,itemiandtagtresultsinthefollowingmodel:n:=U[I[T;xj:=(j=i_j=u_j=t)(13))^y(x)=w0+wu+wi+wt+hvuvii+hvuvti+hvivtiAsthismodelisusedforrankingbetweentwotagstA,tBwithinthesameuser/itemcombination(u;i)[3],boththeoptimizationandthepredictionalwaysworkondifferencesbetweenscoresforthecases(u;i;tA)and(u;i;tB).Thus5TodistinguishelementsinNufromelementsinI,theyaretransformedwithanybijectivefunction!:I!LintoaspaceLwithL\I=;. 0.0e+004.0e+068.0e+061.2e+07 0.050.150.250.35 ECML Discovery Challenge 2009, Task 2Number of Parameters Factorization MachinePairwise Interaction TF (PITF) Fig.3.RecommendationqualityofaFMcomparedtothewinningPITFmodel[3]oftheECML/PKDDDiscoveryChallenge2009.Thequalityisplottedagainstthenumberofmodelparameters.withoptimizationforpairwiseranking(likein[5],[3]),theFMmodelisequivalentto:^y(x):=wt+hvuvti+hvivti(14)NowtheoriginalPITFmodel[3]andtheFMmodelwithbinaryindicators(eq.(14))arealmostidentical.Theonlydifferenceisthat(i)theFMmodelhasabiastermwtfortand(ii)thefactorizationparametersforthetags(vt)betweenthe(u;t)-and(i;t)-interactionaresharedfortheFMmodelbutindividualfortheoriginalPITFmodel.Besidesthistheoreticalanalysis,gure3showsempiricallythatbothmodelsalsoachievecomparablepredictionqualityforthistask.D.FactorizedPersonalizedMarkovChains(FPMC)TheFPMCmodel[4]triestorankproductsinanonlineshopbasedonthelastpurchases(attimet 1)oftheuseru.Againjustbyfeaturegeneration,afactorizationmachine(d=2)behavessimilarly:n:=U[I[L;xj:=8-3.3;〱:1ifj=i_j=u1 jBut 1jifj2But 10else(15)whereButListheset(`basket')ofallitemsauseruhaspurchasedattimet(fordetailssee[4]).Then:^y(x)=w0+wu+wi+hvuvii+1 But 1Xl2But 1hvivli+1 But 1Xl2But 10@wl+hvuvli+1 But 1Xl02But 1;l0lhvlv0li1ALikefortagrecommendationthismodelisusedandopti-mizedforranking(hererankingitemsi)andthusonlyscoredifferencesbetween(u;iA;t)and(u;iB;t)areusedinthepredictionandoptimizationcriterion[4].Thus,alladditivetermsthatdonotdependonivanishandtheFMmodelequationisequivalentto:^y(x)=wi+hvuvii+1 But 1Xl2But 1hvivli(16)NowonecanseethattheoriginalFPMCmodel[4]andtheFMmodelarealmostidenticalanddifferonlyintheadditionalitembiaswiandthesharingoffactorizationparametersoftheFMmodelfortheitemsinboththe(u;i)-and(i;l)-interaction.E.Summary1)StandardfactorizationmodelslikePARAFACorMFarenotgeneralpredictionmodelslikefactorizationmachines.Insteadtheyrequirethatthefeaturevectorispartitionedinmpartsandthatineachpartexactlyoneelementis1andtherest0.2)Therearemanyproposalsforspecializedfactorizationmodelsdesignedforasingletask.Wehaveshownthatfactorizationmachinescanmimicmanyofthemostsuc-cessfulfactorizationmodels(includingMF,PARAFAC,SVD++,PITF,FPMC)justbyfeatureextractionwhichmakesFMeasilyapplicableinpractice.VI.CONCLUSIONANDFUTUREWORKInthispaper,wehaveintroducedfactorizationmachines.FMsbringtogetherthegeneralityofSVMswiththebenetsoffactorizationmodels.IncontrasttoSVMs,(1)FMsareabletoestimateparametersunderhugesparsity,(2)themodelequationislinearanddependsonlyonthemodelparametersandthus(3)theycanbeoptimizeddirectlyintheprimal.TheexpressivenessofFMsiscomparabletotheoneofpolynomialSVMs.IncontrasttotensorfactorizationmodelslikePARAFAC,FMsareageneralpredictorthatcanhandleanyrealvaluedvector.Moreover,simplybyusingtherightindicatorsintheinputfeaturevector,FMsareidenticalorverysimilartomanyofthespecializedstate-of-the-artmodelsthatareapplicableonlyforaspecictask,amongthemarebiasedMF,SVD++,PITFandFPMC.REFERENCES[1]R.A.Harshman,“Foundationsoftheparafacprocedure:modelsandcon-ditionsforan'exploratory'multimodalfactoranalysis.”UCLAWorkingPapersinPhonetics,pp.1–84,1970.[2]Y.Koren,“Factorizationmeetstheneighborhood:amultifacetedcollabo-rativelteringmodel,”inKDD'08:Proceedingofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.NewYork,NY,USA:ACM,2008,pp.426–434.[3]S.RendleandL.Schmidt-Thieme,“Pairwiseinteractiontensorfactoriza-tionforpersonalizedtagrecommendation,”inWSDM'10:ProceedingsofthethirdACMinternationalconferenceonWebsearchanddatamining.NewYork,NY,USA:ACM,2010,pp.81–90.[4]S.Rendle,C.Freudenthaler,andL.Schmidt-Thieme,“Factorizingper-sonalizedmarkovchainsfornext-basketrecommendation,”inWWW'10:Proceedingsofthe19thinternationalconferenceonWorldwideweb.NewYork,NY,USA:ACM,2010,pp.811–820.[5]T.Joachims,“Optimizingsearchenginesusingclickthroughdata,”inKDD'02:ProceedingsoftheeighthACMSIGKDDinternationalconfer-enceonKnowledgediscoveryanddatamining.NewYork,NY,USA:ACM,2002,pp.133–142.[6]V.N.Vapnik,Thenatureofstatisticallearningtheory.NewYork,NY,USA:Springer-VerlagNewYork,Inc.,1995.[7]N.Srebro,J.D.M.Rennie,andT.S.Jaakola,“Maximum-marginmatrixfactorization,”inAdvancesinNeuralInformationProcessingSystems17.MITPress,2005,pp.1329–1336.[8]R.SalakhutdinovandA.Mnih,“Bayesianprobabilisticmatrixfac-torizationusingMarkovchainMonteCarlo,”inProceedingsoftheInternationalConferenceonMachineLearning,vol.25,2008.
© 2021 docslides.com Inc.
All rights reserved.